Vincent Sitzmann: Implicit Neural Scene Representations

Поділитися
Вставка
  • Опубліковано 30 чер 2024
  • Talk @ Tübingen seminar series of the Autonomous Vision Group
    uni-tuebingen.de/en/faculties...
    Implicit Neural Scene Representations
    Vincent Sitzmann (Stanford)
    Abstract: How we represent signals has major implications for the algorithms we build to analyze them. Today, most signals are represented discretely: Images as grids of pixels, shapes as point clouds, audio as grids of amplitudes, etc. If images weren't pixel grids - would we be using convolutional neural networks today? What makes a good or bad representation? Can we do better? I will talk about leveraging emerging implicit neural representations for complex & large signals, such as room-scale geometry, images, audio, video, and physical signals defined via partial differential equations. By embedding an implicit scene representation in a neural rendering framework and learning a prior over these representations, I will show how we can enable 3D reconstruction from only a single posed 2D image. Finally, I will show how gradient-based meta-learning can enable fast inference of implicit representations, and how the features we learn in the process are already useful to the downstream task of semantic segmentation.
    Bio: Vincent Sitzmann just finished his PhD at Stanford University with a thesis on "Self-Supervised Scene Representation Learning". His research interest lies in neural scene representations - the way neural networks learn to represent information on our world. His goal is to allow independent agents to reason about our world given visual observations, such as inferring a complete model of a scene with information on geometry, material, lighting etc. from only few observations, a task that is simple for humans, but currently impossible for AI. In July, Vincent will join Joshua Tenenbaum's group at MIT CSAIL for a Postdoc. vsitzmann.github.io/
  • Наука та технологія

КОМЕНТАРІ • 6

  • @animeshkarnewar3
    @animeshkarnewar3 4 роки тому +6

    Wow, these SIRENs (Sinusoidal Implicit representation networks) are basically Neural Fourier Transforms :D! Great work :clap: :clap:. Thanks a lot for sharing the video!

  • @Kirbykradle
    @Kirbykradle 3 роки тому +5

    I've got to know what happens to those audio signals or images if you try to extrapolate beyond the bounds of the initial signal

    • @FallenPatta
      @FallenPatta 2 роки тому

      I suspect it becomes garbage. Out of distribution learning isn't really something that emerges by chance. But maybe there will be some periodic signals created by the sinusoidal network.

  • @YYYValentine
    @YYYValentine 3 роки тому +1

    Can this be realtime?

    • @KolossosDD
      @KolossosDD 2 роки тому

      Yes: ua-cam.com/video/j8tMk-GE8hY/v-deo.html

  • @hanyanglee9018
    @hanyanglee9018 2 роки тому +1

    10:46 , emmm, I can do this with my mouth too.