Vincent Sitzmann: Implicit Neural Scene Representations
Вставка
- Опубліковано 30 чер 2024
- Talk @ Tübingen seminar series of the Autonomous Vision Group
uni-tuebingen.de/en/faculties...
Implicit Neural Scene Representations
Vincent Sitzmann (Stanford)
Abstract: How we represent signals has major implications for the algorithms we build to analyze them. Today, most signals are represented discretely: Images as grids of pixels, shapes as point clouds, audio as grids of amplitudes, etc. If images weren't pixel grids - would we be using convolutional neural networks today? What makes a good or bad representation? Can we do better? I will talk about leveraging emerging implicit neural representations for complex & large signals, such as room-scale geometry, images, audio, video, and physical signals defined via partial differential equations. By embedding an implicit scene representation in a neural rendering framework and learning a prior over these representations, I will show how we can enable 3D reconstruction from only a single posed 2D image. Finally, I will show how gradient-based meta-learning can enable fast inference of implicit representations, and how the features we learn in the process are already useful to the downstream task of semantic segmentation.
Bio: Vincent Sitzmann just finished his PhD at Stanford University with a thesis on "Self-Supervised Scene Representation Learning". His research interest lies in neural scene representations - the way neural networks learn to represent information on our world. His goal is to allow independent agents to reason about our world given visual observations, such as inferring a complete model of a scene with information on geometry, material, lighting etc. from only few observations, a task that is simple for humans, but currently impossible for AI. In July, Vincent will join Joshua Tenenbaum's group at MIT CSAIL for a Postdoc. vsitzmann.github.io/ - Наука та технологія
Wow, these SIRENs (Sinusoidal Implicit representation networks) are basically Neural Fourier Transforms :D! Great work :clap: :clap:. Thanks a lot for sharing the video!
I've got to know what happens to those audio signals or images if you try to extrapolate beyond the bounds of the initial signal
I suspect it becomes garbage. Out of distribution learning isn't really something that emerges by chance. But maybe there will be some periodic signals created by the sinusoidal network.
Can this be realtime?
Yes: ua-cam.com/video/j8tMk-GE8hY/v-deo.html
10:46 , emmm, I can do this with my mouth too.