Vision & Graphics Seminar at MIT
Vision & Graphics Seminar at MIT
  • 41
  • 166 948
Qifeng Chen - Exploring Invertibility in Image Processing and Restoration
Jan 25th, 2022 @ MIT CSAIL
Abstract: Today’s smartphones have enabled numerous stunning visual effects from denoising to beautification, and we can share high-quality JPEG images easily on the internet, but it is still valuable for photographers and researchers to keep the original raw camera data for further post-processing (e.g., retouching) and analysis. However, the huge size of raw data hinders its popularity in practice, so can we almost perfectly restore the raw data from a compressed RGB image and thus avoid storing any raw data? This question leads us to design an invertible image signal processing pipeline. Then we further explore invertibility in other image processing and restoration tasks, including image compression, reversible image conversion (e.g., image-to-video conversion), embedding novel views in a single JPEG image. In the end, we demonstrate a general framework for restorable image processing operators with quasi-invertible networks.
Speaker bio: Qifeng Chen is an assistant professor at The Hong Kong University of Science and Technology. He received his Ph.D. in computer science from Stanford University in 2017. His research interests include image processing and synthesis, 3D vision, and autonomous driving. He was named one of 35 Innovators under 35 in China by MIT Technology Review and received the Google Faculty Research Award in 2018. He won 2nd place worldwide at the ACM-ICPC World Finals and a gold medal in IOI. He co-founded the startup Lino in 2017.
Переглядів: 1 048

Відео

Gerard Pons-Moll - Robust Neural Field Models, and Humans Interacting in the 3D World
Переглядів 1,1 тис.2 роки тому
Feb 12th 2022 at MIT CSAIL Title: Robust Neural Field Models, and Humans Interacting in the 3D World Abstract: The field of 3D shape representation learning and reconstruction has been revolutionised by combinations of neural networks with implicit surfaces and field representations. However, most models are not robust to rotations and translations of the input shape, lack detail, need to be tr...
Moritz Böhle - B-cos networks: Alignment is All We Need for Interpretability
Переглядів 1,8 тис.2 роки тому
Abstract: Deep Neural Networks (DNNs) constitute powerful models and have successfully been deployed in a wide range of tasks. However, it has proven difficult to extract explanations for DNN decisions that are both human-interpretable and faithful to the underlying model. For example, while a single linear transform can accurately summarise a piece-wise linear model, these transforms are noisy...
Chen-Hsuan Lin - Learning 3D Registration and Reconstruction from the Visual World
Переглядів 2,1 тис.3 роки тому
Sep 21st 2021 at MIT CSAIL Abstract: Humans learn to develop strong senses for 3D geometry by looking around in the visual world. Through pure visual perception, not only can we recover a mental 3D representation of what we are looking at, but meanwhile we can also recognize the location we are looking at the scene from. In this talk, I will discuss the problems of learning geometric alignment ...
Zachary Teed - Optimization Inspired Neural Networks for Multiview 3D
Переглядів 8 тис.3 роки тому
Oct 5th, 2021 @ MIT CSAIL Abstract: Multiview 3D has traditionally been approached as an optimization problem. The solution is produced by an algorithm which searches over continuous variables (camera pose, depth, 3D points) to satisfy both geometric constraints and visual observations. In contrast, deep learning offers an alternative strategy where the solution is produced by a general-purpose...
Joao Carreira - More general perception
Переглядів 1 тис.3 роки тому
Tuesday, July 6th. MIT, CSAIL Abstract: With ever growing datasets and computational power there seems to be ever diminishing value for embedding domain knowledge into the design of feature extractors. In this talk I will review some of the research done at DeepMind around video understanding and how that motivated our attempt at creating a scalable perception model that makes no architectural ...
Silvia Sellán - A deep dive into Swept Volumes
Переглядів 8293 роки тому
June 1st 2021. MIT CSAIL Abstract: Given a solid 3D shape and a trajectory of it over time, we compute its swept volume - the union of all points contained within the shape at some moment in time. We consider the representation of the input and output as implicit functions, and lift the problem to 4D spacetime, where we show the problem gains a continuous structure which avoids expensive global...
Noah Snavely - On Reflection
Переглядів 8193 роки тому
May 5th 2021. MIT CSAIL Abstract: How can you tell if you are in a mirror world? Let's talk about the slightly askew universe of reflected images. Bio: Noah Snavely is an associate professor of Computer Science at Cornell University and Cornell Tech, and also a researcher at Google Research. Noah's research interests are in computer vision and graphics, in particular 3D understanding and depict...
Hsueh-Ti Derek Liu - Towards Scalable Geometry Processing
Переглядів 7643 роки тому
May 18th, 2021 Abstract: Recent advances in 3D scanning technologies have brought us massive highly detailed geometric data with millions of vertices. However, most existing algorithms are still capped at processing meshes with tens of thousands of vertices. The main reason is the lack of scalable numerical solvers that can operate on curved surfaces. In this talk, I will present two complement...
Roumen Thijs - Portable Laser Cutting
Переглядів 2813 роки тому
Tuesday May 11th. MIT, CSAIL Abstract: A portable format for laser cutting will enable millions of users to benefit from laser-cut models as opposed to the 1000s of tech enthusiasts that engage with laser cutting today. What holds widely adopted use back is the limited ability to modify and fabricate existing models. It may seem like a portable format already exist, as laser cut models are alre...
Xinshuo Weng - A Paradigm Shift for Perception and Prediction Pipeline in Autonomous Driving
Переглядів 1,8 тис.3 роки тому
Tuesday April 20th. MIT, CSAIL Abstract: Perception and prediction pipeline (3D object detection and multi-object tracking, trajectory forecasting) is a key component of autonomous systems such as self-driving cars. Although significant advancements have been achieved in each individual module of this pipeline, limited attention is received to improve the pipeline itself. In this talk, I will i...
Gedas Bertasius - Video Understanding with Modern Language Models
Переглядів 1,7 тис.3 роки тому
March 30, 2021. MIT, CSAIL Abstract: Humans understand the world by processing signals from both vision and language. Similarly, we believe that language can be useful for developing better video understanding systems. In this talk, I will present several video understanding frameworks that incorporate models from the language domain. First, I will introduce TimeSformer, the first convolution-f...
Chris Yu - Repulsive Curves
Переглядів 9663 роки тому
March 23rd, 2021. MIT CSAIL Abstract: Curves play a fundamental role across computer graphics, physical simulation, and mathematical visualization, yet most tools for curve design do nothing to prevent crossings or self-intersections. This talk will discuss efficient algorithms for (self-)repulsion of plane and space curves that are well-suited to problems in computational design. Our starting ...
Ayush Tewari - Self-Supervised 3D Digitization of Faces
Переглядів 2,1 тис.3 роки тому
March 9th, 2021. MIT CSAIL Abstract: Photorealistic and semantically controllable digital models of human faces are important for a wide range of applications in movies, virtual reality, and casual photography. Recent approaches have explored digitizing faces from a single image using priors commonly known as 3D morphable models (3DMMs). In this talk, I will discuss methods for high-quality mon...
Dmitry Sokolov - How to compute locally invertible maps
Переглядів 4333 роки тому
03/02/2021 Mapping a triangulated surface to 2D plane (or a tetrahedral mesh to 3D space) is the most fundamental problem in geometry processing. The critical property of a good map is a (local) invertibility, and it is not an easy one to obtain. We propose a mapping method inspired by the mesh untangling problem. In computational physics, untangling plays an important role in mesh generation: ...
Mengye Ren - Towards Continual and Compositional Few-shot Learning
Переглядів 1,6 тис.3 роки тому
Mengye Ren - Towards Continual and Compositional Few-shot Learning
Katerina Fragkiadaki - 3D Vision with 3D View-Predictive Neural Scene representations
Переглядів 9223 роки тому
Katerina Fragkiadaki - 3D Vision with 3D View-Predictive Neural Scene representations
Jon Barron - Understanding and Extending Neural Radiance Fields
Переглядів 66 тис.3 роки тому
Jon Barron - Understanding and Extending Neural Radiance Fields
Bolei Zhou - Inverting Latent Space of GANs for Real Image Editings
Переглядів 7 тис.3 роки тому
Bolei Zhou - Inverting Latent Space of GANs for Real Image Editings
Aditi Raghunathan - Surprises in the quest for robustness in ML
Переглядів 1,5 тис.3 роки тому
Aditi Raghunathan - Surprises in the quest for robustness in ML
Amlan Kar - Learning to Create and Label Data
Переглядів 5623 роки тому
Amlan Kar - Learning to Create and Label Data
Krishna Murthy - Building differentiable models of the 3D world
Переглядів 6533 роки тому
Krishna Murthy - Building differentiable models of the 3D world
Spandan Madan - On the Capability of CNNs to Generalize to Unseen Category-Viewpoint Combinations
Переглядів 2853 роки тому
Spandan Madan - On the Capability of CNNs to Generalize to Unseen Category-Viewpoint Combinations
Taesung Park - Machine Learning for Deep Image Manipulation
Переглядів 1,6 тис.3 роки тому
Taesung Park - Machine Learning for Deep Image Manipulation
Dídac Surís - Learning the Predictability of the Future
Переглядів 9203 роки тому
Dídac Surís - Learning the Predictability of the Future
Olga Russakovsky - Fairness in visual recognition
Переглядів 1,1 тис.4 роки тому
Olga Russakovsky - Fairness in visual recognition
Philipp Krähenbühl - Point-based object detection
Переглядів 13 тис.4 роки тому
Philipp Krähenbühl - Point-based object detection
Shubham Tulsiani - Self-supervised Reconstruction and Interaction
Переглядів 4 тис.4 роки тому
Shubham Tulsiani - Self-supervised Reconstruction and Interaction
Devi Parikh - Some Vision + Language, more AI + Creativity
Переглядів 7554 роки тому
Devi Parikh - Some Vision Language, more AI Creativity
Vittorio Ferrari - Recent research on 3D Deep Learning
Переглядів 3,8 тис.4 роки тому
Vittorio Ferrari - Recent research on 3D Deep Learning

КОМЕНТАРІ

  • @codebycandle
    @codebycandle 7 місяців тому

    ...a good reminder to keep up w/ my pytorch studies.

  • @hanayear
    @hanayear 7 місяців тому

    The English subtitles are not in-sync with the video !! someone please help 😭

  • @mattwillis3219
    @mattwillis3219 8 місяців тому

    What an incredible time we live in where one of the authors of the paper can explain it to the masses via a public forum like this! Incredible and mind expanding work guys! Thankyou so much :)

  • @prbprb2
    @prbprb2 10 місяців тому

    Can someone give a link to the Colab discussed around 12:00

  • @mirukunoneko1375
    @mirukunoneko1375 11 місяців тому

    cc is a bit offset but overall is great!

  • @VictorRodrigues-j9k
    @VictorRodrigues-j9k Рік тому

    Loved this talk, thanks =)

  • @hehehe5198
    @hehehe5198 Рік тому

    very good explanation

  • @Patrick-vq4qz
    @Patrick-vq4qz Рік тому

    Awesome talk!

  • @jouweriahassan8922
    @jouweriahassan8922 Рік тому

    whats the difference between this and photogrammetry?

    • @anirbanmukherjee5181
      @anirbanmukherjee5181 Рік тому

      Intuitively the main difference is that photogrammetry tries to build an actual 3D model based on given images, while NeRF model learns what the images from different view points will look like without actually building an explicit 3D model. Not sure about this point, but Nerfs are probably better given a certain number of images

  • @mattnaganidhi942
    @mattnaganidhi942 Рік тому

    Noice 👍

  • @rohitdhankar360
    @rohitdhankar360 Рік тому

    @12:00 -- Thanks for the interruption and asking if the TIME taken for Inference did not matter for a use case then how will CenterNet compare to other detectors -- thanks

  • @norlesh
    @norlesh Рік тому

    45:32 - "were never going to get real time NeRF" and then came Instant-NeRF ... never say never

  • @zjulion
    @zjulion Рік тому

    nice talk. keep going

  • @theCuriousCuratorML
    @theCuriousCuratorML Рік тому

    where is that notebook speaker is talking about

    • @rahulor3773
      @rahulor3773 Рік тому

      Please provide the link if you have it already.. Thanks in advance!

  • @arcfilmproductions7297
    @arcfilmproductions7297 Рік тому

    What's the difference between this and the 3d scans you get on an ipad pro? Apart from the fact this looks better. Just trying to get my head around it.

  • @jimj2683
    @jimj2683 Рік тому

    One day these algorithms will be so good that you can simply feed all the photos on the internet (including Google Street View and Google images) and out comes a 3d digital twin of the planet. Fully populated by NPCs and driving cars. Essentially GTA for the entire planet.... With enough compute power there is no reason this will not work when combined with generative AI that fills in stuff that is missing by drawing experience from trillions of images/video/3d capture. Imagine giving a photo to a human 3d artist. He will be able to slowly make the scene in 3d from just the photo by using real world experience he has had. Here is a rule of thumb with AI: Everything a human can do (even if it is super slow), AI will eventually be able to do much much faster. Things are going to speed up a lot from here. Cancer research, alzheimer cures, aging reversal etc... Exciting times.

  • @hulo_beral
    @hulo_beral Рік тому

    Very informative video !

  • @jeffreyalidochair
    @jeffreyalidochair Рік тому

    a practical question: how do people figure out the viewing angle and position for a scene that's been captured without that dome of cameras? the dome of cameras makes it easy to know the exact viewing angle and position, but what about just a dude with one camera walking around the scene taking photos of it from arbitrary positions? how do you get theta and phi in practice?

    • @alexandrukis776
      @alexandrukis776 Рік тому

      These papers usually use COLMAP to estimate the camera position for every captured image for real-world datasets. For the synthetic dataset (e.g. the yellow tractor), they just take the camera positions from Blender, or whatever software they use to render the object.

  • @SafouaneElGhazouali
    @SafouaneElGhazouali Рік тому

    Very nice work !! keep it up Drs.

  • @BINLIU-q3e
    @BINLIU-q3e Рік тому

    What a WONDERFUL AMAZING work, Orz

  • @darianogina148
    @darianogina148 Рік тому

    Could you please tell how to make NeRF representation meshable?

  • @prathameshdinkar2966
    @prathameshdinkar2966 Рік тому

    I hit the 1Kth like!

  • @francescos7361
    @francescos7361 Рік тому

    Thanks , multilievel optical flow .

  • @briandelhaisse1112
    @briandelhaisse1112 2 роки тому

    Very good explanation! Thanks for the talk.

  • @TechRyze
    @TechRyze 2 роки тому

    I'm curious to know - when he said at the end that he only has 3 scenes ready to show... considering he mentioned only using 'normal' random public photos - why would this be? Is this related to the computational time required to render the finished product, or for some other reason? If the software works, then surely, give the required amount of time and computational resources, this technique could be used on a potentially infinite number of scenes, using high quality photos sourced online. Is there a manual element to this process that I've missed here, or is the access to the rendering / processing time and resources the limitation?

  • @zhiruigao
    @zhiruigao 2 роки тому

    great idea! design optimization in a differentiable manner and integrate it into deep networks

  • @kefeiyao7784
    @kefeiyao7784 2 роки тому

    Great explanation indeed. I have one question: is it ray tracing or ray marching? From the talk, I seemed to find it to be ray marching, but the actual phrasing in the talk was ray tracing.

  • @huiren4739
    @huiren4739 2 роки тому

  • @崔子藤
    @崔子藤 2 роки тому

    I like it😃

  • @seanchang2876
    @seanchang2876 2 роки тому

    Hi, I'm just wondering how to know the ground truth RGB color for each (x,y,z) spatial location ?

    • @wishful9742
      @wishful9742 2 роки тому

      Hi, You don't need that data. The neural net produces the RGB and alpha for each point along the ray (that was emitted from the pixel along the view direction), then when we have all of ray points RGBA, we can obtain the final pixel RGB color using ray-marching (so all of the parameters along the ray results in the RGB of the pixel). And now we can compare the actual pixel from the obtained pixel and learn from it to produce better parameters along the ray.

    • @miras3780
      @miras3780 2 роки тому

      @@wishful9742 hi, may I ask how does exactly ray marching work? I am not sure how does MLP know that the scene is occluded at certain distance. Does it also learn sigma values from MLP? Or does the distance to the occluded point calculated from camera intrinsic and extrinsic properties? (I am new to nerf )

    • @wishful9742
      @wishful9742 2 роки тому

      @@miras3780 ​ Hello, for each point along the ray, MLP predicts the color and the opacity value. The final pixel is simply the weighted sum of colors (weighted by its opacity value). This is one way of raymarching and there are other algorithms of course. please watch 10:35 to 13:50

  • @muskduh
    @muskduh 2 роки тому

    Thanks for the video

  • @twobob
    @twobob 2 роки тому

    popping the link to the videos in the description of the video would make a lot of sense. Enjoyed the nerf paper.

  • @jiqi4451
    @jiqi4451 2 роки тому

    How to acquire the slides for this tutorial?

  • @anhvo9329
    @anhvo9329 2 роки тому

    loving it

  • @cem_kaya
    @cem_kaya 2 роки тому

    thanks for sharing this presentation

  • @baselomari3657
    @baselomari3657 2 роки тому

    Glad to see Seth Rogan successful with this career change.

  • @prometheususa
    @prometheususa 2 роки тому

    brilliant explaination!

  • @qichaoying4478
    @qichaoying4478 2 роки тому

    IIC-Net is a really innovative work

  • @SheikahZeo
    @SheikahZeo 2 роки тому

    Nerf outputs transparency but all the demo videos seem to only have opaque surfaces. Does it actually work with semi-transparent objects?

    • @SheikahZeo
      @SheikahZeo 2 роки тому

      The colour output will be constant along a freely propagating ray. Seems you waste time recomputing the whole network when you really are just interested in the density

    • @Cropinky
      @Cropinky Рік тому

      works that come after vanilla nerf deal with opaqueness better than the vanilla nerf does

  • @sirpanek3263
    @sirpanek3263 2 роки тому

    Do you see any use for this with drone imagery and fields of crops? This wouldnt work for stitching images im guessing….

  • @ritwikraha
    @ritwikraha 2 роки тому

    Excellent explanation!!!

  • @酷比焍二
    @酷比焍二 2 роки тому

    I think cycle-consistency constraint is more effective for activating structure code and texture code

  • @harshanand7230
    @harshanand7230 2 роки тому

    Is there a transcript for this video

  • @yunhokim7846
    @yunhokim7846 2 роки тому

    This is super helpful Thank you so much

  • @web3.0_metaverse_XR
    @web3.0_metaverse_XR 2 роки тому

    Thank you so much for this tutorial

  • @xiaowang7740
    @xiaowang7740 3 роки тому

    thanks

  • @user-dn7vd7ys8v
    @user-dn7vd7ys8v 3 роки тому

    Can i run Raft on Windows?

  • @trachea123
    @trachea123 3 роки тому

    Great work!

  • @tomg4324
    @tomg4324 3 роки тому

    lol

  • @housedecoratingworld3867
    @housedecoratingworld3867 3 роки тому

    My name is amlan kar