What a time to be alive! For those who haven't seen - click the link and listen carefully at 14:58 and check the first comment! - ua-cam.com/video/yCBEumeXY4A/v-deo.html
4:42 for a second I thought "HOLY SHIT THE AI VERSION LOOKS BETTER THAN THE ORIGINAL" The lighting on the plant gave it away a bit but still quite amazing.
Yeah, it seems like the AI doesn't model subsurface scattering properly, but given how much it does, I think that is a perfectly acceptable tradeoff for shaving hours of manual labor trying to reconstruct these things by hand
@@adicsbtw Given everything else the AI can already do here, I would expect SSS modeling to be in the next paper. If the AI can already do the current scene, they could then use that as input and use additional AI system to figure out the best SSS settings for each material.
Haha I actually thought the tree in the scene rendering looked better on the AI side. In hindsight that turned out to make sense 😂 awesome video. Can't wait to start working with these tools!
I think the AI side tree's leaves might be missing transmission/sub surface scattering. They're producing black shadows instead of emitting light from their undersides.
This is a mind blowing improvement over current photographer techniques. This means that we can fully capture not just the 3d shape and texture of one-of-a-kind object in a museum, but can also capture it's materials. This is going to be a massive deal for archiving historical items.
This is so insane, in a couple years we might have virtual worlds with AI assistants who just create objects on our command. Imagine a VR "game" where you can literally create the world and the objects in it just by describing them. What a time to be alive.
Meta released a video time ago about generating virtual worlds from voice commands. The graphics there were very low but it still was fascinating seeing stuff appear after they were described
this really does look very close to allowing us easily to make background props from an image. that this would ever be possible was derided by some in the industry not so long ago. the idea of being able to create elaborate photoreal scenes from photos and presumably drawings too, is amazing. if the drawings work, then combined with other algorithms (including pose estimation and facial performance capture) it really will be possible to create animated movies from scratch cheaply and quickly. since i have an animated feature film script ready to go (but of course no funding), i have a huge interest in this work. i suggest starting with animation only because it may conceivably be less work than a photoreal movie because one has greater latitude with styling.
@@blakksheep736 nope, not open to public yet, but type dall-e mini (same thing but far less powerfull yet still amazing and far easier tyo get access to since no requirement needed)
Take a shot drink every time this guy uses a conjunction. Well. And. Well. Well. Well. And. And. But. Or. And. And. Good video though, I love this tech. Can't wait for the future.
I said to myself, "well, ok the recreation is impressive, but the light makes a little plastic on the plant" when I realized that it was reversed, I could not believe my eyes. what an incredible paper !
@@joelface not really any difference, except this method can apparently output a pretty decent result in two minutes whereas regular photogrammetry can take hours. one important thing I did notice was the reconstruction of an object with a reflective surface like the saxophone as the lack of an opaque texture would make it impossible to form a pointcloud from the image references. So I wonder what the input method was, if it was just the shitty spinning gif that's seriously impressive, but I doubt it.
@@danttwaterfall Yeah, I could see the biggest advantage of this is how it deals with reflextions, complex materials, and environments. Photogrammetry is purely math/algorithm based and since IIRC we haven't solved the issues with reflections this may be able to get better more true to life results. What I'd like to see next is a model that will recreate the object at hand and the environment around the object such that reflections are explained by the created environment. That way we can get full separation of object, lighting and environment. Ideally, we'd be able to place a mirrored sphere in a room. Take a 360 series of photos or video of the sphere and recreate the entire environment while leaving a perfect sphere with a reflective texture in the center.
So, here's an application for this tech I'd love to see. Convert an entire online store catalog to 3d objects. Use those 3d objects in a virtual marketplace. Seems like a store could use this AI to quickly, relatively, convert their entire database of products to 3d representations. And if they want higher quality 3d objects they could simply add more photos of the products.
So I started reading your comment and thought it was going in a different direction. This is a super cool idea, but I thought of another application for the same idea. Assuming this kind of tech could accurately recreate the real-world dimensions of the items, imagine being able to virtually try on any piece of clothing. Or redecorate your living room in a digital space and see if that new sofa would fit where you want. I know some of this stuff is already sort of possible, but it usually requires quite a bit more effort on the human end. But imagine being able to just take a front and back full-body picture, throw in a couple simple measurements like your height, and it generate at least a an accurately sized fashion model. Or taking a panorama of your living room and it creating the 3d space and all the furniture as models.
With these things I always want a browser server where you can just upload your reference, set the time for it to run, and get the output to go. Stuff like this model reconstruction and the video quality upscaler are my fav.
I definitely didn't expect this development to come so fast. Combine this with some style transfer and you can use photos to make real video game environments, which is a thing I didn't expect to be possible for a few years at least. I hope the implementation is such that the output is easily modifiable by humans, however. It might be hard to make the model be changeable in ways that are useful or meaningful. How can you deform it? Bending? Cracking? Make holes? Fuse with other objects? The condiments on the hot dog plate were treated as solid objects; will it be possible to apply material properties to the objects after scanning? Will it be possible for a model to distinguish between different objects in a scene and allow application of different rules to each, or is everything in the model considered a single object?
Object recognition has been worked on for a long time, maybe it'll be possible to add that to this one, so it would not only reconstruct the hotdog, but know that condiments, bun, etc are separate objects from the sausage.
For a game asset you need to follow the guidelines for that specific game engine and asset type, the generated result shown in this video is pretty much unusable for that.
@@jendabekCZ What do you mean? If this AI can create .obj or .fbx file and the mesh is not straight up broken, then it's usable in all 3d game engines. To what degree it's usable is another matter and partly up to the developer to figure out, but game engines don't have too different guidelines for 3d models. If you mean like low-poly mobile game vs high-poly multi-LOD AAA-3d model, then I guess yeah, but I'm sure it's not a big reach to tell the AI a target triangle count. Ofc animated and interactive objects are more challenging and would likely need a human tweaking the model, remodeling some of it etc, but the first use case would be static environments and objects anyway. For example if I wanted to put the saxophone into a game as working instrument, I would need to separate the buttons as separate objects and animate/script them to work in-game, but comparing to making the whole saxophone model from scratch, the time spent tweaking is minimal.
4:54 Biggest difference I noticed right away was the plant. The shadowing and shading on the plant is defiantly different there. Other then that this is flipping amazing. What else could even be in the next paper beside performance upgrades? 10/10
This is going to be great for so many people. I can't imagine how easy modelling objects for games etc will be in the future. Just take a picture of something and implement it into 3D model, or let the AI do it I mean... Amazing!
Topology is horrendous which can't be fixed easily, so no animation is possible(as in good animation by human team in timeframe that makes sense) and it's too dense to be seriously considered as a simple prop.
This improvement is needed to make the Metaverse a reality. Plus if you can combine this with the new DALL-E 2 you get the poor mans holodeck. I can imagine with a system like that you could take description of setting from a books like Harry Potter and generate your own personal Hogwarts. Exciting times indeed.
I was so amazed by the AI reconstructed scene and its beautiful Subsurface Scattering! It's amazing how the reconstruction looked even better than the target scene and then you tell you switched the labels -_-'
It is absolutely insane how in just a matter of decades we went highly distinguishable polygons, to real-time rendering of realistic scenes, to photogrammetry and now this, with the computing power to creat realistic results in a matter of minutes. Truly astonishing what great minds can accomplish
This is like the photogrammetry technique but on steroids. Because the old technique requires scanned objects to be matte. It can extract only diffuse textures but not this full blown materials and lighting. My mind has been blown.
This is truly groundbreaking, just with a set of photos/videos we'll be able to obtain a full PBR 3D reconstruction of anything we capture. An advanced version of Photogrammetry, with the ability to generate automatic materials, mind-blowing
I wanna see this combined with the most recent works from both Wenzel Jacob and Keenan Crane (I'm not actually 100% sure but I think those two works can complement each other) In that case, instead of geometry, you would get out a heterogenous signed distance field. You could then "simply" turn that into geometry, but the benefit is, at first, an unbiased mesh-free, in a sense "infinite resolution" result that I suspect might be really useful to build a nicer mesh from than what this method gives you.
This sounds so cool! Reminds me of 2D Vector Graphics - Describing the geometries as relative paths. I wonder where procedure-generated game scapes fit in with this line...
Imagine making a dall-e 2 image from description and then feeding the image into this and getting a 3d model just like that, you'd get a complex 3d model just from a simple description
As far as I can tell, it's reinforced through iterative comparison. The AI tries to create the thing, it gets compared to the original, and then it identifies where the confidence of the creation is too low (mistakes were made, which is where transparency would start screwing things up). Then, instead of starting with a torus / box / whatever, it would start with the failed mesh it created and iterate again. This is where transparency would get compared over and over until it's accurate. I presume anyway. I'm trying to get my hands on this now.
So take the next iteration of DALL-E 2, give it a text description of what you want, generate a few images, put the result into this AI and generate a model. Next, find an AI that optimizes models for manufacture to adjust the mesh and make the object manufacturable, all in a few hours, and you have effectively automated 75% of my job as an Industrial Designer. Both exciting and terrifying...
I noticed the front of the plate specular problems right away. It's really good when there is more visual detail to obscure the little issues. I appreciate how difficult this problem is, I tried to convincingly re-light some 3D videos 😅
I’m not sure what price Nvidia expects people to pay for this technology, but this makes me hopeful that photogrammetry is going to be way more accessible.
this is insane. with this technology there's so many branches of new possibilities to be made for the 3d modelling, game design, architecture, historical artifacts, even education and much more
Imagine this, you turn on your VR headset(That has fully coloured cameras on the front of it), walk around for 60 seconds while looking around, and bam, your room is moved into the 3d world. And maybe with object recognition, each object in your room could be individually scanned so everything in your room would count as an individual object and not 1 whole entity.
Getting closer and closer to where you can feed a movie to an AI system and it'll recreate it so you can experience it entirely in VR space. If I understand correctly, the newer Oculus headsets will use some sort of system to scan your physical space (it already does it for spatial reference and potential hazards in play area) to allow an overlay of color and likely more immersive AR elements. I imagine the system in this video would work great with that.
I am not a light transport researcher by trade, but at 5:01 I did notice a difference with the ai reconstruction. It looks like the leaves don’t deal with subsurface light scattering in the reconstruction. Not to nit-pick, I’m always super impressed by these things.
I would definitely watch a ghost hunting shows, where cameras and recorders are put up in some suspected haunted house, but where an AI like this were improving the vague and noisy data obtained. That could be a quite scary experience..
At 4:40 Yeah, it doesnt matter that the labels are swapped. The changes in lighting on the plant is very different. The technology is really cool though and does a really good job of reconstruction :)
you swapped the labels, but i did see diferences. the trees are lit VERY differently. i guess the reconstruction is missing a few light bounces and subsurface scattering entirely
I was actually confused when the target scene looked slightly more fuzzy, but when you turned it around it made sense, still incredible though, almost identical!
Im just absolutely Floored.. soon we're gonna have movies... like LIVE ACTION MOVIES... with completely CGI Animation.. that looks 100% realistic.. and people in their OWN HOMES Will be able to make them... absolutely insane...
I think with these physics simulations now would be possible to finally start an interpretation of mechanics (virtual) to finally label behaviour and emotions to finally get our virtual actors for new 3d movies. Even to help the AI with footage of the actors playing the voices, their expressions and emulate them into the virtual actors.
4:41 - I immediately saw your shenanigans, and was about to call you on it! There was more detail with horizontal striping on the plant pot in the right hand image.
im waiting for a two minute paper to make the installation of all these paper software much more user friendly. That would really be a hold on to your papers moment for sure.
The ultimate end station of this tech would be to make a video of a place and output a 3D space with objects that you can move around. And then load that place into your favorite game! Do you agree?
Take it a step further. Type a text use dalle 2 to generate an image.. put it in the ai simulator and turn it into an object change the settings, connect it to a 3d printer and have it print out the real thing..
@@MrGTAmodsgerman oh yeah, I did, it was so long ago that I forgot. They also made plugins for Maya, Blender etc. At the time it didn't seem as fool proof as it could be. I should revisit it, thanks.
@@toshchakfox Its now implemented & improved for 3ds Max in general. Its very different. Its a game changer, since it really does a good job now and it only requires some learn by doing to get the right results. Arrimus3D on YT had made several videos about it.
I remember, years ago, watching an episode of Scooby-Doo (or maybe it was one of the many movies) where the Scooby Gang were scanned in a big 3D scanner and perfect 3D game versions of them were created. That was sci-fi to me... and now computers can do that without even needing the scanner, with just a bunch of photos. AI is amazing!
This is an amazing way to significantly speed up development of realistic game environments, even to create humans. There's a clear downside where the topology you get is disorganized and the number of unnecessary vertex is abysmal but fixing it will always be faster than doing models from scratch, so it's a significant improvement regardless and something that will become the standard of photorealistic 3D workflows in no time (assuming it hasn't already, my specialty isn't photorealism so I wouldn't know). An amazing breakthrough, we can expect very high quality games with significantly more content thanks to this technology.
This would be extremely helpful within the game development and 3d animation industries. Modeling takes a huge amount of time and as such something that could make it automatically from concept art is a huge help.
For everyone saying that the plant looked different: I think it's the translucency, the effect where an object isn't completely opaque, but you can see the light from the backside. Maybe the AI can't create translucency maps yet.
My mind was blown with the tree scene. I was looking at the tree and how the shadows moved and it looked like the actual real one had a bit more color when the light was behind it but otherwise it was absolutely crazy when he said, "I have swapped the labels" 🤯
What I would love to see is this AI taking in an animated model doing poses and outputting a JOINTED 3d model that can be posed into the same or other poses. That kind of technology would be fantastic for motion-capture based game development.
I would love to see people's family members "coming back to life" in vr through old photos being scanned. I know alot of families have memento boxes of high quality slides and photos that aren't digitalized. If this process could be used to create UE5 metahumans, each family could have a digital family tree dating back generations for descendants to visit in VR or other platforms rather than going to a cemetery!
This will make 3d reference objects more acessable like for 3d printing and also better in results then with some current ways. Nivdia is doing the revolution
In the past, if you wanted to save the image of a landscape, you had to pay someone to draw or paint it. Now, you can just use a camera. Today, if you want to save the form of an object, you have to pay someone to model it. Maybe in the future, we will be able to get the same thing by also using a kind of camera.
If this was easy to produce a model with precision scale, then I can imagine some pretty amazing applications in the medical prosthetics industry (or even fashion) industry. Imagine custom shoes based on pictures of your feet, or ordering a dental retainer by smiling for the camera? More practically, creating a perfect fit 3d printed cast for an arm without needing a high resolution scanner would be pretty amazing. It IS a great time to be alive!
the first use case that came up to my mind is to reconstruct Google map's trees (or buildings) lmao. We can then say thats the entrance of real world's digital twin. Or it can honestly be used in "metaverse"(games) quick scale-up development. I guess
I can spot a difference though. The leaves of the tree are more opaque on the one on the left. The one one the right has some transparency to it when the light hits it from behind. Still super impressive hope that one day soon I’ll be able to integrate something like this with my work.
I am learning unreal engine right now and I'm trying to make a game. Tools like these will change my workflow for ever. As a solo developer I'm thinking about how many countless hours I will be spending on modeling assets for my game, but in the future, this tech could make it so much easier.
I'm afraid this will be hard to use in games, as the geometry is generated from a signed distance field. Turning that into an efficient mesh will still require a lot of manual work
@@shadamethyst1258 What do you mean? It is a mesh, although a quite messy one. It doesn't look very different from what you'd get using photogrammetry. There are some powerful remeshing tools that could clean it up, but chances are that won't be necessary using UE5's Nanite virtualized geometry rendering.
@@GS-tk1hk Yep! Technology like Nanite will make this easy to use. And the assets could also be used in other software like blender where your poly budget is much higher:)
@@shadamethyst1258 Nanite is new technology in Unreal engine, it's not remeshing per se but it's more of a mesh streaming technology where high poly meshes are imported in the game and the density of those meshes changes dynamically as the player moves closer and further away.
This seems like it could be useful in recreating assets from video games. I cant wait for articles like "They printed my game and put it on the app store."
So now... we create a unique scene inside of DALL-E 2 and then put that into Nvidia's AI to make the scene 3D. Then put your VR goggles on and explore it as if it's a real physical space.
I remember back then read Doraemon comic, robot from future, showing that single photo could be transformed into 3D sphere environment. that's something science fiction back then, but now it's reality. what a time to be alive!
Two more papers down the line, the hot dog would be pushed by the jelly and the sauce would stain it (assuming they used the same target scene and virtual jelly).
Well, actually at 4:39 I thought: Wow! The AI added the Subsurfacescattering to the leaves even, if it wasn't there in the reference... but no, we are not yet there. But it is impressive, no matter what.
3D modellers will soon be saying "they took our jobs!!" Joking aside This looks incredibly handy for us environment designers - I mean it seems like an almost superior alternative to photogrammetry, or am I crazy? Perhaps its more useful for real-time application in that regard. Anyway, with auto topology also on the rise it'll be a breeze to get assets into games as long as you've got reference images (which you do, 99% of the time)
Not really, because game assets are almost never just a (poor & unoptimized) duplicate of a real object. Also no AI has an idea what is actually important for a proper nice looking asset matching all the guidelines of the specific game engine / studio / asset type etc.
@@jendabekCZ I wouldn't totally agree, as I think what's makes a good optimised game asset is pretty straightforward and could be AI handled. Basically, min amount of vertices, while keeping max detail, passed a certain threshold put this detail on a normal map instead. Of course I'm simplifying it but my point is these rules are not really subjective. Now on the artistic side, that's something else, but have you already tried Nvidias GauGan demo? It is capable of applying a specific style to its output to match given references Edit: Of course, I'm just speculating on future solutions and how it could evolve
I can see it being used to identify bodies including those with partial remains. Even narrowing down the candidates to 100 out of a million would be helpful.
I'm just looking forward to liking together PaLM, Dall-E 2, and this. We'll be able to describe a scene, PaLM will interpret it into instructions for Dall-E 2, and then Nvidia's AI will turn it into a 3d environment that we can interact with and explore.
Now we know MKBHD watches your videos.
What a time to be alive! For those who haven't seen - click the link and listen carefully at 14:58 and check the first comment! - ua-cam.com/video/yCBEumeXY4A/v-deo.html
@@TwoMinutePapers Well done, you deserve it
🤝🤲
Lol, he even almost wink at the camera.
@@TwoMinutePapers Did you write the time incorrectly?
4:42 for a second I thought "HOLY SHIT THE AI VERSION LOOKS BETTER THAN THE ORIGINAL"
The lighting on the plant gave it away a bit but still quite amazing.
Yeah, it seems like the AI doesn't model subsurface scattering properly, but given how much it does, I think that is a perfectly acceptable tradeoff for shaving hours of manual labor trying to reconstruct these things by hand
@@adicsbtw Absolutely
@@adicsbtw Given everything else the AI can already do here, I would expect SSS modeling to be in the next paper. If the AI can already do the current scene, they could then use that as input and use additional AI system to figure out the best SSS settings for each material.
@@adicsbtw more like the ai doesn't know about subsurface scattering, at all
Same, looking at the right hot dogs I thought "these look more real than the left ones". Then the labels were put correctly and figured out why.
Haha I actually thought the tree in the scene rendering looked better on the AI side. In hindsight that turned out to make sense 😂 awesome video. Can't wait to start working with these tools!
Same here, felt vindicated when the label swap was revealed, haha.
The hotdog looks better in the actual AI image to me.
@@Walter5850 yup, AI has little information about plant cells' transparency. It can not know everything...yet
Haha, exactly my thoughts while viewing this 😂
I think the AI side tree's leaves might be missing transmission/sub surface scattering. They're producing black shadows instead of emitting light from their undersides.
This is a mind blowing improvement over current photographer techniques. This means that we can fully capture not just the 3d shape and texture of one-of-a-kind object in a museum, but can also capture it's materials. This is going to be a massive deal for archiving historical items.
We could have a whole digital archive, and by using VR headsets could make it completely interactable as well
I see virtual museums becoming much easier to make.
maybe in 2 years we can take a single photo with our smartphone and turn it into fully 3D objects.
@@mrsnoo86 I'm pretty sure this tech exists already.
4:50 regardless of if you swapped the labels or not, the shading of the plant is still a difference that I found.
This is so insane, in a couple years we might have virtual worlds with AI assistants who just create objects on our command. Imagine a VR "game" where you can literally create the world and the objects in it just by describing them. What a time to be alive.
Meta released a video time ago about generating virtual worlds from voice commands. The graphics there were very low but it still was fascinating seeing stuff appear after they were described
@@lasagnadipalude8939 Sounds like powerful version of Scribblenauts haha
So, a magic chant?
@@lasagnadipalude8939 I wish Meta had spent time on a more aesthetic graphics package. It's all just so hard to look at; I can't get excited about it.
I'm surprised OpenAI didn't already tried a text to model or text to animation AI
this really does look very close to allowing us easily to make background props from an image. that this would ever be possible was derided by some in the industry not so long ago. the idea of being able to create elaborate photoreal scenes from photos and presumably drawings too, is amazing. if the drawings work, then combined with other algorithms (including pose estimation and facial performance capture) it really will be possible to create animated movies from scratch cheaply and quickly. since i have an animated feature film script ready to go (but of course no funding), i have a huge interest in this work. i suggest starting with animation only because it may conceivably be less work than a photoreal movie because one has greater latitude with styling.
@@F.Ragnarok If you support the Channel you get early acces to videos
A combination between this AI Model and DALL-E 2 will be craaaaazy 🤩
Yes 3d arts. Now make it 3d printable and boom. Or better free assets for game makers
What is DALLE 2 and can I use it?
@@blakksheep736 OpenAI GPT-3 based image generator, version 2: Describe something and it will draw it, going as far as copying art styles
I hope i can make my passed out sister 3ď image
@@blakksheep736 nope, not open to public yet, but type dall-e mini (same thing but far less powerfull yet still amazing and far easier tyo get access to since no requirement needed)
As an engineer, i'm PROUD of those technical improvements.
Thank you for your videos :)
Take a shot drink every time this guy uses a conjunction. Well. And. Well. Well. Well. And. And. But. Or. And. And.
Good video though, I love this tech. Can't wait for the future.
I said to myself, "well, ok the recreation is impressive, but the light makes a little plastic on the plant" when I realized that it was reversed, I could not believe my eyes. what an incredible paper !
This is so helpful for so many industries that it’s impossible to list them all
So essentially a Photogrammetry auto modeler, awesome!
I was having trouble telling the difference between this and standard Photogrammetry. What is the difference?
@@joelface not really any difference, except this method can apparently output a pretty decent result in two minutes whereas regular photogrammetry can take hours. one important thing I did notice was the reconstruction of an object with a reflective surface like the saxophone as the lack of an opaque texture would make it impossible to form a pointcloud from the image references. So I wonder what the input method was, if it was just the shitty spinning gif that's seriously impressive, but I doubt it.
@@danttwaterfall Yeah, I could see the biggest advantage of this is how it deals with reflextions, complex materials, and environments.
Photogrammetry is purely math/algorithm based and since IIRC we haven't solved the issues with reflections this may be able to get better more true to life results.
What I'd like to see next is a model that will recreate the object at hand and the environment around the object such that reflections are explained by the created environment. That way we can get full separation of object, lighting and environment.
Ideally, we'd be able to place a mirrored sphere in a room. Take a 360 series of photos or video of the sphere and recreate the entire environment while leaving a perfect sphere with a reflective texture in the center.
So, here's an application for this tech I'd love to see. Convert an entire online store catalog to 3d objects. Use those 3d objects in a virtual marketplace. Seems like a store could use this AI to quickly, relatively, convert their entire database of products to 3d representations. And if they want higher quality 3d objects they could simply add more photos of the products.
So I started reading your comment and thought it was going in a different direction. This is a super cool idea, but I thought of another application for the same idea.
Assuming this kind of tech could accurately recreate the real-world dimensions of the items, imagine being able to virtually try on any piece of clothing. Or redecorate your living room in a digital space and see if that new sofa would fit where you want.
I know some of this stuff is already sort of possible, but it usually requires quite a bit more effort on the human end. But imagine being able to just take a front and back full-body picture, throw in a couple simple measurements like your height, and it generate at least a an accurately sized fashion model. Or taking a panorama of your living room and it creating the 3d space and all the furniture as models.
With these things I always want a browser server where you can just upload your reference, set the time for it to run, and get the output to go. Stuff like this model reconstruction and the video quality upscaler are my fav.
I definitely didn't expect this development to come so fast. Combine this with some style transfer and you can use photos to make real video game environments, which is a thing I didn't expect to be possible for a few years at least. I hope the implementation is such that the output is easily modifiable by humans, however. It might be hard to make the model be changeable in ways that are useful or meaningful. How can you deform it? Bending? Cracking? Make holes? Fuse with other objects? The condiments on the hot dog plate were treated as solid objects; will it be possible to apply material properties to the objects after scanning? Will it be possible for a model to distinguish between different objects in a scene and allow application of different rules to each, or is everything in the model considered a single object?
Object recognition has been worked on for a long time, maybe it'll be possible to add that to this one, so it would not only reconstruct the hotdog, but know that condiments, bun, etc are separate objects from the sausage.
For a game asset you need to follow the guidelines for that specific game engine and asset type, the generated result shown in this video is pretty much unusable for that.
@@jendabekCZ What do you mean? If this AI can create .obj or .fbx file and the mesh is not straight up broken, then it's usable in all 3d game engines.
To what degree it's usable is another matter and partly up to the developer to figure out, but game engines don't have too different guidelines for 3d models.
If you mean like low-poly mobile game vs high-poly multi-LOD AAA-3d model, then I guess yeah, but I'm sure it's not a big reach to tell the AI a target triangle count. Ofc animated and interactive objects are more challenging and would likely need a human tweaking the model, remodeling some of it etc, but the first use case would be static environments and objects anyway.
For example if I wanted to put the saxophone into a game as working instrument, I would need to separate the buttons as separate objects and animate/script them to work in-game, but comparing to making the whole saxophone model from scratch, the time spent tweaking is minimal.
4:54 Biggest difference I noticed right away was the plant. The shadowing and shading on the plant is defiantly different there. Other then that this is flipping amazing. What else could even be in the next paper beside performance upgrades? 10/10
Absolutely incredible, soon with a video input it could recreate deformable objects
This is going to be great for so many people. I can't imagine how easy modelling objects for games etc will be in the future. Just take a picture of something and implement it into 3D model, or let the AI do it I mean... Amazing!
Topology is horrendous which can't be fixed easily, so no animation is possible(as in good animation by human team in timeframe that makes sense) and it's too dense to be seriously considered as a simple prop.
This improvement is needed to make the Metaverse a reality. Plus if you can combine this with the new DALL-E 2 you get the poor mans holodeck. I can imagine with a system like that you could take description of setting from a books like Harry Potter and generate your own personal Hogwarts. Exciting times indeed.
I was so amazed by the AI reconstructed scene and its beautiful Subsurface Scattering! It's amazing how the reconstruction looked even better than the target scene and then you tell you switched the labels -_-'
It is absolutely insane how in just a matter of decades we went highly distinguishable polygons, to real-time rendering of realistic scenes, to photogrammetry and now this, with the computing power to creat realistic results in a matter of minutes. Truly astonishing what great minds can accomplish
This is like the photogrammetry technique but on steroids. Because the old technique requires scanned objects to be matte. It can extract only diffuse textures but not this full blown materials and lighting. My mind has been blown.
Absolutely insane. Can't wait to start seeing this be used in game development.
This is truly groundbreaking, just with a set of photos/videos we'll be able to obtain a full PBR 3D reconstruction of anything we capture. An advanced version of Photogrammetry, with the ability to generate automatic materials, mind-blowing
I wanna see this combined with the most recent works from both Wenzel Jacob and Keenan Crane (I'm not actually 100% sure but I think those two works can complement each other)
In that case, instead of geometry, you would get out a heterogenous signed distance field. You could then "simply" turn that into geometry, but the benefit is, at first, an unbiased mesh-free, in a sense "infinite resolution" result that I suspect might be really useful to build a nicer mesh from than what this method gives you.
This sounds so cool! Reminds me of 2D Vector Graphics - Describing the geometries as relative paths. I wonder where procedure-generated game scapes fit in with this line...
This could change video games for ever
Imagine making a dall-e 2 image from description and then feeding the image into this and getting a 3d model just like that, you'd get a complex 3d model just from a simple description
8-10 years from now. Unless these researchers given millions of dollars to purchase models
It would be important to know how well does it handle transparencies, as it's one of the issues with other techniques, like photogrametry
Also reflections. But as you can see here, reflection doesn't seem to be a deal here here.
As far as I can tell, it's reinforced through iterative comparison. The AI tries to create the thing, it gets compared to the original, and then it identifies where the confidence of the creation is too low (mistakes were made, which is where transparency would start screwing things up). Then, instead of starting with a torus / box / whatever, it would start with the failed mesh it created and iterate again. This is where transparency would get compared over and over until it's accurate.
I presume anyway. I'm trying to get my hands on this now.
So take the next iteration of DALL-E 2, give it a text description of what you want, generate a few images, put the result into this AI and generate a model. Next, find an AI that optimizes models for manufacture to adjust the mesh and make the object manufacturable, all in a few hours, and you have effectively automated 75% of my job as an Industrial Designer. Both exciting and terrifying...
I noticed the front of the plate specular problems right away. It's really good when there is more visual detail to obscure the little issues.
I appreciate how difficult this problem is, I tried to convincingly re-light some 3D videos 😅
I’m not sure what price Nvidia expects people to pay for this technology, but this makes me hopeful that photogrammetry is going to be way more accessible.
It's Nvidia, it will cost an arm and a leg, your soul and Huang already ate your baby.
i think i found the source code for free, im not sure it it's real though.
I cant wait for a phone-based input app. This kind of "AI Photogrammetry" is right up my alley.
5:00 Saul goodman is that you, this man just switched the plaintiff 😆😅
Haha, I got that reference.
@@joelface yea, my man, its all good mahn
You are the best! Not NVIDIA! You are the first to explain AI Virtualization in simple words. Thanks!
this is insane. with this technology there's so many branches of new possibilities to be made for the 3d modelling, game design, architecture, historical artifacts, even education and much more
Imagine this, you turn on your VR headset(That has fully coloured cameras on the front of it), walk around for 60 seconds while looking around, and bam, your room is moved into the 3d world.
And maybe with object recognition, each object in your room could be individually scanned so everything in your room would count as an individual object and not 1 whole entity.
Getting closer and closer to where you can feed a movie to an AI system and it'll recreate it so you can experience it entirely in VR space.
If I understand correctly, the newer Oculus headsets will use some sort of system to scan your physical space (it already does it for spatial reference and potential hazards in play area) to allow an overlay of color and likely more immersive AR elements. I imagine the system in this video would work great with that.
It didn't give the leaves in the tree scene any subsurface scattering. But a few more papers down the line...
It’s amazing how it even captures those granular details!
They also put the jelly on the hot dgod.
I expect faster convergence speeds in the future from this work. Say a factor of 30 at least but with the same hardware.
I am not a light transport researcher by trade, but at 5:01 I did notice a difference with the ai reconstruction. It looks like the leaves don’t deal with subsurface light scattering in the reconstruction. Not to nit-pick, I’m always super impressed by these things.
I would definitely watch a ghost hunting shows, where cameras and recorders are put up in some suspected haunted house, but where an AI like this were improving the vague and noisy data obtained. That could be a quite scary experience..
At 4:40 Yeah, it doesnt matter that the labels are swapped. The changes in lighting on the plant is very different.
The technology is really cool though and does a really good job of reconstruction :)
you swapped the labels, but i did see diferences. the trees are lit VERY differently. i guess the reconstruction is missing a few light bounces and subsurface scattering entirely
I was actually confused when the target scene looked slightly more fuzzy, but when you turned it around it made sense, still incredible though, almost identical!
Im just absolutely Floored.. soon we're gonna have movies... like LIVE ACTION MOVIES... with completely CGI Animation.. that looks 100% realistic.. and people in their OWN HOMES Will be able to make them... absolutely insane...
I think with these physics simulations now would be possible to finally start an interpretation of mechanics (virtual) to finally label behaviour and emotions to finally get our virtual actors for new 3d movies. Even to help the AI with footage of the actors playing the voices, their expressions and emulate them into the virtual actors.
3:20 smash it, smash the priceless artifact, smash it into a million tiny pieces
Trees, cliffs, beaches, buildings... extremely complex objects. Hell. Guns, armors, all can be imported now.
My dream is complete
4:41 - I immediately saw your shenanigans, and was about to call you on it!
There was more detail with horizontal striping on the plant pot in the right hand image.
Very exciting stuff :) Kudos to the researchers and you for presenting it in such an accessible way!
Yeah Kudo's to the researchers... fuck the AI. 🤨
im waiting for a two minute paper to make the installation of all these paper software much more user friendly. That would really be a hold on to your papers moment for sure.
4:00 Hahaha. Good ol' jelly box throwing :D
AI failed to notice Subsurface scattering in that plant model. But still very impressive. Amazing how far we've come.
Imagine feeding live footage from several cameras around the room returning a virtual room that could be used with vr for metaverse kinda things
The ultimate end station of this tech would be to make a video of a place and output a 3D space with objects that you can move around. And then load that place into your favorite game!
Do you agree?
Take it a step further. Type a text use dalle 2 to generate an image.. put it in the ai simulator and turn it into an object change the settings, connect it to a 3d printer and have it print out the real thing..
How about useing a old video to make it a 3d space to place it into your favorite game? Like a remastered movie scene.
@@TimoBoll22 Nice! Good thinking
3:17 I thought he was going to say, you know what is coming! The NFT !!!
Down the line, having robust AI powered remeshing would be amazing.
Edit: solving mesh topology seems like a perfect task for an AI.
Didn't you saw 3ds Max Quad Remesher?
Nah no thanks
@@Skynet_the_AI Oh, c'mon, you'll love it. Just try it once ☺️
@@MrGTAmodsgerman oh yeah, I did, it was so long ago that I forgot. They also made plugins for Maya, Blender etc. At the time it didn't seem as fool proof as it could be. I should revisit it, thanks.
@@toshchakfox Its now implemented & improved for 3ds Max in general. Its very different. Its a game changer, since it really does a good job now and it only requires some learn by doing to get the right results. Arrimus3D on YT had made several videos about it.
The Times They Are a-Changin....
I remember, years ago, watching an episode of Scooby-Doo (or maybe it was one of the many movies) where the Scooby Gang were scanned in a big 3D scanner and perfect 3D game versions of them were created. That was sci-fi to me... and now computers can do that without even needing the scanner, with just a bunch of photos. AI is amazing!
This is an amazing way to significantly speed up development of realistic game environments, even to create humans. There's a clear downside where the topology you get is disorganized and the number of unnecessary vertex is abysmal but fixing it will always be faster than doing models from scratch, so it's a significant improvement regardless and something that will become the standard of photorealistic 3D workflows in no time (assuming it hasn't already, my specialty isn't photorealism so I wouldn't know).
An amazing breakthrough, we can expect very high quality games with significantly more content thanks to this technology.
This would be extremely helpful within the game development and 3d animation industries. Modeling takes a huge amount of time and as such something that could make it automatically from concept art is a huge help.
these types of algorithms will make the content creators life so easy.
For everyone saying that the plant looked different: I think it's the translucency, the effect where an object isn't completely opaque, but you can see the light from the backside. Maybe the AI can't create translucency maps yet.
dude DALE 2 + this = next level games and im loving it
My mind was blown with the tree scene. I was looking at the tree and how the shadows moved and it looked like the actual real one had a bit more color when the light was behind it but otherwise it was absolutely crazy when he said, "I have swapped the labels" 🤯
I would like to see an entire tree scanned. All the leaves and complex lighting should be a good challange.
For the scene, the shadows of the AI reconstruction are a lot deeper, especially on the plant.
Don't know the cause, but I still fell for the swap.
I think the AI system didn't have option to enable SSS for the generated material. I would expect that oversight to be fixed in the next paper.
What I would love to see is this AI taking in an animated model doing poses and outputting a JOINTED 3d model that can be posed into the same or other poses. That kind of technology would be fantastic for motion-capture based game development.
I was weirded out by how much more detail the pot and the leaves had on the "Ai rendered" version, but after the plot twist it was understandable.
I would love to see people's family members "coming back to life" in vr through old photos being scanned. I know alot of families have memento boxes of high quality slides and photos that aren't digitalized. If this process could be used to create UE5 metahumans, each family could have a digital family tree dating back generations for descendants to visit in VR or other platforms rather than going to a cemetery!
me at 4:41: wow, the reconstruction looks even better than the original!
4:58: ok now it makes sense
This will make 3d reference objects more acessable like for 3d printing and also better in results then with some current ways. Nivdia is doing the revolution
In the past, if you wanted to save the image of a landscape, you had to pay someone to draw or paint it.
Now, you can just use a camera.
Today, if you want to save the form of an object, you have to pay someone to model it.
Maybe in the future, we will be able to get the same thing by also using a kind of camera.
That narration tho...
WeeEEeeLl~
BuuUUUuut~
ImrooOOOOvees~
Just as smooth as sandpaper through my ears
I think he's registered his voice through an AI voice tool and now he's just writing copy for his videos.
That Kinect ad with the skateboard comes through in the end!
his voice at the end of every sentence: 📈
Nvdia will go crazy for RTX 4000 series for sure
If this was easy to produce a model with precision scale, then I can imagine some pretty amazing applications in the medical prosthetics industry (or even fashion) industry. Imagine custom shoes based on pictures of your feet, or ordering a dental retainer by smiling for the camera? More practically, creating a perfect fit 3d printed cast for an arm without needing a high resolution scanner would be pretty amazing. It IS a great time to be alive!
the first use case that came up to my mind is to reconstruct Google map's trees (or buildings) lmao. We can then say thats the entrance of real world's digital twin.
Or it can honestly be used in "metaverse"(games) quick scale-up development. I guess
Perfect
Christ, people still about that meta verse crap. There are bunch of vr spaces created before that shitty corporate place ya know?
@@me-ry9ee that’s why I said (game) chill out
For Rockstar's GTA VI maybe?
Yeah baby! Been doing photogrammetry for 9 years professionally. Going to be digging at this asap
Boom! The path to photorealistic simulations, VR, gaming, AI and human training, and what not!
4:40 if we can spot any differences, in this case, it doesn't matter which is the original. Still extraordinary.
I can spot a difference though. The leaves of the tree are more opaque on the one on the left. The one one the right has some transparency to it when the light hits it from behind. Still super impressive hope that one day soon I’ll be able to integrate something like this with my work.
I am learning unreal engine right now and I'm trying to make a game. Tools like these will change my workflow for ever. As a solo developer I'm thinking about how many countless hours I will be spending on modeling assets for my game, but in the future, this tech could make it so much easier.
I'm afraid this will be hard to use in games, as the geometry is generated from a signed distance field. Turning that into an efficient mesh will still require a lot of manual work
@@shadamethyst1258 What do you mean? It is a mesh, although a quite messy one. It doesn't look very different from what you'd get using photogrammetry. There are some powerful remeshing tools that could clean it up, but chances are that won't be necessary using UE5's Nanite virtualized geometry rendering.
@@GS-tk1hk Yep! Technology like Nanite will make this easy to use.
And the assets could also be used in other software like blender where your poly budget is much higher:)
@@GS-tk1hk I'm not up to date on remeshing technologies it seems, I'll look into it
@@shadamethyst1258 Nanite is new technology in Unreal engine, it's not remeshing per se but it's more of a mesh streaming technology where high poly meshes are imported in the game and the density of those meshes changes dynamically as the player moves closer and further away.
in the swaped scene i intantly detected the black vase had less light on it, so the AI version was the one with the less GI and radiosity
This seems like it could be useful in recreating assets from video games.
I cant wait for articles like "They printed my game and put it on the app store."
So now... we create a unique scene inside of DALL-E 2 and then put that into Nvidia's AI to make the scene 3D. Then put your VR goggles on and explore it as if it's a real physical space.
I remember back then read Doraemon comic, robot from future, showing that single photo could be transformed into 3D sphere environment. that's something science fiction back then, but now it's reality. what a time to be alive!
Yo it even fixes the fstops in redering at 5:00
Loved this kind of new technological leaps❤️❤️
Two more papers down the line, the hot dog would be pushed by the jelly and the sauce would stain it (assuming they used the same target scene and virtual jelly).
Well, actually at 4:39 I thought: Wow! The AI added the Subsurfacescattering to the leaves even, if it wasn't there in the reference... but no, we are not yet there.
But it is impressive, no matter what.
3D modellers will soon be saying "they took our jobs!!"
Joking aside This looks incredibly handy for us environment designers - I mean it seems like an almost superior alternative to photogrammetry, or am I crazy?
Perhaps its more useful for real-time application in that regard.
Anyway, with auto topology also on the rise it'll be a breeze to get assets into games as long as you've got reference images (which you do, 99% of the time)
No... I AM THE CRAZY ONE. Ffffuck!
Imagine drawing a few sketches as a concept artist and then generating 3d assets in ~10 minutes
Not really, because game assets are almost never just a (poor & unoptimized) duplicate of a real object. Also no AI has an idea what is actually important for a proper nice looking asset matching all the guidelines of the specific game engine / studio / asset type etc.
@@jendabekCZ I wouldn't totally agree, as I think what's makes a good optimised game asset is pretty straightforward and could be AI handled. Basically, min amount of vertices, while keeping max detail, passed a certain threshold put this detail on a normal map instead. Of course I'm simplifying it but my point is these rules are not really subjective.
Now on the artistic side, that's something else, but have you already tried Nvidias GauGan demo? It is capable of applying a specific style to its output to match given references
Edit: Of course, I'm just speculating on future solutions and how it could evolve
They're taking all the fun out of this world.
I can see it being used to identify bodies including those with partial remains. Even narrowing down the candidates to 100 out of a million would be helpful.
Waiting for your video on Deepmind's GATO. Good work man!
Imagine importing a video of a person, and the ai would reconstruct the person and even rig and animate it.
I'm just looking forward to liking together PaLM, Dall-E 2, and this. We'll be able to describe a scene, PaLM will interpret it into instructions for Dall-E 2, and then Nvidia's AI will turn it into a 3d environment that we can interact with and explore.
show more robotic arms papers learning to manipulate objects in the real world i find it fascinating
4:40 I knew the labels were already swapped because the plant pot on the right has way more detail