Me in the 90's watching Blade Runner: - "That's BS! You can't see back side of an object from a photograph!" 25 years later: - "Hold on to your papers..."
It is a guess based on the most likely apperance the "AI" would estimate. That is nice to create Bodymodels from a picture. But will not help to find evidence for something as i.e. in "Enemy of the state" has shown
Unfortunately or fortunately still BS. The info you don't have - you simply don't have. This "technology" only shows you what already is known. Itt cannot show anything that is unknown. Imagine if you see a photo what would you say is behind the door. You enumerate what you've seen in your life and come out witg a couple of guesses - based on the context and your intuition. If the door hides something that is unseen for you, by this technology the chance to show you what is exactly behind the door is 0.0%. So just like this technology - use your imagination. You'll never get a thing like in BR 1984. Oh, maybe - if photographs will store all light rays (not just the frontal exposed ones) or storing other forms of information not-in-line-of-sight.
@@tmsaskg I think it's imaginable that a "pseudo-omniscient" AI could take your request for what's behind the door and use the totality of all audio/video/electromagnetic data collected from that instant to reconstruct the scene behind the obstruction. Sort of the like that cell phone aggregation device from The Dark Night.
Woah. I wonder what this is going to do for motion capture for movies, and games. Maybe in the future you'll be able to put yourself into a game even! Nice video.
The phrase "two papers down the line" does not ignore or dis-credit earlier steps forward. Instead it realizes that a few small steps forward will contribute to a huge leap despite the fact that the "first paper" might be even longer step than the "second paper".
Stupid question, but would it technically be possible to not have a frame-by-frame reconstruction for video, but instead do a reconstruction of the first frame, then have the algorithm connect vertices to new posture/detail positions and every "new detail" would add more vertices to the existing geometry? Not sure if this is exactly what "frame-by-frame reconstruction" means, but having a 2 frame vertex position estimation would be next level in terms smoothness. Again ... stupid thought xD
Not stupid at all. I believe what you are talking about would be phenomenal in terms of smoothness since there would be no remeshing. Personally im not qualified to say if its possible or not, but it sounds very simple and I would love to see a paper on it in the future. For it to be done, one of the things that has to be though about is what sort of algorithm is used to translate vetices to their new spots. Something like a shrink wrap might work.
Do you mean reconstructing the entire first frame in 3d with geometry and using the following frames to add detail to that geometry?, that is kinda what the AI does with the humans, we clearly see a lot of flickering but I'm pretty sure that the previous and next frames are connected in a way that some of the information gets transferred so as to not see weird changing shapes, although when looking at the feet on real video there is a lot of flickering, which could mean that the algorithm takes the information from only one frame. You might be onto something here.
Interesting, i thought about a 2 step process. First use the Videodateien to generate a high detail 3D model, and then use this model to recreate the poses. Maybe that could eliminate flickering because it can observe the future.
Two more papers down the line I expect: 1) Perfecting the examples with hidden parts (2:16 approximate the foot to be symmetric) 2) Perfecting video coherence (3:19) 3) Understand pieces of clothing and make multiple models: Human, (glasses, earrings...), shirt, trousers, left boot, right boot ... Then you will take a photo and import yourself to Blender. And watch interactions of clothes in simulated running :D.
Same. I just went "wait, but that data is completely fabricated, right?" only for it to be confirmed a moment later cause I couldn't believe there wasn't another picture the AI was working with.
What would be interesting is to allow these models to be functional and programmeable - "run up stairs", "model 1 dance tango with model 2 to rhythm of song x". It will come, and then add facial overlays and everyone will be able to have their own hollywood!
The problem seems to be that the foot is behind the other one, so the algorithm doesn't fill it in properly. As an intelligent person would. Two more papers down the line! I am hyped! HYPE!
Soon we'll have live 3D model of sports events and we can watch them with VR-goggles from any point we want. We can hover over a soccer field, go stand in the middle of it and look around as we were there. Or we can pin our point of view to some specific player. Going to be amazing.
This won't be any use of outside recreation/entertainment/advertisment/whatever soft, non-crucial industry. Unfortunately this is only for visual fantasy, misleading reality. You won't get any important and scene specific information you hadn't - as that you already had it in your database used to fill the missing parts. This is illusion.
It won’t be strictly evidence, but for help in visualizing the crime scene, where certain professionals can make edits or additions if the algorithm misses something out. And anything out of view from the original camera can possibly be highlighted/color-coated in the simulation
While the video is not directly about VR, it's great watching this channel for all the individual pieces that can lead to such things as converting an entire movie to interactive 3D VR. Current programs have been shown to improve resolution, frame rate, damage, generate 3D geometry of scenes, predict hidden information, and as this video shows have detailed models of characters. I really can see all these and more being utilized in the future to process a movie turning into a full virtual theater you can see from any and all angles including from inside the scene.
One of the best channels on UA-cam (content)... Being a grad student, I've not got much to offer other than commentary accolades, and gratitude for your work of producing this channel.
A better way to do video reconstruction would be to use the extra data to make a better more detailed and accurate 3d model and use an AI that detects poses to pose the model that is created
Definitely rough around the edges, but very promising. Something interesting I saw: look at the lower-left woman's foot at 1:59, the network gave her a high-heel!
Awesome coverage of PiFuHD! I myself tried to explain the paper as well as I could a couple of months ago, but I love your viewpoint on it, as always! Keep up the great work! I love to see someone else explain the same paper as I do, so we can have multiple views and better understand it. Super clear and interesting as always!
@@TwoMinutePapers "The more angles, the better" in a paper that generates the back from a still picture of a front-facing person... I see what you did there!
While the the technology is truly impressive seeing as it can reconstruct so well from limited resources I do hope there are AI in the work that utilise similar techniques for photogrammetry. AI like the one shown could use the large amount of data about the position of vertices and combine it with detail inferred from the photos. A sort of 3D AI based denoising. It would also cut down on the amount of pictures needed to get quality models.
These 3d model of people show how just how similar we are. when there are no detail like those in the boat we look even more similar, so similar that I can't even tell what the difference is.
I see massive potential for this! Feed the model one picture of the front and one picture of the back of a person t-posing, and BOOM! Instant photoscanned human! Slap a rigify rig on that bad boy and download some animations from Mixamo and you've got a passable CGI human you can use in an Ian Hubert-esque shot! WHAT A TIME TO BE ALIVE!
Sometimes I wish I could just try these experiments for myself. Just upload a picture and let it generate a model. I know it's computationally expensive to do that, but still, one can dream.
Interestingly at 2:10 in the top right image it identifies one of her legs to be shorter than the other, rather than being the same length but positioned deeper into the plane of the screen.
The crazy thing about this channel is that the content just keeps getting better naturally. Of course props to 2MP for the editing of the video, but the actual computer science and presentations are exponentially increasing in quality
These are complicated mashes of all the combined data, but they still need to make decisions like not assuming a boy with socks wears high heel shoes when hidden from sight.
I think 3 papers down the line they will be able to create a moving model of the person, even with clothes movement depending on the arbitrary path you 'command' the character to follow
I think this could really be improved from a latent model of human surface anatomy (so instead of directly inferring the shape, it infers the human's vital appearance stats like height, body mass, etc), as well as some temporal coherence (when run on a video).
When I was younger I always wondered how holograms in movies would actually work but seeing all these AI constructed images it is clear as day how viable something like that can be.
Now you just need to using the video footage to interpolate and smooth a full model rather than recreating from scratch each frame. So each frame adds to the precision of the entire model. This way it would only take a 360 video and you would have a near perfect 3D model. Use each frame as you would use 2 camera to get 3D depth except instead of 2 eyes you would have 100's from lots of different angles
in 2 paper the video will be better than still image because they will somehow use all the details from the previous and next frame to add even MOAR DETAILS
This could be useful for game dev, you can quickly and easily create a diverse "cast" of background characters that look realistic and not just as if they where copies.
Having this in realtime + AR glasses could let two people see each other's entire bodies as if they were in the same room while being on different sides of the globe!
Now imagine creating a space that has cameras set up in a room. Now imagine imputing this into that space. Now think of how that could be used as a real time representation of you in VR.
You remember as kids when there was a sci-fi thing and they would look at something on a computer and they would go ENHANCE or ROTATE or ANALYZE and the computer would do some crazy thing and we'd all go 'PFFFFFT that's not how computers work!' Well. it is now.
Im going to walk into Harvard's faculty room. If they ask me to leave, i'll tell them im a fellow scholar. If they ask for my creditials, i'll tell them to hold on to their papers! And show them two minute papers. Then i'll sit on chair and put my arms behind my head, legs on the desk and say "what a time to be alive!"
What about using this technique to capture and reconstruct geo of people for use as background NPCs in video games? Of course this would need to get a lot more accurate, but what if instead of modelling each person you could pay a person off the street to model for you. You would obviously need poly count constraints, but this could work to alleviate some work from 3D artists.
I wonder if there will ever be one AI that can remaster old video footage by showing it some of the seen objects from good quality reference images. Thats my dream. To make old pixelated videos from VHS into HD by giving the software good scans/photos of the objects from the video.
Me in the 90's watching Blade Runner: - "That's BS! You can't see back side of an object from a photograph!"
25 years later: - "Hold on to your papers..."
You still can't *see* it, you can only guesstimate what it probably looks like.
It is a guess based on the most likely apperance the "AI" would estimate. That is nice to create Bodymodels from a picture. But will not help to find evidence for something as i.e. in "Enemy of the state" has shown
In fact it was possible even back then
Unfortunately or fortunately still BS. The info you don't have - you simply don't have.
This "technology" only shows you what already is known. Itt cannot show anything that is unknown. Imagine if you see a photo what would you say is behind the door. You enumerate what you've seen in your life and come out witg a couple of guesses - based on the context and your intuition. If the door hides something that is unseen for you, by this technology the chance to show you what is exactly behind the door is 0.0%.
So just like this technology - use your imagination. You'll never get a thing like in BR 1984.
Oh, maybe - if photographs will store all light rays (not just the frontal exposed ones) or storing other forms of information not-in-line-of-sight.
@@tmsaskg I think it's imaginable that a "pseudo-omniscient" AI could take your request for what's behind the door and use the totality of all audio/video/electromagnetic data collected from that instant to reconstruct the scene behind the obstruction. Sort of the like that cell phone aggregation device from The Dark Night.
Woah. I wonder what this is going to do for motion capture for movies, and games. Maybe in the future you'll be able to put yourself into a game even! Nice video.
Check this out from a month ago regarding that, really cool! :) - ua-cam.com/video/YCur6ir6wmw/v-deo.html
@@TwoMinutePapers Sweet! I’ll check that out for sure.
I know right.
ok
How this conversation happened 1 day ago while video released 3 minutes ago? :o Timezone issue?
"2 more papers down the line"
*_Guy who made 1 paper down the line_*_ has left the chat_
The phrase "two papers down the line" does not ignore or dis-credit earlier steps forward. Instead it realizes that a few small steps forward will contribute to a huge leap despite the fact that the "first paper" might be even longer step than the "second paper".
technically only the 1st 2 papers fail to be "2 more papers down the line", because after that every paper was 2 more papers down at some point xD
@@anteshell no shit sherlock
@@anteshell the guy was just joking 🙃
Stupid question, but would it technically be possible to not have a frame-by-frame reconstruction for video, but instead do a reconstruction of the first frame, then have the algorithm connect vertices to new posture/detail positions and every "new detail" would add more vertices to the existing geometry? Not sure if this is exactly what "frame-by-frame reconstruction" means, but having a 2 frame vertex position estimation would be next level in terms smoothness. Again ... stupid thought xD
Not stupid at all. I believe what you are talking about would be phenomenal in terms of smoothness since there would be no remeshing. Personally im not qualified to say if its possible or not, but it sounds very simple and I would love to see a paper on it in the future. For it to be done, one of the things that has to be though about is what sort of algorithm is used to translate vetices to their new spots. Something like a shrink wrap might work.
Interesting
Do you mean reconstructing the entire first frame in 3d with geometry and using the following frames to add detail to that geometry?, that is kinda what the AI does with the humans, we clearly see a lot of flickering but I'm pretty sure that the previous and next frames are connected in a way that some of the information gets transferred so as to not see weird changing shapes, although when looking at the feet on real video there is a lot of flickering, which could mean that the algorithm takes the information from only one frame. You might be onto something here.
Interesting, i thought about a 2 step process. First use the Videodateien to generate a high detail 3D model, and then use this model to recreate the poses. Maybe that could eliminate flickering because it can observe the future.
So you're talking about making algorithm time-consistent. Well, that's always desirable, but not always so easy to acheive.
Two more papers down the line I expect:
1) Perfecting the examples with hidden parts (2:16 approximate the foot to be symmetric)
2) Perfecting video coherence (3:19)
3) Understand pieces of clothing and make multiple models: Human, (glasses, earrings...), shirt, trousers, left boot, right boot ...
Then you will take a photo and import yourself to Blender. And watch interactions of clothes in simulated running :D.
That would be great.
I don't like how you use the word "Perfect."
I think better/more accurate would be a better term
People would use it to create videos of people without the clothes....🙈
@@JKJKJK794 They do that already.
@@Ch50304 Don't know who "they" is supposed to be but I'm not aware of any application that can do this well yet
No, my papers!! I didn't hold on tight enough...
I hate it when that happens.
Finally, an ML model to estimate how thicc people are.
T H I C C
The output of the network gives you how many C's to add at the end
DeepThiccʕʘ‿ʘʔ
If this isnt in blender yet I'm gonna be the first to code it. Looks way more convenient then photo mapping
W: Does this dress make me look fat?
AI: No. You do.
"judging by our AI's estimation of your pose and movement, you were, in fact, resisting"
When you showed it could reconstruct the backside of the model My jaw dropped
Guess you didn't hold onto your papers!
@@theoneandonly1833 He did, that's why he couldn't hold onto his jaw. Same happened to me lol.
same
The models had pretty backsides.
Same. I just went "wait, but that data is completely fabricated, right?" only for it to be confirmed a moment later cause I couldn't believe there wasn't another picture the AI was working with.
ah yes, finally i can see myself in Sims and simulate my decaying life :)
And tinker with some stuff..
I think we all know what will happen ( ͡° ͜ʖ ͡°)
@@miqbal2507 wait for yourself to go into the pool then remove the ladder?
With tech like this we will have 3d remasters of old movies, where you can actually walk through the scenes.
yes indeed
Imagine creating your own video game character just from a picture.
I am sorry this is much too amazing I could no longer hold onto my papers.
What would be interesting is to allow these models to be functional and programmeable - "run up stairs", "model 1 dance tango with model 2 to rhythm of song x". It will come, and then add facial overlays and everyone will be able to have their own hollywood!
2:16 someone get top-left girl a foot doctor
lol get it.
The bottom-left one have diferent shoes in her feets
The problem seems to be that the foot is behind the other one, so the algorithm doesn't fill it in properly.
As an intelligent person would.
Two more papers down the line!
I am hyped! HYPE!
The applications of this work, two papers down the line, are mind-blowing! Thank you Dr Zsolnai-Faher
Soon we'll have live 3D model of sports events and we can watch them with VR-goggles from any point we want. We can hover over a soccer field, go stand in the middle of it and look around as we were there. Or we can pin our point of view to some specific player. Going to be amazing.
This is insane, be like reconstruct crime scene in 3D from a camera angle and able see from all angles after it.
I doubt the gained information would be usable at court. It is an AI reconstruction, not a real record of a crime.
This won't be any use of outside recreation/entertainment/advertisment/whatever soft, non-crucial industry. Unfortunately this is only for visual fantasy, misleading reality.
You won't get any important and scene specific information you hadn't - as that you already had it in your database used to fill the missing parts. This is illusion.
People will be framed for crimes they did not comited using deep fake technology
Like the star wars the clone wars episode, where anakin and ashoka are trying to figure out who bombed the hangar?!?
Thatd be cool as fuck!
It won’t be strictly evidence, but for help in visualizing the crime scene, where certain professionals can make edits or additions if the algorithm misses something out. And anything out of view from the original camera can possibly be highlighted/color-coated in the simulation
While the video is not directly about VR, it's great watching this channel for all the individual pieces that can lead to such things as converting an entire movie to interactive 3D VR. Current programs have been shown to improve resolution, frame rate, damage, generate 3D geometry of scenes, predict hidden information, and as this video shows have detailed models of characters.
I really can see all these and more being utilized in the future to process a movie turning into a full virtual theater you can see from any and all angles including from inside the scene.
One of the best channels on UA-cam (content)... Being a grad student, I've not got much to offer other than commentary accolades, and gratitude for your work of producing this channel.
The fact it's able to generate this much detail from a single RGB is pretty damn insane if you think about it.
Honestly the video results were still a lot better than I was expecting.
I can tell it guesses better than most modelers can in first hour, and it does it 100x faster.
2 papers down the line, no more mocap suits!
And the algorithm makes nice backsides.
A better way to do video reconstruction would be to use the extra data to make a better more detailed and accurate 3d model and use an AI that detects poses to pose the model that is created
I'M HOLDING TO MY PAPERS
You better
Toilet papers
"I'm holding on to my papers, but my jaw keeps hitting the floor."
Definitely rough around the edges, but very promising. Something interesting I saw: look at the lower-left woman's foot at 1:59, the network gave her a high-heel!
I am holding onto my papers ✋📝
Awesome coverage of PiFuHD!
I myself tried to explain the paper as well as I could a couple of months ago, but I love your viewpoint on it, as always!
Keep up the great work! I love to see someone else explain the same paper as I do, so we can have multiple views and better understand it.
Super clear and interesting as always!
You are very kind, thank you so much for dropping by! Agreed, the more angles, the better!
@@TwoMinutePapers "The more angles, the better" in a paper that generates the back from a still picture of a front-facing person... I see what you did there!
This is a huge step already, it doesn't need to be perfect, that can be done later.
2 more papers down the line, you can take an entire dance performance and throw it onto any Blender model of your choosing.
Imagine this for VR, you could video call someone (who has no special equipment) and see them in 3D
While the the technology is truly impressive seeing as it can reconstruct so well from limited resources I do hope there are AI in the work that utilise similar techniques for photogrammetry. AI like the one shown could use the large amount of data about the position of vertices and combine it with detail inferred from the photos. A sort of 3D AI based denoising. It would also cut down on the amount of pictures needed to get quality models.
"2 more papers down the line"
Born too late to explore the sea, too early to explore the stars, but just right to explore AI! What a time to be alive!
Wow! Truly astonishing! Probably, I did not hold on to my papers because to backside blew me away! Thank you Károly for presenting it!
So computers can recognize and reconstruct super hot models.
We're only a few steps away from this ai recreating original characters by just posing for a picture. Amazing.
These 3d model of people show how just how similar we are. when there are no detail like those in the boat we look even more similar, so similar that I can't even tell what the difference is.
I was have used their algorithm but never knew any in-depth information about it. Thank you for making this video.
Wow! I really hope the next paper on this topic covers smoothing the animations of the generated models :)
I see massive potential for this! Feed the model one picture of the front and one picture of the back of a person t-posing, and BOOM! Instant photoscanned human! Slap a rigify rig on that bad boy and download some animations from Mixamo and you've got a passable CGI human you can use in an Ian Hubert-esque shot!
WHAT A TIME TO BE ALIVE!
Amazing work! Thank you for sharing. I particularly like reading the paper and the Colab notebook that researchers shared. Outstanding!
Sometimes I wish I could just try these experiments for myself. Just upload a picture and let it generate a model. I know it's computationally expensive to do that, but still, one can dream.
Interestingly at 2:10 in the top right image it identifies one of her legs to be shorter than the other, rather than being the same length but positioned deeper into the plane of the screen.
UA-cam compression when the wireframe people start moving: *_Im about to blur this man's whole vid_*
The crazy thing about this channel is that the content just keeps getting better naturally. Of course props to 2MP for the editing of the video, but the actual computer science and presentations are exponentially increasing in quality
These are complicated mashes of all the combined data, but they still need to make decisions like not assuming a boy with socks wears high heel shoes when hidden from sight.
This makes motion capture so much easier.
Can't imagine what this will be 2 papers down the line. Amazing!!🙏🏼
I think 3 papers down the line they will be able to create a moving model of the person, even with clothes movement depending on the arbitrary path you 'command' the character to follow
I can't wait until 2050 when my boss is using this to make sure I am working
Imagine combining this with reliable holographic technology and finally being able to deliver order 66 like Darth Sideous intended.
I think this could really be improved from a latent model of human surface anatomy (so instead of directly inferring the shape, it infers the human's vital appearance stats like height, body mass, etc), as well as some temporal coherence (when run on a video).
Oh man, i love when AIs construct missing information in a plausible way. This offers so many possibilites.
What a time to be alive!
"It can also reconstruct the backside"
well, you're not exactly wrong...
When I was younger I always wondered how holograms in movies would actually work but seeing all these AI constructed images it is clear as day how viable something like that can be.
my paperweight wasn't heavy enough to hold onto my papers
I’m ready to be in GTA 8
Now you just need to using the video footage to interpolate and smooth a full model rather than recreating from scratch each frame. So each frame adds to the precision of the entire model. This way it would only take a 360 video and you would have a near perfect 3D model. Use each frame as you would use 2 camera to get 3D depth except instead of 2 eyes you would have 100's from lots of different angles
We need easy demos that everyone can use.
What a time to be alive indeed , those darn ai taking all our jobs
in 2 paper the video will be better than still image because they will somehow use all the details from the previous and next frame to add even MOAR DETAILS
Great, all we need now is for it to create mind simulations and we got Roko's basilisk.
This could be useful for game dev, you can quickly and easily create a diverse "cast" of background characters that look realistic and not just as if they where copies.
As soon as it learns to wrap and complete the image around the 3D model we could make some cool looking hologram effects in videogames.
Having this in realtime + AR glasses could let two people see each other's entire bodies as if they were in the same room while being on different sides of the globe!
Thank you Mr Dr Man i really appreciate you showing us a glimpse of the near future :D
Now imagine creating a space that has cameras set up in a room. Now imagine imputing this into that space. Now think of how that could be used as a real time representation of you in VR.
Damn!, I tore my paper
As Archimedes said - give me a normal and depth and I can relight the world!
You remember as kids when there was a sci-fi thing and they would look at something on a computer and they would go ENHANCE or ROTATE or ANALYZE and the computer would do some crazy thing and we'd all go 'PFFFFFT that's not how computers work!'
Well. it is now.
What about the texture/colour from the photo? can this be brought to the 3D model??? love this...what a time to be alive.. Please let me know!
I'm still overwhelmed and baffled by the Deformation simulation processing has achieved.
this makes all my Motion Capture classes seems useless... but at the same time... THANKS GOD
I had to hold 2 stacks of papers for this one
Imagine making a game or something, and you need a character model, so you snap a picture of someone, and import it.
Soon we will be able to project our avatars over our skin
Im going to walk into Harvard's faculty room.
If they ask me to leave, i'll tell them im a fellow scholar.
If they ask for my creditials, i'll tell them to hold on to their papers! And show them two minute papers.
Then i'll sit on chair and put my arms behind my head, legs on the desk and say "what a time to be alive!"
Hellraiser Quote : " We Are No More Surprises "
This is the Chinese Gait Recognition that the Minute Hour predicted.
NO WAY ITS OUT PUBLICLY IM SO HYPED AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Okay I don't know what to expect from 2020 anymore... This is really amazing :O
So many papers perfect for illusion games
This is the start of new 3D revolution.
Are you kidding me??? *DONT EXPECT MIRACLES?!?!?!?* THE STILL IMAGES ALONE ARE MIRACLES
I wish this could extract textures as well. Maybe in the near future
A couple of years from now, and imagine what applications this technology would see in video game and movie industry...
This AI is like every young guy out there, estimating the invisible backside of each model.
Can't wait to mocap from a single footage.
For those of us who are here watching this on its upload date, November 3rd, 2020, we should understand that we were never part of the problem.
In the future, there is no escape from the machines
and so it begins
WHAT A TIME TO BE ALIVE !
What about using this technique to capture and reconstruct geo of people for use as background NPCs in video games? Of course this would need to get a lot more accurate, but what if instead of modelling each person you could pay a person off the street to model for you. You would obviously need poly count constraints, but this could work to alleviate some work from 3D artists.
I live for hearing "What a time to be alive"
Anybody else recalling "The Most Realistic Video Game" from Studio C and making comparisons?
This + V.r. = A literal Virtual you!
I wonder if there will ever be one AI that can remaster old video footage by showing it some of the seen objects from good quality reference images. Thats my dream. To make old pixelated videos from VHS into HD by giving the software good scans/photos of the objects from the video.