This AI Learned To Stop Time! ⏱
Вставка
- Опубліковано 8 вер 2024
- ❤️ Check out Lambda here and sign up for their GPU Cloud: lambdalabs.com...
📝 The paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes" is available here:
www.cs.cornell....
❤️ Watch these videos in early access on our Patreon page or join us here on UA-cam:
- / twominutepapers
- / @twominutepapers
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Haro, Alex Serban, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bryan Learn, Christian Ahlin, Eric Haddad, Eric Martel, Gordon Child, Haris Husic, Ivo Galic, Jace O'Brien, Javier Bustamante, John Le, Jonas, Kenneth Davis, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Mark Oates, Michael Albrecht, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Ramsey Elbasheer, Robin Graham, Steef, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: / twominutepapers
Thumbnail background image credit: pixabay.com/im...
Károly Zsolnai-Fehér's links:
Instagram: / twominutepapers
Twitter: / twominutepapers
Web: cg.tuwien.ac.a...
so much progress so quickly! the new camera 'warp stabilisation' without cropping (at the end of your video ie 6:22) will be very valuable just on its own, let alone things like being able to track out or in so that it doesn't look like a zoom. this could save a fortune on a shoot, because setting up for a track, even with a slider, still takes a lot of time and there is always the matter of pulling focus (though some AI algorithms can re-sharpen soft shots i think!).
hello time traveler
19 hour old comment on a nine minute old vid
yo is covid-21 a thing in 19 hours?
@@strebicux6174 patreon preview(
Watch out blurry, shaky Bigfoot and UFO videos, AI is coming for you!
Edit: actually, I want to see what this paper would do to the moon landing footage and the Zapruder film.
Nerf has come a log way since the supersoaker in the 90's
yes.
😂😂😂 lowkey lowkey
@The Xtreme Crafter log*
@@pig_master101 *ln
Lol. NERF or nothing!
Lonnie Johnson was a genius, did some work for NASA JPL. I think that's when he came up w/ the super soaker concept by accident. Ended up selling his product to Hasbro which put it in their NERF line. Made my childhood summers extra fun, completely changed water gun fights
I see
that means it's the same type as Star Platinum
The W doesn't stand for Wild, it stands for WARUDO
@@nixel1324 And D-NERF stands for DIO-NERF
I was waiting for this!
Why are these companies so stubborn about admitting that they made a JoJo reference?...
No
I feel like every day we get closer to actually being able to "zoom and enhance" like on TV cop shows
except its not real information so it would hopefully never be permissible in court.
@@Yobleck let’s hope
@@Yobleck Even if it's not legitimate evidence in court, cops could and propably would use it in interrogations. Kinda like lie detectors, eventhough those tests aren't real evidence, police can use it to pressure you into a confession.
AI Superresolution is actually another big topic of study!
The funniest aspect is that mainstream media is finally catching on that all that 'zoom and enhance' nonsense of older shows was unrealistic...
I'm waiting for the video where the A.I. makes one for him lol.
Where he shows how the AI completely made his latest video by itself.
Plot twist; the vids are already ai
He’s already made videos on ai writing his script and doing text to speech with his voice. Just need something for the visuals
I while I ago I thought there was one. in the watching playing video games one, I was told of a switcheroo in the comments, and kept waiting for reveal. but then it turned out the switch was just showing two videos side by side, but then he revealed the faster one was actually the ai
Laughs in carykh
I love that in all of Two Minute Papers videos, what was once a "new method" becomes a "previous method". What a time to be alive!
Next step: Al learned to reverse time
Final step: AI learned how to kill John Connor with sniper rifle and not with stupid pistol and shotgun)
To be fair, a well calibrated Android with a good AI could kill you from really impressive ranges with just a pistol. Like as soon as it gets line of sight of you, it should've already have been able to aim the pistol at you accurately and pull the trigger. No chase scenes, nothing. Just it sees you and you are dead before you can think.
Fighting against a well made war machine of the future, is like fighting against an aimbotter. The most unrealistic part of the Terminator movies, was that humans could ever stand a chance.
@@josephburchanowski4636 Pretty much. Still perhaps a higher consciousness would somewhat weaken machines simple functions like being able to lock onto a target in few hundred milliseconds and take head shots from miles away. We can see that when multiple functions are stacked, the whole process slows down considerably. So you can either have infiltrator terminator type of androids that can pass off as a human with all the computationally intensive simulations or you can have a relatively simple spot like robot with an aimbot turret on its back 360 no scoping people from miles away with some 3 kills a second rate. But it can be taken out easily so as long as you dont get in its fov.
@@josephburchanowski4636 Terminator but it's CS spinbotters. Great Corridor or rocket jump video idea
@@josephburchanowski4636 Even a perfectly aimed pistol gives a much higher dispersion on the target than a rifle. The barrel is too short, so an imperfection of the same size results in a larger deviation from the straight line. Hence, if I was the Terminator's boss, I'd order it to obtain a rifle.
@@josephburchanowski4636 so true! I mean, or just literally send back a virus that is specifically synchronized to John Connor's DNA and programmed to kill instantly. The entire movie would have been just, John Connor being a miscarriage, checkmate. Granted, it was the 90's and people loved explosions... they still do most likely.
Even a few years from now, image and video editing is going to be insane. My biggest problem with these is that I never know how to actually use these algorithms.
You won't need to. They'll show up as plugins in your editor (2d or 3d).
I think the biggest problem will be that we won't be able to trust any images or videos.
The security cameras business may be destroyed - or find a way to digitally sign their pictures. I'm not sure it would be easy, given that cameras - and the signing keys - are in control of who knows who.
@@DillonThomasDigital By they don't at the moment
@@vladimirdyuzhev It's still useful for the person who owns the cameras, assuming they haven't been hacked somehow, even if it isn't reliable evidence
@@vladimirdyuzhev That problem is still relatively far in the future. These algorithms leave traces. My main concern is how this'll increase distrust for less educated people. And I don't mean that in a demeaning way, I mean people who don't have a basic understanding of how these things work. Which is a pretty big group as of right now.
The future will depend more and more on government entities to be transparent and educate the people. The future powers of the world will be the countries that manage this and maintain peace within their own borders.
i gotta admit i sometimes don't hold on to my papers and just continue watching the video
I'm using NeRFs for my Bachelor thesis and it's super exciting stuff. The basic idea behind it isn't all that complicated, all these papers do (on a high level) is add new inputs dimensions (time/latent codes/etc) and play around with how the camera is interpreted. It is amazing how such "simple" adaptions result in such good results. I'm trying to replicate papers that use NeRF for facial avatars, i.e. video synthesis of talking heads; the basic idea is simply interpret head movement as camera movement (to get the input angles for training the NeRF) and add expression parameters captured with some face model to the input. Voila, controllable talking heads! I'm sure this channel will review those papers in the future, it's the next level for "deep fakes" imo.
What would be neat is to nerf scan your head and then animate it based on mic input.
@@Klaster_1 I think long term this is what is going to happen. There is already a paper using DeepSpeech that learns a NeRF from a video and audio signals, then you simply change to a different (feature encoded) audio signal and you get as output a video of the person saying the new audio (there's some extra fine tuning involved). This paper already looks better than any other GAN etc approaches. If you are interested, google AD-Nerf. Sadly, most of these really promising papers did not release their code yet and I'm really struggling trying to reproduce them.
Hey I’m trying to use this effect in a music video could you help me?
@@mercerprince1991 Sadly, unless you are familiar with computer graphics and machine learning, this might be difficult. You can navigate to the project page and then to the github directory of the paper. Then you need to accomplish two things: Firstly, install the model with all its dependencies, secondly preprocess the video using colmap and the given scripts. Then you need to train the network, which, unless you own several high end GPUs, will take several days. Once you trained it on your data, you need to understand how to generate outputs, which is another difficult task unless you're experienced in the field. This stuff is not really ready for the general consumer.
How could one, if at all, extract vertex data from the synthesized frames? I initially thought the AI was doing this, but perhaps it is just outputting pixels based on your desired viewing angle?
I am 100% sure people will use this to create memes someday.
edit: a word and removed a space
jojo reference
The Warudo technique
Memes and porn basicly
For example, people were afraid deepfakes were going to be used as fake news and fake information on social media... its obviously being porn and memes XD
TOKI WO TOMARE
@@calexito9448 always is.
These men really bended time and travelled back to comment 18 hour ago
Or they're Patreons
WHO KNOWS 🤷♂️
@@romainhedouin *members of the channel
These are all ideas that many of us would have dreamed about, but here are some people actually making them happen. 👍
These are my FAVORITE types of AI techniques. Spatial reconstruction is what I am very excited about!
Imagine being able to create a full VR experience just by taking a few videos and pictures from an event! I would love to see something like the Red Bull soapbox race from any point of view and in full VR!
Literally just thought about this too! It would really help with immersion and hopefully take care of sharpness.
“wait, whats that in the reflection in the murderer’s sunglasses? _Enhance!!”_
is now possible?
Even the reflection in the eye of a fly that reflects in the mirror. It typically rickrolling once enhanced.
Pretty much, or it will be soon. The problem is that the enhancements will put what the AI has learned to expect to be there, just like a human artist would by using their imagination. It won't be what was there in reality, except maybe by lucky chance.
@@BrooksMoses Yes, all it does is generate something that looks convincing, it can't add information that wasn't there
@@circuit10 It can and does, which is the issue. It can't add the information that was there but concealed. It just comes up with something, anything, that never was there.
@@dingus_doofus That’s what I’m saying, it makes up something that looks convincing
5:30 Shadow of the man is moving on the grass like it's not the camera moves but man itself.
Fantastic! The new shot with the guy backflipping is impressive, and it seems to realize that his shadow is connected to his position in how it's cast, but doesn't quite get that the light-source isn't moving. Still, it's fantastic what it can do.
Dang I did not think the "Halliday's Journals" from Ready Player One could ever really exist but I stand corrected and that's awesome!
Honestly speaking as a filmmaker the stabilisation application is really exciting. The amount of footage this could save would be invaluable. Seems like the technology is only a year or two away from something that could be implemented in editing software. My papers have flown away at this point, I couldn't hold on.
You scroll down to find some za warudo memes don't ya.
damn your right
The only reason I'm here
_"Toki wo tomare..."_
I don't know what you're talking about 😨
let us do our job then...
If there was ever a channel that should have done an April Fool's video, it's Two Minute Papers
No one would have known because every paper he covers is so mind bogeling
What could be more unrealistic than those real tecnologies evolution? hahahah
@@a.thiago3842 Maybe something like "New AI technique synthesizes entire movie from text descriptions!" xD
it would be so dangerous, because 99% of the viewers will believe it and maybe tell to their friends, and their friends to their friends....
It would be the perfect time to make an entirely AI generated video
"Praise the papers!" lol I love that!
5:30
I love how the shadow is hovering over the ground while time is frozen 😂
This is remarkable!! Now that newer phones are coming with LIDAR tech, the additional information is sure to enhance the accuracy even more!
AI: watch jojo
After learning
AI: fool do you think iam Your AI
You wrong this is me DIO "tokio tomare *za warudo*"
BBBAKAMONOGA
Heey Baby!
It’s absolutely crazy how rapidly these types of things are advancing!
You know, it would be 10 times scarier if ai would be able to generate HD 1 minute videos with sound by giving it an idea
I'm guessing it will exist in our lifetime. Not even in a very very distant future.
Yeah, no, that's bound to happen. I'd bet it'd be possible within the next 15 years; no later.
2-5years
Imagine being able to create an entire new cartoon or anime episode just by typing the script or giving some ideas
@@LightVelox That will cause a sea of low-quality copycat animes :(((( and yet it is bound to happen.
I can totally imagine deep learning teaching itself to infer 3D scenes from 2D images.
1. With the help of some 3D simulation software, the AI can generate a fictional 3D scene with trees, rivers, grass, humans, animals etc.
2. The AI then take snapshots from the 3D scene for training.
3. Based on these snapshots it can score itself by comparing the inferred 3D scene vs the actual 3D scene it generated, and thus improving the model. This can be repeated infinitely.
4. Once trained, the model could probably make accurate inference on 3D scenes from 2D images, even from completely different angles.
I would really like to see this applied on animated media. Characters which only have a 2D representation and automatically generating a 3D view of them.
Incredible! Video stabilization is something that has been researched a lot and this is an amazing breakthrough.
"this ai learned to stop time!"
jojo fans : ..........he what?
Sometimes a paper comes across this channel that makes me downright _excited_ .
This stuff is making things that I grew up knowing as a punchline for its impossibility into automated systems and it's amazing to watch develop.
The video stabilization in the last example reminds me of what FOVeated rendering could look like in a VR game. We're approaching being able to do this in real-time like in Minority Report with any video you drop in becoming a fully realized environment just from a video. Really the only thing missing is the hologram.
5:34 I'd also like to point out that the shadows are moving, in a "frozen" image so there's still progress to be made
AI: I can stop the time.
Space bar: Am I a joke to you?
Looks like we are finally approaching what I've been hoping for for awhile now. I've been thinking there should be a way to effectively "integrate" over a video to create a "scene" that represents the area in the video and the events that happen. I'm not sure how close this is to effectively generating an internal 3D representation of an entire scene but it's looks like it's heading there.
If it does, then the next step should be to add super-resolution to the method. It could work well with it for two reasons. One, any slightly moving video should be collecting new information about the scene that could be used to enhance super-resolution. Two, any model that generates an internal 3D representation of the scene should be able to effectively "re-render" the scene at a higher resolution like you would an animation project.
Video seems to get the short end of the stick when it comes to photogrammetry but with a good enough algorithm/AI it should make a fantastic tool with how easy it is to use to collect massive amounts of data
I can see this being used in the future for movies and maybe even in phone video and video editors. You kind of showed the potential of it at the end of the video with that stabilization demo.
Can you cover some new photo colorizing methods? I feel like there is still no standard way to colorise black and white images that looks realistic, they all have a lot of flaws
Check this out! ua-cam.com/video/EjVzjxihGvU/v-deo.html
Another youtuber did a video about the current ones, the best is from the website "myHeritage" and it's near perfect, and it even makes short clips from still photos. I tested it with a few old pictures from my grandparents and was stunned
We're just two papers away from the comparison literally being the photograph investigation scene from Blade Runner
Imagine this in sports real-time replay with VR headsets
Beautiful. I wouldn't want official refereeing on this of course, but the replay potential is awesome
Got some pretty strong Matrix bulletime feelings with this technic
The old NeRF output at 2:12 looks really cool
Imagine shooting with a great camera could do! Mind blown!
Loved the shadows around 5:31. Seems pretty interesting how that actually happens in the nn.
5:30 moving shadow is trippy as hell
'What a time to be alive!' - wonderful enthusiam! Thanks for the great video!
Imagine being able to watch a movie in 3d from different angles, That will be sooooo amazing.
AI learned how to stop time
*Meanwhile my AI having hard time classifying a cat and a dog*
Prob underfitting
looking forward to that stabilisation! and also maybe in the future quick and cheap 3D scanning!
UA-cam: *recommends this channel
Me: *finds a random piece of paper before watching
My first thought is to try to apply it to the old tv shows / movies that were filmed in poor quality and/or different aspects. Currently we have to choose between stretching/scaling or shrinking it and stuffing it inside the actual frame. Would be lovely if it could just expand the view to whatever resolution/ratio I was using. I'm sure there are better applications, but just the same, let's start with giving me my childhood media in all of its full HD glory. There are already some amazing techniques that bring up the frame rate, would love to see it all combined.
NeRF can't produce things in higher quality than what your input is, it's strength are novel view creation, i.e. "interpolating" between different views of the same scene. What they do in these papers is use tricks to interpret time/camera/head movement as new viewing angles/parameters for the NeRF. You need a scene and multiple views of it to train a NeRF to represent that scene, but a representation can't really be better than what it is trying to represent.
I really wish more people could witness this, and how amazing, incredible and unbelivebeable things are becoming each day. But instead, we have to deal with earthflatters, people that believe we never been in space or the Moon. How could they believe soon we'll be able to travel around a photograph taken decades ago? That's mindblowing!
6:22 Really hoping to see this get implemented in end user products like premiere pro.
VR would be practical. As well. If both NERF scientists and VR people talked, it would be productive
Man, 2022 will be the explosion of VR. Better get some cash bro because you're gonna need to experience this!
Imagine what this could do for google street view!
120hz wide variable focus dispays in VR (so you can shift focus like irl and not everything is fixed at infinity) with this technique for street view!! yes please
@@bl4ck1911 and also moving down the street could be almost seamless. no more weird disorienting warping effects
Wow, can't wait to see this method used with historical footage!
Going full Cyberpunk 2077 this episode I love it!
2:37 is the reason I binge watch these videos hahaha
This is exactly what I was thinking about yesterday, but with stereo video!
it is amazin... only as an observation... in the freeze time examples of the dog and one of the jumping guys... the shadows moves with the camera as if you were moving the light.
I still think it's too many instances of information per slide (decried the blurring in a previous comment). this time it was hard for me to appreciate the 3D video stabilisation with the other dancing frame next to it. guess I have to take a piece of cardboard or move half of the window outside of visible screen space. what a time to be alive!
5:30 Very good but It needs a shadow detector, look at the jumping guys shadow
Question (may be dumb): if the algorithm can predict the picture from (x + e) angle out of the picture from angle x, is it possible to predict using angle (x+e) as an input, therefore predict (x+2e)? what would be the limit of that? thanks
The NeRFs use images from many angles as inputs (in videos it's simply the angles that result from the camera movement), and in a way tries to "interpolate" between those. I'm not sure how familiar you are with machine learning, but it learns a representation of the scene as a "radiance field" which is nothing more than a function that takes as input a position in space (or in this case time as well) and a viewing angle and outputs a color and a density.
It is in general good at predicting anything within the "minimum/maximum" angle/time but very bad at anything beyond that. If you have input images from angles 0-180, it won't be good at predicting things from an angle greater than 180, because there's no information in the inputs about that. If the object is symmetric, it might output something reasonable, but this is not because the model has any understanding, more of a lucky accident.
Now imagine that Method could also detect common things and apply a simulation to it, like a bonestructure to humans or a meshmodel to geometric shapes. Then you could interact with the things and reshape them on the go, like on the spot cgi. You could also drastically reduce the information (and bandwith) needed for a set scene, like a videoconference, since it's fully defined by the parameters of that simulation and could be rendered with a graphics engine.
Could this possibly be used to upload a lot of old family footage and generate a map of their home from nearly 100 years ago? I think it would be an amazing gift.
Seems we are getting closer to Blade Runner's enhanced photos where Deckard almost navigates the picture like if it was a 3D environment.
Two papers down the line, AI is going to be manipulating the space time fabric itself.
DIO!!!
Truly looking interesting. First AI use that seems not useless.
2022: AI learns how to reverse time like the Flash
Queue epic JunkieXL Flash theme
I am so glad this wasn't posted yesterday!
Ok I know it’s not what you want the program to do but 2:10 looks really cool
Kool. Needs to be hybridized with a GPT-3 image to language abstractor, to create poetic image continuations that encapsulate the thematic elements like some dreamy recollection of the past lol...
That is INCREDIBLE progress. My papers have blown up at this point.
This is great, I hope it gets implemented in video editing Software next to persistence tracking and stabilization
Company: **Makes AI**
AI: _You fool! You shall soon know the true power of the world! As it's name suggests, it's power can rule the whole world!_
It seems you can use AI for all sorts of things, just because its such a powerful tool. Because of that, I'm wondering if you think its possible to make an AI programmed to make you a better person.
First the Ai must know what humans consider "a better person"
@@Felhek Facebook was able to use an AI to purposefully make people angry (true story), so they were able to define that correctly. Maybe it's harder to define what makes someone a good person, but AIs are clearly capable of it.
5:34 Well, the shadow of the jumping man shouldn't really be moving, it is not dependent on our position. Next paper maybe?
now, we will have a wonderful time to live in... *forever.*
2 Minute Papers is becoming 10 Minute Papers
What a time to be alive!
The NHL does some pretty neat tricks with that in their replays. Coupled with multiple camera angles, this is very impressive
5:35 apparently the shadow is a physical object attached to the wall, instead of being projected to the ground...
Oh no! Pretty soon we cannot make fun of “can you enhance that”-scenes in CSI anymore!
Technically all of this is very cool.
Practically I really am not sure if I like the concept of augmenting video footage. Like, do I really want my doggo to have fake feet?
ZA WARUDO!
You thought it was a Robot but it was me DIO!
This can cause a revolution in video compression.
man that's amazing
we can get meme videos from 6 years ago and remake memes through that..
This is mindblowing
i dont understand the use of this, maybe it can create pbr textures just from video? or it can smooth a jaggery video ? but i cannot see intense ai usage like the gpt or gan because all the depth and 3d perception and 3d scenario building are already done super fast with the help of iphone's lidar sensor and other hand held sensors
There are vast applications of NeRF in general, time is just the first step. You can add any dimension to the input, such as expression parameters for faces. NeRFs are the next big step in "deep fakes", expression transfer, etc.
wow, this is almost like the foto analysis we see in the movie Blade Runner.
Always thought how ridiculous that was back then, not anymore
This is how exponential progress looks like
Training for two days on two V100 GPUs to get these results (Per 15s clip). I dunno but at some point the training time becomes relevant when it's this extensive.
I don't know why but when the old on to your papers symbol comes up it gives me a sense of joy 😂😂🥺
2:45 Thought I had dropped my phone in the dark.
CSI 2030:
- we got photo of possible suspect's strand of hair
Ok, zoom out and pan left.
- got him!
Nono, wait, zoom out and pan up
- home address, nice, sending a swat team.
Now zoom in on 2'nd window on upper floor, remove the wall and enhance
- Jerry, not everyone wants to look at someone's spouse on the toilet...
- Classic Jerry..
- goddamn Jerry
So... The AI is Dio eh?
I wish that D-NERF and NERF-W could combine and be shared publically... I have a somewhat perfect scenario I want to test it on.
In 50 years Two Minute Papers will upload a 3D virtual neurofeed with the same title, about an AI that can for real stop time
Any idea on when these technologies will be in consumer software? This is awesome!