Structure from Motion Octocopter - Computerphile

Computerphile

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 24 січ 2025

КОМЕНТАРІ • 139

@mogami4869 9 років тому
I like that you are actually recommending a book in the end each time, as compared to many other channels who are sponsored by Audible and just tell me to follow their link..thank you!
@TheVladBlog 9 років тому
This is so good! We are slowly coming towards computer visual understanding. These guys have already developed a "basic" system for the AI to differentiate between different kinds of objects in it's sight.
@hcblue 9 років тому
Such a cool device / technology. Thanks for covering both the theoretical / low-level subjects, e.g., algorithms, and more practical applications / projects, Sean!
@IDontDoDrumCovers 9 років тому ⁺³⁷
the next google earth is be crazy haha, just imagine like 10,000 of these flying around cities taking pictures of everything in swarms
@GtaRockt 9 років тому ⁺⁹
+Social Experiment and when they they play the ryde of the valkyres all the time
@zoranhacker 9 років тому
that could happen lol
@BoJaN4464 9 років тому ⁺¹
+Social Experiment Google already does this with satellite images, no drones required!
@jmac217x 9 років тому
+Social Experiment That's a scary cyberpunk future you just made me imagine. I'm kind of wishing it to happen...
@frollard 9 років тому ⁺¹
+Social Experiment Not sure if you've seen this with current gen google maps/earth; in many cities it's all scanned in 3d already. It's creepy since it can look below obstructed side views of houses (ie trees)
@NikiDaDude 9 років тому ⁺⁴
On a related topic I'd really like to see a video on the software Google use to generate the detailed topography in Maps and Earth.
If you look at most major cities you'll see that all the buildings, trees and even small objects like cars have been turned into a polygon mesh and textured.
@klaxoncow 9 років тому ⁺¹
+Nick Yes, I'd love to know exactly how they've managed to do that.
It covers the entire planet (correction: planets), so it had to be an automated process.
But the textures show us the sides of objects - so it can't be coming from the satellite data alone, as satellites can only see things top down, they don't see the sides of things.
My secondary thought was that maybe they've cross-referenced it with their Street View images. The satellites see from above and Street View sees it from the ground.
But the thing is that Street View doesn't have full coverage of everywhere. Most of it is obtained by driving a camera car around, but that means it only covers what can be seen from the road. And not all roads are covered.
As you can see when you try to drop the little yellow man on the map to trigger Street View, which highlights which roads have Street View coverage, there's plenty of places where the Google car never goes - because there are no roads, or the roads are private roads, or whatever.
I thought "ah, they've cross-referenced it all with Street View and altitude data to automate a 3D map" - although, fair play, that's still one hell of a computational challenge unto itself not to be sniffed at, isn't it? - but it seems to have coverage well beyond what a satellite could ever see or Street View has coverage of.
So, how the hell did they actually do it?
And how did they do it so well? As I've not yet spotted a single mistake going around looking at this all over the planet. So it must be very good quality source data, as the automated process just ain't getting it wrong anywhere.
So, yeah, if Computerphile could find a Google Maps engineer to explain that one to us all, I'd definitely be interested to know, as they have seemingly done the impossible there!
@manhaxor 9 років тому
+Nick Yasutaka Furukawa is a Google programmer who created an MVS (multi-view stereo) algorithm that's used to make the reconstructions in Maps and Earth. I would like to see a video that explains a few of the different methods of 3d reconstruction.
@unaliveeveryonenow 9 років тому
+KlaxonCow Wrong, satellites can see things at slight angle, but they have fixed orbits. So at least you would need images from multiples satellites. But how would they solve the problem of tall buildings covering small objects at their bases?
@stheil 9 років тому ⁺¹
+cyberconsumer I seem to recall that they don't only use satellite images (both straight down and at an angle) but also aerial photography from planes (and/or helicopters). And those can cover a much shallower angle, obviously.
@jmac217x 9 років тому ⁺⁷
Hey Sean would you consider using a monopod stand for your videos? I know it's like the trademark to have that shaky cam, but it's a bit unsettling at times, especially when the camera is pointing a single direction most of the time anyway. I don't want to interrupt your seemingly quick workflow, but something like that could be easily maneuvered to adjust for those close up shots you get, and something with only a single leg wouldn't be too cumbersome to rotate or quickly move. Just a suggestion cause your videos rock.
@Computerphile 9 років тому ⁺⁷
Will consider, I try to use tripod most of time these days, only go handheld when situation calls for it or if I haven't got access to my tripod (it happens) >Sean
@jmac217x 9 років тому ⁺³
Awesome. I knew that you must have used something for those videos with Professor Brailsford, but thought I'd mention it anyway. I really love those videos, for more than just their video quality :3
@JohnDoe-pn9ml 9 років тому ⁺²
Videos like this make me wish there was an engineerphile.
@yomaze2009 9 років тому
It would be interesting to see different forks in the learning algorithm that focus on detecting properties of "challenging objects" and see how quickly it "decides" to use alternate techniques to get full coverage of the object. I Imagine crowdsourcing the determination of what areas of a 3d model "need more work" by offering the 3d model alongside the video taken by the device online. Also include that ability for the "crowd" to slice into the 3d model to isolate the "least defined" component of the feature. The system could then attempt a solution and provide it back to the crowd for further analysis. Could be made to be fun as a game of sorts!
@GroovingPict 9 років тому ⁺²⁹
could you maybe film in a place with even more background noise next time? Thanks. I was almost able to focus on what he was saying in this one, and we cant have that now can we.
@Computerphile 9 років тому ⁺⁵³
Yeah if the world was perfect I'd film all interviews on a sound stage with a full camera crew and separate sound crew and pay the contributors so they have the chance to set up their equipment...
@GroovingPict 9 років тому ⁺⁴
+Computerphile Yep, cause perfect studio conditions is definitely what I meant.
@unverifiedbiotic 9 років тому ⁺¹¹
+GroovingPict Have you any idea how hard it is to find a quiet spot in a public space when doing an interview? I think that the microphone they gave the guy was doing a pretty good job all things considered. You can edit out some of the noise that stands out a lot and doesn't overlap with the audio you want to keep, but this is almost impossible if the entire audio is polluted and takes A LOT of time in each case and the end result may be even more distracting than the noise itself (audio artifacts). In the end, recording an interview and editing it is often more difficult than the viewer can imagine and you can really screw up your upload schedule if you don't teach yourself to ignore such minor issues.
@mindfulmike8612 9 років тому ⁺⁶
+GroovingPict People have to work and if they're going to interview in the place where work is actually being done there is going to be background noise. Quick being such a whiny little brat and appreciate the FREE CONTENT they're giving you.
@seanski44 9 років тому ⁺⁶
+GroovingPict ah so you weren't talking about a perfect world? Do you think I chose this location? Get real!
@kensmith5694 9 років тому
Add some really good magnetic sensors and in a lot of cases you could image what is hidden by the outer surface.
@yomaze2009 9 років тому
Also, hardware/software side this could benefit greatly from the work being done on the analysis of the polarity of reflected light to determine the absolute color, texture, reflectivity, etc. of the individual objects.
@Adamantium9001 9 років тому ⁺²
How does the vehicle keep track of its own position in order to report where each picture was taken from?
@xell2k 9 років тому
+Adamantium9001 it uses GPS basically. The 3D reconstruction and camera motion from the images is than automatically estimated using the structure-from-motion pipeline (without gps), and then simply "moved" to the real-world coordinates using the rough GPS coordinates or ground control points. while flying the vehicle localizes itself using GPS only
@aritakalo8011 8 років тому
+Manuel Hofer GPS or reference markers. You can see the markers on the table or in the field video. Probably GPS is little bit in accurate for this job. GPS only UAVs have a tendency to drift around somewhat while trying to hover precisely, since GPS is not really pinpoint accurate.
Also since they fly under overhangs etc. it might block GPS, so they probably use those geo reference markers to get a local reliable reference.
Some UAVs do this be themselves to some extend. They have down looking optic and/or laser scanners to scan the ground below to get instant local reference for hover holding. Of course it is relative and not absolute, so hence the marker plates. From those you get absolute reference with optical scanner (aka camera)
@EtrielDevyt 9 років тому ⁺¹
This is gonna be great for location building for games!
@manhaxor 9 років тому ⁺¹
+EtrielDevyt there's more efficient ways. Like a 3d scanner that takes color data as well as position data. Structure from motion is the best method for making a somewhat decently detailed reconstruction of a large area, but with flaws and distortions included.
@titaniumdiveknife 9 років тому ⁺¹
Genius! Wish my German and programming were half as good as his English.
@Aragorn450 9 років тому
This sounds a lot like what senseFly does, but at higher resolution and with more autonomy. Their system is used for mapping cities some, plus keeping track of agriculture growth and all sorts of other things. I could see them approaching these guys to buy their technology for sure.
@JamesJansson 9 років тому
You should totally join computerphile and numberphile to cover 1) computer algebra systems and then 2) computational theorem provers.
@ZadakLeader 9 років тому ⁺²³
The sounds of forks and metal things in the background...
@seanski44 9 років тому ⁺¹²
Actually a lot of the noise was people doing demonstrations of rock carving...
@linkVIII 9 років тому
+Sean Riley sounds very restaurant like
@seanski44 9 років тому ⁺²
+linkviii yeah people were clearing lunch but the tap tap tap is someone chipping away at a piece of rock
@NevaranUniverse 9 років тому ⁺¹
+Vlad Ţepeş hungry developers are hungry!
@jopaki 9 років тому
effin exciting stuff here man! wow.
@Systox25 9 років тому
TU Graz? nice!
@NeilRoy 9 років тому
Fascinating idea. I wonder if such a system could be used on planetary exploration, like say Mars. Or even underwater etc... kewl stuff anyhow.
@devjock 9 років тому
So this octacopter is basically doing what people with complete blindness in one eye are doing? Rocking back and forth to see what background areas of a picture are being exposed / obscured behind objects in the foreground? Yeah that takes a lot of computing power. I'd imagine the algorythms used to reconstruct 3d geometry are modeled on the way human brains work to accomplish the same task (in the case of humans, mostly based on mapping out walking surface, obstacle avoidance, and impact threat assesment). Is it done with dense neural networks? How would those be trained? Or is it a selflearning network? Something completely different?
Also, given the fact that it would be trivial to have that octacopter carry one more camera (for stereoscopic image aquisition), what was the reasoning for that not getting implemented? I'd imaging the aquisition phase would be way more streamlined if the octacopter had 3d imaging in place to begin with..
So many questions!
@Encypruon 9 років тому
What about moving things? Like leaves, trees moving in the wind, doors, cars, animals, humans, wind turbines in the background, water and changing light conditions (clouds, the sun moving during the process, flickering synthetic light...). And what about reflective surfaces and refraction? Some surfaces don't look the same from different angles...
Can any of these things be handled reliably? I imagine it to be very hard to construct meaningful models with things like these in the scene.
@illusivec 9 років тому ⁺²
So photogrametry and agisoft photoscan?
@manhaxor 9 років тому ⁺⁹
+flanker It seems like he's more selling the semi-autonomous process of the UAV deciding to take more pictures of harder to reconstruct areas.
@brummii 9 років тому
+flanker I think that the important function would be the UAVs sharing information between each other and "crowdsourcing" the creation of a 3d map, which all of them can use to navigate simultaneously.
@manhaxor 9 років тому
brummii That's an interesting idea. I've only ever seen realtime 3d reconstruction for single devices, like google's self driving car, and a few pieces of mining equipment. I'm sure there's more, but I have yet to see multiple devices use the same reconstruction.
@Durakken 9 років тому
Is the reason it has to be a static object due to processing power or some algorythmic problem? I don't see why a motion prediction algorythm couldn't be incorporated into that considering that a lot of animation is based on prediction algorythms now other than the render time for those things, but usually that is due to the quality of the render and not the animation I think.
@xell2k 9 років тому
+Durakken basic Structure-from-motion only works for static scenes. You can only solve the optimization problem behind it when the 3D points "do not move". Otherwise it would get much more complicated. However, if most of the scene is static and a few objects are moving it usually works too. Then the moving things are usually just cancelled out automatically. There is of course software to reconstruct a dynamic scene, but here you usually have the assumption of a static camera, and if not, the algorithms are not ready to be used outside of the safe lab environment (as far as I know)
@ozdergekko 9 років тому ⁺¹
Yeah, fellow Austrian *and* Austrian institute of technology (technische Universität Graz)
@hanniffydinn6019 9 років тому ⁺¹
How does this compare to a laser scanner attached to a drone ??????
@MrSabba81 8 років тому
Hi thanks for sharing this. I am wondering if changing the perspective it would also be possible to use it especially for vegetation instead of excluding it: do you think we could estimate biomass (volume) of trees, shrubs, etc. with some adaptations? Thanks
Simone
@yourfilmindustry 9 років тому
They're not rocks, they're are minerals!
@RAHUDAS 4 роки тому
can anyone tell me what ML Algorithm used along with the Photogrammetry in this demonstration ??
@NizarElZarif 9 років тому
I was wondering, does anybody knows the name of the algorithm used ? like is their is a paper to read or tutorial to watch ?
@leopoldarkham7017 9 років тому ⁺¹
+Nizar El-Zarif Searching for Pointclouds and Poisson surface reconstruction will get you going in the right direction
@NizarElZarif 9 років тому
Leopold Arkham Thanks
@unaimb 9 років тому
That’s photogrammetry with dense pointcloud Poisson mesh reconstruction, it’s been used for matchmoving in the film and TV industry for about 6-7 years now. It seems interesting that they chose to make the whole software from scratch instead of just the drone driving bit… I guess most matchmoving software is not really open when trying to expand its capabilities in such a way.
@xell2k 9 років тому ⁺¹
+Unai Martínez Barredo Exactly, there are a lot of SfM pipelines out there (freeware and non-freeware) that work basically in the same way. However, many of them are closed-source which makes it uncomfortable to extend them. since we are a research institution, we aim at developing new tools, such as the view planning you see in the video. Our basic 3D reconstruction software has been developed since about 2006 or 2007, and we are basically maintaining it and extending it with further apps and functionalities. And since we have the full source code, this is ideal for research
@unaimb 9 років тому
Nice :)
@LastofAvari 9 років тому
Cool stuff :)
@unvergebeneid 9 років тому
Automatic photogrammetry ... this might be great for indie game developers! Of course it might also save lives by predicting avalanches and landslides and boring stuff like that.
@0MVR_0 9 років тому
Who where the first to introduce photogrammetry to machine learning?
@rockosigafredi 9 років тому ⁺²⁵
Who's here is also from Austria? :-D
@WolframHofmeister 9 років тому
AUT rules! 😁
@Anvilshock 9 років тому
+rockosigafredi Schluchtenscheißer ...
@MayhemUniverse 9 років тому
Servus!
@malaysiaszsz.hiphop_repres6278 6 років тому
yuck ugly country shitty people
@Triantalex 2 місяці тому
Noone.
@THEMATT222 Рік тому
Noice 👌
@TheKirkster96 9 років тому
What if the intelligence could identify the position and dimensions of some vegetation (like a tree) and then just generate a model to fill in that space and give the viewers a representation of "hey there is some tree there, but we can't scan each branch and every leaf into a accurate model."
@frigeragmady9625 9 років тому ⁺¹
blaming background noise, distorting the vocals of an intellectual while he speaks intelligent-stuff (i am dumbing myself down to level with some of these commenters) is kinda like not wanting to blame yourself for not understanding the intelligent-stuff
@kilésengati 9 років тому ⁺¹
Photogrametry is great for game development, too.
@Hwyadylaw 9 років тому ⁺¹
This is just me being a language geek, and not relevant to the video, but I feel like the ones with four rotors should be called Tetracopter instead of Quadrocopter.
@Ptolemusa 9 років тому
+McDucky Indeed. ^^
@TheAllardP 9 років тому ⁺²
+McDucky
It's the same thing. Quad is latin and Tetrad is greek.
@Hwyadylaw 9 років тому ⁺¹
Philippe Allard
I wouldn't call myself a "language geek" if I didn't know that.
It's Téttrares (τέτταρες) or Téssares(τέσσαρες) in Ancient Greek. It's Quattuor in Latin.
@jazzpi 9 років тому
+McDucky Why?
@antivanti 9 років тому ⁺³
+McDucky So your preference for Tetracopter stems not from knowledge of languages but for an arbitrary preference for Greek over Latin? =)
@gen157 9 років тому
Early, but not first. Not that I care, just wanted to tell others about it.
About the video: Non-English speaker speaking English makes it a little hard when he isn't fluent enough. I understood enough, but needs to understand sentence structure a little bit.
@cavalrycome 9 років тому ⁺⁷
+Gen15
I can see why some viewers might have trouble with the speaker's accent, but his grammar is very close to perfect.
@JapTut 9 років тому ⁺²
+Gen15 and the background noise doesn't help either.
@NeatNit 9 років тому ⁺⁵
Is it really impossible for you to drop the Audible sponsoring? It's getting really annoying.
@quakquak6141 9 років тому ⁺²⁰
+NeatNit it's at the end of the video, less annoying than this is impossible, I don't see the problem
@NeatNit 9 років тому ⁺²
+quak quak I guess you're right... It just kinda sickens me that they have to pretend to be excited and genuinely interested in audible when it's obvious that they're paid to say that.
@trucid2 9 років тому ⁺¹¹
+NeatNit It bothers you when others make money from their work? Should they be working for free for your sake?
@NeatNit 9 років тому ⁺²
+trucid2 not what I meant, see my previous reply
@AustrianAnarchy 9 років тому ⁺³
+NeatNit Maybe they are genuinely excited about the Audible anyway and the sponsorship made them positively giddy?
@chris24hdez 9 років тому
It's called 3D Photogrammetry
@serkantan2951 5 років тому
3D reconstruction I believe would be a broader definition. Aside from mere photographs, there are other ways to measure the depth in an image even though they don't really discuss those methods they probably taking them into account such as shades, illumination, defocus, texture.
@GroovingPict 9 років тому ⁺¹
"take this images"...
@IstasPumaNevada 9 років тому ⁺¹
+GroovingPict This comment coupled with your other one is pinging my troll-detection algorithms.
@Anvilshock 9 років тому
+GroovingPict If you do the equivalent of squinting with your ears really hard, you can almost hear he actually says "these" really fast ...
@Anvilshock 9 років тому ⁺¹
GroovingPict Yep, it's the dumbest thing, and - Congratulations - YOU got it! Here's your Golden Dunce Hat prize!
@espalorp3286 9 років тому ⁺²
enemy uav spotted
@iseslc 9 років тому
+Proteus Battlefield player spotted
@tisimo123 9 років тому ⁺¹
+iseslc or any call of duty game after 4
@iseslc 9 років тому
tisimo123
That's right! I haven't played those, though... only BF games.
@TomMinnick 9 років тому
I've heard it called "Photogrammetry" before, but this is the first time I've heard it called "3d reconstruction"
@Diggnuts 9 років тому ⁺⁹
I hate the term "structure from motion". It makes no sense.
@manhaxor 9 років тому
+Diggnuts Well it's usually used to reconstruct a scene from video. I agree that it's strange to hear it used when the source material is still images.
@antivanti 9 років тому
+Diggnuts "Parallax 3D reconstruction" might be a more correct term? Or just photogrammetry.
@Diggnuts 9 років тому ⁺³
Anders Öhlund I prefer photogrammetry as that is quite simply precisely what it is!
@ivesennightfall6779 9 років тому
I saw Ubuntu /o/
@jmac217x 9 років тому
As soon as he said _intelligence_ i cringed a little. I don't want that to become the next buzz word. An algorithm like that does not equate to intelligence in my opinion. Everything about this guy seems off to me.

Наступне

Автоматичне відтворення