And even freelance production projects ! As part of a team I've created one game with kinect v1, and another one with kinect v2. What a great piece of hardware.
@@glass1098 hi ! Unfortunately I don't have a portfolio page. But the first one was an air-hockey style game, where the player held a broom with an ir led, which was detected via kinect, and the players had to put the trash in the correct recycling bins. The other game was a penalty shootout game which detected the players kick. :)
Kinect technology became faceID in iPhone and iPad. Not a failure at all but providing very secure and just about invisible biometric authentication to about a billion people on a daily basis.
4:27 "I should put an artwork up or something." Take a depth-field picture of that wall, print it out and hang it back onto the wall. Now it's a piece of art!
A sidenote to Kinect: The Kinect v2 uses time of flight, which some people like, some people hate. What I find most fascinating is that the Kinect lives on, both as Kinect for Azure, and the depth sensing tech the Hololens has. While not successful as a motion control method, it's still really useful when used with a PC.
I wonder what this might do with a mirror. I expect it would see the mirror as a "window" where there's a lot more depth, but I wonder how it would handle the weird reflections of the IR dots.
The dots necessarily spread out from the projector, so even if a mirror was placed perfectly perpendicular to their flight path barely any would reflect back in the right way to generate a coherent depth image.
Alas, Intel discontinued the RealSense line of products. The librealsense library will be still maintained (if I'm correct) but no new hardware is going to be released.
@@arcmchair_roboticist There is some truth to this. KinectV2 and V1 are both excellent. I think it's mostly down to a decade of software refinement though. From a hardware perspective the RealSense L515 should mop the floor with everything. It's a shame it was dropped.
Intel has discontinued some of the products, but the stereo cameras would continue to be sold (D415, D435I, D455) for sure. The librealsense library is still maintained (new release today).
I wish he would've done a more in depth explanation about the device. Like what data type is used for the depth field? Is it a 2D array of floating point values since depth can technically be infinite? Is it calibrated to only detect so far? Or does it use a variable-depth rate with a finite sized data type (like an integer, as in the other rgb fields) that adjusts the value according to the furthest object it senses?
So, thinking about it, it's likely that the RGB aspect is an integer or a fraction between 0 and 1. That's pretty common, and for RGB, those two are going to be functionally identical, since a computer is likely only going to be able to display in 24-bit color anyway. So, for the color, it probably doesn't matter, and it could go either way. The depth is probably a fraction between zero and one. That would allow you to map between the visible colors pretty accurately, and display a fine-grained depth map, which we see in the video. After all, you only need 32 million values, and the resolution of a 32-bit floating point between 0 and 1 gives you that reliably. Re: 2d array, I wouldn't be surprised if it's indexable as a 2d array in the API, but it's probably stored as a 1d array, since translating from coordinates to an index (and vice versa) is trivial. I don't know if that's actually what's going on, mind you, just making some assumptions based on similar technologies.
I'll bet it's just an extra byte, just like R, G, and B are each 1 byte. 256 integers, maybe in a logarithmic scale so there's more precision for near values than far ones.
Keep in mind, you don't have to store colors as 24-bit (three byte) colors, that's just a convention because that's what most monitors support. If you're working with optical data, you may or may not be limited to a 24-bit color. For the depth, only having 256 "depth steps" seems _really, really_ restrictive.
@@b4ux1t3-tech Yeah, I just meant the common 24-bit RGB format. 8 bits for depth could be too little, though I thought it might be enough to give the extra boost a neural net needs. You could easily do more bits. I was wondering if instead of inventing a new format they actually just produce a separate file that's a grayscale image for the depth. Then you can combine them yourself or just use the standard RBG image when you don't need depth.
I still use my Kinect - mostly to just log into Windows with my face, but also as a night camera to keep an eye on our new foster dog when he's home alone. It's amazing that a piece of hardware almost a decade old is still so good at what it does!
This device, Intel RealSense D435 and its peers, are so under appreciated. The hardware is brilliant but at the same time the wide range of support its packages offers is amazing. They have regular support with ROS, edge computation platforms like Jetson nano and as a stand alone relasense SDK. If more people knew about this and used it, Intel would not have dared to thought of shutting this down. There are other cameras similar this, like Zed for example, but the wise array of support realsense offers ha no competition.
You don’t even need an SDK. If you have a network card, there are devices that run driverless and are compatible with industrial and FOSS software. We are building one of these.
This is really insightful. We are using stereomaping, similar to the techniques used by Landsat and World View satelites, for my Master's Thesis! This technology is super cool, glad you are showig folks how it works becasue there are so many applications beyond the kinect!
I'm reminded of my sadly unsupported Lytro Illum camera, a "lightfield" device. Being able to share "live" images was fun, and it's a shame they didn't release that back-end code as open source so something like flickr or instagram could support it. You can still make movies of it, but the fun of the live images was that the viewer could control the focus view of your photograph.
For everyone looking for a realsense alternative, occipital are still shipping their structure sensors and structure cores. Works on similar principles.
Now if we can get IRGBUD (adding (near-)Infrared and Ultraviolet), that'd be cool. (Even cooler would be FIRGBUD, but far-IR tends to require sufficiently different optics that I definitely won't be holding my breath for that one.)
Ah, just gotten a couple of 435's for the lab this year. The funniest bit so far is how it sometimes does a perspective distortion of featureless walls much more realistically than photoshop does :D
Is there already a video conferencing tool which takes advantage of this? This seems huge for being able to eliminate background and focus on the face.
But image depth could also lead to poor performance if it catches more noises leading to a general data shift. I guess the processing step should be carefully done.
Yes, LiDAR would be way better, but it's going to cost you ten or twenty times more than this device. This is geared more for prosumer tinkering, while LiDAR is more for autonomous driving, or other situations where human lives hang in the balance.
I imagine it would also be more useful in time based solutions - because Lidar requires it to count the time for the signal to return back to do calculations, and the infrared emitter could be used to get the depth information a little bit faster - because you're only waiting for the image to get back the first time, and you get more information on the lens at once, based on the pattern in the image. You could probably get even more accurate depth perception if you combined lidar with this.
@@ZT1ST also unless the lidar laser is changing direction for each pixel, which would have to happen extremely quickly, you would have to use a number of lidars that probably can't move and will get a much lower resolution depth channel. Maybe it could supplement the stereo information or help calibrate the camera but overall not super useful
Could you take one of these, then attach it to a mirror setup which separates each len's vision by far more distance, and then use it for longer distance range finding (like a WW2 stareoscopic rangefinder)?
Question: *How* is the depth stored? RGB uses values between 0-255 to store the intensity and you can work out the percentage of that could in that pixel. How about depth? Does it also have 1byte? What does it mean? Can you calculate the actual distance from the camera?
I mostly worked with depthimage, which is essentially a greyscale image where lighter pixels are closer and darker pixels are further away. On the other hand there is pointcloud, which is an array of 3D points. Typically that can be structured or unstructured, e.g. a 1000x1000 array of points, or a vector of 1000000 points. Perhaps this isn't as detailed as you'd have liked but this is as in depth as I've gone
huh.. Thought this was more solved than it is. Even with dedicated hardware, you can only get sub 30fps directly from the camera. I suppose the directly from the camera and cheaply is key words
Computerphile, a very very earnest request, every video you post sparks a hunger for knowledge on that topic, could you plz plz attack some links or anything from where we can actually learn that stuff. I and I think ma y others like me will be very grateful if you could do such a thing.
when will they make camera modules which can simultaneously capture RGB and the IR from the same lens? that way, we have no parallax error between the depth and colour data
What happened to that time-of-flight RGBD webcam Microsoft bought just a little before they released the Kinect? Did they just buy it out to try to stifle competition and left the technology to rot?
I thought the kinect when announced promised to no use any processing power of the console, but in the end because of cuts actually did? Am I misremembering?
At work the computer-vision use an Intel D435 to segment parcels on a conveyor belt. And they DON'T USE THE DEPTH for that. Only the RGB. They use the depth for other things, but not for that. Also I'm pretty sure that they DON'T POST-PROCESS the depth image.
Why not take four instead of two sensors and fill in all the gaps for the two sensors inbetween the outer ones? That would imho absolutely complete the stereoscopy and further more increase the quality of all the interpolated/corresponsing pixels of the motive... or do I not get sth here? Why does it have to be two? Anthropomorphising a bit too much here?
2:41 Me: Looking at my finger and something distant. Hey, there was some text there. Must be something about looking at finger... (my thought when rewinding the video). :D
Many possible reasons: In math, 3-7=-4, but in computing, if you're not careful, 3-7=4294967291. And, as you're already not being careful, that super-maximum-mega-extreme number might cause something to be very out of bounds. When trying to match patterns, you gotta make sure that your code doesn't go completely wonky when nothing in the image can possibly match. If you're lazy (as all programmers are at some time), then you might write the program to keep looking until it finds a match. That might just keep searching the image over and over, or it might get to the end of the image and start poking around at every memory location on the computer, even areas marked as "forbidden, trespassers will be shot" by the OS. The part of code that gets called to handle the extreme case may not have been properly written, possibly having errors in basic syntax. It's not ever supposed to get called, so it's not checked as well or as often. There are others, but those are the big three.
Ah the Kinect. Such a massive failure as a gaming peripheral but pivotal in so much computer vision research/DIY projects.
And even freelance production projects ! As part of a team I've created one game with kinect v1, and another one with kinect v2. What a great piece of hardware.
@@MINDoSOFT Which ones?
@@glass1098 hi ! Unfortunately I don't have a portfolio page. But the first one was an air-hockey style game, where the player held a broom with an ir led, which was detected via kinect, and the players had to put the trash in the correct recycling bins. The other game was a penalty shootout game which detected the players kick. :)
I have three of the v2, and 2 of the v1
Kinect technology became faceID in iPhone and iPad. Not a failure at all but providing very secure and just about invisible biometric authentication to about a billion people on a daily basis.
I just LOVE that he talks to the camerman and not us, makes it so much more candid and easier to watch as a viewer!
4:27 "I should put an artwork up or something." Take a depth-field picture of that wall, print it out and hang it back onto the wall. Now it's a piece of art!
@yefdafad I think you might have forgotten to switch Windows
Mike always has something exciting.
- "Probably have to give it back"
- "Oh no, it fell off... my car"
That comment at 2:41 was magic. Caught me red handed!
Always great to learn from Mike Pound!
He's a lot better than Mike Pence
Entering poundland
A sidenote to Kinect: The Kinect v2 uses time of flight, which some people like, some people hate. What I find most fascinating is that the Kinect lives on, both as Kinect for Azure, and the depth sensing tech the Hololens has. While not successful as a motion control method, it's still really useful when used with a PC.
Why people hate it?
I wonder what this might do with a mirror. I expect it would see the mirror as a "window" where there's a lot more depth, but I wonder how it would handle the weird reflections of the IR dots.
Wow 🤯
Interesting thought!
Now I’m curious too!
Someone needs to do this please!
It does not do well really. Same things with transparent objects
The dots necessarily spread out from the projector, so even if a mirror was placed perfectly perpendicular to their flight path barely any would reflect back in the right way to generate a coherent depth image.
Alas, Intel discontinued the RealSense line of products.
The librealsense library will be still maintained (if I'm correct) but no new hardware is going to be released.
I wish they'd maintain the L515 a little better. The 400 series seem to be well supported, but the 500 series is a vastly superior sensor.
There is still kinect which actually works better in pretty much every way afaik
@@arcmchair_roboticist There is some truth to this. KinectV2 and V1 are both excellent. I think it's mostly down to a decade of software refinement though. From a hardware perspective the RealSense L515 should mop the floor with everything. It's a shame it was dropped.
Intel has discontinued some of the products, but the stereo cameras would continue to be sold (D415, D435I, D455) for sure.
The librealsense library is still maintained (new release today).
Wrong
I wish he would've done a more in depth explanation about the device. Like what data type is used for the depth field? Is it a 2D array of floating point values since depth can technically be infinite? Is it calibrated to only detect so far? Or does it use a variable-depth rate with a finite sized data type (like an integer, as in the other rgb fields) that adjusts the value according to the furthest object it senses?
So, thinking about it, it's likely that the RGB aspect is an integer or a fraction between 0 and 1. That's pretty common, and for RGB, those two are going to be functionally identical, since a computer is likely only going to be able to display in 24-bit color anyway. So, for the color, it probably doesn't matter, and it could go either way.
The depth is probably a fraction between zero and one. That would allow you to map between the visible colors pretty accurately, and display a fine-grained depth map, which we see in the video. After all, you only need 32 million values, and the resolution of a 32-bit floating point between 0 and 1 gives you that reliably.
Re: 2d array, I wouldn't be surprised if it's indexable as a 2d array in the API, but it's probably stored as a 1d array, since translating from coordinates to an index (and vice versa) is trivial.
I don't know if that's actually what's going on, mind you, just making some assumptions based on similar technologies.
I'll bet it's just an extra byte, just like R, G, and B are each 1 byte. 256 integers, maybe in a logarithmic scale so there's more precision for near values than far ones.
Keep in mind, you don't have to store colors as 24-bit (three byte) colors, that's just a convention because that's what most monitors support.
If you're working with optical data, you may or may not be limited to a 24-bit color.
For the depth, only having 256 "depth steps" seems _really, really_ restrictive.
@@b4ux1t3-tech Yeah, I just meant the common 24-bit RGB format. 8 bits for depth could be too little, though I thought it might be enough to give the extra boost a neural net needs. You could easily do more bits. I was wondering if instead of inventing a new format they actually just produce a separate file that's a grayscale image for the depth. Then you can combine them yourself or just use the standard RBG image when you don't need depth.
I imagine if you look up a manual it'll tell you.
I still use my Kinect - mostly to just log into Windows with my face, but also as a night camera to keep an eye on our new foster dog when he's home alone. It's amazing that a piece of hardware almost a decade old is still so good at what it does!
i would love to see this and Intel RealSense LiDAR L515 side by side.
12:13 "THAT'S A QUANTUM BIT!!! SO IT'S NOT JUST ZERO OR ONE..."
This device, Intel RealSense D435 and its peers, are so under appreciated. The hardware is brilliant but at the same time the wide range of support its packages offers is amazing. They have regular support with ROS, edge computation platforms like Jetson nano and as a stand alone relasense SDK. If more people knew about this and used it, Intel would not have dared to thought of shutting this down. There are other cameras similar this, like Zed for example, but the wise array of support realsense offers ha no competition.
You don’t even need an SDK. If you have a network card, there are devices that run driverless and are compatible with industrial and FOSS software.
We are building one of these.
Mike Pound always has the stuff to really get you going! More Mike Pound!
This is really insightful. We are using stereomaping, similar to the techniques used by Landsat and World View satelites, for my Master's Thesis! This technology is super cool, glad you are showig folks how it works becasue there are so many applications beyond the kinect!
Thank you this was exactly the breakdown I was hunting for last week!
This channel is insane. Never stop uploading
Another great video from Mike, love computerphile!
lol I put my finger on my face at the exact instant before the screen said I was looking at my finger
Image Depth is a quantification of the camera's ability to take a picture that makes a deep philosophical statement! 🤣
Hit the like button so that Mike can get to keep the camera.
I'm reminded of my sadly unsupported Lytro Illum camera, a "lightfield" device. Being able to share "live" images was fun, and it's a shame they didn't release that back-end code as open source so something like flickr or instagram could support it. You can still make movies of it, but the fun of the live images was that the viewer could control the focus view of your photograph.
Mike needs to have his own channel Dona vlog
For everyone looking for a realsense alternative, occipital are still shipping their structure sensors and structure cores. Works on similar principles.
I loved your video and hopefully you’ll get to hang on to the hardware so you can keep working with it.
MIKE IS BACK
3:12 The camera knows where it is because it knows where it isn't
So it calculates where it should be... :-)
I have that laptop. Thank you for validating my purchase. ;-)
I believe this is also how Face ID works. It used the dot projector and IR camera to get the 3D image of the face and do the authentification.
indeed, TrueDepth
Oh, thank you! 🙏 This is exactly the video I've been looking for.
Now if we can get IRGBUD (adding (near-)Infrared and Ultraviolet), that'd be cool. (Even cooler would be FIRGBUD, but far-IR tends to require sufficiently different optics that I definitely won't be holding my breath for that one.)
Would be cool to see Mike talk about 'event cameras' (aka artificial retinas). They're really on the rise in machine vision.
Agreed. Hoping to work with those soon
Its been so long 😭 finally
Mike's the best
You got a like just when you predicted i looked at my finger. Amazing video
I am search for this yesterday and now you put video 😀
Mike pound the legend 🙌
I like the 'Gingham/Oxford shirt with blue sweater' energy Mike projects in almost every video.
I'm a simple man. I see Mike Pound, I click.
Exceptional once again, Mike, congratulations!
Genius idea, Exactly a multiple image sensor can capture various algorithms. Specially Heat signature. That can see through doors.
A beautiful beautiful beautiful video
I am literally enlightened! Thanks ever so much!
We need more cameos of Sean!
Very informative!! This camera will work far better at night in a car instead in the morning?
Mr. Pound i love you
Ah, just gotten a couple of 435's for the lab this year. The funniest bit so far is how it sometimes does a perspective distortion of featureless walls much more realistically than photoshop does :D
Id like to hear a video with mike pound talking about the occukus quest 2, i bet that uses a similar method. What a brilliant machine!
Great explanation
I mean, the Kinect 2 did time-of-flight, not structured light like the first one. And it was still pretty cheap, being a mass-market device.
Ohh yes! Mike videos are the best
I wish I had a teacher like him!
Excellent video!
Is there already a video conferencing tool which takes advantage of this? This seems huge for being able to eliminate background and focus on the face.
I think FaceTime could be already doing that since iPhones have one of these depth sensors in each of the cameras
Interesting the ultraleap leapmotion camera uses three cameras to try and resolve depth and position m, all of which are near ir
Made it early.
Computerphile is the Best
MORE MIKE POUND!
But image depth could also lead to poor performance if it catches more noises leading to a general data shift. I guess the processing step should be carefully done.
Great video, thank you for sharing
Wouldn’t LiDAR be more accurate/achieve the same thing concerning depth perception for machines?
Yes, LiDAR would be way better, but it's going to cost you ten or twenty times more than this device. This is geared more for prosumer tinkering, while LiDAR is more for autonomous driving, or other situations where human lives hang in the balance.
I imagine it would also be more useful in time based solutions - because Lidar requires it to count the time for the signal to return back to do calculations, and the infrared emitter could be used to get the depth information a little bit faster - because you're only waiting for the image to get back the first time, and you get more information on the lens at once, based on the pattern in the image.
You could probably get even more accurate depth perception if you combined lidar with this.
@@ZT1ST also unless the lidar laser is changing direction for each pixel, which would have to happen extremely quickly, you would have to use a number of lidars that probably can't move and will get a much lower resolution depth channel.
Maybe it could supplement the stereo information or help calibrate the camera but overall not super useful
Nice explanation. Thanks!
my professor literally delivered a lecture today regarding image depth, and i see it on Computerphile XD
Could you take one of these, then attach it to a mirror setup which separates each len's vision by far more distance, and then use it for longer distance range finding (like a WW2 stareoscopic rangefinder)?
@Pedro Abreu I just meant having the view of each camera be really far away from each other so that there's more parallax
Question: *How* is the depth stored? RGB uses values between 0-255 to store the intensity and you can work out the percentage of that could in that pixel. How about depth? Does it also have 1byte? What does it mean? Can you calculate the actual distance from the camera?
I mostly worked with depthimage, which is essentially a greyscale image where lighter pixels are closer and darker pixels are further away.
On the other hand there is pointcloud, which is an array of 3D points. Typically that can be structured or unstructured, e.g. a 1000x1000 array of points, or a vector of 1000000 points.
Perhaps this isn't as detailed as you'd have liked but this is as in depth as I've gone
The handy thing about depthimage is you can compress it like any other image, which is great for saving bandwidth in a distributed system
Heh, Mike is always fun.
More Mike videos, please!!
huh.. Thought this was more solved than it is.
Even with dedicated hardware, you can only get sub 30fps directly from the camera.
I suppose the directly from the camera and cheaply is key words
You can get 30FPS of depth-aligned RGBD Images from the realsense camera with a resolution of 1280x720. Higher than that and it drops to 15, afaik.
You can also get 60 Hz at lower res and 6 Hz at higher res IIRC
Mike is too smart!
Computerphile, a very very earnest request, every video you post sparks a hunger for knowledge on that topic, could you plz plz attack some links or anything from where we can actually learn that stuff. I and I think ma y others like me will be very grateful if you could do such a thing.
2:42 Yeah 🤦♂️ now that's why this video deserves a like.
when will they make camera modules which can simultaneously capture RGB and the IR from the same lens? that way, we have no parallax error between the depth and colour data
Pretty sure that this is possible. Only problem is that you need two IR cameras to do the stereo matching
lenses for rbg cams usually have an ir filter built in
What happened to that time-of-flight RGBD webcam Microsoft bought just a little before they released the Kinect? Did they just buy it out to try to stifle competition and left the technology to rot?
Looks like the Intel RealSense Depth Camera D435. Only 337 GBP (in Denmark). Let's send a couple to Mike. ;-)
Dr Pound is the Richard Feynman of computer science
Coincidentally, Richard Feynman had so many affairs he was known as Dr Pound 😂
Hmm, so the camera can be used to detect color imperfections on supposedly singlecolored flat surfaces. Can that be used to detect beginning fungus?
2:41 Of course I have just looked at my finger. Sean et. al clearly know their audience.
I thought the kinect when announced promised to no use any processing power of the console, but in the end because of cuts actually did? Am I misremembering?
At work the computer-vision use an Intel D435 to segment parcels on a conveyor belt. And they DON'T USE THE DEPTH for that. Only the RGB. They use the depth for other things, but not for that.
Also I'm pretty sure that they DON'T POST-PROCESS the depth image.
So much better when I only saw image death
Please make the next one on the time of flight camera
Definitely looked at my own finger you mentioned it haha. Great info thank you.
How does this guy know so much different stuff??!
How do you get depth for a single RGB image with AI?
Check megadepth
I was hoping to learn how the time-of-flight depth sensors work.
What does it do if you hold a stereogram (SIRDS) picture in front of it?
I looked at my finger.
7:16 but why does faceid work under sunlight? Is the laser just stronger in face id?
how to combine this depth camera within gazebo? I got Ros1 Noetic, is there any plugin available?
Is the depth calculated on the hardware itself or on software running on the computer?
At this point he can literally explain anything and I'd understand.
P.S. he looks like Hugh Grant
What's the link to the video where the stuff on the whiteboard was written and discussed?
So, did they let him keep it, this is the real burning question
Why not take four instead of two sensors and fill in all the gaps for the two sensors inbetween the outer ones? That would imho absolutely complete the stereoscopy and further more increase the quality of all the interpolated/corresponsing pixels of the motive... or do I not get sth here? Why does it have to be two? Anthropomorphising a bit too much here?
2:41 Me: Looking at my finger and something distant. Hey, there was some text there. Must be something about looking at finger... (my thought when rewinding the video). :D
I guess I'm missing why he expected it to crash if he covered up the lens. It's not like you're getting null values, just extreme values.
Many possible reasons:
In math, 3-7=-4, but in computing, if you're not careful, 3-7=4294967291. And, as you're already not being careful, that super-maximum-mega-extreme number might cause something to be very out of bounds.
When trying to match patterns, you gotta make sure that your code doesn't go completely wonky when nothing in the image can possibly match. If you're lazy (as all programmers are at some time), then you might write the program to keep looking until it finds a match. That might just keep searching the image over and over, or it might get to the end of the image and start poking around at every memory location on the computer, even areas marked as "forbidden, trespassers will be shot" by the OS.
The part of code that gets called to handle the extreme case may not have been properly written, possibly having errors in basic syntax. It's not ever supposed to get called, so it's not checked as well or as often.
There are others, but those are the big three.
where do they get the dot matrix printer paper from... ?
Too bad these are discontinued
Would adding a UV emitter help?