Depth Camera - Computerphile

Поділитися
Вставка
  • Опубліковано 4 жов 2024

КОМЕНТАРІ • 264

  • @nikanj
    @nikanj 2 роки тому +575

    Ah the Kinect. Such a massive failure as a gaming peripheral but pivotal in so much computer vision research/DIY projects.

    • @MINDoSOFT
      @MINDoSOFT 2 роки тому +22

      And even freelance production projects ! As part of a team I've created one game with kinect v1, and another one with kinect v2. What a great piece of hardware.

    • @glass1098
      @glass1098 2 роки тому

      @@MINDoSOFT Which ones?

    • @MINDoSOFT
      @MINDoSOFT 2 роки тому +9

      @@glass1098 hi ! Unfortunately I don't have a portfolio page. But the first one was an air-hockey style game, where the player held a broom with an ir led, which was detected via kinect, and the players had to put the trash in the correct recycling bins. The other game was a penalty shootout game which detected the players kick. :)

    • @xeonthemechdragon
      @xeonthemechdragon 2 роки тому +2

      I have three of the v2, and 2 of the v1

    • @JulesStoop
      @JulesStoop 2 роки тому +19

      Kinect technology became faceID in iPhone and iPad. Not a failure at all but providing very secure and just about invisible biometric authentication to about a billion people on a daily basis.

  • @cussyplays
    @cussyplays 2 роки тому +82

    I just LOVE that he talks to the camerman and not us, makes it so much more candid and easier to watch as a viewer!

  • @Pystro
    @Pystro 2 роки тому +145

    4:27 "I should put an artwork up or something." Take a depth-field picture of that wall, print it out and hang it back onto the wall. Now it's a piece of art!

    • @Checkedbox
      @Checkedbox 2 роки тому +4

      @yefdafad I think you might have forgotten to switch Windows

  • @Yupppi
    @Yupppi 2 роки тому +190

    Mike always has something exciting.

  • @oskrm
    @oskrm 2 роки тому +138

    - "Probably have to give it back"
    - "Oh no, it fell off... my car"

  • @smoothmarx
    @smoothmarx 2 роки тому +14

    That comment at 2:41 was magic. Caught me red handed!

  • @stef9019
    @stef9019 2 роки тому +38

    Always great to learn from Mike Pound!

  • @araghon007
    @araghon007 2 роки тому +11

    A sidenote to Kinect: The Kinect v2 uses time of flight, which some people like, some people hate. What I find most fascinating is that the Kinect lives on, both as Kinect for Azure, and the depth sensing tech the Hololens has. While not successful as a motion control method, it's still really useful when used with a PC.

  • @SoonRaccoon
    @SoonRaccoon 2 роки тому +95

    I wonder what this might do with a mirror. I expect it would see the mirror as a "window" where there's a lot more depth, but I wonder how it would handle the weird reflections of the IR dots.

    • @meispi9457
      @meispi9457 2 роки тому +7

      Wow 🤯
      Interesting thought!

    • @FlexTCWin
      @FlexTCWin 2 роки тому

      Now I’m curious too!

    • @260Xander
      @260Xander 2 роки тому

      Someone needs to do this please!

    • @hulavux8145
      @hulavux8145 2 роки тому +5

      It does not do well really. Same things with transparent objects

    • @zybch
      @zybch 2 роки тому +3

      The dots necessarily spread out from the projector, so even if a mirror was placed perfectly perpendicular to their flight path barely any would reflect back in the right way to generate a coherent depth image.

  • @maciekdziubinski
    @maciekdziubinski 2 роки тому +74

    Alas, Intel discontinued the RealSense line of products.
    The librealsense library will be still maintained (if I'm correct) but no new hardware is going to be released.

    • @joels7605
      @joels7605 2 роки тому +7

      I wish they'd maintain the L515 a little better. The 400 series seem to be well supported, but the 500 series is a vastly superior sensor.

    • @arcmchair_roboticist
      @arcmchair_roboticist 2 роки тому +6

      There is still kinect which actually works better in pretty much every way afaik

    • @joels7605
      @joels7605 2 роки тому +12

      @@arcmchair_roboticist There is some truth to this. KinectV2 and V1 are both excellent. I think it's mostly down to a decade of software refinement though. From a hardware perspective the RealSense L515 should mop the floor with everything. It's a shame it was dropped.

    • @paci4416
      @paci4416 2 роки тому +5

      Intel has discontinued some of the products, but the stereo cameras would continue to be sold (D415, D435I, D455) for sure.
      The librealsense library is still maintained (new release today).

    • @CrazyDaneOne
      @CrazyDaneOne 2 роки тому

      Wrong

  • @ajv35
    @ajv35 2 роки тому +40

    I wish he would've done a more in depth explanation about the device. Like what data type is used for the depth field? Is it a 2D array of floating point values since depth can technically be infinite? Is it calibrated to only detect so far? Or does it use a variable-depth rate with a finite sized data type (like an integer, as in the other rgb fields) that adjusts the value according to the furthest object it senses?

    • @b4ux1t3-tech
      @b4ux1t3-tech 2 роки тому +2

      So, thinking about it, it's likely that the RGB aspect is an integer or a fraction between 0 and 1. That's pretty common, and for RGB, those two are going to be functionally identical, since a computer is likely only going to be able to display in 24-bit color anyway. So, for the color, it probably doesn't matter, and it could go either way.
      The depth is probably a fraction between zero and one. That would allow you to map between the visible colors pretty accurately, and display a fine-grained depth map, which we see in the video. After all, you only need 32 million values, and the resolution of a 32-bit floating point between 0 and 1 gives you that reliably.
      Re: 2d array, I wouldn't be surprised if it's indexable as a 2d array in the API, but it's probably stored as a 1d array, since translating from coordinates to an index (and vice versa) is trivial.
      I don't know if that's actually what's going on, mind you, just making some assumptions based on similar technologies.

    • @Norsilca
      @Norsilca 2 роки тому +7

      I'll bet it's just an extra byte, just like R, G, and B are each 1 byte. 256 integers, maybe in a logarithmic scale so there's more precision for near values than far ones.

    • @b4ux1t3-tech
      @b4ux1t3-tech 2 роки тому +10

      Keep in mind, you don't have to store colors as 24-bit (three byte) colors, that's just a convention because that's what most monitors support.
      If you're working with optical data, you may or may not be limited to a 24-bit color.
      For the depth, only having 256 "depth steps" seems _really, really_ restrictive.

    • @Norsilca
      @Norsilca 2 роки тому +1

      @@b4ux1t3-tech Yeah, I just meant the common 24-bit RGB format. 8 bits for depth could be too little, though I thought it might be enough to give the extra boost a neural net needs. You could easily do more bits. I was wondering if instead of inventing a new format they actually just produce a separate file that's a grayscale image for the depth. Then you can combine them yourself or just use the standard RBG image when you don't need depth.

    • @danieljensen2626
      @danieljensen2626 2 роки тому +2

      I imagine if you look up a manual it'll tell you.

  • @jenesuispasbavard
    @jenesuispasbavard 2 роки тому

    I still use my Kinect - mostly to just log into Windows with my face, but also as a night camera to keep an eye on our new foster dog when he's home alone. It's amazing that a piece of hardware almost a decade old is still so good at what it does!

  • @Sth0r
    @Sth0r 2 роки тому +35

    i would love to see this and Intel RealSense LiDAR L515 side by side.

  • @ZandarKoad
    @ZandarKoad 2 роки тому +3

    12:13 "THAT'S A QUANTUM BIT!!! SO IT'S NOT JUST ZERO OR ONE..."

  • @Snair1591
    @Snair1591 2 роки тому +12

    This device, Intel RealSense D435 and its peers, are so under appreciated. The hardware is brilliant but at the same time the wide range of support its packages offers is amazing. They have regular support with ROS, edge computation platforms like Jetson nano and as a stand alone relasense SDK. If more people knew about this and used it, Intel would not have dared to thought of shutting this down. There are other cameras similar this, like Zed for example, but the wise array of support realsense offers ha no competition.

    • @thecheshirecat5564
      @thecheshirecat5564 2 роки тому +1

      You don’t even need an SDK. If you have a network card, there are devices that run driverless and are compatible with industrial and FOSS software.
      We are building one of these.

  • @daltonbrady2492
    @daltonbrady2492 2 роки тому +1

    Mike Pound always has the stuff to really get you going! More Mike Pound!

  • @jerrykomas1248
    @jerrykomas1248 2 роки тому +1

    This is really insightful. We are using stereomaping, similar to the techniques used by Landsat and World View satelites, for my Master's Thesis! This technology is super cool, glad you are showig folks how it works becasue there are so many applications beyond the kinect!

  • @rachel_rexxx
    @rachel_rexxx 2 роки тому +3

    Thank you this was exactly the breakdown I was hunting for last week!

  • @ianbdb7686
    @ianbdb7686 2 роки тому

    This channel is insane. Never stop uploading

  • @MattGriffin1
    @MattGriffin1 2 роки тому +3

    Another great video from Mike, love computerphile!

  • @astropgn
    @astropgn 2 роки тому +6

    lol I put my finger on my face at the exact instant before the screen said I was looking at my finger

  • @bluegizmo1983
    @bluegizmo1983 2 роки тому +14

    Image Depth is a quantification of the camera's ability to take a picture that makes a deep philosophical statement! 🤣

  • @anujpartihar
    @anujpartihar 2 роки тому +1

    Hit the like button so that Mike can get to keep the camera.

  • @delusionnnnn
    @delusionnnnn 2 роки тому +6

    I'm reminded of my sadly unsupported Lytro Illum camera, a "lightfield" device. Being able to share "live" images was fun, and it's a shame they didn't release that back-end code as open source so something like flickr or instagram could support it. You can still make movies of it, but the fun of the live images was that the viewer could control the focus view of your photograph.

  • @elmin2323
    @elmin2323 2 роки тому +6

    Mike needs to have his own channel Dona vlog

  • @AaronHilton
    @AaronHilton 2 роки тому

    For everyone looking for a realsense alternative, occipital are still shipping their structure sensors and structure cores. Works on similar principles.

  • @utp216
    @utp216 2 роки тому +1

    I loved your video and hopefully you’ll get to hang on to the hardware so you can keep working with it.

  • @TheStrolch
    @TheStrolch 2 роки тому

    MIKE IS BACK

  • @GameNOWRoom
    @GameNOWRoom 2 роки тому +3

    3:12 The camera knows where it is because it knows where it isn't

  • @asnothe
    @asnothe 2 роки тому

    I have that laptop. Thank you for validating my purchase. ;-)

  • @sikachukuning2473
    @sikachukuning2473 2 роки тому +1

    I believe this is also how Face ID works. It used the dot projector and IR camera to get the 3D image of the face and do the authentification.

    • @arkemal
      @arkemal Рік тому

      indeed, TrueDepth

  • @jonva13
    @jonva13 2 роки тому

    Oh, thank you! 🙏 This is exactly the video I've been looking for.

  • @DavidLindes
    @DavidLindes 2 роки тому +2

    Now if we can get IRGBUD (adding (near-)Infrared and Ultraviolet), that'd be cool. (Even cooler would be FIRGBUD, but far-IR tends to require sufficiently different optics that I definitely won't be holding my breath for that one.)

  • @lopzag
    @lopzag 2 роки тому +2

    Would be cool to see Mike talk about 'event cameras' (aka artificial retinas). They're really on the rise in machine vision.

    • @ciarfah
      @ciarfah 2 роки тому

      Agreed. Hoping to work with those soon

  • @maxmusterman3371
    @maxmusterman3371 2 роки тому +1

    Its been so long 😭 finally

  • @troeteimarsch
    @troeteimarsch 2 роки тому +1

    Mike's the best

  • @1endell
    @1endell 2 роки тому

    You got a like just when you predicted i looked at my finger. Amazing video

  • @CineGeeks001
    @CineGeeks001 2 роки тому

    I am search for this yesterday and now you put video 😀

  • @Hacktheplanet_
    @Hacktheplanet_ 2 роки тому +1

    Mike pound the legend 🙌

  • @acegh0st
    @acegh0st 2 роки тому +2

    I like the 'Gingham/Oxford shirt with blue sweater' energy Mike projects in almost every video.

  • @StuartSouter
    @StuartSouter 2 роки тому

    I'm a simple man. I see Mike Pound, I click.

  • @marioh9926
    @marioh9926 2 роки тому

    Exceptional once again, Mike, congratulations!

  • @flopthis
    @flopthis 2 роки тому

    Genius idea, Exactly a multiple image sensor can capture various algorithms. Specially Heat signature. That can see through doors.

  • @adekunleafolabi1040
    @adekunleafolabi1040 2 роки тому

    A beautiful beautiful beautiful video

  • @OFP27
    @OFP27 Рік тому

    I am literally enlightened! Thanks ever so much!

  • @nonyafletcher601
    @nonyafletcher601 2 роки тому

    We need more cameos of Sean!

  • @Athens1992
    @Athens1992 Рік тому

    Very informative!! This camera will work far better at night in a car instead in the morning?

  • @TheTobias7733
    @TheTobias7733 2 роки тому

    Mr. Pound i love you

  • @Lodinn
    @Lodinn 2 роки тому +1

    Ah, just gotten a couple of 435's for the lab this year. The funniest bit so far is how it sometimes does a perspective distortion of featureless walls much more realistically than photoshop does :D

  • @Hacktheplanet_
    @Hacktheplanet_ 2 роки тому

    Id like to hear a video with mike pound talking about the occukus quest 2, i bet that uses a similar method. What a brilliant machine!

  • @suryavaraprasadalla8511
    @suryavaraprasadalla8511 2 роки тому

    Great explanation

  • @unvergebeneid
    @unvergebeneid 2 роки тому +1

    I mean, the Kinect 2 did time-of-flight, not structured light like the first one. And it was still pretty cheap, being a mass-market device.

  • @soejrd24978
    @soejrd24978 2 роки тому

    Ohh yes! Mike videos are the best

  • @quanta8382
    @quanta8382 2 роки тому

    I wish I had a teacher like him!

  • @stefanguiton
    @stefanguiton 2 роки тому

    Excellent video!

  •  2 роки тому +2

    Is there already a video conferencing tool which takes advantage of this? This seems huge for being able to eliminate background and focus on the face.

    • @Garvm
      @Garvm 2 роки тому +1

      I think FaceTime could be already doing that since iPhones have one of these depth sensors in each of the cameras

  • @JadeNeoma
    @JadeNeoma 2 роки тому +1

    Interesting the ultraleap leapmotion camera uses three cameras to try and resolve depth and position m, all of which are near ir

  • @functionxstudios1674
    @functionxstudios1674 2 роки тому +1

    Made it early.
    Computerphile is the Best

  • @bryan69087
    @bryan69087 2 роки тому

    MORE MIKE POUND!

  • @hexenkingTV
    @hexenkingTV 2 роки тому +1

    But image depth could also lead to poor performance if it catches more noises leading to a general data shift. I guess the processing step should be carefully done.

  • @sermadreda399
    @sermadreda399 Рік тому

    Great video, thank you for sharing

  • @srry198
    @srry198 2 роки тому +3

    Wouldn’t LiDAR be more accurate/achieve the same thing concerning depth perception for machines?

    • @Phroggster
      @Phroggster 2 роки тому +1

      Yes, LiDAR would be way better, but it's going to cost you ten or twenty times more than this device. This is geared more for prosumer tinkering, while LiDAR is more for autonomous driving, or other situations where human lives hang in the balance.

    • @ZT1ST
      @ZT1ST 2 роки тому

      I imagine it would also be more useful in time based solutions - because Lidar requires it to count the time for the signal to return back to do calculations, and the infrared emitter could be used to get the depth information a little bit faster - because you're only waiting for the image to get back the first time, and you get more information on the lens at once, based on the pattern in the image.
      You could probably get even more accurate depth perception if you combined lidar with this.

    • @niccy266
      @niccy266 2 роки тому +1

      @@ZT1ST also unless the lidar laser is changing direction for each pixel, which would have to happen extremely quickly, you would have to use a number of lidars that probably can't move and will get a much lower resolution depth channel.
      Maybe it could supplement the stereo information or help calibrate the camera but overall not super useful

  • @arash_mehrabi
    @arash_mehrabi 2 роки тому

    Nice explanation. Thanks!

  • @Laétudiante
    @Laétudiante 2 роки тому

    my professor literally delivered a lecture today regarding image depth, and i see it on Computerphile XD

  • @Jacob-yg7lz
    @Jacob-yg7lz 2 роки тому +1

    Could you take one of these, then attach it to a mirror setup which separates each len's vision by far more distance, and then use it for longer distance range finding (like a WW2 stareoscopic rangefinder)?

    • @Jacob-yg7lz
      @Jacob-yg7lz 2 роки тому

      @Pedro Abreu I just meant having the view of each camera be really far away from each other so that there's more parallax

  • @thisisthefoxe
    @thisisthefoxe 2 роки тому +1

    Question: *How* is the depth stored? RGB uses values between 0-255 to store the intensity and you can work out the percentage of that could in that pixel. How about depth? Does it also have 1byte? What does it mean? Can you calculate the actual distance from the camera?

    • @ciarfah
      @ciarfah 2 роки тому +1

      I mostly worked with depthimage, which is essentially a greyscale image where lighter pixels are closer and darker pixels are further away.
      On the other hand there is pointcloud, which is an array of 3D points. Typically that can be structured or unstructured, e.g. a 1000x1000 array of points, or a vector of 1000000 points.
      Perhaps this isn't as detailed as you'd have liked but this is as in depth as I've gone

    • @ciarfah
      @ciarfah 2 роки тому

      The handy thing about depthimage is you can compress it like any other image, which is great for saving bandwidth in a distributed system

  • @AcornElectron
    @AcornElectron 2 роки тому +4

    Heh, Mike is always fun.

  • @katymapsa
    @katymapsa 2 роки тому

    More Mike videos, please!!

  • @6kwecky6
    @6kwecky6 2 роки тому +3

    huh.. Thought this was more solved than it is.
    Even with dedicated hardware, you can only get sub 30fps directly from the camera.
    I suppose the directly from the camera and cheaply is key words

    • @_yonas
      @_yonas 2 роки тому +1

      You can get 30FPS of depth-aligned RGBD Images from the realsense camera with a resolution of 1280x720. Higher than that and it drops to 15, afaik.

    • @ciarfah
      @ciarfah 2 роки тому

      You can also get 60 Hz at lower res and 6 Hz at higher res IIRC

  • @silakanveli
    @silakanveli 2 роки тому +1

    Mike is too smart!

  • @kaustabhchakraborty4721
    @kaustabhchakraborty4721 2 роки тому

    Computerphile, a very very earnest request, every video you post sparks a hunger for knowledge on that topic, could you plz plz attack some links or anything from where we can actually learn that stuff. I and I think ma y others like me will be very grateful if you could do such a thing.

  • @ssshukla26
    @ssshukla26 2 роки тому

    2:42 Yeah 🤦‍♂️ now that's why this video deserves a like.

  • @carmatic
    @carmatic 2 роки тому +1

    when will they make camera modules which can simultaneously capture RGB and the IR from the same lens? that way, we have no parallax error between the depth and colour data

    • @erikbrendel3217
      @erikbrendel3217 2 роки тому

      Pretty sure that this is possible. Only problem is that you need two IR cameras to do the stereo matching

    • @marc_frank
      @marc_frank 2 роки тому

      lenses for rbg cams usually have an ir filter built in

  • @tiagotiagot
    @tiagotiagot 2 роки тому

    What happened to that time-of-flight RGBD webcam Microsoft bought just a little before they released the Kinect? Did they just buy it out to try to stifle competition and left the technology to rot?

  • @bsvenss2
    @bsvenss2 2 роки тому +1

    Looks like the Intel RealSense Depth Camera D435. Only 337 GBP (in Denmark). Let's send a couple to Mike. ;-)

  • @thuokagiri5550
    @thuokagiri5550 2 роки тому +1

    Dr Pound is the Richard Feynman of computer science

    • @SimonCoates
      @SimonCoates 2 роки тому +2

      Coincidentally, Richard Feynman had so many affairs he was known as Dr Pound 😂

  • @CrystalblueMage
    @CrystalblueMage 2 роки тому

    Hmm, so the camera can be used to detect color imperfections on supposedly singlecolored flat surfaces. Can that be used to detect beginning fungus?

  • @valisjan95
    @valisjan95 2 роки тому

    2:41 Of course I have just looked at my finger. Sean et. al clearly know their audience.

  • @blenderpanzi
    @blenderpanzi 2 роки тому

    I thought the kinect when announced promised to no use any processing power of the console, but in the end because of cuts actually did? Am I misremembering?

  • @castortoutnu
    @castortoutnu 2 роки тому

    At work the computer-vision use an Intel D435 to segment parcels on a conveyor belt. And they DON'T USE THE DEPTH for that. Only the RGB. They use the depth for other things, but not for that.
    Also I'm pretty sure that they DON'T POST-PROCESS the depth image.

  • @jms019
    @jms019 2 роки тому

    So much better when I only saw image death

  • @sanveersookdawe
    @sanveersookdawe 2 роки тому

    Please make the next one on the time of flight camera

  • @joechacon8874
    @joechacon8874 2 роки тому

    Definitely looked at my own finger you mentioned it haha. Great info thank you.

  • @PopeLando
    @PopeLando 2 роки тому +1

    How does this guy know so much different stuff??!

  • @levmatta
    @levmatta 2 роки тому +1

    How do you get depth for a single RGB image with AI?

  • @JohnDlugosz
    @JohnDlugosz 2 роки тому

    I was hoping to learn how the time-of-flight depth sensors work.

  • @threeMetreJim
    @threeMetreJim 2 роки тому

    What does it do if you hold a stereogram (SIRDS) picture in front of it?

  • @cogwheel42
    @cogwheel42 2 роки тому +8

    I looked at my finger.

  • @tsunghan_yu
    @tsunghan_yu 2 роки тому

    7:16 but why does faceid work under sunlight? Is the laser just stronger in face id?

  • @nanjiang1177
    @nanjiang1177 3 місяці тому

    how to combine this depth camera within gazebo? I got Ros1 Noetic, is there any plugin available?

  • @tiagotiagot
    @tiagotiagot 2 роки тому

    Is the depth calculated on the hardware itself or on software running on the computer?

  • @patrickjane419
    @patrickjane419 9 місяців тому

    At this point he can literally explain anything and I'd understand.
    P.S. he looks like Hugh Grant

  • @johanhendriks
    @johanhendriks 2 роки тому

    What's the link to the video where the stuff on the whiteboard was written and discussed?

  • @shanematthews1985
    @shanematthews1985 2 роки тому +1

    So, did they let him keep it, this is the real burning question

  • @Noobinski
    @Noobinski 2 роки тому

    Why not take four instead of two sensors and fill in all the gaps for the two sensors inbetween the outer ones? That would imho absolutely complete the stereoscopy and further more increase the quality of all the interpolated/corresponsing pixels of the motive... or do I not get sth here? Why does it have to be two? Anthropomorphising a bit too much here?

  • @AhmedAzhad
    @AhmedAzhad 2 роки тому

    2:41 Me: Looking at my finger and something distant. Hey, there was some text there. Must be something about looking at finger... (my thought when rewinding the video). :D

  • @nathanwiles2719
    @nathanwiles2719 2 роки тому +4

    I guess I'm missing why he expected it to crash if he covered up the lens. It's not like you're getting null values, just extreme values.

    • @michaelcartmell7428
      @michaelcartmell7428 2 роки тому

      Many possible reasons:
      In math, 3-7=-4, but in computing, if you're not careful, 3-7=4294967291. And, as you're already not being careful, that super-maximum-mega-extreme number might cause something to be very out of bounds.
      When trying to match patterns, you gotta make sure that your code doesn't go completely wonky when nothing in the image can possibly match. If you're lazy (as all programmers are at some time), then you might write the program to keep looking until it finds a match. That might just keep searching the image over and over, or it might get to the end of the image and start poking around at every memory location on the computer, even areas marked as "forbidden, trespassers will be shot" by the OS.
      The part of code that gets called to handle the extreme case may not have been properly written, possibly having errors in basic syntax. It's not ever supposed to get called, so it's not checked as well or as often.
      There are others, but those are the big three.

  • @billconiston8091
    @billconiston8091 2 роки тому

    where do they get the dot matrix printer paper from... ?

  • @pabloortiz8007
    @pabloortiz8007 2 роки тому +2

    Too bad these are discontinued

  • @GameOfThePlanets
    @GameOfThePlanets 2 роки тому

    Would adding a UV emitter help?