Visual Odometry with Monocular Camera For Beginners: A Project in OpenCV

Поділитися
Вставка
  • Опубліковано 10 лют 2025

КОМЕНТАРІ • 90

  • @NicolaiAI
    @NicolaiAI  Рік тому +4

    Join My AI Career Program
    www.nicolai-nielsen.com/aicareer
    Enroll in My School and Technical Courses
    www.nicos-school.com

  • @Elku
    @Elku 2 роки тому +11

    Great video, I used your package and modified it a bit to my liking.
    You do have one correction to make though, transf = vo.get_pose(q1, q2) can return an infinite value in [0,3] and [1,3], especially when a video shows something stopping.
    Adding transf = np.nan_to_num(transf, neginf=0,posinf=0) fixes the issue.

  • @alessandrotorresani5915
    @alessandrotorresani5915 2 роки тому +5

    Thank you for this amazing material! Can't wait to see the next steps with bundle adjustment 🙂

  • @thomassouza5853
    @thomassouza5853 2 роки тому +4

    This is amazing, I will definitely use this in my final course work.

  • @stevennovotny9568
    @stevennovotny9568 Рік тому +9

    Nice overview of a VO process. Well done. However, I think the use of using a unit translation in getting your second set of homogenous coordinates and then using those points to ultimately calc scale will always give you something close to 1. This issue is hidden by the fact that the ground truth in your dataset changes by approximately one unit per frame. You can check this by skipping several frames. The displacement should be greater but your algorithm will still give a scale of about one.

  • @rajithamdesilva
    @rajithamdesilva 2 роки тому +3

    Thanks! Found your channel today and it's an absolute gem for beginner computer vision researchers.

    • @NicolaiAI
      @NicolaiAI  2 роки тому +1

      Thanks a lot for the nice words!

  • @gildrondavid3229
    @gildrondavid3229 2 роки тому +1

    Really helpful vid! Might hav some questions later on tho. Thanks man and keep up the amazing work!

    • @NicolaiAI
      @NicolaiAI  2 роки тому

      Thank you so much! Really appreciate it and feel free to ask whatever questions u have

  • @teetanrobotics5363
    @teetanrobotics5363 2 роки тому +1

    Amazing tutorial bro !!. Keep it going.

  • @TheEdmaster87
    @TheEdmaster87 Рік тому

    I have started my master's thesis project in VIO so this is quite interesting. I will use sensor fusion thought and not just VO.

  • @zaidarif218
    @zaidarif218 2 роки тому +1

    Thanks man this is awesome Keep up the good work ;)

  • @ConsultingjoeOnline
    @ConsultingjoeOnline 8 місяців тому +1

    Very cool!

  • @sattarmonjezi4396
    @sattarmonjezi4396 2 роки тому +1

    Thank You for this video.

    • @NicolaiAI
      @NicolaiAI  2 роки тому

      Thanks for watching! Hope that it can help u

  • @19_dharshak87
    @19_dharshak87 Рік тому +1

    Hey ! Awesome video. Just a tiny question, I was not able to find the image_r folder in the KITTY datasets. Could anyone help with that

  • @JoseAlejandroDonate
    @JoseAlejandroDonate 2 роки тому +2

    Thanks for this excelent tutorial. I have a question. Basically, I took a look at your approach to see if I could get my pose estimation relative scale working correctly. It didn't. The relative scale computed is, most of the time, a number close to 1. Any recommendations on this?

  • @MajorKlods
    @MajorKlods 2 місяці тому

    So, it makes sense to me how you calculate the relative scale, but not how you apply it? Normally, I would say that once you calculated the relative scale r, you then scale the translation vector t between the two camera frames by r*t in order to get the correct scale between the frames. But here you use it only for the chirality check instead? What is the reason for adding the relative scale to the number of points that have positive depth? I fail to see the reason behind using the relative scale like this.

  • @hofitroy1
    @hofitroy1 2 роки тому +2

    so what is the difference between optical flow and visual odometry? which one is better to use for real-time "location" estimation and navigation for drones for example?

    • @NicolaiAI
      @NicolaiAI  2 роки тому +2

      There are some similarities. Optical flow can also be used to track features from frame to frame. What to use depends on your feature extractor and system and so on. I'm gonna make a stereo visual odometry video too where I use optical flow to track feature points

  • @bociek125
    @bociek125 2 роки тому +1

    great work! would love to see move videos on the topic of optical vision where the camera is moving

    • @NicolaiAI
      @NicolaiAI  2 роки тому

      Thanks for watching! Will definitely do more of those

  • @santos4027
    @santos4027 2 роки тому +1

    THANK YOU!!!

  • @serialsensor2756
    @serialsensor2756 7 місяців тому

    You can recover scale by e.g. introducing the assumption that the points you use are from the street in front of you. Did that to stabilize a self balance robot with VO (there is a Video on my account)

    • @mertipolati
      @mertipolati Місяць тому

      How did you implement this assumption? I'm trying to recover the scale and got stuck with this problem.

    • @serialsensor2756
      @serialsensor2756 Місяць тому

      @mertipolati with monocular odometry you need to know something. E.g. distance from your camera to the surface. There is no other way. So basically like done here people.inf.ethz.ch/pomarc/pubs/SaurerVICOMOR12.pdf
      but you do know d in Eq (1)

  • @poproduction3994
    @poproduction3994 2 роки тому +1

    Hey Nicolai, can you please share any resource from where i can learn to integrate bundle adjustment to this code(basically get vslam working)? Thanks for the tutorial.

  • @azmyin
    @azmyin 2 роки тому +1

    In the decompose_essential_mat, the technique you used in finding the correct [R,t] pair when decomposing essential matrix, is it a heuristic method that you implemented from scratch or is there a published paper explaining the method?

  • @shehanhere
    @shehanhere 10 місяців тому

    I'm interested in learning the possibilty of applying visual odometry as an initial step to camera matchmoving. any thoughts?

  • @Zap12348
    @Zap12348 Рік тому +1

    why are we taking 0th and 2nd index of the translation vector in "gt_path.append((gt_pose[0, 3], gt_pose[2, 3]))"?
    Isnt that (tx, tz) whereas we need (tx,ty)?
    or is it because in 2D world with respect to camera, z of camera is y of real 2d world?

  • @amineaitallala3420
    @amineaitallala3420 2 роки тому +1

    Thank you, do you have any idea how to implement this with an other feature detector ? i tried with sift it didn't works so well

    • @NicolaiAI
      @NicolaiAI  2 роки тому

      Thanks for watching! Yeah u can use all the feature detectors from opencv

  • @SweAwesome
    @SweAwesome 2 роки тому +1

    Thank you so much for the great video! Just one question:
    If we would end up not using the KITTY dataset, then how would we go about creating the first Projection matrics used at 32:08 ? (self.P)

    • @NicolaiAI
      @NicolaiAI  2 роки тому +1

      Camera calibration. Thanks a lot for watching!

    • @ifedayoolusanya5202
      @ifedayoolusanya5202 Рік тому

      Hi so we can convert our camera calibration value to a 3 by 4 matrix to get the self.p right @@NicolaiAI

  • @karanbirchahal3268
    @karanbirchahal3268 2 роки тому

    Amazing video really great job, will you implement slam as well ?

  • @anshXR
    @anshXR 2 роки тому

    Instead of using decompose essential matrix and then finding the correct pose by trigulation of points, can I just use cv2.recoverPose() method. It does the same thing by itself.

  • @CuongNguyen-kq7cx
    @CuongNguyen-kq7cx Рік тому

    wow, great!!! Can I use raspberry pi cam for that ?

  • @saadazhar4175
    @saadazhar4175 2 роки тому

    Great video, have you done the video of optimizations as well?

  • @azmyin
    @azmyin 2 роки тому +1

    Great video. However, why is your algorithm not calculating the Z direction of the pose??

    • @NicolaiAI
      @NicolaiAI  2 роки тому +1

      It is, but only x and y are visualized

    • @azmyin
      @azmyin 2 роки тому

      Oh great.

  • @podcastBenaa
    @podcastBenaa 2 роки тому

    Thank you for your video!
    How can we create VO for 360 cameras like insta360 x3 if at all possible? Also, is calibrating such a camera possible (equirectangular images)?

  • @rubenponsaers9124
    @rubenponsaers9124 11 місяців тому

    Nice video, trying to do it on my own data. Aren't the extrinsic parameters different for each image, so how is it possible that you can use it for your whole image sequence?

    • @NicolaiAI
      @NicolaiAI  11 місяців тому

      Have it runnning in another video with live cameras

  • @nomuchohan
    @nomuchohan Рік тому

    Hi Nicolai! I've been following your channel since a long time and have learned quite a lot of things from you since I started following you... I am texting you because the project I am stuck on this time is by far the hardest one I've ever come across. It's a freelance project I got from somewhere and seems like I have exhausted all my options on how to actually get it done. I need your help with the project or at the very least suggestions on how I can approach the problem statement or solve it. So, I'll give a brief summary of the project:-
    I have to come up with a system that can map football players from the video frame to a 2D field image and get their velocities, acceleration etc stuff. I have used yolov7 for detection of the players from the video frame and using euclidean distance to keep track of the centroid of the selected player. Now, I want to be able to design a system to map this player on a 2D field image and get the player's acceleration and velocity. I tried perspective transform but it does not seem feasible as I will have to click on four separate corners every frame if I want to map. I want this process to be automated. Is there any way you can help me? note:- throughout the video the camera angle will not stay constant it will keep on changing. It's a ptz camera. Please help me with the above.
    Thank you.

    • @VikasRajpurohit-t2s
      @VikasRajpurohit-t2s Рік тому

      Here's a step-by-step approach to achieve this:
      Camera Calibration:
      Perform camera calibration to obtain the camera's intrinsic matrix and distortion coefficients. You can use a chessboard pattern and OpenCV's cv2.calibrateCamera function for this.
      For a PTZ camera with a variable field of view, you may need to calibrate the camera multiple times as the camera angle changes.
      Object Detection and Tracking:
      Use YOLOv7 or any other object detection algorithm to detect football players in video frames. Extract their bounding boxes.
      Implement an object tracker (e.g., Kalman filter, CentroidTracker) to track players between frames based on their bounding boxes.
      Perspective Transform (Bird's-eye view):
      Obtain the 2D field image that you want to map the players onto.
      Define four points on the field image corresponding to the four corners of the field.
      Implement an automatic method (e.g., feature matching) to estimate the perspective transformation between the field image and the camera view in each frame.
      Optical Flow:
      Use optical flow algorithms (e.g., Lucas-Kanade, Farneback) to estimate the motion vectors of players between consecutive frames.
      Based on the motion vectors and the camera's frame rate, calculate the players' velocities and accelerations in the 2D field coordinate system.
      Combine Data:
      Using the perspective transform, map the players' positions from the camera view to the 2D field image.
      Combine the positional data with the calculated velocities and accelerations to obtain the desired player tracking information.
      Might help !!!

  • @maaitrayodas5630
    @maaitrayodas5630 2 роки тому +1

    Hey can you please share the article or paper from where the theory is taken

    • @NicolaiAI
      @NicolaiAI  2 роки тому

      Have not used a specific article or paper

    • @maaitrayodas5630
      @maaitrayodas5630 2 роки тому

      @@NicolaiAI I was intrigued by the scale calculation using triangulation, and then estimating R and t. Without using the initial R and t using inbuilt cv2.recoverPose

  • @goroyeh1898
    @goroyeh1898 2 роки тому

    Great tutorials! May I ask what is the full pipeline on 8:43? The part after Local optimization is occluded by your handsome face 😆

  • @ChaitanyaKrishnabodduluri
    @ChaitanyaKrishnabodduluri Рік тому +1

    Whats the use of getting a pose with out scale?

    • @NicolaiAI
      @NicolaiAI  Рік тому

      Actually we do. We take the relative scale into account. I go over that in the code

  • @poproduction3994
    @poproduction3994 2 роки тому

    Great tutorial seriously its great thanks for putting this out !!!! I have one question
    when i==0 of pose estimation we are using
    cur_pose = gt_pose
    and after that in i==1 we are using
    cur_pose = np.matmul(cur_pose, np.linalg.inv(transf))
    so in second iteration we are using the pose we have from ground truth and multiplying it pose we have calculated.
    What if we dont have ground truth. how will we calculate the cur_pose for i=1 then?
    thanks in advance

    • @poproduction3994
      @poproduction3994 2 роки тому

      nevermind i got the ans in your live camera trajectory video. thanks a bunch

    • @akilarsath6499
      @akilarsath6499 Рік тому

      @@poproduction3994can u tell the solution?

  • @ChaitanyaKrishnabodduluri
    @ChaitanyaKrishnabodduluri Рік тому +1

    Monocular camera odometry suffers with scale drift right? the Pose(R,T) doesn't have any units here right?

    • @NicolaiAI
      @NicolaiAI  Рік тому

      It does. We take the relative scale into account. I go over that in the code. But it Will be another accumulating error for the odometry

  • @ashishgarg4965
    @ashishgarg4965 Рік тому

    Can you make a video on visual slam??

  • @alirezasoltani3049
    @alirezasoltani3049 2 роки тому +1

    Thanks

    • @NicolaiAI
      @NicolaiAI  2 роки тому

      Thanks for watching! Hope that u can use it

  • @lucaperrin
    @lucaperrin Місяць тому

    Great stuff ! Very good to start with the theory part to actually understand what's happening in the code.
    Question: could the cv2 function : 'cv2.recoverPose(E, q1, 12, K)' be used to get directly the R and t matrices from the Essential and K matrices ?
    Thanks !

  • @jaskiratsingh9710
    @jaskiratsingh9710 2 роки тому +1

    How did you create the calibration and poses txt file. Is there any code for that? Please share if it is there

    • @NicolaiAI
      @NicolaiAI  2 роки тому

      That's from the KITTI dataset

  • @dynamicgecko1213
    @dynamicgecko1213 2 роки тому

    Thank you for these videos man. I really appreciate it.
    I forked the repo to replicate results. Where can we get the "lib" module?

    • @LoayAltal
      @LoayAltal 2 роки тому

      It's a folder next to the python script, in his github

  • @tselin7611
    @tselin7611 2 роки тому +1

    Hello,
    I tried to feed live frames into the code. However, it yielded very high constant bias and noises. Is there any way to reduce the constant bias and noises?
    Thanks!

    • @NicolaiAI
      @NicolaiAI  2 роки тому

      U Can use different filters. Try out with a low pass filter to start with

  • @TheWeibing
    @TheWeibing 2 роки тому +1

    Thanks! But I do have a question. How do we obtain the pose data of our own without referring to kitti datasets?

    • @NicolaiAI
      @NicolaiAI  2 роки тому +1

      Thanks for watching! If u want to find the poses of ur own data u can just replace the images with ur own. But then u won't have the ground Truth poses

    • @TheWeibing
      @TheWeibing 2 роки тому +1

      Is the ground truth pose mandatory? Does the code works without ground truth text file? I read bout it the other day and it seemed to be important to obtain scale information?

    • @NicolaiAI
      @NicolaiAI  2 роки тому +1

      @@TheWeibing it's not mandatory but then u kinda don't know how ur system performs

    • @TheWeibing
      @TheWeibing 2 роки тому

      @@NicolaiAI Hey just wondering what does the 2nd Row, 4th Colum term in the output transformation matrix represents? [x, ?, y]

    • @helenagarcia5103
      @helenagarcia5103 Рік тому

      Caca

  • @gjgb8836
    @gjgb8836 9 місяців тому +2

    what is the name of the github repo_

  • @rajparikh7730
    @rajparikh7730 Рік тому

    I have used this code with my own 1350 input images
    The only problem I'm facing is that this model is not able to run the images sequentially
    What I've noticed is it first runs a few images sequentially (say 30-50) and then it goes back to the start (0-10)
    I don't know what to do
    Please help

  • @jealouseggs5619
    @jealouseggs5619 2 роки тому

    can i run this on ROS for a NAO robot?

  • @Theo-cn2cy
    @Theo-cn2cy 11 місяців тому

    Where can I get the link for his discord server?

  • @gbo10001
    @gbo10001 Рік тому

    can it also work with object tracking?

  • @AnkitVashisht
    @AnkitVashisht 2 роки тому +1

    Bro didn't used pnp ?

  • @nicolasnicolas-iz5ke
    @nicolasnicolas-iz5ke 2 роки тому +1

    Something is strange, your method does not estimate the magnitude of translation (only its direction) and somehow it is pretty close to ground truth

    • @NicolaiAI
      @NicolaiAI  2 роки тому

      Nope the translation is the magnitude. The whole transformations of the camera poses are estimated

  • @iminaboroberts8516
    @iminaboroberts8516 11 місяців тому

    @NicolaiAI how can i get the dataset