This tracking camera is really interesting. I am trying to do 3D mapping with open source software called FastFusion. That software needs both RGBD inputs and pose inputs. It was developed back in 2013 or something but is still the best I have seen that is publicly available. It is fairly old, so much so that compiling it takes a bit of effort, modifying a few files, etc, but is able to do real time textured meshing on only a CPU! They recommend using DVO Slam for 6dof poses. That software is also very old and needs an old version of ROS to run, haven't tried it yet. Something like this which takes care of all that for you and does all the processing on the camera without using the CPU is really nice!
I'm still working on collecting and texturing the point clouds using the RealSense SDK 2.11. Intel built a nearly full set of Labview examples with the 2.11 SDK release. However the color camera doesn't work in the examples. I have it working, but it uses NI Vision Development module to get the color image. From there I estimated a homography transform to map the color image into the size and shape of the greyscale image. That way the intrinsic matrix of the left greyscale camera can be used to pick points from the color image. The Depth image is directly related to the left greyscale image. Each X pixel in the greyscale image equals the X world point by using this formula: (x_pixel - ImageCenterX) * Depth image value for this pixel / Focal length. The Y world point is: (y_pixel - imageCenterY) * Depth / Focal length. Using the left greyscale camera's intrinsic values ppx, ppy for image center; and fx, fy for focal length in x and y, the world coordinates end up being: XWorld = (x - ppx) * DepthValue / fx YWorld = (y - ppy) * DepthValue / fy ZWorld = DepthValue. Those world values, XYZ, and the RGB values of the transformed color image for each pixel can be written to a CSV file (one point per line) to create a textured point cloud. If you use the Labview 3D picture control RGB needs to be normalized from three (0 to 255) values, to three (0.0 to 1.0) values [ just separately divide the R, G, and B values by 255]. And you need to add alpha = 0.0 for each element in the RGB array.
Is there anyway to import skeletal tracking libraries from matlab or Microsoft Kinect? I've been working with a kinect recently, but that form factor is pretty amazing.
I don't have any experience with either of those libraries. Labview does have a way of wrapping DLL functions, but it works best if the data is passed as primitives. There is an NI explanation here: zone.ni.com/reference/en-XX/help/371361J-01/glang/call_library_function/
I tried using the T265 with a depth sensor for Augmented reality. The concept works, but the issue is that you need the relative positions of the cameras calibrated exactly. The viewpoint of the depth camera is not 100% exactly at the point of the pose the T265 is reporting, and you need to account for that. I was using Panda3D as the game engine for AR. It looks great at close ranges, but at greater distances the virtual objects move way off where they are supposed to be. I am sure it is possible, but the math behind it is something beyond me so far. Camera calibration is one of those things that I just find too intimidating.
Take a look at this post about finding the rotation and translation matrix between two sets of corresponding points. nghiaho.com/?page_id=671 I've been learning Matrix Algebra a little at a time for the last 2 years. When I saw this the first time I didn't know how to use it. The Rotation|Translation matrix can be used to convert points between coordinate systems. Here's an outline of the process: 1) You need a set of corresponding points. Point 0 in Set A and point 0 in Set B are the same physical point, but they are "seen" from different coordinate systems. Collect a bunch of these corresponding points. 2) Find the centroid of each set. This is the average of X, average of Y, and average of Z for each set. That is, sum all the X's and divide by the number of points. repeat for Y and Z. 3) Recenter all the points in each set around the origin (0,0,0) by subtracting the Centroid from each point in the set. (use centroid A on set A and Centroid B on set B) 4) Create the H matrix [this is the part I didn't get for a long time, but It was simpler than I thought] The H matrix is a 3 x 3 matrix created by multiplying a 3x1 matrix and a 1x3 matrix. The values of the 3x1 matrix are the values of point 0 in the re-centered set A, and the 1x3 matrix is the value of the re-centrered set B point 0. Perform this multiplication for all of the re-centered corresponding points, then sum the results. So each re-centered corresponding point will have a 3x3 matrix, and you are just adding all the 3x3 matrices together. 5) Perform a Singular Value Decomposition (SVD) on the H Matrix. It will output 3 matrices, we want U and V. [sorry I can't really explain what SVD douse, I have a weak understanding] 6) The Rotation matrix ( a 3x3 matrix) R is VU^T R = VU^T. This "^T" means that U must be transposed before being multiplied by V. The Rotation matrix contains the values needed to rotate re-centered Set A to align with re-centered Set B. 7) The rotation matrix (R) might be reflected ( inverted in the Z axis). To check, get the determinant of R. If it is less than 0 multiply the third column of R by -1. This fixes reflection. 8) The translation matrix (t) moves the Set A to the location of the original Set B. To find the Translation matrix multiply Centroid Set A by -1. Multiply that by the Rotation matrix R. Finally add the centroid of Set B. t = -1 x CentroidA x R + CentroidB 9) The complete R|t matrix should be arranged in the form R R R t R R R t R R R t 0 0 0 1 10) Using the R|t matrix: Any time you want to convert a point from the "Set A" camera to the "Set B" world (or camera), convert the point to a 4 x 1 matrix where the last element is set to 1. (Like below) X Y Z 1 Lets call that Pa Multiply this by the R|t matrix. You will get a 4 x 1 homogeneous result (lets call it Pb). Pb = R|t Pa Divide the first three values by the fourth value to get the point converted to Set B. Xb = Pb1 / Pb4 Yb = Pb2 / Pb4 Zb = Pb3 / Pb4 more about Homogeneous Coordinates ua-cam.com/video/Q2uItHa7GFQ/v-deo.html
Is it possible to calculate a pointcloud only with T265 camera? I own both of them but my boss told me to obtain a pointcloud only with the tracking camera
Unfortunately, no, the tracking camera only outputs translation and rotation, not the image points. Maybe your boss wants the translation points, but that isn't a point cloud, its a series of linear points. Maybe there was a firmware update that does provide a point cloud, my camera was an early version. The last realsense viewer can upgrade the firmware. For my applications, i stopped at v _.31 because i use the labview wrapper of the .dll for the d415 and d435.
I would be interested in outdoor applications with many loop closures. I wonder whether the relatively short stereo baseline would permit visual odometry for instance in a park.
I dont know. Intel says there is a 1% error. I saw a video of a long hallway loop that did not close. but it was off only that 1%. This is just an idea... Maybe there are two landmarks that you could get feature points from at the start of the loop. when you return the landmarks should align. If they don't, look for the translation between start feature point and the end feature points in a homography matrix.
Thanks for the video. if i got it right, i could use the combination ofthe intel sensors to create a 3d model of an object without placing any markers for orientation?
Yes, BUT... 1) The .PLY file was not rotated like the point cloud in realsense viewer. 2) The tracking camera has 1% error. To remedy this I transform the point cloud using a translation and rotation matrix derived from the T265 quaternion output. Rotation matrix can be confusing, and so can quaternions. The math performed to rotate the point cloud using quaternions is faster, but programs like cloud compare use matrix rotations. After you fix the rotation you must add the translation. Here is a link to my notes about the t265 and rotating point clouds. Rotating a point using quaternions lowtechllcblog.blogspot.com/2020/11/rotating-point-using-quaternions.html Pose considerations of realsense cameras lowtechllcblog.blogspot.com/2020/02/ Getting the t265 camera pose data using .net lowtechllcblog.blogspot.com/2019/12/getting-camera-pose-data-from-intel.html
@@LowtechLLC Thank you very much for your answer. On the other hand i could also use the intel realsense d435i for the same result? Or makes it more sense in this case to combine the d435 with the t265 aswell? Or even d435i with t265?
@@MMMM-bb6cj I don't have a d435i or d455 to check the IMU output. The d435i spec says 6 DOF, but I cannot find a definitive answer if that means 3DOF for translation x,y,z and 3DOF for rotation x,y,z. Before you asked I assumed the addition of the IMU was just for rotation [only 3DOF]. This questionable press release indicates the 6DOF is translation and rotation, newsroom.intel.com/news/new-intel-realsense-d435i-stereo-depth-camera-adds-6-degrees-freedom-tracking/ However this contradicts that by saying the CAMERA POSE 6DOF is only output from the T265: www.intelrealsense.com/how-to-getting-imu-data-from-d435i-and-t265/ If the latter is true, then the d435i 6DOF is not the camera pose, but is 3DOF Acceleration m/s^2 and 3DOF Gyroscopic Radians/s. The T265 uses a sensor fusion algorithm to turn the IMU and video into a camera pose. The pose is needed to transform the point cloud samples and "stitch" them together. This link indicates the D453i does not perform a sensor fusion algorithm to calculate the camera pose. www.intelrealsense.com/which-device-is-right-for-you/ So if you know a visual SLAM algorithm then D435i will generate the camera pose. Otherwise, whichever D4xx depth camera you use, pairing it to the tracking camera will give you the pose needed to translate and rotate the collected point clouds. In any case, it requires some programming to get what you want.
If you are interested in using LabView with the T265, I recommend the .NET wrapper. I've got an outline on my blog, lowtechllcblog.blogspot.com/2019/12/getting-camera-pose-data-from-intel.html
I have not tried that. The Intel product page (www.intelrealsense.com/tracking-camera-t265/) says the camera will re-localize after "Kidnapping". Again, have not tried it, but I think the internal feature map is only retained while it has power. If it recognizes the features it sees in the map it will re-localize. If you might try using apriltags or ArUco markers and OpenCV to re-localize at start-up docs.opencv.org/3.1.0/d5/dae/tutorial_aruco_detection.html . Let me know what you find out.
Does the tracking camera T265 works with unity? I want to use the slam capabilities to track a AR HMD, but the depth sounds like a good idea too but look heavier.
I have not used Unity, but there is a Unity and a C# wrapper for the RealSense cameras on github. Its in the librealsense project. github.com/IntelRealSense/librealsense/tree/master/wrappers If you get it working leave a comment that links to how you did it.
I have not done that, but here's a link from intel showing three D435's scanning a person at the same time. ua-cam.com/video/jhNoGyKl928/v-deo.html I believe each camera's pose needs to be turned into a R|t matrix (camera's extrinsic matrix). Then the point cloud from each D435 is multiplied by the R|t matrix for alignment to world coordinates.
The Intel Realsense web site doesn't have any specs for eye tracking, but they do have a link to a software company that does eye tracking. The link is eyeware.tech/gazesense/ The site says the Intel Realsense D415 is compatible with their software. But there isn't any accuracy data in their spec. " Eye tracking range: 0.3 - 1.0m Head pose tracking range: 0.3 - 1.5m Field of view: 65° x 40° x 72° (Horizontal × Vertical × Diagonal)" The resolution of the camera is 1920 x 1080, so that will be 0.2 mm / pixel at 0.3 meter working distance, and 0.66 mm/pixel at 1.0 meter working distance. I would expect the true resolvable object size to be 3 times larger than these numbers (0.6 ~ 2 mm object size). I don't know how to turn that into angular accuracy. This is a guess, using 100 mm as the radius of a skull, and 2 mm as the minimum object resolution, the arctan of 2/100 = 1.14 degrees. just a guess.
@@migueldelahoz4740 If you are using windows 10 they released an accessibility update for eye tracking in 2017. This video demos it with a Tobii eye tracker C4. There is a list of compatible devices, but I'm not familiar with them. Tobii • Tobii Eye Tracker 4C • Tobii EyeX • Tobii Dynavox PCEye Plus • Tobii Dynavox EyeMobile Mini • Tobii Dynavox EyeMobile Plus • Tobii Dynavox PCEye Mini • Tobii Dynavox PCEye Explore • Tobii Dynavox I-Series+ • Selected laptops and monitors that include eye tracking integrations EyeTech • TM5 Mini I think the difficulty is in writing device drivers for the hardware that allows them to be used by the operating system. There could be a work around, at least for Windows. The operating system includes a library called User32.dll. It contains all the function calls to make a program act like a keyboard and mouse. I'm sure I'm over simplifying it, but software that combines the D415 depth data, face pose, eye tracking and User32.dll could act as an eye tracking mouse without needing to write a device driver. That said, I don't know of anyone who does that.
Yes, it is the translation vector. The viewer will show the value, you can get it programmaticly with the dll. I used the c# wrapper of the dll with labview to receive the translation. The camera will reset 0,0,0 when it powers up. The rotation is in quaternion form. That is good if you want to do interpolation of the camera's orientation. If you need to convert to a rotation matrix check the second link below. lowtechllcblog.blogspot.com/2019/12/getting-camera-pose-data-from-intel.html lowtechllcblog.blogspot.com/2020/02/intel-realsense-t265-quaternion.html
@@LowtechLLC thank you very much for your prompt response. May I please also know if it is possible to get a timestamp of all measurements taken by the camera?
@@jesuszamora4679 The T265 streams the pose data at ~25 poses per second. I can't remember if there is a time stamp. If you receive the stream in its own thread you can create the time stamp there. I would hook up the camera and test it, but my back was fractured in a head on car wreck Monday, and can't get to the camera. The SDK download has some demo programs that run from the command line. I think one of them just gets pose data, and it might have a time stamp. [dash cam of the wreck ua-cam.com/video/EhcXnhqEjOA/v-deo.html]
The realsense viewer will save .ply files which are polygons. I was thinking you could just pull the vertices of the polys, but the file is binary, not ascii. Maybe MeshLab (www.meshlab.net) has an export for point clouds. The way I create point clouds in labview uses the depth image and the camera focal length to calculate the x,y,z location for each pixel in the depth image. ---- Just checked, yes MeshLab does export as .xyz point cloud. The first 3 columns are x y z.
It is the Intel.RealSense.Viewer.exe available at github.com/IntelRealSense/librealsense/releases That link will take you to the latest RealSense SDK 2.0. There should be a link to download the latest viewer on that page. The viewer I used can be downloaded separately at the bottom of this link github.com/IntelRealSense/librealsense/releases/tag/v2.19.1
I can't find any information about faceshift studio. Do you have a link to the software page? Faceshift will need to connect to the realsense as a 16 bit depth camera. On my PC, the directshow driver won't let that happen, it thinks the pixel format is 32 bit RGBA. Errors occur when it tries to decode the pixels, likely because the frame contains half the data it expects. [The realsense SDK handles the depth frames without using directshow.] The D435 depth data starts about 10 inches, 254 mm, from the camera. As long as you have a way of decoding the depth video frames into 16 bit distance measurements it could work.
@@LowtechLLC Apple bought Faceshift years ago. So, it's not currently in timeline updates from the team. It uses kinect SDK V1.8 for kinect and OpenNI SDK for primesense cameras to capture facial motion tracking to motion capture. So, does d series realsense cameras supports faceshift? Realsense official documentation shows it supports OpenNI wrappers
For more information on converting the quaternion based camera pose to R|t matrix camera pose have a look at my blog page: lowtechllcblog.blogspot.com/2020/02/intel-realsense-t265-quaternion.html and lowtechllcblog.blogspot.com/2020/11/rotating-point-using-quaternions.html
Hopefully an Ubuntu user can help you out, here's a couple of the Windows problems I've run into... 1) When the PC is powered up the Realsense camera will not be recognized. It has to be unplugged and plugged back in. 2) Sometimes Device Manager reports the RealSense as a Myriad 2 USB hub. 3) The pose data pipeline must be started using the C++ example before the Realsense2.dll recognizes the T265 camera. There are some GitHub Issue discussions about the RealSense firmware race condition at power up. Sometimes it loads the Myriad 2 boot loader and reports as a USB 1.1 device. github.com/IntelRealSense/librealsense/issues/4331 I'm not clear on the solution, but one guy said he used a PCIe USB 3.0 gen 2 card (instead of the motherboard USB ports) to solve the problem. Resetting USB device via software might also be the way to go (for Ubuntu) github.com/ralight/usb-reset I had better reliability using an older version of the RealSense SDK 2.0. For me, on windows, 2.25.0 was better than 2.28.1. One of the issue discussions suggested 2.18 for Ubuntu. Note that the latest is 2.29.0
Yes, I never had any issues with it in Ubuntu at all. I had it working in Unity, C++, and Python. No problem at all. You need to make sure you run the script that enables the udev rules so that it's detected when it is plugged in though.
I am having issues with the recorded file. The file size (.bag) is too large, say 300 MB for 10 seconds recording. I don't know how to deal with it. Am I doing something wrong? Can you please suggest to me how to work on it? Thanks.
The recording file is filled with the data that is streaming from the camera. If you are streaming Depth and RGB, both are in the bag. You can reduce the size of the .bag file by lowering the frame rate and by selecting a lower image size. Try changing the depth camera resolution to 640x360 at 6fps. If you don't need the color image or the IR image turn them off. Since you are recording the stream I am assuming that the camera is moving. If the temporal filter is on you will see some blur in the depth image. That is because the frames are being averaged. Faster frame rate allows for a smaller time sample when averaging, but fills the bag faster. The way I use the camera doesn't use the bag file. I save single frames at the highest resolution when the camera is not moving. Then I process the frames into a point cloud using the camera's intrinsic parameters. The tracking camera provides the extrinsic parameters. You may not be using point clouds, but I'm including the info below for anyone else who reads this comment. Convert the depth image to a point cloud by solving this set of equations for every pixel in the depth image: Zworld = the value of the depth pixel in millimeters Xworld = ( (xdepth - ppx) * Zworld) / fx Yworld = ( ( ydepth - ppy) * Zworld) / fy where xdepth is the x value of the pixel in the depth image where top left = 0 and bottom right = image width -1 ydepth is the y value of the pixel in the depth image where top left = 0 and bottom right = image height -1 ppx is the principal point in x. this is often the center of the image but depends on sensor and lens alignment. ppy is the principal point in y, again often the center y of the image. fx is the focal length of the lens but its units are in pixels, not in millimeters. fy is the focal length of the lens in pixels, and is pretty much the same as fx. Note, that the value of the pixel in the depth image is the distance in millimeters that object is from the camera. To find the camera intrinsic parameters run the rs-sensor-control.exe that is part of the RealSense SDK C:\Program Files (x86)\Intel RealSense SDK 2.0\tools For a 1280x720 depth image from the D435 you can use these values (which are close, but wont match your camera perfectly) fx 645.912 fy 645.912 ppx 640 ppy 360 For a 640 x 360 depth image you can use these intrinsics (again, not exact) fx 322.956 fy 322.956 ppx 320 ppy 180 Notice that the scale of the 640x320 is 1/2 of the 1280 x720.
Thank you so much for this! Been confused as to which one I need for a project I've been tasked, this cleared things up. And extra shoutout to Hank 😄
This tracking camera is really interesting. I am trying to do 3D mapping with open source software called FastFusion. That software needs both RGBD inputs and pose inputs. It was developed back in 2013 or something but is still the best I have seen that is publicly available. It is fairly old, so much so that compiling it takes a bit of effort, modifying a few files, etc, but is able to do real time textured meshing on only a CPU!
They recommend using DVO Slam for 6dof poses. That software is also very old and needs an old version of ROS to run, haven't tried it yet. Something like this which takes care of all that for you and does all the processing on the camera without using the CPU is really nice!
Thanks for info. I need to learn about their variable resolution voxel approach. Looks cool.
Their GitHub link is github.com/tum-vision/fastfusion
I'm still working on collecting and texturing the point clouds using the RealSense SDK 2.11. Intel built a nearly full set of Labview examples with the 2.11 SDK release. However the color camera doesn't work in the examples. I have it working, but it uses NI Vision Development module to get the color image. From there I estimated a homography transform to map the color image into the size and shape of the greyscale image. That way the intrinsic matrix of the left greyscale camera can be used to pick points from the color image. The Depth image is directly related to the left greyscale image. Each X pixel in the greyscale image equals the X world point by using this formula: (x_pixel - ImageCenterX) * Depth image value for this pixel / Focal length. The Y world point is: (y_pixel - imageCenterY) * Depth / Focal length. Using the left greyscale camera's intrinsic values ppx, ppy for image center; and fx, fy for focal length in x and y, the world coordinates end up being:
XWorld = (x - ppx) * DepthValue / fx
YWorld = (y - ppy) * DepthValue / fy
ZWorld = DepthValue.
Those world values, XYZ, and the RGB values of the transformed color image for each pixel can be written to a CSV file (one point per line) to create a textured point cloud.
If you use the Labview 3D picture control RGB needs to be normalized from three (0 to 255) values, to three (0.0 to 1.0) values [ just separately divide the R, G, and B values by 255]. And you need to add alpha = 0.0 for each element in the RGB array.
7:34 best part of video
Would be useful for blind people.
Is there anyway to import skeletal tracking libraries from matlab or Microsoft Kinect? I've been working with a kinect recently, but that form factor is pretty amazing.
I don't have any experience with either of those libraries. Labview does have a way of wrapping DLL functions, but it works best if the data is passed as primitives. There is an NI explanation here: zone.ni.com/reference/en-XX/help/371361J-01/glang/call_library_function/
Official GitHub page shows OpenNI wrappers .....so, skeletal tracking is possible
I tried using the T265 with a depth sensor for Augmented reality. The concept works, but the issue is that you need the relative positions of the cameras calibrated exactly. The viewpoint of the depth camera is not 100% exactly at the point of the pose the T265 is reporting, and you need to account for that. I was using Panda3D as the game engine for AR. It looks great at close ranges, but at greater distances the virtual objects move way off where they are supposed to be.
I am sure it is possible, but the math behind it is something beyond me so far. Camera calibration is one of those things that I just find too intimidating.
Take a look at this post about finding the rotation and translation matrix between two sets of corresponding points.
nghiaho.com/?page_id=671
I've been learning Matrix Algebra a little at a time for the last 2 years. When I saw this the first time I didn't know how to use it. The Rotation|Translation matrix can be used to convert points between coordinate systems. Here's an outline of the process:
1) You need a set of corresponding points. Point 0 in Set A and point 0 in Set B are the same physical point, but they are "seen" from different coordinate systems. Collect a bunch of these corresponding points.
2) Find the centroid of each set. This is the average of X, average of Y, and average of Z for each set. That is, sum all the X's and divide by the number of points. repeat for Y and Z.
3) Recenter all the points in each set around the origin (0,0,0) by subtracting the Centroid from each point in the set. (use centroid A on set A and Centroid B on set B)
4) Create the H matrix [this is the part I didn't get for a long time, but It was simpler than I thought] The H matrix is a 3 x 3 matrix created by multiplying a 3x1 matrix and a 1x3 matrix. The values of the 3x1 matrix are the values of point 0 in the re-centered set A, and the 1x3 matrix is the value of the re-centrered set B point 0. Perform this multiplication for all of the re-centered corresponding points, then sum the results. So each re-centered corresponding point will have a 3x3 matrix, and you are just adding all the 3x3 matrices together.
5) Perform a Singular Value Decomposition (SVD) on the H Matrix. It will output 3 matrices, we want U and V. [sorry I can't really explain what SVD douse, I have a weak understanding]
6) The Rotation matrix ( a 3x3 matrix) R is VU^T R = VU^T. This "^T" means that U must be transposed before being multiplied by V. The Rotation matrix contains the values needed to rotate re-centered Set A to align with re-centered Set B.
7) The rotation matrix (R) might be reflected ( inverted in the Z axis). To check, get the determinant of R. If it is less than 0 multiply the third column of R by -1. This fixes reflection.
8) The translation matrix (t) moves the Set A to the location of the original Set B. To find the Translation matrix multiply Centroid Set A by -1. Multiply that by the Rotation matrix R. Finally add the centroid of Set B. t = -1 x CentroidA x R + CentroidB
9) The complete R|t matrix should be arranged in the form
R R R t
R R R t
R R R t
0 0 0 1
10) Using the R|t matrix: Any time you want to convert a point from the "Set A" camera to the "Set B" world (or camera), convert the point to a 4 x 1 matrix where the last element is set to 1. (Like below)
X
Y
Z
1
Lets call that Pa
Multiply this by the R|t matrix.
You will get a 4 x 1 homogeneous result (lets call it Pb).
Pb = R|t Pa
Divide the first three values by the fourth value to get the point converted to Set B.
Xb = Pb1 / Pb4
Yb = Pb2 / Pb4
Zb = Pb3 / Pb4
more about Homogeneous Coordinates ua-cam.com/video/Q2uItHa7GFQ/v-deo.html
Is it possible to calculate a pointcloud only with T265 camera? I own both of them but my boss told me to obtain a pointcloud only with the tracking camera
Unfortunately, no, the tracking camera only outputs translation and rotation, not the image points. Maybe your boss wants the translation points, but that isn't a point cloud, its a series of linear points. Maybe there was a firmware update that does provide a point cloud, my camera was an early version. The last realsense viewer can upgrade the firmware. For my applications, i stopped at v _.31 because i use the labview wrapper of the .dll for the d415 and d435.
I would be interested in outdoor applications with many loop closures. I wonder whether the relatively short stereo baseline would permit visual odometry for instance in a park.
I dont know. Intel says there is a 1% error. I saw a video of a long hallway loop that did not close. but it was off only that 1%. This is just an idea... Maybe there are two landmarks that you could get feature points from at the start of the loop. when you return the landmarks should align. If they don't, look for the translation between start feature point and the end feature points in a homography matrix.
Lowtech I have ordered one and will give it a try. I would be happy to deploy clear targets for loop closures.
Well I have faced problems using D435i in outdoor application because the cameras get noises when facing sunlight do you have any solution for that
Thanks for the video. if i got it right, i could use the combination ofthe intel sensors to create a 3d model of an object without placing any markers for orientation?
Yes, BUT...
1) The .PLY file was not rotated like the point cloud in realsense viewer.
2) The tracking camera has 1% error.
To remedy this I transform the point cloud using a translation and rotation matrix derived from the T265 quaternion output.
Rotation matrix can be confusing, and so can quaternions. The math performed to rotate the point cloud using quaternions is faster, but programs like cloud compare use matrix rotations. After you fix the rotation you must add the translation.
Here is a link to my notes about the t265 and rotating point clouds.
Rotating a point using quaternions lowtechllcblog.blogspot.com/2020/11/rotating-point-using-quaternions.html
Pose considerations of realsense cameras lowtechllcblog.blogspot.com/2020/02/
Getting the t265 camera pose data using .net lowtechllcblog.blogspot.com/2019/12/getting-camera-pose-data-from-intel.html
To be clear, it is not a streamed line out of the box solution.
@@LowtechLLC Thank you very much for your answer. On the other hand i could also use the intel realsense d435i for the same result? Or makes it more sense in this case to combine the d435 with the t265 aswell? Or even d435i with t265?
@@MMMM-bb6cj I don't have a d435i or d455 to check the IMU output. The d435i spec says 6 DOF, but I cannot find a definitive answer if that means 3DOF for translation x,y,z and 3DOF for rotation x,y,z. Before you asked I assumed the addition of the IMU was just for rotation [only 3DOF]. This questionable press release indicates the 6DOF is translation and rotation, newsroom.intel.com/news/new-intel-realsense-d435i-stereo-depth-camera-adds-6-degrees-freedom-tracking/
However this contradicts that by saying the CAMERA POSE 6DOF is only output from the T265:
www.intelrealsense.com/how-to-getting-imu-data-from-d435i-and-t265/
If the latter is true, then the d435i 6DOF is not the camera pose, but is 3DOF Acceleration m/s^2 and 3DOF Gyroscopic Radians/s.
The T265 uses a sensor fusion algorithm to turn the IMU and video into a camera pose. The pose is needed to transform the point cloud samples and "stitch" them together. This link indicates the D453i does not perform a sensor fusion algorithm to calculate the camera pose.
www.intelrealsense.com/which-device-is-right-for-you/
So if you know a visual SLAM algorithm then D435i will generate the camera pose. Otherwise, whichever D4xx depth camera you use, pairing it to the tracking camera will give you the pose needed to translate and rotate the collected point clouds. In any case, it requires some programming to get what you want.
Wonderful.
If you are interested in using LabView with the T265, I recommend the .NET wrapper. I've got an outline on my blog, lowtechllcblog.blogspot.com/2019/12/getting-camera-pose-data-from-intel.html
Can you do mapping using t265 only ? Then loaded that map again to localize the camera in it ?
I have not tried that. The Intel product page (www.intelrealsense.com/tracking-camera-t265/) says the camera will re-localize after "Kidnapping". Again, have not tried it, but I think the internal feature map is only retained while it has power. If it recognizes the features it sees in the map it will re-localize. If you might try using apriltags or ArUco markers and OpenCV to re-localize at start-up docs.opencv.org/3.1.0/d5/dae/tutorial_aruco_detection.html .
Let me know what you find out.
Does the tracking camera T265 works with unity? I want to use the slam capabilities to track a AR HMD, but the depth sounds like a good idea too but look heavier.
I have not used Unity, but there is a Unity and a C# wrapper for the RealSense cameras on github. Its in the librealsense project.
github.com/IntelRealSense/librealsense/tree/master/wrappers
If you get it working leave a comment that links to how you did it.
so could you hook up multiple pairs to build a 360 model with texture? I.e. 3 pairs around an object?
I have not done that, but here's a link from intel showing three D435's scanning a person at the same time.
ua-cam.com/video/jhNoGyKl928/v-deo.html
I believe each camera's pose needs to be turned into a R|t matrix (camera's extrinsic matrix). Then the point cloud from each D435 is multiplied by the R|t matrix for alignment to world coordinates.
Hi Buddy. I'm a person with limitations due to amyotrophic lateral sclerosis disease. Sorry, what is the accuracy of the eye tracking of this device?.
The Intel Realsense web site doesn't have any specs for eye tracking, but they do have a link to a software company that does eye tracking. The link is
eyeware.tech/gazesense/
The site says the Intel Realsense D415 is compatible with their software. But there isn't any accuracy data in their spec.
" Eye tracking range: 0.3 - 1.0m
Head pose tracking range: 0.3 - 1.5m
Field of view: 65° x 40° x 72° (Horizontal × Vertical × Diagonal)"
The resolution of the camera is 1920 x 1080, so that will be 0.2 mm / pixel at 0.3 meter working distance, and 0.66 mm/pixel at 1.0 meter working distance. I would expect the true resolvable object size to be 3 times larger than these numbers (0.6 ~ 2 mm object size). I don't know how to turn that into angular accuracy. This is a guess, using 100 mm as the radius of a skull, and 2 mm as the minimum object resolution, the arctan of 2/100 = 1.14 degrees. just a guess.
@@LowtechLLC Thx for the information. I see that proyect is so old and they don't have release more products since 2017, because?
@@migueldelahoz4740 If you are using windows 10 they released an accessibility update for eye tracking in 2017. This video demos it with a Tobii eye tracker C4. There is a list of compatible devices, but I'm not familiar with them.
Tobii
• Tobii Eye Tracker 4C
• Tobii EyeX
• Tobii Dynavox PCEye Plus
• Tobii Dynavox EyeMobile Mini
• Tobii Dynavox EyeMobile Plus
• Tobii Dynavox PCEye Mini
• Tobii Dynavox PCEye Explore
• Tobii Dynavox I-Series+
• Selected laptops and monitors that include eye tracking integrations
EyeTech
• TM5 Mini
I think the difficulty is in writing device drivers for the hardware that allows them to be used by the operating system.
There could be a work around, at least for Windows. The operating system includes a library called User32.dll. It contains all the function calls to make a program act like a keyboard and mouse. I'm sure I'm over simplifying it, but software that combines the D415 depth data, face pose, eye tracking and User32.dll could act as an eye tracking mouse without needing to write a device driver. That said, I don't know of anyone who does that.
@@LowtechLLC Thx Buddy. What is the better the D415 or Tobii eye tracker 5?
@@LowtechLLCFor I can use my personal computer just with my eyes
Can you get x,y,z coordinates from your T265 with respect to its origin?
Yes, it is the translation vector. The viewer will show the value, you can get it programmaticly with the dll. I used the c# wrapper of the dll with labview to receive the translation. The camera will reset 0,0,0 when it powers up. The rotation is in quaternion form. That is good if you want to do interpolation of the camera's orientation. If you need to convert to a rotation matrix check the second link below.
lowtechllcblog.blogspot.com/2019/12/getting-camera-pose-data-from-intel.html
lowtechllcblog.blogspot.com/2020/02/intel-realsense-t265-quaternion.html
@@LowtechLLC thank you very much for your prompt response. May I please also know if it is possible to get a timestamp of all measurements taken by the camera?
@@jesuszamora4679 The T265 streams the pose data at ~25 poses per second. I can't remember if there is a time stamp. If you receive the stream in its own thread you can create the time stamp there. I would hook up the camera and test it, but my back was fractured in a head on car wreck Monday, and can't get to the camera. The SDK download has some demo programs that run from the command line. I think one of them just gets pose data, and it might have a time stamp.
[dash cam of the wreck ua-cam.com/video/EhcXnhqEjOA/v-deo.html]
@@LowtechLLC Sorry to hear about your accident. Thank you for answering so promptly. I think you can get timestamps, but I am still researching.
Interesting work.
Can I save the point cloud model? Thanks!
The realsense viewer will save .ply files which are polygons. I was thinking you could just pull the vertices of the polys, but the file is binary, not ascii. Maybe MeshLab (www.meshlab.net) has an export for point clouds. The way I create point clouds in labview uses the depth image and the camera focal length to calculate the x,y,z location for each pixel in the depth image. ----
Just checked, yes MeshLab does export as .xyz point cloud. The first 3 columns are x y z.
@@LowtechLLC Thank you so much. My aim is to create an economical SLAM system to survey dynamically.
Thank you for this presentation! Very interesting. Is the presented 3D/2D viewing software available anywhere?
It is the Intel.RealSense.Viewer.exe available at
github.com/IntelRealSense/librealsense/releases
That link will take you to the latest RealSense SDK 2.0. There should be a link to download the latest viewer on that page.
The viewer I used can be downloaded separately at the bottom of this link
github.com/IntelRealSense/librealsense/releases/tag/v2.19.1
@@LowtechLLC Thank you very much!
Will realsense d series works with faceshift studio?
I can't find any information about faceshift studio. Do you have a link to the software page?
Faceshift will need to connect to the realsense as a 16 bit depth camera. On my PC, the directshow driver won't let that happen, it thinks the pixel format is 32 bit RGBA. Errors occur when it tries to decode the pixels, likely because the frame contains half the data it expects. [The realsense SDK handles the depth frames without using directshow.]
The D435 depth data starts about 10 inches, 254 mm, from the camera. As long as you have a way of decoding the depth video frames into 16 bit distance measurements it could work.
@@LowtechLLC Apple bought Faceshift years ago. So, it's not currently in timeline updates from the team. It uses kinect SDK V1.8 for kinect and OpenNI SDK for primesense cameras to capture facial motion tracking to motion capture. So, does d series realsense cameras supports faceshift?
Realsense official documentation shows it supports OpenNI wrappers
For more information on converting the quaternion based camera pose to R|t matrix camera pose have a look at my blog page:
lowtechllcblog.blogspot.com/2020/02/intel-realsense-t265-quaternion.html
and lowtechllcblog.blogspot.com/2020/11/rotating-point-using-quaternions.html
where did you buy these?
store.intelrealsense.com/
Has anyone got this to work with Unbuntu? So far trying to follow the directions nothing.
Hopefully an Ubuntu user can help you out, here's a couple of the Windows problems I've run into...
1) When the PC is powered up the Realsense camera will not be recognized. It has to be unplugged and plugged back in.
2) Sometimes Device Manager reports the RealSense as a Myriad 2 USB hub.
3) The pose data pipeline must be started using the C++ example before the Realsense2.dll recognizes the T265 camera.
There are some GitHub Issue discussions about the RealSense firmware race condition at power up. Sometimes it loads the Myriad 2 boot loader and reports as a USB 1.1 device.
github.com/IntelRealSense/librealsense/issues/4331
I'm not clear on the solution, but one guy said he used a PCIe USB 3.0 gen 2 card (instead of the motherboard USB ports) to solve the problem.
Resetting USB device via software might also be the way to go (for Ubuntu)
github.com/ralight/usb-reset
I had better reliability using an older version of the RealSense SDK 2.0. For me, on windows, 2.25.0 was better than 2.28.1. One of the issue discussions suggested 2.18 for Ubuntu. Note that the latest is 2.29.0
Yes, I never had any issues with it in Ubuntu at all. I had it working in Unity, C++, and Python. No problem at all. You need to make sure you run the script that enables the udev rules so that it's detected when it is plugged in though.
I am having issues with the recorded file. The file size (.bag) is too large, say 300 MB for 10 seconds recording. I don't know how to deal with it. Am I doing something wrong? Can you please suggest to me how to work on it? Thanks.
The recording file is filled with the data that is streaming from the camera. If you are streaming Depth and RGB, both are in the bag. You can reduce the size of the .bag file by lowering the frame rate and by selecting a lower image size. Try changing the depth camera resolution to 640x360 at 6fps. If you don't need the color image or the IR image turn them off.
Since you are recording the stream I am assuming that the camera is moving. If the temporal filter is on you will see some blur in the depth image. That is because the frames are being averaged. Faster frame rate allows for a smaller time sample when averaging, but fills the bag faster.
The way I use the camera doesn't use the bag file. I save single frames at the highest resolution when the camera is not moving. Then I process the frames into a point cloud using the camera's intrinsic parameters. The tracking camera provides the extrinsic parameters.
You may not be using point clouds, but I'm including the info below for anyone else who reads this comment.
Convert the depth image to a point cloud by solving this set of equations for every pixel in the depth image:
Zworld = the value of the depth pixel in millimeters
Xworld = ( (xdepth - ppx) * Zworld) / fx
Yworld = ( ( ydepth - ppy) * Zworld) / fy
where
xdepth is the x value of the pixel in the depth image where top left = 0 and bottom right = image width -1
ydepth is the y value of the pixel in the depth image where top left = 0 and bottom right = image height -1
ppx is the principal point in x. this is often the center of the image but depends on sensor and lens alignment.
ppy is the principal point in y, again often the center y of the image.
fx is the focal length of the lens but its units are in pixels, not in millimeters.
fy is the focal length of the lens in pixels, and is pretty much the same as fx.
Note, that the value of the pixel in the depth image is the distance in millimeters that object is from the camera.
To find the camera intrinsic parameters run the
rs-sensor-control.exe
that is part of the RealSense SDK
C:\Program Files (x86)\Intel RealSense SDK 2.0\tools
For a 1280x720 depth image from the D435 you can use these values (which are close, but wont match your camera perfectly)
fx 645.912
fy 645.912
ppx 640
ppy 360
For a 640 x 360 depth image you can use these intrinsics (again, not exact)
fx 322.956
fy 322.956
ppx 320
ppy 180
Notice that the scale of the 640x320 is 1/2 of the 1280 x720.
I have the T265 I will sell it for 450 or lower if you or anyone else is interested let me know
Thank for the offer. I'll ask around and let you know if I get any leads.