I hope this video helps you somehow. I’m currently working on point cloud object detection and classification in my research. Share your thoughts if you are also working on point cloud processing or interested in this topic.
Thank you! Appreciate it. Both the PyTorch and TensorFlow GitHub links I shared in the video description are actually end to end learning implementation. In the video, I only discussed the main python model script that contains the architecture. But you'll find the training script and instructions on the GitHub page to classify point cloud end to end. Let me know if you have any questions.
@@LightsCameraVision Thanks for the comment I have a question: My Goal is to make custom object detection Taking an sample/example of a Sweater , I have .ply file of the same, So should i annotate for several .ply file or should i convert into hdf5 file for using custom object detection? Thanks
@@myproject7762 No, you don’t need to convert the ply files. You can directly work with them, just use the appropriate python library like ‘plyfile’ to read the files. The PyTorch implementation of the PointNet link that I shared has code for reading ply files (here: github.com/fxia22/pointnet.pytorch/blob/f0c2430b0b1529e3f76fb5d6cd6ca14be763d975/pointnet/dataset.py#L171). Follow their process to read the files. I guess you already have the annotations of your custom data, if not then you definitely need to do that first. Once you created the dataloader using the above way, then you can use PointNet or VoxelNet or any architecture of your choice for object detection. Maybe this repo helps github.com/jediofgever/PointNet_Custom_Object_Detection. This repo shows an example of what you are looking for. It does not look so clean but should help a little. Point Cloud processing code is not so easily available like 2D vision, so you need to write your own code in some cases. Hope this helps.
@@myproject7762Reply to your comment: "Hi so should i use 2D images to annotate-to-KITTI file format and then predict the same on #d Point cloud data ?" I accidentally deleted the comment, I guess. If you already have annotations for 2D images then you can propagate that to 3D labels. If not, then you can directly annotate the 3D point cloud. There are not many tools available for this. But this tool supervise.ly/lidar-3d-cloud/ looks a good fit for your case. It supports the KITTI format. Since the KITTI data/format is used as a benchmark in many algorithms, generating annotations for your custom data in the KITTI format is a good idea. This tool is well documented. You can also read this Medium post medium.com/deep-systems/releasing-first-online-3d-point-cloud-labeling-tool-in-supervisely-4faca42b5d6e about this tool.
Nice video. Thank you for the clear and easy-to-understand explanation of PointNet. It was super helpful to see the side-by-side comparison of the code and network block. It helped 👍
Hi, I recently reviewed pointnet too as part of my research. There are three main takeaways: Permutation invariance, Canonical space transformation & Local-Global knowledge. For permutation invariance, authors use symmetric functions such as Max() function. For Canonical space transformation, TNet is used and for local-global sharing, features learned from second mlp (N, 64) are combined with the input(N,1024) to the last mlp are passed to another MLP network. This simulates the local knowledge combined with original global knowledge. Really liked your concise and clear explanaiton. Perhaps, a more detailed (~20 min) video just about the theory and another implementation video would be awesome. Regardless, if you plan to cover more networks, I would like more videos on PointNet++, GradSLAM and DeepGMR. Awesome! Thanks!
Thanks for your kind words. It's great to see that you are also passionate about point cloud processing. Thanks for the suggestions, I'll keep them in mind. ✌️
but I was wondering on how the information flow works. Like an Interpretation why it is able to see shapes. I did some research and made this summary of my findings: ua-cam.com/video/di5g4Nb4hKs/v-deo.html
The paper used ModelNet40 and ShapeNet data. But you definitely can use other datasets like ScanNet, SceneNet, and many more like these. You can find all these datasets in the link below. Hope it helps. paperswithcode.com/datasets?mod=3d&page=1
Thanks for this valuable video explanation! I have a suggestion, for some video ideas. What do you think of explaining how to use a semantic segmentation network (eg. Randla net, etc.) Using open3d-ml with a general explanation followed by a quick overview of the paper behind it? They (open3d-ml) have several nets for semantic segmentation and the application is interesting This just an idea Thanks for the valuable content 😊
If you mean how to generate point cloud data, then you need a LiDAR device to capture point clouds, or you can use a simulator to augment real point cloud with synthetic obstacles and environment. There are simulators like CARLA you can check out. For data annotation, you can try this tool (supervise.ly/lidar-3d-cloud/). It supports the KITTI format. Since KITTI data/format is used as a benchmark in many algorithms, generating annotations for custom data in KITTI format is a good idea. This tool is well documented. You can also read this Medium post (medium.com/deep-systems/releasing-first-online-3d-point-cloud-labeling-tool-in-supervisely-4faca42b5d6e) about this tool. If you are looking for models for segmentation then you may look into PointNet, PointSIFT, Squeezesegv3 and there are many more models. Hope it helps. ✌️
It should work assuming you have enough computational resources. The original PointNet architecture takes 1024 points as input which is definitely not much. You can also modify this for your project so that it can handle more points. Depending on what you are trying to classify you may not need many points or a big point cloud for each object. You can always downsample points.
Great Work man..!! i need a info that, I Have annotated my ply file and i have the annoated file format in KITTI . So next how should i train for the same Could you please help me out for the same ..!!
Thanks man. Appreciate it. I think your final goal is to classify multiple objects from a scene. Let me know if it’s not the case. So you need to start with object localization. There are quite a few options you have for this task. VoxelNet (CVPR’18) is a little older method, but it’s intuitive and it uses the KITTI dataset. You can start with this for object detection. If you are looking for new methods then check out Voxel-RCNN, SE-SSD, or maybe VoTr-TSD. They all use the KITTI dataset. Now, the issue with the classification task is all the methods (like PointNet) I know have done single shape classification (Using maybe ShapeNet or ModelNet data), but you need a multi-label classifier. So maybe you can start with a detection/proposal network for proposing regions then use these final detections/regions for classification using another network like PointNet. You can get some inspiration from 2D vision regarding your case.
@@LightsCameraVision Hi I have annotated my data with ply file, it is in KITTI format. In some cases i have seen images along with ply. So my questions is. .PLY file is enough for custom object detection or should i also have images w.r.t .PLY file?
@@nikhilbharadwaj6692 Your point cloud in PLY format is enough for object detection. Most of the methods take only the point cloud as input. For example, VoxelNet, Point-RCNN, Point-RGNN, SA-SSD+EBM, etc. So you're good. There are some methods where they use 2D images along with point cloud, but you don't have to use them.
@@nikhilbharadwaj6692 Happy to help, bro. No, I only work with the existed datasets that are already annotated and used in other research, nothing new. But I think, for your case, you just have to change the data loader script in any popular method code on GitHub so that it can read your PLY files, then follow the instructions for the training. Some repositories actually use PLY files too. All the methods I mentioned before have code available on GitHub. Also, I never tried this following repo, but maybe it helps you somehow. github.com/jediofgever/PointNet_Custom_Object_Detection
Thank you for the amazing video! I am working on a project that is exploring the estimation of age using 3D facial depth maps. Is it possible for this PointNet implementation to be used for regression instead of classification?
Appreciate it. You definitely can do it. You just need to tweak a little at the end. I remember this paper using PointNet for regression. Check it out. arxiv.org/pdf/2010.04865.pdf ✌️
Não é mais o modelo mais preciso; existem modelos mais novos com melhor precisão. Com dados suficientes, ele deve ser capaz de lidar com uma quantidade significativa de classes. No entanto, não funciona bem em cenas complexas. Mas este é um dos primeiros modelos que deram início a tudo. Alguns dos novos modelos publicados recentemente ainda usam PointNet como backbone para extração de recursos ou outros motivos. Por ser tão fácil de usar, tenho certeza de que muitas pessoas na indústria começam com ele e, em seguida, desenvolvem de acordo com suas necessidades.
I hope this video helps you somehow. I’m currently working on point cloud object detection and classification in my research. Share your thoughts if you are also working on point cloud processing or interested in this topic.
Great video !!
But how to classify it end to end, any implementation for the same with code ??
Thank you! Appreciate it.
Both the PyTorch and TensorFlow GitHub links I shared in the video description are actually end to end learning implementation. In the video, I only discussed the main python model script that contains the architecture. But you'll find the training script and instructions on the GitHub page to classify point cloud end to end.
Let me know if you have any questions.
@@LightsCameraVision Thanks for the comment
I have a question:
My Goal is to make custom object detection
Taking an sample/example of a Sweater , I have .ply file of the same, So should i annotate for several .ply file or should i convert into hdf5 file for using custom object detection?
Thanks
@@myproject7762 No, you don’t need to convert the ply files. You can directly work with them, just use the appropriate python library like ‘plyfile’ to read the files. The PyTorch implementation of the PointNet link that I shared has code for reading ply files (here: github.com/fxia22/pointnet.pytorch/blob/f0c2430b0b1529e3f76fb5d6cd6ca14be763d975/pointnet/dataset.py#L171). Follow their process to read the files.
I guess you already have the annotations of your custom data, if not then you definitely need to do that first.
Once you created the dataloader using the above way, then you can use PointNet or VoxelNet or any architecture of your choice for object detection. Maybe this repo helps github.com/jediofgever/PointNet_Custom_Object_Detection. This repo shows an example of what you are looking for. It does not look so clean but should help a little.
Point Cloud processing code is not so easily available like 2D vision, so you need to write your own code in some cases. Hope this helps.
@@myproject7762Reply to your comment: "Hi so should i use 2D images to annotate-to-KITTI file format and then predict the same on #d Point cloud data ?" I accidentally deleted the comment, I guess.
If you already have annotations for 2D images then you can propagate that to 3D labels.
If not, then you can directly annotate the 3D point cloud. There are not many tools available for this. But this tool supervise.ly/lidar-3d-cloud/ looks a good fit for your case. It supports the KITTI format. Since the KITTI data/format is used as a benchmark in many algorithms, generating annotations for your custom data in the KITTI format is a good idea. This tool is well documented. You can also read this Medium post medium.com/deep-systems/releasing-first-online-3d-point-cloud-labeling-tool-in-supervisely-4faca42b5d6e about this tool.
Nice video. Thank you for the clear and easy-to-understand explanation of PointNet. It was super helpful to see the side-by-side comparison of the code and network block. It helped 👍
i have seen many vids, but urs are the best. straight forward
Thank you for the kind words. I really appreciate it. 🙂 Thanks for watching. ✌️
Hi, I recently reviewed pointnet too as part of my research. There are three main takeaways: Permutation invariance, Canonical space transformation & Local-Global knowledge. For permutation invariance, authors use symmetric functions such as Max() function. For Canonical space transformation, TNet is used and for local-global sharing, features learned from second mlp (N, 64) are combined with the input(N,1024) to the last mlp are passed to another MLP network. This simulates the local knowledge combined with original global knowledge.
Really liked your concise and clear explanaiton. Perhaps, a more detailed (~20 min) video just about the theory and another implementation video would be awesome. Regardless, if you plan to cover more networks, I would like more videos on PointNet++, GradSLAM and DeepGMR. Awesome! Thanks!
Thanks for your kind words. It's great to see that you are also passionate about point cloud processing. Thanks for the suggestions, I'll keep them in mind. ✌️
i found this video helpful;) so i liked it 👍
but I was wondering on how the information flow works. Like an Interpretation why it is able to see shapes. I did some research and made this summary of my findings:
ua-cam.com/video/di5g4Nb4hKs/v-deo.html
Which datasets you can use for this?
The paper used ModelNet40 and ShapeNet data. But you definitely can use other datasets like ScanNet, SceneNet, and many more like these. You can find all these datasets in the link below. Hope it helps.
paperswithcode.com/datasets?mod=3d&page=1
Great video! but can we apply this on a video as 3D?
You are awesome
Appreciate the kind words. 🙂✌️
Thanks for this valuable video explanation! I have a suggestion, for some video ideas. What do you think of explaining how to use a semantic segmentation network (eg. Randla net, etc.) Using open3d-ml with a general explanation followed by a quick overview of the paper behind it? They (open3d-ml) have several nets for semantic segmentation and the application is interesting
This just an idea
Thanks for the valuable content 😊
Thank you, Alex. I really appreciate it. 😊 I have added your suggestion to my video idea list.
How can I make my own point cloud dataset? What tools can I use to classify parts of the point clouds?
If you mean how to generate point cloud data, then you need a LiDAR device to capture point clouds, or you can use a simulator to augment real point cloud with synthetic obstacles and environment. There are simulators like CARLA you can check out.
For data annotation, you can try this tool (supervise.ly/lidar-3d-cloud/). It supports the KITTI format. Since KITTI data/format is used as a benchmark in many algorithms, generating annotations for custom data in KITTI format is a good idea. This tool is well documented. You can also read this Medium post (medium.com/deep-systems/releasing-first-online-3d-point-cloud-labeling-tool-in-supervisely-4faca42b5d6e) about this tool.
If you are looking for models for segmentation then you may look into PointNet, PointSIFT, Squeezesegv3 and there are many more models. Hope it helps. ✌️
Thank you so much !
Does this work good for large point cloud files say in GBs? Can it detect all the occurances of each class?
It should work assuming you have enough computational resources. The original PointNet architecture takes 1024 points as input which is definitely not much. You can also modify this for your project so that it can handle more points. Depending on what you are trying to classify you may not need many points or a big point cloud for each object. You can always downsample points.
Great Work man..!!
i need a info that, I Have annotated my ply file and i have the annoated file format in KITTI . So next how should i train for the same
Could you please help me out for the same ..!!
Thanks man. Appreciate it.
I think your final goal is to classify multiple objects from a scene. Let me know if it’s not the case.
So you need to start with object localization. There are quite a few options you have for this task. VoxelNet (CVPR’18) is a little older method, but it’s intuitive and it uses the KITTI dataset. You can start with this for object detection. If you are looking for new methods then check out Voxel-RCNN, SE-SSD, or maybe VoTr-TSD. They all use the KITTI dataset.
Now, the issue with the classification task is all the methods (like PointNet) I know have done single shape classification (Using maybe ShapeNet or ModelNet data), but you need a multi-label classifier.
So maybe you can start with a detection/proposal network for proposing regions then use these final detections/regions for classification using another network like PointNet. You can get some inspiration from 2D vision regarding your case.
@@LightsCameraVision
Hi I have annotated my data with ply file, it is in KITTI format.
In some cases i have seen images along with ply.
So my questions is. .PLY file is enough for custom object detection or should i also have images w.r.t .PLY file?
@@nikhilbharadwaj6692 Your point cloud in PLY format is enough for object detection. Most of the methods take only the point cloud as input. For example, VoxelNet, Point-RCNN, Point-RGNN, SA-SSD+EBM, etc. So you're good.
There are some methods where they use 2D images along with point cloud, but you don't have to use them.
@@LightsCameraVision Thank you soo much man..!!
So is there is any ready code to train custom object detection which you have done ?
@@nikhilbharadwaj6692 Happy to help, bro.
No, I only work with the existed datasets that are already annotated and used in other research, nothing new.
But I think, for your case, you just have to change the data loader script in any popular method code on GitHub so that it can read your PLY files, then follow the instructions for the training. Some repositories actually use PLY files too. All the methods I mentioned before have code available on GitHub.
Also, I never tried this following repo, but maybe it helps you somehow.
github.com/jediofgever/PointNet_Custom_Object_Detection
Thank you for the amazing video! I am working on a project that is exploring the estimation of age using 3D facial depth maps. Is it possible for this PointNet implementation to be used for regression instead of classification?
Appreciate it. You definitely can do it. You just need to tweak a little at the end. I remember this paper using PointNet for regression. Check it out. arxiv.org/pdf/2010.04865.pdf ✌️
@@LightsCameraVision Thanks so much for taking the time to point me in the right direction! Will check it out
Happy to help.
Eu trabalho com isso e nao sei se possui limitações em um ambiente industrial, com diversas coisas diferentes para classificar
Não é mais o modelo mais preciso; existem modelos mais novos com melhor precisão. Com dados suficientes, ele deve ser capaz de lidar com uma quantidade significativa de classes. No entanto, não funciona bem em cenas complexas. Mas este é um dos primeiros modelos que deram início a tudo. Alguns dos novos modelos publicados recentemente ainda usam PointNet como backbone para extração de recursos ou outros motivos. Por ser tão fácil de usar, tenho certeza de que muitas pessoas na indústria começam com ele e, em seguida, desenvolvem de acordo com suas necessidades.