Thank you so much for the video. This is the best explanation of RCNN model family so far on UA-cam. You should continue this series with topics like mask RCNN etc.
Thank You. Yes will definitely cover that. I might do a different series on Instance Segmentation covering MaskRCNN YoloAct e.t.c but will indeed cover MaskRCNN in one of the videos.
@@amirhosseinizadi7125 Currently I am working on ControlNet video(should finish that in 3-4 days), and after that will do one on Yolo. MaskRCNN would take some time as that will be part of a different series which I havent started working on yet.
Thanj you very much for the video, it is very interesting. Though, I have one question, at 15:40 timestanp. You mention that there may be a situation, where ground truth box doesn't have big IoU with any of anchor boxes. How do we pick these anchor boxes (i just cant get which methodology we have to follow when picking the dimensions for anchor boxes)?
Thank You! For faster rcnn, that is the reason why we add low overlap anchor boxes as well(if they are indeed the best anchor box available). Here the authors did not tune anchor box selection at all for a dataset, they just pick one which captures large enough variation in terms of scale and aspect ratio. In models like yolov2 , they use the anchor box strategy but use k means to pick the the best anchor boxes. So once you use k means on your ground truth box dimensions, you end up with cluster centres that are good representatives of bxo dimensions in your dataset. These cluster centres then become a good choice for your anchor boxes width and height.
Hi I really like your videos. Thanks for all the effort. I have a question, why do they use foreground and background classification, what's the significance of classifying a proposed region into fg and bg? At the end of the day we're interested to classify that box as a region proposal or not right?
Hello, Thank you! Regarding your question, the purpose of RPN is to perform the responsibility of proposing regions. Using anchor boxes we have divided image into different regions, now these are not proposals that potentially contain an object, these are just regions of image. RPN needs to learn the foreground and background classification on these regions(whether this region contain an object or not) to be able to pass only high scoring valid proposals(those which have high foreground class probability, out of all anchor boxes) to detection layers. Do let me know if I have somehow misunderstood your question.
Thanks for such a good video. Your explanation was more detailed and understandable than others'. But I'm a bit confused how goes second stage. I would be happy if you clarify or correct my understanding: As far as I understand, in the second stage model contains from CNN, RPN and Fast R-CNN. Then we train CNN and Fast R-CNN but we freeze RPN. RPN just generates regions but RPN weights are frozen in the second stage, am I right?
So what you mentioned is one of the ways of training. There are three approaches training depending on if the backbone is shared or not. 1. Training Faster RCNN where RPN and FastRCNN models have separate backbone(lower performance). Here, we first train RPN_Backbone + RPN in Stage I to have a model capable of generating regions. Then in Stage II we train FastRCNN_Backbone(different from Stage I) + FastRCNN using proposals from already trained RPN(here RPN is now frozen and not trained in Stage II). 2. Joint training with shared backbone where Backbone + RPN + FastRCNN, all are trained together. RPN classification and localization loss is used to update parameters of Backbone + RPN FastRCNN classification and localization loss is used to update parameters of Backbone + FastRCNN 3. 4 Step Alternate training with shared backbone This is talked about in the video @25:23 Stage I - RPN + Backbone trained Stage II - Fast RCNN + Backbone trained(RPN frozen) Stage III - RPN Fine tuned(Backbone frozen) Stage IV - FastRCNN fine tuned(RPN and Backbone frozen) Do Let me know if you have more questions around this.
Hello will you implement it from scratch especially in tensorflow? Plus i have been struggling a bit with yolo v2 implementation. Would also need help on that😅
Hello, Actually I have limited experience(and by limited I mean zero) with tensorflow, so the implementation would be in Pytorch. That video should be out in 3-4 days and my hope is that it will be of some help to you, using which you could get a better understanding of tensorflow implementation as well. And yes this series will indeed have videos on different yolo versions . Would take some time to get through all of them but it will have it.
Thanks for the videos. Very useful. Just a small request. If you try to simplify your speech by reducing " ... of ... of ... of ... of", it'll be really helpful to understand in one go. It's bit hard to follow without subtitles, even though I'm closer to your country. Nothing to put you down, but as a constructive feedback to see you improve in your videos; we all have patterns in speech.
Thank you so much for the video. This is the best explanation of RCNN model family so far on UA-cam. You should continue this series with topics like mask RCNN etc.
Thank You. Yes will definitely cover that. I might do a different series on Instance Segmentation covering MaskRCNN YoloAct e.t.c but will indeed cover MaskRCNN in one of the videos.
@@Explaining-AI Thank you😊
@@Explaining-AI when mask rcnn and yolo drop?
@@amirhosseinizadi7125 Currently I am working on ControlNet video(should finish that in 3-4 days), and after that will do one on Yolo.
MaskRCNN would take some time as that will be part of a different series which I havent started working on yet.
@@Explaining-AI Thank you so much. Your videos made detection models more clear for me. If you cover mask r-cnn in a video, it would be a great help!
Thanj you very much for the video, it is very interesting. Though, I have one question, at 15:40 timestanp. You mention that there may be a situation, where ground truth box doesn't have big IoU with any of anchor boxes. How do we pick these anchor boxes (i just cant get which methodology we have to follow when picking the dimensions for anchor boxes)?
Thank You! For faster rcnn, that is the reason why we add low overlap anchor boxes as well(if they are indeed the best anchor box available). Here the authors did not tune anchor box selection at all for a dataset, they just pick one which captures large enough variation in terms of scale and aspect ratio. In models like yolov2 , they use the anchor box strategy but use k means to pick the the best anchor boxes. So once you use k means on your ground truth box dimensions, you end up with cluster centres that are good representatives of bxo dimensions in your dataset. These cluster centres then become a good choice for your anchor boxes width and height.
Love your work! This channel is really helpful.
Thank you for this
what do you mean by topk proposals 2000, is this from the single image we take 2000 proposals?
Hello @raihanpahlevi6870, Yes thats correct. 2000 proposals are taken from a single image.
Such a good content, Keep on!
Hi I really like your videos. Thanks for all the effort.
I have a question, why do they use foreground and background classification, what's the significance of classifying a proposed region into fg and bg? At the end of the day we're interested to classify that box as a region proposal or not right?
Hello,
Thank you! Regarding your question, the purpose of RPN is to perform the responsibility of proposing regions. Using anchor boxes we have divided image into different regions, now these are not proposals that potentially contain an object, these are just regions of image. RPN needs to learn the foreground and background classification on these regions(whether this region contain an object or not) to be able to pass only high scoring valid proposals(those which have high foreground class probability, out of all anchor boxes) to detection layers.
Do let me know if I have somehow misunderstood your question.
Thanks for such a good video. Your explanation was more detailed and understandable than others'. But I'm a bit confused how goes second stage. I would be happy if you clarify or correct my understanding:
As far as I understand, in the second stage model contains from CNN, RPN and Fast R-CNN. Then we train CNN and Fast R-CNN but we freeze RPN. RPN just generates regions but RPN weights are frozen in the second stage, am I right?
So what you mentioned is one of the ways of training. There are three approaches training depending on if the backbone is shared or not.
1. Training Faster RCNN where RPN and FastRCNN models have separate backbone(lower performance).
Here, we first train RPN_Backbone + RPN in Stage I to have a model capable of generating regions.
Then in Stage II we train FastRCNN_Backbone(different from Stage I) + FastRCNN using proposals from already trained RPN(here RPN is now frozen and not trained in Stage II).
2. Joint training with shared backbone where Backbone + RPN + FastRCNN, all are trained together.
RPN classification and localization loss is used to update parameters of Backbone + RPN
FastRCNN classification and localization loss is used to update parameters of Backbone + FastRCNN
3. 4 Step Alternate training with shared backbone
This is talked about in the video @25:23
Stage I - RPN + Backbone trained
Stage II - Fast RCNN + Backbone trained(RPN frozen)
Stage III - RPN Fine tuned(Backbone frozen)
Stage IV - FastRCNN fine tuned(RPN and Backbone frozen)
Do Let me know if you have more questions around this.
@@Explaining-AI Yeah now it's more clear! Thank you for your answer!
such a good explanation and illustration! Please make about YOLO
Thank You! Yes, as part of this series, will be making videos on different versions of YOLO as well.
Hello will you implement it from scratch especially in tensorflow?
Plus i have been struggling a bit with yolo v2 implementation. Would also need help on that😅
Hello, Actually I have limited experience(and by limited I mean zero) with tensorflow, so the implementation would be in Pytorch. That video should be out in 3-4 days and my hope is that it will be of some help to you, using which you could get a better understanding of tensorflow implementation as well.
And yes this series will indeed have videos on different yolo versions . Would take some time to get through all of them but it will have it.
@@Explaining-AI
Thanks for the information.
Keep up the great work 👍
Thank you!
Thanks for the videos. Very useful. Just a small request. If you try to simplify your speech by reducing " ... of ... of ... of ... of", it'll be really helpful to understand in one go. It's bit hard to follow without subtitles, even though I'm closer to your country. Nothing to put you down, but as a constructive feedback to see you improve in your videos; we all have patterns in speech.
Thank you for this feedback. I will definitely try to take care of this in the future videos.