YOLOv1 from Scratch

Aladdin Persson

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 23 гру 2024

КОМЕНТАРІ • 305

@AladdinPersson 4 роки тому ⁺⁵⁸
Here's the outline for the video:
0:00 - Introduction
0:24 - Understanding YOLO
08:25 - Architecture and Implementation
32:00 - Loss Function and Implementation
58:53 - Dataset and Implementation
1:17:50 - Training setup & evaluation
1:40:58 - Thoughts and ending
@venkatesanr9455 4 роки тому
Highly helpful and awesome
@omarabubakr6524 2 роки тому
why didn't you explain the utils file?
@PaAGadirajuSanjayVarma 4 роки тому ⁺⁹²
Plz give this man a noble proze
@deeps-n5y 3 роки тому
*Nobel
@iiVEVO 3 роки тому ⁺⁴
A noble nobel prize*
@MohamedAli-dk6cb 2 роки тому ⁺¹⁴
One of the greatest deep learning videos I have ever seen online. You are amazing Aladdin, please keep going with the same style. The connections you make between the theory and the implementation is beyond PhD level. Wish I can give you more than one like.
@asiskumarroy4470 4 роки тому ⁺¹⁴
I dont know how do I express my gratitude to you.Thanks a lot brother.
@_adi_1900 4 роки тому ⁺⁹
This channels going to blow up now. Great stuff!
@AladdinPersson 4 роки тому
🙏 🙏
@vijayabhaskar-j 4 роки тому ⁺⁹⁶
This series was super helpful, can you please continue this by making one for Yolo v3, v4, SSD, and RetinaNet? That will make this content more unique because none of the channels that explains all these architectures and your explanations are great!
@jertdw3646 2 роки тому
I'm confused on how i'm supposed to load the images up for training. Did you get that part?
@Glitch40417 Рік тому
@@jertdw3646on't know if you got it or not, actually there's a train.csv file.
Instead of 8examples.csv or 100examples.csv we can use that file.
@Anonymous-nz8wd 4 роки тому ⁺⁴
GOD DAMN! I was searching for this for a really long time but you did it, bro. Fantastic.
@keshavaggarwal5835 4 роки тому ⁺³
Best Channel ever. Cleared all doubts about YOLO. I was able to implement this in tensorflow by following your guide with ease. Thanks a lot bro.
@AladdinPersson 4 роки тому ⁺¹
Awesome to hear it! Leave a link to Github and people could use that if they are also doing it for TF?:)
@Skybender153 3 роки тому ⁺¹
Link for the tensorflow repo would be appreciated Keshav
@rampanda2361 3 роки тому ⁺¹
The savior, Been looking at codes of other people for few days, Could not understand it better as those were codes only with no explanation what so ever. Thank you very much.
@thetensordude 4 роки тому ⁺⁵⁵
Most underrated channel!!!
@vanglequy7844 3 роки тому
Let's look at it upside down then!
@thanhquocbaonguyen8379 3 роки тому ⁺⁷
massively thank you for implementing this in pytorch and explain every bits in detail. it was really helpful for my university project. i have watched your tutorials at least 3 times. thank you!
@abireo2285 2 роки тому
PhDs are 100% learning how to code here :)
@haldiramsharma4601 4 роки тому ⁺⁸
Best channel ever!! All because of you, I learned to implement everything from scatch!! Thank you very much
@caidexiao9839 2 роки тому ⁺³
Thanks a lot for you kindness to provide the yolov1 video. By the end of the video, you got mAP close to 1.0 with only 8 training images. I guess you used weights of a well trained model. With more than 10,000 images and more than 20 hours on Kaggle 's free GPU, my mAP was about 0.7, but my validation mAP was less than 0.2. Nobody mentioned the over fitting issue of yolo v1 model training.
@satvik4225 6 місяців тому ⁺²
mine is coming 0.0 always
@TornadoFilms_ 5 днів тому
@@satvik4225 yeea why is that , did u got that fixed
@nguyenthehoang9148 Рік тому ⁺¹
By far, your series is one of the best content about computer vision on UA-cam. It's very helpful when people explain how things work under the hood, like the very well-known courses by Andrew Ng. If you make a paid course for this kind of content, I'll definitely buy it.
@pphuangyi Рік тому
Thanks!
@_nttai 4 роки тому ⁺³
I was lost somewhere in the loss but still watch the whole thing. Great video. Thank you
@sangrammishra4396 2 роки тому ⁺¹
I love the way he explained and always maimtain simplicity in explaining the code, thanks aladdin
@eminemhc5763 4 роки тому ⁺⁵
Only 3.5K subscribers ??? One of the most underrated channel in UA-cam
Keep posting quality video like this bro , soon you will reach 100K+ subs , congrats in advance
Thanks for the quality content :)
@AladdinPersson 4 роки тому ⁺¹
Appreciate the kinds words 🙏 🙏
@Тима-щ2ю 7 місяців тому
What an amount of work! I don't often see people in the internet that are so dedicated to deep learning!
@ai4popugai Рік тому
The most clear explanation that I have ever found, thank you!!
@sachavanweeren9578 2 роки тому ⁺²
I can imagine this video took a lot of time to prepare, the result is great and super helpful. Thank you very much. Respect!
@abireo2285 2 роки тому
This is the best deep learning coding video I have ever seen.
@WiktorJurek 3 роки тому ⁺³
This is insanely valuable. Thank you very much, dude.
@krzysztofmajchrzak1881 4 роки тому ⁺¹
I want to thank so much! It is literally a live saver for me! Your channel is underrated!
@ИльяЯгупов-н4я Рік тому
Thank you so much for this video, it's so helpful! Especially the concept in first 9 minutes. I read a lot of sources, but here it's the only place where it is clearly explauned. And more precisely the part where we are looking for a cell with midpoint of bounding box! Thank you so much for a great Explanation!
@crazynandu 4 роки тому ⁺¹⁴
Great Video as usual . Looking forward to see RCNNs (mask , faster , fast , ..) from scratch from you !! Similar to Transformers you did, you can do one from scratch and other using the torchvision's implementation .Kudos !!
@nikolayandcards 4 роки тому ⁺³
So glad I came across your channel (Props to Python Engineer). Very valuable content. Thanks for sharing and you have gained a new loyal subscriber/fan lol.
@AladdinPersson 4 роки тому
Welcome 😁
@张子诚-z3b 3 роки тому
I'm a beginner of object detection, You videos help me a lot. I really like your style of code.
@majtales 4 роки тому ⁺¹
@27:05 why flatten again? Isn't it already flattened in the forward method of the class?
Also, do we really need to flatten? @51:22 The MSELoss documentation says it sums over all dimensions by default. Also how did you work around that division by zero?@1:33:15
@정래혁-c8y 3 роки тому ⁺²
This video was so helpful. Thank you!
@shantambajpai8064 4 роки тому ⁺²
Dude, this is AMAZING !
@zachhua7704 2 роки тому ⁺⁴
Hi Aladdin, thanks for the great tutorial. I got a question at 1:13:09, in the paper, authors say the width and height of each bounding box are relative to the whole image, while you say they are relative to the cell. Is that a mistake?
@TheDroidMate Рік тому
Amazing video series, thanks! Extra kudos for the OS you're using 💜
@poojanpanchal3721 4 роки тому
Great Video!! never seen anyone implementing a complete YOLO algorithm from scratch.
@AladdinPersson 4 роки тому ⁺¹
...and I understand why :3
@Epistemophilos 2 роки тому ⁺⁶
Is there a mistake in the network diagram in the paper? Surely the 64 7x7 filters in the first layer result in 64 channels, not 192? What am I missing? If it is a mistake (seems highly unlikely), then the question is if there are really 192 filters, or 64.
@chocorramo99 4 місяці тому ⁺¹
64 kernels and there are 3 channels, 192 resulting channels. lol kinda late.
@Epistemophilos 4 місяці тому ⁺²
@@chocorramo99 Linear algebra is timeless! Thanks :D
@ignaciofalchini8264 3 роки тому
you are awesome bro, really nice job, best YOLOv1 video in existence, thanks a lot
@vishalm2338 4 роки тому
Thanks a ton Aladdin for making this video. I truly loved it. Also, Would like to see Retinanet implementation . It would be really fun to watch too. Kudos to you!!
@francomozo6096 4 роки тому
Thank you man!!!! Great video! Gave me a really good understanding on Yolo, will subscribe
@1chimaruGin0_0 4 роки тому ⁺²
Great work as always!
This video help me a lot to understand my confusion about yolo loss.
Could you do some video on Anchors and Focal loss?
@AladdinPersson 4 роки тому ⁺²
I'll revisit object detection at some point and try to implement more state of the art architectures and will look into it :)
@bradleyadjileye1202 Рік тому
Absolutely wonderful, thank you very much for such a fantastic job !
@ilikeBrothers 3 роки тому ⁺¹
Просто топчик! Огромное спасибо за столь подробное разъяснение ещё и с кодом.
@sumitbali9194 3 роки тому
Your videos are a great help to data science beginners. Keep up the good work 👍
@krishnasumanthmannala984 4 роки тому
At 03:42 the width and height of an object are relative to the image I think wrt YOLO 1.
@jeroenritmeester73 3 роки тому ⁺³
How does the very first layer of the DarkNet with out_channels = 64 produce 192 feature maps? I understand that 3*64 = 192 but I don't really see how that applies.
Similarly, the second step has a convlution of 3x3x192, but there are 256 feature maps afterwards.
@DanielPietsch-o6r Рік тому
I am also confused about that part. In my understanding it should be 7x7x3 and then 192 total kernels, right?
@adarshsingh936 3 роки тому ⁺²
Can someone explain the use of unsqeeze(3) at 43:55
@haideralishuvo4781 4 роки тому
FInally , Most waited video , Will have a look asap
@NamNguyen-fn5td 3 роки тому ⁺¹
Hi. I have question at 1:12:29. Why "x_cell, y_cell = self.S * x - j, self.S * y - i" minus j and i ? What does this mean?
@NamNguyen-fn5td 3 роки тому
at 50:27 if you not flatten box_predictions and box_target in MSEloss, it is the same result as flatten
@anierrn6935 3 роки тому
35:35 explanation about square roots for w,h
@jitmanewtyagi565 3 роки тому ⁺¹
Broooooo, thanks for this man.
@proxyme3628 Рік тому
Regarding the loss for the confidence (FOR OBJECT LOSS part in loss.py), the label Ci should be IoU? In the code, it is (torch.flatten(exists_box * target[..., 20:21]), but because exists_box is target[..., 20:21], it is just a square of target[..., 20:21]? The original v1 paper said "Formally
we define confidence as Pr(Object) IOUtruth
pred . If no
object exists in that cell, the confidence scores should be
zero. Otherwise we want the confidence score to equal the
intersection over union (IOU) between the predicted box
and the ground truth", which suggests the Ci_hat is to be calculated from IoU.
@horvathbalazs1480 4 роки тому ⁺³
Hi, I really appreciate your work and patience to make this video, however I would like to ask the following: The loss function is created based on the original paper, but the loss for bounding box midpoint coordinates (x,y) are not included because we calculate just the sqrt of width, height of boxes. Am I right?
@horvathbalazs1480 4 роки тому ⁺³
Okay, sorry for the silly question. I just noticed that we should not get the squared root of x,y so that's why we skip here:
box_predictions[..., 2:4] = torch.sign(box_predictions[..., 2:4]) * torch.sqrt(
torch.abs(box_predictions[..., 2:4] + 1e-6)
)
box_targets[..., 2:4] = torch.sqrt(box_targets[..., 2:4])
@heriun7268 3 роки тому
4:00 I think you are wrong. w,h is realative to the whole image. check paper Section 2.Unified Detection - 4th paragraph
@santoshwaddi6201 3 роки тому
Very nicely explained in detail.... Great work
@SamtapesGamer Рік тому
Amazing!! Thank you very much for all these lessons! It would help me a lot if you could make videos implementing Kalman Filter and DeepSort from scratch, for object tracking
@changliu3367 3 роки тому
Awesome video. Pretty helpful! Thanks a lot.
@vikramsandu6054 3 роки тому
Your name is Aladdin but you are a genie to us. Thanks for this video.
@zukofire6424 Рік тому
Thanks! I don't understand the code regarding the bounding boxes though... Could you do a deep dive into the bounding boxes calculations AND show how to test on a new image?
@pixarlyVII 3 роки тому ⁺¹
I have a question. At 39:41 you, from utils, import intersection_over_union. I thought that dataset.py, loss.py, ..., utils.py where empty python files. Why did you imported a function from utils.py if in the tutorial we dont code anything in this file?
I've followed the tutorial and Im stucked at 59:50 bc my code cant import name "intersection_over_union" from "utils".
@pixarlyVII 3 роки тому
Nada, soy gilipollas. Me he copiado el archivo utils.py de lo que has subido a GitHub y ya va.
It would be interesting to code that part (utils.py) too in the tutorial.
@Sky-nt1hy 3 роки тому
There's an error at 56:11, line 10 : it Should be target[..., 25:26] instead of target[..., 20:21] for the no-object-detected loss
@정현호-u9j 3 роки тому
I think target[...,20:21] is right. The target's index for last dimension ends at 24, I think
@Sky-nt1hy 3 роки тому
@@정현호-u9j 아하 그렇군요! 감사합니다ㅎㅎ
@정현호-u9j 3 роки тому
제가 이해하기로 target data 값은 7x7x25 의 shape을 갖어요. 25중 앞의 20은 각 01000.. 처럼 해당 class가 one hot encoded 된 값이고. 나머지 5는[ confidence score, x,y,height,width]인데, 왜 5*2가 아니라 5냐면 논문에서는 하나의 grid cell에는 하나의 true object( 정확히는 object를 감싸는 bounding box의 mid point)만 존재한다고 가정하고 있어요. 혹시 제가 틀렸다면 고쳐주세요
@buat_simple_saja 2 роки тому
Thank you man, your video help me a lot
@ZXCOLA-z7s 2 роки тому
That’s totally awesome!
@bhavyashah8674 2 роки тому ⁺¹
Hii @Aladdin Persson. Amazing video. I just have a doubt. While calculating iou for true_label and pred_labels, should we not add the width and height that we clipped when creating true_labels? That is, in case of the example you gave of [0.95, 0.55, 0.5, 1.5], shouldn't we convert 0.95 to 0.95(as the cell we chose is in 0th index along the width) and 0.55 to 1.55(as the cell we chose is in 1st index along the height). This is because we are doing geometric operations like converting x_centre and y_centre to xmin, ymin, xmax and ymax and on not doing the conversion I mentioned, instead of getting the xmin, ymin, xmax and ymax of the bounding box we get some other coordinates instead.
Also could you please create the same using Tensorflow?
@mahdiamrollahi8456 2 роки тому ⁺¹
Hello. Why the target and prediction are in different shapes?
@vijayabhaskar-j 4 роки тому
at 42:13 shouldn't that be [...,25:29] not [...,26:30] as the first iout_b1 covers 21,22,23,24 and the second should cover 25,26,27,28? or 25th is the confidence score and 26,27,28,29 are the second bounding boxes?
@AladdinPersson 4 роки тому ⁺²
Yes you're correct, 25th is for the confidence score for the second bbox and 26:30 (remember it's non-including the 30th index) so I think what is shown is correct
@saeeddamadi3823 3 роки тому
At 1:05:41 you mention your video of how to build a custom dataset. Please link it to the video to enhance your informative channel.
@Old_SDC Рік тому
Will be back, just need a quick break 35:30
Downloading 59:42
@josephherrera639 3 роки тому ⁺³
Do you mind showing how to plot the images with their bounding boxes (and how that can be applied to testing on new data)? Also, do all images have a maximum of 2 objects to localize?
@R0Ck50LiD-b5z 2 роки тому ⁺¹
Hi, do you have any details on how you prepared the dataset?
@vamsibalijepally3431 4 роки тому ⁺¹
def test(S=7, B=2,C=20):
model = Yolov1(in_channels=3,split_size=S,num_boxes = B,num_classes=C)
x = torch.randn((2,3,448,448))
print(model(x).shape)
this will throw help if got same error like me
__init__() missing 1 required positional argument: 'kernel_size'
@pranavkushare6788 4 роки тому
Yeah i'm getting the same error.
Have you found any solution and reason ?
@chinmay996 3 роки тому
@@pranavkushare6788 if you still have not solved the problem, check your parameters in CNNBlock inside _create_conv_layers method.
@mizhou1409 3 роки тому
Great job, very helpful for a new beginner.
@soorkie 4 роки тому ⁺⁷
Hi, can you do a similar one with Graph Convolutional Networks? Your videos are very usefull ❤️
@canyi9103 Рік тому
4:24, In paper the width and height are predicted relative to the whole image. they can not be larger than 1, but in your video, you said it can larger than 1. It seems not right
@patloeber 4 роки тому
Amazing effort!
@AladdinPersson 4 роки тому
Thank you:)
@larafischer420 Рік тому ⁺¹
muito boa essa série de vídeos! Vc pode passar as referências q vc usa pra montar esses notes? Tenho dificuldade em encontrar materiais pra estudar
@sb-tq3xw 4 роки тому
Amazing Work!!
@markgazol5404 4 роки тому ⁺²
Very clear and helpful! Thanks for the videos. I've got one question, though, Can you please explain what is the label for the images with no objects? During the training should it be like [0, 0, 0, 0, 0] or smth?
@GursewakSinghDhiman 3 роки тому
You are doing an amazing job. Thanks alot
@amartyabhattacharya Рік тому
One question that I have is, How can I get to know the coordinates of the grid cell of which the centers are a part of? Is it like (1,1) of the output prediction gives the prediction for grid cell having two endpoints as (0,0),(64,64) ? (448/7 = 64)
@yantinghuang7491 4 роки тому ⁺¹
Great video! Will you make "from scratch" series video for Siamese network?
@AladdinPersson 4 роки тому
I'll look into it! Any specific paper?
@yantinghuang7491 4 роки тому
@@AladdinPersson Thanks Aladdin! This one should be a good reference: Hermans, Alexander, Lucas Beyer, and Bastian Leibe. "In defense of the triplet loss for person re-identification." arXiv preprint arXiv:1703.07737 (2017).
@leochang3915 4 роки тому
Thank you , you really help me a lot!
@danlan4132 2 роки тому
Thank you very much!!!! Excellent video!!!! By the way, do you have any tutorials for oriented bounding box detection?
@qichongxia2110 10 місяців тому
very helpful! thank you !
@jaylenzhang4198 Рік тому
My understanding of this λ_noob-associated loss function is that it is used to penalize false negatives. This λ_noob-associated loss function includes all grid cells that do not contain any objects but have confidence scores larger than 0. Since there will be a lot of these false negatives, the author adds the coefficient λ_noob to lower their ratio in the overall loss function.
@nova2577 4 роки тому
Appreciate your effort!!
@shenbin2930 2 роки тому ⁺¹
When I use the code, the detection accuracy of the training set is very good, but the detection accuracy of the test set is almost equal to 0, which is obviously overfitting.
In fact, the original code is to train an overfitting model, but I have modified some of the code. Why is it still overfitting?
I have made the following modifications：
nn.Dropout(0) -> nn.Dropout(0.5)
WEIGHT_DECAY = 0->WEIGHT_DECAY = 2e-4
This question has bothered me for a long time. I would appreciate it if you could answer it.
@FanFanlanguageworld1707 2 роки тому
How many images you trained with?
@m4gh3 Рік тому
I got the same results, I too am trying to understand what is going on
Also I can overfit with a way smaller network
@vaibhavsingh1049 3 роки тому ⁺¹
I think there's a mistake in how you rescale the width and height of the bounding box to be greater than 1 because in the paper it stated as follows:
"We normalize the bounding box
width and height by the image width and height so that they
fall between 0 and 1. We parametrize the bounding box x
and y coordinates to be offsets of a particular grid cell location so they are also bounded between 0 and 1".
See that all the values for the box lie between 0 and 1, x and y relative to the cell, and w and h relative to the entire image size.
If I'm wrong please correct me.
@정현호-u9j 3 роки тому
I agree with this dude
@NamNguyen-fn5td 2 роки тому
@@정현호-u9j I think in this video he does this because Yolo model make image to 7x7x30 . So does he have to do it so that the label fits the image size ?
@wuke4231 Рік тому
thank you for your video!😘
@Wh1teD 3 роки тому ⁺¹
Very informative video and I think I understood the algo but there is one doubt I have: the code you wrote would only work with this specific dataset? If I would want to use a different dataset, would I need to rewrite the bigger part of the code (i. e. the loss function, the training code)?
@PaAGadirajuSanjayVarma 4 роки тому
I am glad I found your channel
@apunbhagwan4473 3 роки тому ⁺¹
He is simply Great
@kayleescanlin4699 5 місяців тому
Can someone explain what the "conda activate dl" means in the terminal at 57:27? Is that a specific environment to download or is it something we create ourselves?
@SAnish-uj4jc 4 місяці тому
yo im not able to understand the code ?? am i missing something please help
@dominicyang-y8b 3 роки тому
您好，貌似在数据集方面有一定问题，您直接使用resize方法可能会造成图像的失真，我认为在图像中添加灰条的方式更加合理一些
@NityaStriker 4 роки тому ⁺¹
Hi.
I’m unable to load the PascalVOC_YOLO dataset within a Colab notebook due to the dataset being private. Is there a way to use the dataset in a Colab notebook without downloading it on my laptop ?
@AladdinPersson 4 роки тому
I'm not sure, I think you need to download it. Isn't there a way to upload the dataset to Colab so you can run it?
@NityaStriker 4 роки тому
@@AladdinPersson There is, but my internet connection is not the fastest while having a small data cap which is why I usually use !wget within colab itself. In this case, both the !wget command and Kaggle’s command failed within Colab for the Kaggle file after which I wrote the above comment.
Later, I copied the code from the get_data file, pasted it onto a cell, added a few lines of code for creating 8examples.csv and 100examples.csv, and ran it for the code to work.
@hetalivekariya7415 2 роки тому
Why I did not come across your channel before!!. But anyways I am glad I found your channel. Thank you.
@nerdyguy7270 2 роки тому ⁺²
Hi, this is awesome and really helpful. I was going through the yolov1 paper and found that the height and the width are relative to the whole image and not to the cell. Is that correct?
@siddhantjain2591 4 роки тому ⁺²
Awesome as always!
Could you do some video on EfficientNets sometime, that would be great !
@КонстантинМогилевский-о2л 3 роки тому
Really great tutorial, but why do we do flatten in forward pass of the Yolov1 and also nn.Flatten() in the _create_fcs? Isn't it redundant?
@berkgur868 2 роки тому
Why are we multiplying the width, height loss with sign of the gradient? I did not get it.
@venkateshvaddadi271 3 роки тому
great job brother
you are really awesome
@omarhesham7390 7 місяців тому
Fantastic Bro

Наступне

Автоматичне відтворення