Pytorch ResNet implementation from Scratch

Aladdin Persson

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 16 лис 2024

КОМЕНТАРІ • 128

@prudvi01 3 роки тому ⁺²⁹
Honestly I've put this video aside for a while because it was 30 minutes long but it didn't even feel like 30 minutes now that I've watched it. I now understand the architecture really well. Thank you!
@doggydoggy578 Рік тому
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@codevacaphe3763 5 місяців тому
@@doggydoggy578 It probably means you have a dimensional mismatch in some layers maybe the identity mapping ?
@alirezasadri4081 4 роки тому ⁺⁶
thanks for the video, a minor change that improves your code, is to implement residual mapping inside the class block. If you look at figure 2 from the paper, the definition of a block uses the mapping. Here you have put the mapping as part of the design of the network. This suggests that you a re very experienced with networks without skip connections and just changed them in your imagination rather than defining a block. :), still works, I know.
@AladdinPersson 4 роки тому
Interesting viewpoint, am I understanding you correctly in that you would rather have the identity_downsample inside the init of the class block?
@alirezasadri4081 4 роки тому ⁺²
Hi, yes. so it moves from _make_layer to inside __init__ of block, but carefully
@rekindle9 3 роки тому
very useful for beginning researchers who don`t know how to implement papers work!
@Bunny-eh4ji 3 місяці тому
Thank you for tutorial. You're a real mad lad for this.
@1chimaruGin0_0 4 роки тому ⁺³
Thank for this tutorial. Need tutorial for EfficientNet
@AladdinPersson 4 роки тому ⁺¹
Noted!
@amegatron07 Рік тому
I love the idea of residual layers. Not taking math into account, on a higher level it intuitively seems useful, because with usual layers, the low-level information gets lost from layer to layer. But with skip-connections, we keep track of lower-level information, sort of. Unfortunately, I can't now remember the IRL-example to depict this, but in general it is the same: while constructing something high-level, we don't only need to see what we have at this high-level, but also need to keep track of some lower-level steps we're performing.
@doggydoggy578 Рік тому
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@erdichen984 3 роки тому ⁺¹
The best ResNet tutorial ever, Thank you !!! if possible, please help us to make the tutorial about Siamese Network
@siddhu8224 3 роки тому
I have watched only once but you explained really well. Right now working on some assignment hope this could help me. Thanks man. You lift my hopes on this Resnet. Thanks keep sharing knowledge.
@doggydoggy578 Рік тому
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@Dougystyle11 4 роки тому ⁺¹
Thank you for the tutorial series, it's been great so far. I gotta say, ResNet implementation is trickier than it looks haha
@AladdinPersson 4 роки тому ⁺³
Thanks dougy I like your style ; ) Yeah ResNet was definitely the hardest one. Initially I thought Inception would be the hardest because I felt it was conceptually more difficult whereas the idea behind ResNet is super simple. But the implementation was totally the opposite
@abhishek-shrm 3 роки тому ⁺¹
@@AladdinPersson Haha! Looks like Aladdin was waiting for the moment to go on full about how much time he spent figuring out the implementation of ResNet. By the way, great video as always.
@talha_anwar 3 роки тому ⁺¹
This is my favorite series
@MrMarcowally 4 роки тому ⁺⁴
like the previous comment said, please do an EfficientNet from Scratch
@AladdinPersson 4 роки тому
Will look into that!
@Rafi-nc3nw 3 роки тому
Sir, please suggest how anyone can reach at your coding level. Just how you have done the coding of ResNet netwrok is mindblowing!!!
@kamikamen_official 8 місяців тому
Will give this a look.
@adesiph.d.journal461 4 роки тому ⁺²
Hello Aladdin,
Great Videos. To appreciate your efforts and encourage you to make more! joined your Community. I was implementing this video and got stuck on making sense of the identity_downsample. I would really appreciate if you could spare some information on what exactly is the role of an identity_downsample.
@adesiph.d.journal461 4 роки тому ⁺¹
My understanding is in residual networks with skip connections, the output is f(x) + x . We want the f(x) and x to be of the same dimensions. So to do that, we use an identity downsample on x to make sure they [f(x) and x] are of the same size?
@AladdinPersson 4 роки тому ⁺¹
Appreciate the support 👊 Yeah you're exactly right, when running x through f(x) the shapes might not match in order to do the addition and we might need to modify it which we do through the identity_downsample function. I think coding ResNets could be done in a more clear way and I might revisit this implementation if there's a better way of implementing it
@adesiph.d.journal461 4 роки тому
@@AladdinPersson thanks for coming back! You deserve all the support. Sure looking forward to see a newer implementation if you are going for it!
@reemawangkheirakpam8165 3 роки тому
awesome ..can you make a video on ensembling please
@abhisekpanigrahi1033 Рік тому
Hello Aladdin , Can you please make video explaining the concept of _make_layer function. It is really confusing.
@sourodipkundu8421 2 роки тому ⁺²
I didn't understand where did you implement the skip connection?
@terjeoseberg990 7 місяців тому
“x += identity” is the skip connection. “identity” is set to the input at the top of the function, then added to the output at the end, thus skipping all the calculations in the middle.
@anthonydavid6578 Рік тому ⁺¹
i expect the whole reimplement that includes the dataset preprocessing, training code, visualization and so on; is there any of these videoes?
@Rohit-bs4zv 4 місяці тому
Exactly
@saruaralam2723 3 роки тому ⁺¹
@Aladdin Persson Could you also make a hands-on coding video of Efficient net
@lalithaevani5942 Рік тому
For the Stride part for down-sampling in each layer, in the paper, it is written to down-sample at conv3_1, conv4_1 and conv5_1. If I understand your code correctly does it mean that there are conv3_0, conv4_0, and conv5_0 and hence stride of 2 is applied to the second block in each layer?
@hengzhichen8932 3 роки тому
Thank you for your tutoring Aladdin. Since block is not saved in ResNet Class. Can we delete block argument from ResNet __init__ and makerlayer.
@zakariasenousy4551 4 роки тому ⁺²
Thanks for this. Very helpful :)
Can you do for us an implementation of ensemble model of resnets and densenets?
@AladdinPersson 4 роки тому ⁺¹
Are you looking for how the training structure would look like when we are training an ensemble of models?
@ZobeirRaisi 4 роки тому ⁺³
Thanks for Tutorial, Do you have any program to implement U-Net?
@AladdinPersson 4 роки тому ⁺⁴
I have not heard of U-Net before so I havn't unfortunately. Seems like an interesting architecture from reading the paper abstract. I'll add it to the list and if I got time I can do it :)
@glassylove 4 роки тому ⁺³
@@AladdinPersson +1 for an implementation of U-Net.
@danish551 3 роки тому
+1 for UNet
@홍중택 2 роки тому ⁺¹
Hi Aladdin! Thanks so much for a great content. I had a quick question at aroud 3:50 (calculating the padding). I'm looking at this formula [(W−K+2P)/S]+1 that people often use to calculate the output size, and tried letting W = 7, K = 3, S = 2 etc, but I just don't see how a P=3 would get us an output of 112. How can I calculate/estimate padding sizes from input and output sizes (+ kernel sizes, steps, etc)?
@berkgur868 2 роки тому
Padding gets ceiling functioned from 2.5 to 3! this is the case with most of the odd numbered kernels :p, I believe caffe used to round down from 2.5 to 2 back in the day
@doggydoggy578 Рік тому
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@nuriyeakin1206 3 роки тому ⁺²
Can you make a video on implementing mask rcnn from scratch :)
@neelabhmadan6820 4 роки тому
Excellent work!! Super fun
@kostiantynbevzuk3807 Рік тому
Sry probably for stupid question, but dont we need to pass stride as parameter in `class.block.conv1` and set padding to 1, and `block.conv2` to stride=1 and padding to 0 instead? Or am I missing something from original paper?
@ywy6810 Рік тому
Thanks sir you are so kind
@grahamastor4194 3 роки тому
Any chance you could create a Tensorflow version of these advance network implementations? Many thanks, super useful videos.
@КириллНикоров 4 роки тому ⁺²
Thank you for tutorial. I have some questions about it:
1) Why do you use "identity = x" in your code? Is not it dangerous as identity and x in fact share the same memory after that? Do any reasons exists for not using " identity = x.clone() " ?
2) Don't you try to use the shortcut " x += identity " after the non-linearity ? I've read the article and can't understand exactly when the authors apply it: before or after, but for me it seems more reasonable to put it after ReLU following the equation H(x) = F(x) + x in the article. I've also read the PyTorch implementation of resnet model and understand that the scheme of your implementation is taken from there, so maybe you can explain me why it is more proper to do it in this way?
My English is far away from fluent so I want to say that I don't mean to be rude in any point.
@AladdinPersson 4 роки тому ⁺⁹
Thanks for the comment and questions! I'll try my best to answer them.
For your first question I do think you're correct to be cautious of doing these operations, dealing with pointers in general can be quite tricky and in this case I'm uncertain as well. I tried a few examples just to see what it does. If we would have
a = torch.zeros(5)
b = a
c = a.clone()
a[0] = 1
print(a)
print(b)
print(c)
then it could cause issues if we would believe that b is a copy of a rather than pointing to the same memory. But if we change the shape of x by doing something like
a = torch.zeros(5)
b = a
c = a.clone()
a[0] = 1
a = torch.cat((a, torch.tensor([10.])), 0)
print(a)
print(b)
print(c)
They will no longer point to the same, and I guess this similar to this case because of the conv layers etc that are changing the shape. When I try and train the network using x.clone() or simply using x I obtain the same results. I do think you bring up a good point and it is more clear to use .clone(), in pytorch own implementation they use two different variables x and out to be clear and avoid this issue you bring up, and I will change the implementation on Github to use x.clone() instead.
For your second question, in the paper they use the equation
y = F(x) + x
where F(x) is the residual block and x is the identity. After this they say they apply the non linearity on y, which is what we're doing in the code too. This is written in section 3.2 of the paper.
@КириллНикоров 4 роки тому ⁺²
@@AladdinPersson Thank you for your answer.
I also checked 3 variants: assignment, clone () and copy_(), after I wrote the comment and they really seem to be equivalent in this case in the question of memory sharing, but the question of differences of the gradients calculation for these three approaches is not fully clear for me yet.
I'm very grateful for your reference to the section. I really don't understand how I missed it. I may have been to focused on the idea that the shortcut is used to help the non-linear function to approximate the identity, so I thought we should add this identity after we got the final non-linear function, relu, of the current "block".
@nubi9315 2 роки тому ⁺¹
@@КириллНикоров here, x = self.conv1(x) is creating new item, instead of changing x, so x now pointing to new area, and what was value of x before stays where identity points. Taking care of pointer assignment is a must, though here it is all fine.
@saivenkateshchilukoti7057 3 роки тому
Hi Aladdin Persson,
Can you please share the notebook of how to implement resnet from scratch using the full pre-activation bottleneck block?
or please make a video regarding that. Thanks in advance
@jacobusjacobs76 3 роки тому
Hi. Thank you for the video. Would you please help me to understand how I would adapt the implementation to create the ResNet34 and ResNet18 models? I tried but had no success.
@kdubovetskyi 2 роки тому
Could someone explain, please
*Why the expansion is hardcoded?*
@random-drops 3 роки тому
Thanks for the tutorial. I just started to learn DL, and only recently did I come to learn this ResNet, particularly the ResNet9. I wonder how to apply this ResNet50/101/152 into training. Sorry for my dumbness.
@AladdinPersson 3 роки тому ⁺¹
There are pretrained ResNet models available through PyTorch torchvision library that I would recommend that you use. You can read more about them here: pytorch.org/docs/stable/torchvision/models.html
@deepexplorationcode9456 4 роки тому ⁺²
Thanks a lot for the great tutorial. I've now understood how to program Resnet. Do you have any program to implement one of these architectures: Resnext, DenseNet, Mask R-CNN, YOLACT++
@AladdinPersson 4 роки тому
I've got some plans for the next videos, but I'll take a look at these in the future and can make a video if I find any of them interesting :) Thanks for the suggestion!
@ashwinjayaprakash7991 3 роки тому ⁺²
I didnot understand is indentity downsample the part where we skip connections ahead
@nomad1104 3 роки тому ⁺²
"x += identity" is the skip connection part. But downsampling is required if channels of identity and x does not match. Identity is taken first "identity = x". But the output of the resnet layer, "x" will have have channels = 4 times the identity channels. So downsampling just equalizes the number of channels in order for skip connection to be possible.
@ashwinjayaprakash7991 3 роки тому ⁺¹
@@nomad1104 Thank you bro!
@science_electronique Рік тому
the final layer software or we keep fc and go forward ?
@LalitPandeyontube 2 роки тому
I am trying to prune the residual blocks such that my resnet will have 3 residual blocks.. but I keep on getting an error with mat dimensions.
@doggydoggy578 Рік тому
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@aneekaazmat6653 2 роки тому
Hello ,
Your video was very interesting for me as I am just using resnet first time. But I have a question about how we can use it for audio classification, I have to do boundary detection in music files. My mel spectograms shape is not actually same for all files it is (80,1789) , (80,3356) , and so on. means the 2nd dimension is changing at ever song. so how can I use this kind of mel spectograms for RESNET?
Can you pleas make a video for audio classification using RESNET?
@陈俊杰-q4u 4 роки тому ⁺¹
Hey! Aladdin, Nice tutorial!
I have a question: when I omit this sentence >> if stride != 1 or self.in_channels != intermediate_channels*4:
It also works, so I really don't know why add >> self.in_channels != intermediate_channels*4,
Please kindly reply to me, THX!
@AladdinPersson 4 роки тому ⁺¹
If I remember correctly I think this was needed for the ResNet models that we didn't implement. For ResNet50,101,152, we don't need this line, so it was a bit unnecessary that I included it in the video.
@陈俊杰-q4u 4 роки тому ⁺¹
@@AladdinPersson Do you mean ResNet18 or 34 need this line?
@AladdinPersson 4 роки тому
@@陈俊杰-q4u If I remember correctly, since for the resnets expect those two the intermediate_channels always expand by 4: 64 -> 256, 128->512, 256->1024 etc. I'd need to reread the paper again and check though.
@陈俊杰-q4u 4 роки тому
@@AladdinPersson Yeah! I see, but my quesition is whether you add this condition or not, these sentences inside will work.
@jawher9 4 роки тому ⁺¹
@@陈俊杰-q4u You should add the second statement because in the first Residual Layer your in_channels = 64 and after the first Residual Block they get expanded by 4 so 256 channels, however, the residual still has 64 channels therefor the shapes of the output and the residual mismatch. When you add the second statement, it gets corrected because the downsampler expands the residual channels by 4 i.e. 64*4=256.
@mayank1334 4 роки тому
Which environment are you using here?
@bradduy7329 3 роки тому
can you explain how forward function can be run without calling it?
E.g def forward():
forward()
@AladdinPersson 3 роки тому
The call method inside the parent class nn.Module calls forward()
@AbdulQayyum-kd3gf 4 роки тому
Great tutorial. How can I used different layers features from pretrained models in pytroch as a fintune?
@AladdinPersson 4 роки тому ⁺²
I actually think I've made a video to answer this question: ua-cam.com/video/qaDe0qQZ5AQ/v-deo.html. Maybe it helps you out. I think code would explain it for you better than I could in words so the code for the video can be found: github.com/AladdinPerzon/Machine-Learning-Collection/blob/804c45e83b27c59defb12f0ea5117de30fe25289/ML/Pytorch/Basics/pytorch_pretrain_finetune.py#L33-L54
@giaphattram7932 4 роки тому
Thank you, very effective tutorial. Please do Yolov3
@AladdinPersson 4 роки тому
Yolo v3 or v4?
@giaphattram7932 4 роки тому
I was not aware of v4 actually. Either would be good to learn and us viewers can practice to modify to other versions after. But I guess a tut video on v4 will stay current longer than v3 )
@rushirajparmar9602 4 роки тому ⁺¹
@@AladdinPersson V4 would be great, I guess!
@jacky2476 Місяць тому
Thx.
@Alihamza-s1d Рік тому
NotImplementedError: Module [ResNet] is missing the required "forward" function
getting this error anyone can tell about it
when i use
def test():
net = ResNet152()
x = torch.randn(2, 3, 224, 224)
y = net(x).to(device)
print(y.shape)
test(
@struggler5134 2 роки тому
thanks your from china pytorch rookie
@maomao1591 3 роки тому
Thank you for your insightful explanation. But I'm confused with a part of this condition code " if stride != 1 or self.in_channels != intermediate_channels * 4". Why there is in_channels != intermediate_channels * 4. Could you help me ,thank you.
@doggydoggy578 Рік тому
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@Idontknow-vh2hl 3 роки тому ⁺¹
how to code resnet 18 and 34 ?
@doggydoggy578 Рік тому
He answered it at the end of video. Watch carefullly before commenting.
@gokulakrishnancandassamy4995 3 роки тому
Are there cases where identity_downsample is actually None? Because at the end of every block (in each layer) we end up changing the number of channels. Could someone explain this?
@doggydoggy578 Рік тому
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@Sekharhimansu 4 роки тому
where u have given test ..at that place how I can train this model on coco dataset..can u help me out??
@AladdinPersson 4 роки тому
I'm not entirely sure what you mean by test in this scenario and training the model on the COCO dataset (can be used for object detection, caption etc) will depend on your use case. In the video we built the ResNet model for classification and I didn't want to spend unnecessary time on setting up a training loop etc, I have other videos if you want to learn more about that
@Sekharhimansu 4 роки тому
@@AladdinPersson what i mean is as u have written this model from scratch but how to train this model on coco dataset?? I am a beginner so I am asking for the code for it...
@krisnaargadewa4376 Рік тому
why bias not TRUE?
@ArunKumar-sg6jf 2 роки тому
l learnt how u applied pading 0 and padding = 1
@unknown3158 3 роки тому
It's actually not hard to follow. I think using PyTorch makes it even easier since you get a better idea of what is going on. Btw, how did you manage to run PyTorch on Spyder? Whenever I do simply 'import torch', Spyder crashes for me, that is why I am using PyCharm with PyTorch.
@shambhaviaggarwal9977 3 роки тому
pytorch worked normally for me in pycharm but not in other editors. later I found out there were issues with the installation of pytorch. i still don't understand how did pycharm work if the installation was not proper. for me, I installed wrong version of cuda.
@doggydoggy578 Рік тому
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@doggydoggy578 Рік тому
@@shambhaviaggarwal9977 Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@unknown3158 Рік тому
@@doggydoggy578 You are trying to multiply 2 Matrices which are not compatible, i.e., the no of cols of A should be equal to number of rows of B. Check the dimensions.
@kevinyang6815 3 роки тому
I wish I understood a single thing you did in this video
@vijaypatneedi 4 роки тому
Whats the best resources to learn pytorch ?
@AladdinPersson 4 роки тому ⁺⁷
It's an interesting question, I'll try to give my answer in two parts.
First I believe the bottleneck in most cases isn't actually Pytorch but rather knowledge about machine learning / deeplearning itself. To learn the concepts I believe excellent resources are Machine Learning (great introductory course to ML on coursera) by Andrew Ng, Deeplearning Specialization also by Andrew ng. Following Cs231n, Cs224n from the online lectures and doing the assignments I think is a very efficient way to learn. After that I think like I am currently doing reading research papers, implementing those research papers and doing projects are ways to further develop.
Now for the part of learning pytorch specifically I think reading the pytorch tutorials pytorch.org/tutorials/ is great, reading other peoples code/watching others code stuff (like I am doing in these videos) can be beneficial and reading old posts on pytorch forums is also beneficial. Most importantly I think it's about getting started coding in Pytorch. Remember I'm still learning a lot and don't consider myself having "learned" Pytorch, but those are my thoughts on your question currently. Hope that answers your question at least somewhat :)
@vijaypatneedi 4 роки тому
@@AladdinPersson Thanks for a detailed response, do you think for a beginner it's better to stick with pytorch, than implementing in tensorflow keras as well? Which of them gives a good learning curve and strengthen the underlying concepts?
How important is it to implement code from scratch vs transfer learning or using API calls
@AladdinPersson 4 роки тому ⁺⁴
I don't think it matters too much. Pick either and just stick with it, I wouldn't implenent everything in both. It seems to be the case that Pytorch allows for faster iterations and researchers tend to prefer it meanwhile tf is used for production. I like the Python way of coding so Pytorch is a natural choice, it's a very natural extension to normal Python.
I think it's useful to read papers, understanding what they've done and implementing it. This is more practice getting into that mindset than the usefulness of implementing the model from scratch if you understand what I mean
@vijaypatneedi 4 роки тому
@@AladdinPersson If you have time consider making a video about how you started to learn deep learning architectures and how do you do it on daily basis...
And a few tips/suggestions for beginners. Because you explain things so beautifully ❤️
@mohit723 4 роки тому
please implement ResNeSt... pleaseeee
@Ssc2969 Рік тому
Hi, thanks a lot for this tutorial. This code is extremely helpful. If I use part of this code in my project and cite your GitHub link if my paper gets published, would that be, okay? Please let me know. Thanks!
@shivanshutyagi232 4 роки тому
did u use manim??
@AladdinPersson 4 роки тому ⁺¹
Yeah for the intro! :)
@shivanshutyagi232 4 роки тому
@@AladdinPersson awsome, well thanks for the tutorial. It's pretty helpful. :)
@AladdinPersson 4 роки тому
I really appreciate the kind feedback
@OlegKorsak 2 роки тому
u can do super().__init__()
@doggydoggy578 Рік тому
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@minhajuddinansari561 Рік тому
In the condition:
if stride != 1 or self.in_channels != out_channels*4
shouldn't it instead be self.out_channels != in_channels*4
EDIT: Oh you clarified that out_channels is out_channels * expansion
@doggydoggy578 Рік тому
Omg I don't know what happens but no what why I try, the code return the same error :
in forward(self, x)
33 print(x.shape,identity.shape)
34 print('is identity_downsamples none ?', self.identity_downsamples==None)
---> 35 x += identity
36 x = self.relu(x)
37
RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1
Help please
I have re check my code multiple times and make sure it is exactly as yours but to no avail I can't make it to work. :( I run on Colab btw
@soumyagupta3924 Рік тому ⁺¹
I am also getting same error
@activision4170 7 місяців тому
I had the same error. The shape of the identity is not the same as x.
You probably made a typo in the init function of the class block.
Make sure all the parameters are the same. In my case, I accidently put padding=1 instead of padding=0 in conv3; which caused the output size to be different.
@doggydoggy578 7 місяців тому
@@activision4170 thanks bro

Наступне

Автоматичне відтворення