Honestly I've put this video aside for a while because it was 30 minutes long but it didn't even feel like 30 minutes now that I've watched it. I now understand the architecture really well. Thank you!
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
thanks for the video, a minor change that improves your code, is to implement residual mapping inside the class block. If you look at figure 2 from the paper, the definition of a block uses the mapping. Here you have put the mapping as part of the design of the network. This suggests that you a re very experienced with networks without skip connections and just changed them in your imagination rather than defining a block. :), still works, I know.
I love the idea of residual layers. Not taking math into account, on a higher level it intuitively seems useful, because with usual layers, the low-level information gets lost from layer to layer. But with skip-connections, we keep track of lower-level information, sort of. Unfortunately, I can't now remember the IRL-example to depict this, but in general it is the same: while constructing something high-level, we don't only need to see what we have at this high-level, but also need to keep track of some lower-level steps we're performing.
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
I have watched only once but you explained really well. Right now working on some assignment hope this could help me. Thanks man. You lift my hopes on this Resnet. Thanks keep sharing knowledge.
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
Thanks dougy I like your style ; ) Yeah ResNet was definitely the hardest one. Initially I thought Inception would be the hardest because I felt it was conceptually more difficult whereas the idea behind ResNet is super simple. But the implementation was totally the opposite
@@AladdinPersson Haha! Looks like Aladdin was waiting for the moment to go on full about how much time he spent figuring out the implementation of ResNet. By the way, great video as always.
Hello Aladdin, Great Videos. To appreciate your efforts and encourage you to make more! joined your Community. I was implementing this video and got stuck on making sense of the identity_downsample. I would really appreciate if you could spare some information on what exactly is the role of an identity_downsample.
My understanding is in residual networks with skip connections, the output is f(x) + x . We want the f(x) and x to be of the same dimensions. So to do that, we use an identity downsample on x to make sure they [f(x) and x] are of the same size?
Appreciate the support 👊 Yeah you're exactly right, when running x through f(x) the shapes might not match in order to do the addition and we might need to modify it which we do through the identity_downsample function. I think coding ResNets could be done in a more clear way and I might revisit this implementation if there's a better way of implementing it
“x += identity” is the skip connection. “identity” is set to the input at the top of the function, then added to the output at the end, thus skipping all the calculations in the middle.
For the Stride part for down-sampling in each layer, in the paper, it is written to down-sample at conv3_1, conv4_1 and conv5_1. If I understand your code correctly does it mean that there are conv3_0, conv4_0, and conv5_0 and hence stride of 2 is applied to the second block in each layer?
I have not heard of U-Net before so I havn't unfortunately. Seems like an interesting architecture from reading the paper abstract. I'll add it to the list and if I got time I can do it :)
Hi Aladdin! Thanks so much for a great content. I had a quick question at aroud 3:50 (calculating the padding). I'm looking at this formula [(W−K+2P)/S]+1 that people often use to calculate the output size, and tried letting W = 7, K = 3, S = 2 etc, but I just don't see how a P=3 would get us an output of 112. How can I calculate/estimate padding sizes from input and output sizes (+ kernel sizes, steps, etc)?
Padding gets ceiling functioned from 2.5 to 3! this is the case with most of the odd numbered kernels :p, I believe caffe used to round down from 2.5 to 2 back in the day
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
Sry probably for stupid question, but dont we need to pass stride as parameter in `class.block.conv1` and set padding to 1, and `block.conv2` to stride=1 and padding to 0 instead? Or am I missing something from original paper?
Thank you for tutorial. I have some questions about it: 1) Why do you use "identity = x" in your code? Is not it dangerous as identity and x in fact share the same memory after that? Do any reasons exists for not using " identity = x.clone() " ? 2) Don't you try to use the shortcut " x += identity " after the non-linearity ? I've read the article and can't understand exactly when the authors apply it: before or after, but for me it seems more reasonable to put it after ReLU following the equation H(x) = F(x) + x in the article. I've also read the PyTorch implementation of resnet model and understand that the scheme of your implementation is taken from there, so maybe you can explain me why it is more proper to do it in this way? My English is far away from fluent so I want to say that I don't mean to be rude in any point.
Thanks for the comment and questions! I'll try my best to answer them. For your first question I do think you're correct to be cautious of doing these operations, dealing with pointers in general can be quite tricky and in this case I'm uncertain as well. I tried a few examples just to see what it does. If we would have a = torch.zeros(5) b = a c = a.clone() a[0] = 1 print(a) print(b) print(c) then it could cause issues if we would believe that b is a copy of a rather than pointing to the same memory. But if we change the shape of x by doing something like a = torch.zeros(5) b = a c = a.clone() a[0] = 1 a = torch.cat((a, torch.tensor([10.])), 0) print(a) print(b) print(c) They will no longer point to the same, and I guess this similar to this case because of the conv layers etc that are changing the shape. When I try and train the network using x.clone() or simply using x I obtain the same results. I do think you bring up a good point and it is more clear to use .clone(), in pytorch own implementation they use two different variables x and out to be clear and avoid this issue you bring up, and I will change the implementation on Github to use x.clone() instead. For your second question, in the paper they use the equation y = F(x) + x where F(x) is the residual block and x is the identity. After this they say they apply the non linearity on y, which is what we're doing in the code too. This is written in section 3.2 of the paper.
@@AladdinPersson Thank you for your answer. I also checked 3 variants: assignment, clone () and copy_(), after I wrote the comment and they really seem to be equivalent in this case in the question of memory sharing, but the question of differences of the gradients calculation for these three approaches is not fully clear for me yet. I'm very grateful for your reference to the section. I really don't understand how I missed it. I may have been to focused on the idea that the shortcut is used to help the non-linear function to approximate the identity, so I thought we should add this identity after we got the final non-linear function, relu, of the current "block".
@@КириллНикоров here, x = self.conv1(x) is creating new item, instead of changing x, so x now pointing to new area, and what was value of x before stays where identity points. Taking care of pointer assignment is a must, though here it is all fine.
Hi Aladdin Persson, Can you please share the notebook of how to implement resnet from scratch using the full pre-activation bottleneck block? or please make a video regarding that. Thanks in advance
Hi. Thank you for the video. Would you please help me to understand how I would adapt the implementation to create the ResNet34 and ResNet18 models? I tried but had no success.
Thanks for the tutorial. I just started to learn DL, and only recently did I come to learn this ResNet, particularly the ResNet9. I wonder how to apply this ResNet50/101/152 into training. Sorry for my dumbness.
There are pretrained ResNet models available through PyTorch torchvision library that I would recommend that you use. You can read more about them here: pytorch.org/docs/stable/torchvision/models.html
Thanks a lot for the great tutorial. I've now understood how to program Resnet. Do you have any program to implement one of these architectures: Resnext, DenseNet, Mask R-CNN, YOLACT++
I've got some plans for the next videos, but I'll take a look at these in the future and can make a video if I find any of them interesting :) Thanks for the suggestion!
"x += identity" is the skip connection part. But downsampling is required if channels of identity and x does not match. Identity is taken first "identity = x". But the output of the resnet layer, "x" will have have channels = 4 times the identity channels. So downsampling just equalizes the number of channels in order for skip connection to be possible.
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
Hello , Your video was very interesting for me as I am just using resnet first time. But I have a question about how we can use it for audio classification, I have to do boundary detection in music files. My mel spectograms shape is not actually same for all files it is (80,1789) , (80,3356) , and so on. means the 2nd dimension is changing at ever song. so how can I use this kind of mel spectograms for RESNET? Can you pleas make a video for audio classification using RESNET?
Hey! Aladdin, Nice tutorial! I have a question: when I omit this sentence >> if stride != 1 or self.in_channels != intermediate_channels*4: It also works, so I really don't know why add >> self.in_channels != intermediate_channels*4, Please kindly reply to me, THX!
If I remember correctly I think this was needed for the ResNet models that we didn't implement. For ResNet50,101,152, we don't need this line, so it was a bit unnecessary that I included it in the video.
@@陈俊杰-q4u If I remember correctly, since for the resnets expect those two the intermediate_channels always expand by 4: 64 -> 256, 128->512, 256->1024 etc. I'd need to reread the paper again and check though.
@@陈俊杰-q4u You should add the second statement because in the first Residual Layer your in_channels = 64 and after the first Residual Block they get expanded by 4 so 256 channels, however, the residual still has 64 channels therefor the shapes of the output and the residual mismatch. When you add the second statement, it gets corrected because the downsampler expands the residual channels by 4 i.e. 64*4=256.
I actually think I've made a video to answer this question: ua-cam.com/video/qaDe0qQZ5AQ/v-deo.html. Maybe it helps you out. I think code would explain it for you better than I could in words so the code for the video can be found: github.com/AladdinPerzon/Machine-Learning-Collection/blob/804c45e83b27c59defb12f0ea5117de30fe25289/ML/Pytorch/Basics/pytorch_pretrain_finetune.py#L33-L54
I was not aware of v4 actually. Either would be good to learn and us viewers can practice to modify to other versions after. But I guess a tut video on v4 will stay current longer than v3 )
NotImplementedError: Module [ResNet] is missing the required "forward" function getting this error anyone can tell about it when i use def test(): net = ResNet152() x = torch.randn(2, 3, 224, 224) y = net(x).to(device) print(y.shape) test(
Thank you for your insightful explanation. But I'm confused with a part of this condition code " if stride != 1 or self.in_channels != intermediate_channels * 4". Why there is in_channels != intermediate_channels * 4. Could you help me ,thank you.
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
Are there cases where identity_downsample is actually None? Because at the end of every block (in each layer) we end up changing the number of channels. Could someone explain this?
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
I'm not entirely sure what you mean by test in this scenario and training the model on the COCO dataset (can be used for object detection, caption etc) will depend on your use case. In the video we built the ResNet model for classification and I didn't want to spend unnecessary time on setting up a training loop etc, I have other videos if you want to learn more about that
@@AladdinPersson what i mean is as u have written this model from scratch but how to train this model on coco dataset?? I am a beginner so I am asking for the code for it...
It's actually not hard to follow. I think using PyTorch makes it even easier since you get a better idea of what is going on. Btw, how did you manage to run PyTorch on Spyder? Whenever I do simply 'import torch', Spyder crashes for me, that is why I am using PyCharm with PyTorch.
pytorch worked normally for me in pycharm but not in other editors. later I found out there were issues with the installation of pytorch. i still don't understand how did pycharm work if the installation was not proper. for me, I installed wrong version of cuda.
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@@shambhaviaggarwal9977 Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@@doggydoggy578 You are trying to multiply 2 Matrices which are not compatible, i.e., the no of cols of A should be equal to number of rows of B. Check the dimensions.
It's an interesting question, I'll try to give my answer in two parts. First I believe the bottleneck in most cases isn't actually Pytorch but rather knowledge about machine learning / deeplearning itself. To learn the concepts I believe excellent resources are Machine Learning (great introductory course to ML on coursera) by Andrew Ng, Deeplearning Specialization also by Andrew ng. Following Cs231n, Cs224n from the online lectures and doing the assignments I think is a very efficient way to learn. After that I think like I am currently doing reading research papers, implementing those research papers and doing projects are ways to further develop. Now for the part of learning pytorch specifically I think reading the pytorch tutorials pytorch.org/tutorials/ is great, reading other peoples code/watching others code stuff (like I am doing in these videos) can be beneficial and reading old posts on pytorch forums is also beneficial. Most importantly I think it's about getting started coding in Pytorch. Remember I'm still learning a lot and don't consider myself having "learned" Pytorch, but those are my thoughts on your question currently. Hope that answers your question at least somewhat :)
@@AladdinPersson Thanks for a detailed response, do you think for a beginner it's better to stick with pytorch, than implementing in tensorflow keras as well? Which of them gives a good learning curve and strengthen the underlying concepts? How important is it to implement code from scratch vs transfer learning or using API calls
I don't think it matters too much. Pick either and just stick with it, I wouldn't implenent everything in both. It seems to be the case that Pytorch allows for faster iterations and researchers tend to prefer it meanwhile tf is used for production. I like the Python way of coding so Pytorch is a natural choice, it's a very natural extension to normal Python. I think it's useful to read papers, understanding what they've done and implementing it. This is more practice getting into that mindset than the usefulness of implementing the model from scratch if you understand what I mean
@@AladdinPersson If you have time consider making a video about how you started to learn deep learning architectures and how do you do it on daily basis... And a few tips/suggestions for beginners. Because you explain things so beautifully ❤️
Hi, thanks a lot for this tutorial. This code is extremely helpful. If I use part of this code in my project and cite your GitHub link if my paper gets published, would that be, okay? Please let me know. Thanks!
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
In the condition: if stride != 1 or self.in_channels != out_channels*4 shouldn't it instead be self.out_channels != in_channels*4 EDIT: Oh you clarified that out_channels is out_channels * expansion
Omg I don't know what happens but no what why I try, the code return the same error : in forward(self, x) 33 print(x.shape,identity.shape) 34 print('is identity_downsamples none ?', self.identity_downsamples==None) ---> 35 x += identity 36 x = self.relu(x) 37 RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1 Help please I have re check my code multiple times and make sure it is exactly as yours but to no avail I can't make it to work. :( I run on Colab btw
I had the same error. The shape of the identity is not the same as x. You probably made a typo in the init function of the class block. Make sure all the parameters are the same. In my case, I accidently put padding=1 instead of padding=0 in conv3; which caused the output size to be different.
Honestly I've put this video aside for a while because it was 30 minutes long but it didn't even feel like 30 minutes now that I've watched it. I now understand the architecture really well. Thank you!
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@@doggydoggy578 It probably means you have a dimensional mismatch in some layers maybe the identity mapping ?
thanks for the video, a minor change that improves your code, is to implement residual mapping inside the class block. If you look at figure 2 from the paper, the definition of a block uses the mapping. Here you have put the mapping as part of the design of the network. This suggests that you a re very experienced with networks without skip connections and just changed them in your imagination rather than defining a block. :), still works, I know.
Interesting viewpoint, am I understanding you correctly in that you would rather have the identity_downsample inside the init of the class block?
Hi, yes. so it moves from _make_layer to inside __init__ of block, but carefully
very useful for beginning researchers who don`t know how to implement papers work!
Thank you for tutorial. You're a real mad lad for this.
Thank for this tutorial. Need tutorial for EfficientNet
Noted!
I love the idea of residual layers. Not taking math into account, on a higher level it intuitively seems useful, because with usual layers, the low-level information gets lost from layer to layer. But with skip-connections, we keep track of lower-level information, sort of. Unfortunately, I can't now remember the IRL-example to depict this, but in general it is the same: while constructing something high-level, we don't only need to see what we have at this high-level, but also need to keep track of some lower-level steps we're performing.
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
The best ResNet tutorial ever, Thank you !!! if possible, please help us to make the tutorial about Siamese Network
I have watched only once but you explained really well. Right now working on some assignment hope this could help me. Thanks man. You lift my hopes on this Resnet. Thanks keep sharing knowledge.
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
Thank you for the tutorial series, it's been great so far. I gotta say, ResNet implementation is trickier than it looks haha
Thanks dougy I like your style ; ) Yeah ResNet was definitely the hardest one. Initially I thought Inception would be the hardest because I felt it was conceptually more difficult whereas the idea behind ResNet is super simple. But the implementation was totally the opposite
@@AladdinPersson Haha! Looks like Aladdin was waiting for the moment to go on full about how much time he spent figuring out the implementation of ResNet. By the way, great video as always.
This is my favorite series
like the previous comment said, please do an EfficientNet from Scratch
Will look into that!
Sir, please suggest how anyone can reach at your coding level. Just how you have done the coding of ResNet netwrok is mindblowing!!!
Will give this a look.
Hello Aladdin,
Great Videos. To appreciate your efforts and encourage you to make more! joined your Community. I was implementing this video and got stuck on making sense of the identity_downsample. I would really appreciate if you could spare some information on what exactly is the role of an identity_downsample.
My understanding is in residual networks with skip connections, the output is f(x) + x . We want the f(x) and x to be of the same dimensions. So to do that, we use an identity downsample on x to make sure they [f(x) and x] are of the same size?
Appreciate the support 👊 Yeah you're exactly right, when running x through f(x) the shapes might not match in order to do the addition and we might need to modify it which we do through the identity_downsample function. I think coding ResNets could be done in a more clear way and I might revisit this implementation if there's a better way of implementing it
@@AladdinPersson thanks for coming back! You deserve all the support. Sure looking forward to see a newer implementation if you are going for it!
awesome ..can you make a video on ensembling please
Hello Aladdin , Can you please make video explaining the concept of _make_layer function. It is really confusing.
I didn't understand where did you implement the skip connection?
“x += identity” is the skip connection. “identity” is set to the input at the top of the function, then added to the output at the end, thus skipping all the calculations in the middle.
i expect the whole reimplement that includes the dataset preprocessing, training code, visualization and so on; is there any of these videoes?
Exactly
@Aladdin Persson Could you also make a hands-on coding video of Efficient net
For the Stride part for down-sampling in each layer, in the paper, it is written to down-sample at conv3_1, conv4_1 and conv5_1. If I understand your code correctly does it mean that there are conv3_0, conv4_0, and conv5_0 and hence stride of 2 is applied to the second block in each layer?
Thank you for your tutoring Aladdin. Since block is not saved in ResNet Class. Can we delete block argument from ResNet __init__ and makerlayer.
Thanks for this. Very helpful :)
Can you do for us an implementation of ensemble model of resnets and densenets?
Are you looking for how the training structure would look like when we are training an ensemble of models?
Thanks for Tutorial, Do you have any program to implement U-Net?
I have not heard of U-Net before so I havn't unfortunately. Seems like an interesting architecture from reading the paper abstract. I'll add it to the list and if I got time I can do it :)
@@AladdinPersson +1 for an implementation of U-Net.
+1 for UNet
Hi Aladdin! Thanks so much for a great content. I had a quick question at aroud 3:50 (calculating the padding). I'm looking at this formula [(W−K+2P)/S]+1 that people often use to calculate the output size, and tried letting W = 7, K = 3, S = 2 etc, but I just don't see how a P=3 would get us an output of 112. How can I calculate/estimate padding sizes from input and output sizes (+ kernel sizes, steps, etc)?
Padding gets ceiling functioned from 2.5 to 3! this is the case with most of the odd numbered kernels :p, I believe caffe used to round down from 2.5 to 2 back in the day
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
Can you make a video on implementing mask rcnn from scratch :)
Excellent work!! Super fun
Sry probably for stupid question, but dont we need to pass stride as parameter in `class.block.conv1` and set padding to 1, and `block.conv2` to stride=1 and padding to 0 instead? Or am I missing something from original paper?
Thanks sir you are so kind
Any chance you could create a Tensorflow version of these advance network implementations? Many thanks, super useful videos.
Thank you for tutorial. I have some questions about it:
1) Why do you use "identity = x" in your code? Is not it dangerous as identity and x in fact share the same memory after that? Do any reasons exists for not using " identity = x.clone() " ?
2) Don't you try to use the shortcut " x += identity " after the non-linearity ? I've read the article and can't understand exactly when the authors apply it: before or after, but for me it seems more reasonable to put it after ReLU following the equation H(x) = F(x) + x in the article. I've also read the PyTorch implementation of resnet model and understand that the scheme of your implementation is taken from there, so maybe you can explain me why it is more proper to do it in this way?
My English is far away from fluent so I want to say that I don't mean to be rude in any point.
Thanks for the comment and questions! I'll try my best to answer them.
For your first question I do think you're correct to be cautious of doing these operations, dealing with pointers in general can be quite tricky and in this case I'm uncertain as well. I tried a few examples just to see what it does. If we would have
a = torch.zeros(5)
b = a
c = a.clone()
a[0] = 1
print(a)
print(b)
print(c)
then it could cause issues if we would believe that b is a copy of a rather than pointing to the same memory. But if we change the shape of x by doing something like
a = torch.zeros(5)
b = a
c = a.clone()
a[0] = 1
a = torch.cat((a, torch.tensor([10.])), 0)
print(a)
print(b)
print(c)
They will no longer point to the same, and I guess this similar to this case because of the conv layers etc that are changing the shape. When I try and train the network using x.clone() or simply using x I obtain the same results. I do think you bring up a good point and it is more clear to use .clone(), in pytorch own implementation they use two different variables x and out to be clear and avoid this issue you bring up, and I will change the implementation on Github to use x.clone() instead.
For your second question, in the paper they use the equation
y = F(x) + x
where F(x) is the residual block and x is the identity. After this they say they apply the non linearity on y, which is what we're doing in the code too. This is written in section 3.2 of the paper.
@@AladdinPersson Thank you for your answer.
I also checked 3 variants: assignment, clone () and copy_(), after I wrote the comment and they really seem to be equivalent in this case in the question of memory sharing, but the question of differences of the gradients calculation for these three approaches is not fully clear for me yet.
I'm very grateful for your reference to the section. I really don't understand how I missed it. I may have been to focused on the idea that the shortcut is used to help the non-linear function to approximate the identity, so I thought we should add this identity after we got the final non-linear function, relu, of the current "block".
@@КириллНикоров here, x = self.conv1(x) is creating new item, instead of changing x, so x now pointing to new area, and what was value of x before stays where identity points. Taking care of pointer assignment is a must, though here it is all fine.
Hi Aladdin Persson,
Can you please share the notebook of how to implement resnet from scratch using the full pre-activation bottleneck block?
or please make a video regarding that. Thanks in advance
Hi. Thank you for the video. Would you please help me to understand how I would adapt the implementation to create the ResNet34 and ResNet18 models? I tried but had no success.
Could someone explain, please
*Why the expansion is hardcoded?*
Thanks for the tutorial. I just started to learn DL, and only recently did I come to learn this ResNet, particularly the ResNet9. I wonder how to apply this ResNet50/101/152 into training. Sorry for my dumbness.
There are pretrained ResNet models available through PyTorch torchvision library that I would recommend that you use. You can read more about them here: pytorch.org/docs/stable/torchvision/models.html
Thanks a lot for the great tutorial. I've now understood how to program Resnet. Do you have any program to implement one of these architectures: Resnext, DenseNet, Mask R-CNN, YOLACT++
I've got some plans for the next videos, but I'll take a look at these in the future and can make a video if I find any of them interesting :) Thanks for the suggestion!
I didnot understand is indentity downsample the part where we skip connections ahead
"x += identity" is the skip connection part. But downsampling is required if channels of identity and x does not match. Identity is taken first "identity = x". But the output of the resnet layer, "x" will have have channels = 4 times the identity channels. So downsampling just equalizes the number of channels in order for skip connection to be possible.
@@nomad1104 Thank you bro!
the final layer software or we keep fc and go forward ?
I am trying to prune the residual blocks such that my resnet will have 3 residual blocks.. but I keep on getting an error with mat dimensions.
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
Hello ,
Your video was very interesting for me as I am just using resnet first time. But I have a question about how we can use it for audio classification, I have to do boundary detection in music files. My mel spectograms shape is not actually same for all files it is (80,1789) , (80,3356) , and so on. means the 2nd dimension is changing at ever song. so how can I use this kind of mel spectograms for RESNET?
Can you pleas make a video for audio classification using RESNET?
Hey! Aladdin, Nice tutorial!
I have a question: when I omit this sentence >> if stride != 1 or self.in_channels != intermediate_channels*4:
It also works, so I really don't know why add >> self.in_channels != intermediate_channels*4,
Please kindly reply to me, THX!
If I remember correctly I think this was needed for the ResNet models that we didn't implement. For ResNet50,101,152, we don't need this line, so it was a bit unnecessary that I included it in the video.
@@AladdinPersson Do you mean ResNet18 or 34 need this line?
@@陈俊杰-q4u If I remember correctly, since for the resnets expect those two the intermediate_channels always expand by 4: 64 -> 256, 128->512, 256->1024 etc. I'd need to reread the paper again and check though.
@@AladdinPersson Yeah! I see, but my quesition is whether you add this condition or not, these sentences inside will work.
@@陈俊杰-q4u You should add the second statement because in the first Residual Layer your in_channels = 64 and after the first Residual Block they get expanded by 4 so 256 channels, however, the residual still has 64 channels therefor the shapes of the output and the residual mismatch. When you add the second statement, it gets corrected because the downsampler expands the residual channels by 4 i.e. 64*4=256.
Which environment are you using here?
can you explain how forward function can be run without calling it?
E.g def forward():
forward()
The call method inside the parent class nn.Module calls forward()
Great tutorial. How can I used different layers features from pretrained models in pytroch as a fintune?
I actually think I've made a video to answer this question: ua-cam.com/video/qaDe0qQZ5AQ/v-deo.html. Maybe it helps you out. I think code would explain it for you better than I could in words so the code for the video can be found: github.com/AladdinPerzon/Machine-Learning-Collection/blob/804c45e83b27c59defb12f0ea5117de30fe25289/ML/Pytorch/Basics/pytorch_pretrain_finetune.py#L33-L54
Thank you, very effective tutorial. Please do Yolov3
Yolo v3 or v4?
I was not aware of v4 actually. Either would be good to learn and us viewers can practice to modify to other versions after. But I guess a tut video on v4 will stay current longer than v3 )
@@AladdinPersson V4 would be great, I guess!
Thx.
NotImplementedError: Module [ResNet] is missing the required "forward" function
getting this error anyone can tell about it
when i use
def test():
net = ResNet152()
x = torch.randn(2, 3, 224, 224)
y = net(x).to(device)
print(y.shape)
test(
thanks your from china pytorch rookie
Thank you for your insightful explanation. But I'm confused with a part of this condition code " if stride != 1 or self.in_channels != intermediate_channels * 4". Why there is in_channels != intermediate_channels * 4. Could you help me ,thank you.
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
how to code resnet 18 and 34 ?
He answered it at the end of video. Watch carefullly before commenting.
Are there cases where identity_downsample is actually None? Because at the end of every block (in each layer) we end up changing the number of channels. Could someone explain this?
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
where u have given test ..at that place how I can train this model on coco dataset..can u help me out??
I'm not entirely sure what you mean by test in this scenario and training the model on the COCO dataset (can be used for object detection, caption etc) will depend on your use case. In the video we built the ResNet model for classification and I didn't want to spend unnecessary time on setting up a training loop etc, I have other videos if you want to learn more about that
@@AladdinPersson what i mean is as u have written this model from scratch but how to train this model on coco dataset?? I am a beginner so I am asking for the code for it...
why bias not TRUE?
l learnt how u applied pading 0 and padding = 1
It's actually not hard to follow. I think using PyTorch makes it even easier since you get a better idea of what is going on. Btw, how did you manage to run PyTorch on Spyder? Whenever I do simply 'import torch', Spyder crashes for me, that is why I am using PyCharm with PyTorch.
pytorch worked normally for me in pycharm but not in other editors. later I found out there were issues with the installation of pytorch. i still don't understand how did pycharm work if the installation was not proper. for me, I installed wrong version of cuda.
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@@shambhaviaggarwal9977 Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
@@doggydoggy578 You are trying to multiply 2 Matrices which are not compatible, i.e., the no of cols of A should be equal to number of rows of B. Check the dimensions.
I wish I understood a single thing you did in this video
Whats the best resources to learn pytorch ?
It's an interesting question, I'll try to give my answer in two parts.
First I believe the bottleneck in most cases isn't actually Pytorch but rather knowledge about machine learning / deeplearning itself. To learn the concepts I believe excellent resources are Machine Learning (great introductory course to ML on coursera) by Andrew Ng, Deeplearning Specialization also by Andrew ng. Following Cs231n, Cs224n from the online lectures and doing the assignments I think is a very efficient way to learn. After that I think like I am currently doing reading research papers, implementing those research papers and doing projects are ways to further develop.
Now for the part of learning pytorch specifically I think reading the pytorch tutorials pytorch.org/tutorials/ is great, reading other peoples code/watching others code stuff (like I am doing in these videos) can be beneficial and reading old posts on pytorch forums is also beneficial. Most importantly I think it's about getting started coding in Pytorch. Remember I'm still learning a lot and don't consider myself having "learned" Pytorch, but those are my thoughts on your question currently. Hope that answers your question at least somewhat :)
@@AladdinPersson Thanks for a detailed response, do you think for a beginner it's better to stick with pytorch, than implementing in tensorflow keras as well? Which of them gives a good learning curve and strengthen the underlying concepts?
How important is it to implement code from scratch vs transfer learning or using API calls
I don't think it matters too much. Pick either and just stick with it, I wouldn't implenent everything in both. It seems to be the case that Pytorch allows for faster iterations and researchers tend to prefer it meanwhile tf is used for production. I like the Python way of coding so Pytorch is a natural choice, it's a very natural extension to normal Python.
I think it's useful to read papers, understanding what they've done and implementing it. This is more practice getting into that mindset than the usefulness of implementing the model from scratch if you understand what I mean
@@AladdinPersson If you have time consider making a video about how you started to learn deep learning architectures and how do you do it on daily basis...
And a few tips/suggestions for beginners. Because you explain things so beautifully ❤️
please implement ResNeSt... pleaseeee
Hi, thanks a lot for this tutorial. This code is extremely helpful. If I use part of this code in my project and cite your GitHub link if my paper gets published, would that be, okay? Please let me know. Thanks!
did u use manim??
Yeah for the intro! :)
@@AladdinPersson awsome, well thanks for the tutorial. It's pretty helpful. :)
I really appreciate the kind feedback
u can do super().__init__()
Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?
In the condition:
if stride != 1 or self.in_channels != out_channels*4
shouldn't it instead be self.out_channels != in_channels*4
EDIT: Oh you clarified that out_channels is out_channels * expansion
Omg I don't know what happens but no what why I try, the code return the same error :
in forward(self, x)
33 print(x.shape,identity.shape)
34 print('is identity_downsamples none ?', self.identity_downsamples==None)
---> 35 x += identity
36 x = self.relu(x)
37
RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1
Help please
I have re check my code multiple times and make sure it is exactly as yours but to no avail I can't make it to work. :( I run on Colab btw
I am also getting same error
I had the same error. The shape of the identity is not the same as x.
You probably made a typo in the init function of the class block.
Make sure all the parameters are the same. In my case, I accidently put padding=1 instead of padding=0 in conv3; which caused the output size to be different.
@@activision4170 thanks bro