If you want more theory on how CNN works (which I highly recommend) then I think the following lecture is great: ua-cam.com/video/LxfUGhug-iQ/v-deo.html. This video assumes you want to know how to implement these things in code using Pytorch. But for those of you who are beginners to machine learning/deep learning and want some direction to how to go about and learn these things then two great ones that I started with is the ML course and DL specialization both by Andrew Ng. Below you'll find both affiliate and non-affiliate links if you want to check it out. The pricing for you is the same but a small commission goes back to the channel if you buy it through the affiliate link. ML Course (affiliate): bit.ly/3qq20Sx DL Specialization (affiliate): bit.ly/30npNrw ML Course (no affiliate): bit.ly/3t8JqA9 DL Specialization (no affiliate): bit.ly/3t8JqA9 Just a note is that ML course is free (only costs for a certificate) and the DL specialization lectures is also available on the deeplearning.ai youtube channel for free.
Hi, can you make a video regarding making custom layers in pytorch which can be integrated into the CNN model. The layer can have parameters(trainable) or without any parameters. This would help a lot in building models with a different intuition.
Hey aladdin, Loved this tutorial. Doubt: In line 36 [7:00], why didn't you go for flatten instead of reshape. I mean flatten also does the same thing right? And line 56: Shouldn't we set *shuffle=False* for test_loader?
Thanks for the kind words, I think you could use either reshape or flatten in that scenario. I very rarely see people use flatten in PyTorch, but it's very common in TensorFlow. And you're right regarding the shuffle! :)
If I increase number of epochs to 5 for fully connected neural net, it is also achieving near 97% accuracy. The CNN used here is not that much better than FF. But it was really a great video. The best for learning DL coding.
I didn't see this in the video (may have skimmed over it), but I had to also remove the reshape() from my check_accuracy function as well before I could run it.
The network is oftentimes associated with a specific dataset so in this case after the 2 max pooling we will have an output size of 7x7 and then we have 16 channels of that. The linear layers are often statically defined but oftentimes you use an adaptive average pool before the linear layer (to make sure it's always the specific output size): pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html
Hey there! First, I just want to thank you for this consistent, helpful and high-quality content. Second, don't know if it's a noob question or not, but while implementing this with a custom dataset (thanks to your other tutorial), I get an error during data loading that says that some image file pointed to by my csv file does not exist. However, when I check the root dir, the file is definitely there... Plus, at different run times, the file that can't be found is never the same! I didn't get this error when feeding the same images and csv file to googlenet. Can you think of any reason why I'm getting this? Thanks again!
You are reusing pool layer right? So you are visiting same layer twice in each forward propagation and updating the same weights. Don't you think they should be kept independent and thus two instances of pool is needed?
Thanks for the video! Can I ask about the stride? You use (1,1) but won't that just move the kernel along the diagonal and ignore most of the image, leading to a 28x28 output that's mostly 0?
a question:for the forward function in class CNN, why is it “ x = x.reshape(x.shape[0], -1) ” instead of " x = x.reshape(-1) " ? In my mind, x.shape[0] is the batchsize , but before the fc1, it should be an image , not 64 images. Thanks in advance.
Yes and good that you noticed this, I definitely should've expressed this better in the video. Cross Entropy Loss has two components in softmax and then negative log likelihood (NLLloss). So when we send the output to CrossEntropy we want the logits rather than softmaxed outputs otherwise we will do softmax on softmax :)
I didn't understand how the linear calculation was done to select the dimensions specifically the x7x7 it doesn't line up with the formula you gave or at least i don't see how. Please elaborate on this or provide the additional formula specific to linear layers. I understood how the n_out formula helps us understand how many "channels" there are after pooling is done, and how the convolution is a same convolution and i followed the calculation.
hey i there can you please tell me how did you find the flattening layer dimensions . i had worked with ears so there was a function flatten . i think pytorch does not have the function for the task of flattening conv layers . please tell me if you would have a way or how did you calculate the dimension 16*7*7
Sorry for the delayed response, what do you mean 'ears'? :) There's no flattening of conv layers expect the resulting output from a convlayer if it for example is (batch_size, 16, 7, 7) then we can flatten this tensor by reshaping the tensor to be (batch_size, 16*7*7) which I believe is what I did in the video. How you calculate the dimension will ultimately depend on the conv layers you use (there's also a formula for calculating the output size of a conv layer) and the channels that you set as output from conv layer. Calculating shapes output from a stack of convlayers can be tedious, but if you step through one conv layer at a time using the formula (google: formula conv layer) then it's relatively straightforward. Also utilizing "same convolutions" and having maxpool that divides the input by 2 (which I believe is what I used in the video also) can alleviate some of these calculations.
Thank you so much ....your politeness and urge to teach people is very good....thank you for everything on the channel...thank you so much.....will you provide your mail id ....and also what are your further plans?
That's a great question, for the in channels it's kind of decided by the dataset, for example if you have images that has colors it has 3 channels (RGB). For uncolored images it's gonna be 1 in channel. For the outchannels it's a hyperparameter and you can decide, it's different for every architecture really. But as a general rule, as you go deeper in the network and have more conv layers the number of output filters tend to increase.
@@AladdinPersson Cool thanks for the tip... Will you do keras CNN in the future? I really enjoyed this video, but would enjoy seeing Keras being used...
@Aladdin Persson first of all i wanted to say thank you for your great videos. Second, can you prepare a video for time series forecasting using Transformers?
Using CrossEntropy cost function softmax is included so then you don't want to return F.softmax(x). Since softmax is a monotone function (argmax before and after softmax doesn't change) so if you want to take the one with highest probability I would just do scores = model(input) and then scores.argmax(dim)
Thanks for your comment, sure I can do my best. I actually don't have too much experience working with Conv1d but from my understanding they're used a lot for time series data. You can read more about Conv1d on this: towardsdatascience.com/understanding-1d-and-3d-convolution-neural-network-keras-9d8f76e29610. Conv2d are obviously what we used, and are used when working with images mostly. The shapes are (mini_batch_size, channels, height, width). Add another dimension to the data, like instead of looking at a single image at a time looking at several frames (videos) so that you would have (mini_batch_size, number_frames, channels, height, width) you would use Conv3d. How I see it (this is a simplification): Conv1d: Time series data Conv2d: Images Conv3d: Videos
Ate you sure about same convolution? I think the input and output size same depends upon kernel size and also there is no option as padding same. Plz correct if m wrong
In PyTorch there's no option of setting it to be a same convolution but if the input height and width stays the same after we've sent it through a conv layer we call that a same convolution. As you say the input and output size definitely depend on kernel size, but regardless we can set the padding such that we can keep the input shape the same and then it's called a same convolution. Sorry for the very late response!
@@AladdinPersson no issue at least you did. Could you further explain this how to set padding for same. As I am doing it by setting the kernels, but ofcourse that affects the model architecture.
If you want more theory on how CNN works (which I highly recommend) then I think the following lecture is great: ua-cam.com/video/LxfUGhug-iQ/v-deo.html. This video assumes you want to know how to implement these things in code using Pytorch. But for those of you who are beginners to machine learning/deep learning and want some direction to how to go about and learn these things then two great ones that I started with is the ML course and DL specialization both by Andrew Ng.
Below you'll find both affiliate and non-affiliate links if you want to check it out. The pricing for you is the same but a small commission goes back to the channel if you buy it through the affiliate link.
ML Course (affiliate): bit.ly/3qq20Sx
DL Specialization (affiliate): bit.ly/30npNrw
ML Course (no affiliate): bit.ly/3t8JqA9
DL Specialization (no affiliate): bit.ly/3t8JqA9
Just a note is that ML course is free (only costs for a certificate) and the DL specialization lectures is also available on the deeplearning.ai youtube channel for free.
Hi, can you make a video regarding making custom layers in pytorch which can be integrated into the CNN model. The layer can have parameters(trainable) or without any parameters. This would help a lot in building models with a different intuition.
Your tutorials are really helpful. Thanks dude.
This is really great! Thank you. Can you also do one for a regression problem using 1D CNN please? Keep it up!
Hey aladdin, Loved this tutorial. Doubt: In line 36 [7:00], why didn't you go for flatten instead of reshape. I mean flatten also does the same thing right?
And line 56: Shouldn't we set *shuffle=False* for test_loader?
Thanks for the kind words, I think you could use either reshape or flatten in that scenario. I very rarely see people use flatten in PyTorch, but it's very common in TensorFlow. And you're right regarding the shuffle! :)
Nice observation Shashank. Thanks.
Great vid bud. Helps a ton.
If I increase number of epochs to 5 for fully connected neural net, it is also achieving near 97% accuracy. The CNN used here is not that much better than FF. But it was really a great video. The best for learning DL coding.
thank you, you're a life saver!!
Another great video
I didn't see this in the video (may have skimmed over it), but I had to also remove the reshape() from my check_accuracy function as well before I could run it.
9:39
Can we create the last fully connected layer inside the forward when we know the shape of X dynamically ? why do we statically define it as 16*7*7 ?
The network is oftentimes associated with a specific dataset so in this case after the 2 max pooling we will have an output size of 7x7 and then we have 16 channels of that.
The linear layers are often statically defined but oftentimes you use an adaptive average pool before the linear layer (to make sure it's always the specific output size):
pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html
Hey there! First, I just want to thank you for this consistent, helpful and high-quality content. Second, don't know if it's a noob question or not, but while implementing this with a custom dataset (thanks to your other tutorial), I get an error during data loading that says that some image file pointed to by my csv file does not exist. However, when I check the root dir, the file is definitely there... Plus, at different run times, the file that can't be found is never the same! I didn't get this error when feeding the same images and csv file to googlenet. Can you think of any reason why I'm getting this?
Thanks again!
You are reusing pool layer right? So you are visiting same layer twice in each forward propagation and updating the same weights. Don't you think they should be kept independent and thus two instances of pool is needed?
Also should not there be a softmax layer at rear given it's a classification task, other wise output vector won't sum to 1.
MaxPool2d can be reused because it doesn't have any trainable parameters.
You can use several different pool instances
thank you for the best videos and I also read the comments below you are quite helpful. Bless you
Glad it was useful :)
Thanks for the video! Can I ask about the stride? You use (1,1) but won't that just move the kernel along the diagonal and ignore most of the image, leading to a 28x28 output that's mostly 0?
a question:for the forward function in class CNN, why is it “ x = x.reshape(x.shape[0], -1) ” instead of " x = x.reshape(-1) " ? In my mind, x.shape[0] is the batchsize , but before the fc1, it should be an image , not 64 images. Thanks in advance.
Is there a reason you don't apply a softmax at the end?
Yes and good that you noticed this, I definitely should've expressed this better in the video. Cross Entropy Loss has two components in softmax and then negative log likelihood (NLLloss). So when we send the output to CrossEntropy we want the logits rather than softmaxed outputs otherwise we will do softmax on softmax :)
does reshaping, remove the image spatial information? thank you
I didn't understand how the linear calculation was done to select the dimensions specifically the x7x7 it doesn't line up with the formula you gave or at least i don't see how. Please elaborate on this or provide the additional formula specific to linear layers.
I understood how the n_out formula helps us understand how many "channels" there are after pooling is done, and how the convolution is a same convolution and i followed the calculation.
The MaxPool2d layer reduces dimentions to half twice, that's why: 28 -> 14 -> 7.
hey i there can you please tell me how did you find the flattening layer dimensions . i had worked with ears so there was a function flatten . i think pytorch does not have the function for the task of flattening conv layers . please tell me if you would have a way or how did you calculate the dimension 16*7*7
Sorry for the delayed response, what do you mean 'ears'? :) There's no flattening of conv layers expect the resulting output from a convlayer if it for example is (batch_size, 16, 7, 7) then we can flatten this tensor by reshaping the tensor to be (batch_size, 16*7*7) which I believe is what I did in the video. How you calculate the dimension will ultimately depend on the conv layers you use (there's also a formula for calculating the output size of a conv layer) and the channels that you set as output from conv layer.
Calculating shapes output from a stack of convlayers can be tedious, but if you step through one conv layer at a time using the formula (google: formula conv layer) then it's relatively straightforward. Also utilizing "same convolutions" and having maxpool that divides the input by 2 (which I believe is what I used in the video also) can alleviate some of these calculations.
Thank you so much ....your politeness and urge to teach people is very good....thank you for everything on the channel...thank you so much.....will you provide your mail id ....and also what are your further plans?
Hello Aladdin, how do you know how many in_channels and out_channels are in your convolutional neural network?
That's a great question, for the in channels it's kind of decided by the dataset, for example if you have images that has colors it has 3 channels (RGB). For uncolored images it's gonna be 1 in channel. For the outchannels it's a hyperparameter and you can decide, it's different for every architecture really. But as a general rule, as you go deeper in the network and have more conv layers the number of output filters tend to increase.
@@AladdinPersson Cool thanks for the tip... Will you do keras CNN in the future? I really enjoyed this video, but would enjoy seeing Keras being used...
@@joshlazor6208 Thanks for the suggestion, I'll think about it. Right now I feel there's a lot of things to explore in Pytorch still.
@@AladdinPersson One more thing: How do you know the number of classes and the number of input features? Is there a certain number?
And why are we reshaping x to be (x.shape[0], -1)?
Hi, could you please use some images to show that out put then this will be very helpful. Thanks
I tried you code and I got an error..
"optimizer got an empty parameter list ". Any help?
Thanks for the content and I like to know how we can implement CNN 1D. Can you share some tips
Don't have too much experience with 1D convs, don't think I can offer you much help there unfortunately
@Aladdin Persson first of all i wanted to say thank you for your great videos. Second, can you prepare a video for time series forecasting using Transformers?
I love your voice
Thanks Aladdin
Thank you for the video. Should I need to return x or F.softmax(x) in CNN output, In this video you did not care the highest probability for output x
Using CrossEntropy cost function softmax is included so then you don't want to return F.softmax(x). Since softmax is a monotone function (argmax before and after softmax doesn't change) so if you want to take the one with highest probability I would just do scores = model(input) and then scores.argmax(dim)
@@AladdinPersson
Thanks for the brief explanation.
Hej, thanks for the video! Can you please explain difference between: Conv1d, Conv2d, Conv3d? Thanks! Better with example on different types of
Data)
Thanks for your comment, sure I can do my best. I actually don't have too much experience working with Conv1d but from my understanding they're used a lot for time series data. You can read more about Conv1d on this: towardsdatascience.com/understanding-1d-and-3d-convolution-neural-network-keras-9d8f76e29610. Conv2d are obviously what we used, and are used when working with images mostly. The shapes are (mini_batch_size, channels, height, width). Add another dimension to the data, like instead of looking at a single image at a time looking at several frames (videos) so that you would have (mini_batch_size, number_frames, channels, height, width) you would use Conv3d.
How I see it (this is a simplification):
Conv1d: Time series data
Conv2d: Images
Conv3d: Videos
@@AladdinPersson Thanks for article and brief explanation! Waiting for videos with Embedding layers!
Ate you sure about same convolution? I think the input and output size same depends upon kernel size and also there is no option as padding same.
Plz correct if m wrong
In PyTorch there's no option of setting it to be a same convolution but if the input height and width stays the same after we've sent it through a conv layer we call that a same convolution. As you say the input and output size definitely depend on kernel size, but regardless we can set the padding such that we can keep the input shape the same and then it's called a same convolution. Sorry for the very late response!
@@AladdinPersson no issue at least you did. Could you further explain this how to set padding for same. As I am doing it by setting the kernels, but ofcourse that affects the model architecture.
getting error TypeError: 'module' object is not callable