ResNets are tricky to conceptualise as there are many nuances to consider. Dr Bryce, you have done a great job here offering such a brilliant explanation that is both logical and easy to follow. You definitely have a gift of explaining complex ideas. Thank you!
I am writing a thesis on content-based image retrieval and I had to understand the ResNet architecture in-depth and by far this is the most transparent explanation ever!!
very very very good explanation. almost all explanations on this forget about the influence of random weights on the forward propagation and focus solely on the backward gradient multiplication. which is why i never understood why you needed to feed forward the input. thanks a lot
i have seen a lot of online lectures but you are the best for two reasons, the way you speak is not monotonous which give time to comprehend and process what your are explaining, and the second is the effort put in video editing to speed up when writing things down on board which doesn't break the flow of the lecture. Liked your video. Thanks🙂!
This is the clearest video that I've ever seen which explains the resnet for a layman, while at the same time conveying all the very important and relevant information related to resnet - I couldn't understand the paper - but with this video finally understood it - thanks a lot Professor Bryce - hope you create more such videos on deep learning
Excellent class! I watched many videos before I came to this video and none explained the concept of residual networks as clearly as you did. Greetings from México!
Wow This explanation is amazing. So clear! I saw some videos about resNets but none of them describes what skip connections mean inside, what is their inside structure and working logic. But your explanation gives me much more. You explained the way of thinking and inside structure and advantages. Wow!
Until now, this is the best Residual Network tutorial I have found. As constructive feedback, I would like you to dive more deeply into how shape mismatches are handled because that part is not at par with the rest of the highly intuitive explanations of various things happening in a ResNet.
This is such a clean and helpful video! Thank you very much! The only thing I still don't know is during the propagation, we now have two sets of gradients for each block? One for going through the layers, one for going around the layers, then how do we know which one to use to update the weights and bias?
Good question. For any given weight (or bias), its partial derivative expresses how it affects the loss along *all* paths. That means we have to use both the around- and through-paths to calculate the gradient. Luckily, this is easy to compute because the way to combine those paths is just to add up their contributions!
@csprof, By consistently including the original information alongside the features obtained from each residual block, are we inadvertently constraining our ResNet model to closely adhere to the input data, possibly leading to a form of over-memorization?
Thank you very much. I am not sure yet how residual block lead to faster gradient passing when the gradient has to go through both paths please? It means as I understand that this adds more overhead to compute the gradient. Please correct me if I am wrong. Also can you please add more how 1x1 reduce the depth or make a video please if possible? For example, I am not sure how the entire depth say of size 255 gives output to one neuron.
You're right that the residual connections mean more-complicated gradient calculations, which are therefore slower to compute for one pass. The sense in which it's faster is that it takes fewer training iterations for the network to learn something useful, because each update is more informative. Another way to think about it is that the function you're trying to learn with a residual architecture is simpler, so your random starting point is a lot more likely to be in a place where gradient descent can make rapid downhill progress. For the second part of your question, whenever we have 2D convolutions applied to a 3D tensor (whether the third dimension is color channels in the initial image, or different outputs from a preceding convolutional layer) we generally have a connection from *every* input along that third dimension to each of the neurons. If you do 1x1 convolution, each neuron gets input from a 1x1 patch in the first two dimensions, so the *only* thing it's doing is computing some function over all the third-dimension inputs. And then by choosing how many output channels you want, you can change the size on that dimension. For example, say that you have a 20x20x3 image. If you use 1x1 convolution with 8 output channels, then each neuron will get input from a 1x1x3 sub-image, but you'll have 8 different functions computed on that same patch, resulting in a 20x20x8 output.
ResNets are tricky to conceptualise as there are many nuances to consider. Dr Bryce, you have done a great job here offering such a brilliant explanation that is both logical and easy to follow. You definitely have a gift of explaining complex ideas. Thank you!
I am writing a thesis on content-based image retrieval and I had to understand the ResNet architecture in-depth and by far this is the most transparent explanation ever!!
very very very good explanation. almost all explanations on this forget about the influence of random weights on the forward propagation and focus solely on the backward gradient multiplication. which is why i never understood why you needed to feed forward the input. thanks a lot
i have seen a lot of online lectures but you are the best for two reasons, the way you speak is not monotonous which give time to comprehend and process what your are explaining, and the second is the effort put in video editing to speed up when writing things down on board which doesn't break the flow of the lecture. Liked your video. Thanks🙂!
This is the clearest video that I've ever seen which explains the resnet for a layman, while at the same time conveying all the very important and relevant information related to resnet - I couldn't understand the paper - but with this video finally understood it - thanks a lot Professor Bryce - hope you create more such videos on deep learning
Another example of a random youtuber with very less subscriber explaining a complex topic so brilliantly...
Thankyou so much sir
This tutorial is so clear that I can follow along as a non-native English speaker. Thanks a lot!
I am going to complete the entire playlist. Thanks, Bryce, you are a life saver
Brilliant explanation! Thank you so much, Professor Bryce!
Love your explanation, very easy to understand the concept and the flow of the ResNet in 17 mins! Really appreciate it
Awesome explanation. Got me through a learning hurdle that several others could not.
Every single second of this video conveys an invaluable amount of information to properly understand these topics. Thanks a lot!
Thank you professor Bryce, Resnets where brilliantly explained by you. I am looking forward for new videos on more recent deep learning architectures!
So clear and well explained. Thank you!
Thank you Prof. Bruce for explaining this thing with minimal complicated technicality
your explanation is clear and concise! Thank you so much
Brilliant explanation, the 3D diagrams were excellent and I could understand some tricky concepts, thank you so much!
Amazing. Thanks a lot. Your explanation is so clear. Please keep making videos professor!🙏
you are brilliant!! Thank you for explaining this so well!!!!❤❤❤
You have my respect, Professor.
Awesome explanation!! Thank you for your effort :)
nice explanation, thank you very much Professor Bryce
Thank you for the clear, concise, yet comprehensive explanation!
Brilliant explanation. Thank you!
Great video on this, super informative.
Thank you so much Mr Bryce.
Your explanations are very clear and well structured. Please never stop teaching.
Excellent class! I watched many videos before I came to this video and none explained the concept of residual networks as clearly as you did.
Greetings from México!
Omg this is so helpful! Thank you so much !!!
Best explanation i came across resnet so far.
Thank you so much for this video!
awesome.Loved it clear and concise!
this was fantastic - thank you
Best explanation of resnet on the internet
Thank you very much for putting the time and effort. This is one of the best explanations I've seen (including US uni. professors)
Brilliant explanation.
Wow This explanation is amazing. So clear! I saw some videos about resNets but none of them describes what skip connections mean inside, what is their inside structure and working logic. But your explanation gives me much more. You explained the way of thinking and inside structure and advantages. Wow!
great explanation, simple and straightforward.
Really Great explanation. Thanks Prof. ♥
Very nice video!
Thanks so much! very informative brief explanation
What an explanation
great explanation, thank you!
Brilliant explanation!!!
Until now, this is the best Residual Network tutorial I have found. As constructive feedback, I would like you to dive more deeply into how shape mismatches are handled because that part is not at par with the rest of the highly intuitive explanations of various things happening in a ResNet.
Great explanation, congrats.
thank you for the great explanation
It was clear and useful. Tnx a lot
Thanks for your video.
16 golden minutes.❤
Great Explanation !!!!
very nice! thank you!
You are a star!
Thank you!!!
Awesome explanation
Superb!
Amazing expalinaton. Thank you sir
Best Explanation
Thank you 👏👏
got a meaningfull insights from this video
Loss landscape looking super smooth .....
Waow. Thankyou
Prof. Bryce is the GOAT!
great!
Who is this teacher? Damn he is good. Thank you
This is such a clean and helpful video! Thank you very much! The only thing I still don't know is during the propagation, we now have two sets of gradients for each block? One for going through the layers, one for going around the layers, then how do we know which one to use to update the weights and bias?
Good question. For any given weight (or bias), its partial derivative expresses how it affects the loss along *all* paths. That means we have to use both the around- and through-paths to calculate the gradient. Luckily, this is easy to compute because the way to combine those paths is just to add up their contributions!
Coudn't understand how we can treat the shape-mismatch 13:40
Great lecture nonetheless, thank you sir !! Understood what Residual Networks are 🙏
@csprof, By consistently including the original information alongside the features obtained from each residual block, are we inadvertently constraining our ResNet model to closely adhere to the input data, possibly leading to a form of over-memorization?
10:10
Concerns: shape mis-match
nervous sweating
Can you please talk about GANs and if possible stable diffusion
Thank you very much. I am not sure yet how residual block lead to faster gradient passing when the gradient has to go through both paths please? It means as I understand that this adds more overhead to compute the gradient. Please correct me if I am wrong. Also can you please add more how 1x1 reduce the depth or make a video please if possible? For example, I am not sure how the entire depth say of size 255 gives output to one neuron.
You're right that the residual connections mean more-complicated gradient calculations, which are therefore slower to compute for one pass. The sense in which it's faster is that it takes fewer training iterations for the network to learn something useful, because each update is more informative. Another way to think about it is that the function you're trying to learn with a residual architecture is simpler, so your random starting point is a lot more likely to be in a place where gradient descent can make rapid downhill progress.
For the second part of your question, whenever we have 2D convolutions applied to a 3D tensor (whether the third dimension is color channels in the initial image, or different outputs from a preceding convolutional layer) we generally have a connection from *every* input along that third dimension to each of the neurons. If you do 1x1 convolution, each neuron gets input from a 1x1 patch in the first two dimensions, so the *only* thing it's doing is computing some function over all the third-dimension inputs. And then by choosing how many output channels you want, you can change the size on that dimension. For example, say that you have a 20x20x3 image. If you use 1x1 convolution with 8 output channels, then each neuron will get input from a 1x1x3 sub-image, but you'll have 8 different functions computed on that same patch, resulting in a 20x20x8 output.
👍
Do you mean that RESNET is just a skip connection not an individual network ?????????
Brilliant explanation. Thank you!