Exatcly what I need, a channel that focus more on the mathematics as well as the computer science side of things. Please don't ever be afraid to get into the maths, keep it like this. There's a lotta superficials tutorials channels out there, theres a void to be filled in the deep mathematical kinda sense.
Your videos are a godsend. I had so much trouble trying to build a CNN from scratch for my Uni course, and nothing on the internet even come close to how well you explain every single step. Thank you so much!
This helped me a lot! Lots of projects out there just use the Pytorch API for the most common functionalities, which makes things harder for people who desire to change the code in order to implement a research idea or just experiment in general. Thank you so much for your effort, and for taking the time to make these videos!
Your channel is one of the best on the whole UA-cam. I am very pleased to see you and remember the basics) If I decide to make a channel about DL, I will take an example from you.
Literally, I've been looking for this exact video for months. It's exactly what I was looking for, a high cuality video that really goes in depth about all the maths and coding around neural networks. I feel like a lot of the videos out there either explain all this in a high level of abstraction or only go in depth about the maths or the coding part, but this just does all things great. Honestly I can't believe this channel is bellow 100k subs. Keep it up man!!
thank you so much for all of these videos - the explanations are genuinely incredible and make it so simple to understand, are you ever gonna return and continue to make these videos?
What happens if we are using batch gradient descent instead of stochastic GD? In the video we have M as n x n matrix, and the output of softmax is n x 1. Now, in the batch GD case, the ouput of softmax is n x m, but what about the M matrix?
Excellent video; liked the previous one as well. I was wondering if it was necessary to have the analytic formula for the gradient. Couldn't we just just use a small variation of the input and use the output variation to get the local gradient, using the forward function when back propagating ?
It is an interesting question :) I've never tried it, but I think you will have to run the network as many times as you have parameters. If you have a function f(x,y) and you want the derivative with respect to x, then you have to evaluate (f(x+dx,y) - f(x,y))/dx, but then you will have to compute (f(x,y+dy) - f(x,y))/dy to get the derivative with respect to y. Unless I missed something, this seems unfeasible for a neural network given that it has thousands of parameters. I think this might also lead to bigger and bigger imprecision as you go back in the network.
thanks you sir for this great explanation, but i have one question ! what about the loss ? we need to define a new loss function ? we must use cross entropy loss not the binary cross entropy ?
Hi Anis. The loss that you will be using depends mainly on the use case of the neural network. We defined Mean Square Error in the first video, and Binary Cross Entropy in the second one.
@@independentcode I mean WE Can use any loss fonction that WE want with softmax ? And for the binary cross entropy i thought its only for binary classification ?
We were computing the derivative of E (error) with respect to x_k (k_th element in the input vector). That's for the k. The formula is a sum for i. Try expanding the sum, and take the derivative with respect to one of the input variable. You'll understand.
That's because the Activation class takes in a function that will be applied to each input element individually: y_i=f(x_i). In the case of Softmax, each output depends on all the inputs, so the backpropagation works out differently.
Excellent explanation again. However the implememtation is inefficient. It is good to simplify the formulas to explain. But this leads to inefficiency. Normally you do not need to form identity matrix and M explicitly and transpose of the matrix M. Form a vector y=(y1 y2...) then kroncker product of y will give you all the elements contained in MM.T. And using numpy brodcasting you can do substraction. Not full implementation but it will look like reshape(kron(y, y)....) - y
Hey! I didn't know about the Kronecker product, thank you for mentioning it! I looked at it, and indeed we can compute the same formula as: np.identity(n) * y - np.reshape(np.kron(y, y), (n, -1)) However, I feel like we still have to create the identity matrix since we're subtracting elements in the diagonal.
@@independentcode Look your last formula M\hadamard(I - M.T) = M - M\hadamard M.T and you can obtain H = M\hadamard M.T by reshaping kronecker product. And M - H will be computed by broadcasting.
@@independentcode Moreover, guess what? You do not need even to form M, therefore you do not need matrix vector multiplication.Hint: You need only elementwise vector multiplication. Try it. If you could not come up with solution, I will explain you
Why stop uploading the video, isn't it your responsibility to complete what you've started ? 🥲 Your content is what many of us looking for, so keep doing bro 🤜🤛
Exatcly what I need, a channel that focus more on the mathematics as well as the computer science side of things.
Please don't ever be afraid to get into the maths, keep it like this.
There's a lotta superficials tutorials channels out there, theres a void to be filled in the deep mathematical kinda sense.
I hear you! I'll do my best.
Your videos are a godsend. I had so much trouble trying to build a CNN from scratch for my Uni course, and nothing on the internet even come close to how well you explain every single step. Thank you so much!
Thank you for the kind words. I'm really glad my videos helped :)
This helped me a lot!
Lots of projects out there just use the Pytorch API for the most common functionalities, which makes things harder for people who desire to change the code in order to implement a research idea or just experiment in general.
Thank you so much for your effort, and for taking the time to make these videos!
Your channel is one of the best on the whole UA-cam.
I am very pleased to see you and remember the basics)
If I decide to make a channel about DL, I will take an example from you.
Literally, I've been looking for this exact video for months. It's exactly what I was looking for, a high cuality video that really goes in depth about all the maths and coding around neural networks. I feel like a lot of the videos out there either explain all this in a high level of abstraction or only go in depth about the maths or the coding part, but this just does all things great.
Honestly I can't believe this channel is bellow 100k subs.
Keep it up man!!
Happy New Year!!!
Thanks for creating this channel!
I already know all this, but your presentation is amazing.♥♥♥
This is one gem of a channel!
I miss you!
?
how do you have less than a thousand subscribers ? this is better than most of popular stuff out there. please keep making videos i love them
Great work! Your videos are one of the best in explaining the most complex topics !
thank you so much for all of these videos - the explanations are genuinely incredible and make it so simple to understand, are you ever gonna return and continue to make these videos?
Alright, this deserves a sub ^^
What happens if we are using batch gradient descent instead of stochastic GD? In the video we have M as n x n matrix, and the output of softmax is n x 1. Now, in the batch GD case, the ouput of softmax is n x m, but what about the M matrix?
Can you also make video on cross-entropy loss.
Please do the LSTM I beg you
Hlo there mister love your videos if possible please make a video on recurrent layers. (From Scratch)
Like before watching, cuz i trust the content❤️
Excellent video, thx for the content
I am your 754th subscriber.
Excellent video; liked the previous one as well. I was wondering if it was necessary to have the analytic formula for the gradient. Couldn't we just just use a small variation of the input and use the output variation to get the local gradient, using the forward function when back propagating ?
It is an interesting question :) I've never tried it, but I think you will have to run the network as many times as you have parameters. If you have a function f(x,y) and you want the derivative with respect to x, then you have to evaluate (f(x+dx,y) - f(x,y))/dx, but then you will have to compute (f(x,y+dy) - f(x,y))/dy to get the derivative with respect to y. Unless I missed something, this seems unfeasible for a neural network given that it has thousands of parameters. I think this might also lead to bigger and bigger imprecision as you go back in the network.
Why did you stop doing vidéos :/ ?
I was wondering if the Backward pass code will work for Batch of Input.
thanks you sir for this great explanation, but i have one question ! what about the loss ? we need to define a new loss function ? we must use cross entropy loss not the binary cross entropy ?
Hi Anis. The loss that you will be using depends mainly on the use case of the neural network. We defined Mean Square Error in the first video, and Binary Cross Entropy in the second one.
@@independentcode I mean WE Can use any loss fonction that WE want with softmax ? And for the binary cross entropy i thought its only for binary classification ?
on 3.08 if k=i then ... what is k , and what is i?
We were computing the derivative of E (error) with respect to x_k (k_th element in the input vector). That's for the k. The formula is a sum for i.
Try expanding the sum, and take the derivative with respect to one of the input variable. You'll understand.
why don't you inherited the Activation class
That's because the Activation class takes in a function that will be applied to each input element individually: y_i=f(x_i). In the case of Softmax, each output depends on all the inputs, so the backpropagation works out differently.
Excellent explanation again. However the implememtation is inefficient. It is good to simplify the formulas to explain. But this leads to inefficiency. Normally you do not need to form identity matrix and M explicitly and transpose of the matrix M. Form a vector y=(y1 y2...) then kroncker product of y will give you all the elements contained in MM.T. And using numpy brodcasting you can do substraction. Not full implementation but it will look like reshape(kron(y, y)....) - y
Hey! I didn't know about the Kronecker product, thank you for mentioning it! I looked at it, and indeed we can compute the same formula as:
np.identity(n) * y - np.reshape(np.kron(y, y), (n, -1))
However, I feel like we still have to create the identity matrix since we're subtracting elements in the diagonal.
@@independentcode Look your last formula M\hadamard(I - M.T) = M - M\hadamard M.T and you can obtain H = M\hadamard M.T by reshaping kronecker product. And M - H will be computed by broadcasting.
@@independentcode Another way(less elegant): Before reshaping do substraction(without using identity) then reshape.
@@independentcode Moreover, guess what? You do not need even to form M, therefore you do not need matrix vector multiplication.Hint: You need only elementwise vector multiplication. Try it. If you could not come up with solution, I will explain you
@@tangomuzi I came up with this, using the identity: (np.identity(n) - y.T) * y
Is this what you were thinking of ?
When transformer tutorials ?
Please make a video on RNN implementation from scratch
-1 and +10
Bro stopped making videos at the worst time possible ‼️😭
First to comment
Why stop uploading the video, isn't it your responsibility to complete what you've started ? 🥲
Your content is what many of us looking for, so keep doing bro 🤜🤛