At 8:40 I think it is suppose to say something like Step 2: weights x inputs + bias step 3: calculate loss (mean squared error) step 4: find partial derivative for all weights step 5: calculate optimization direction on computational graph step 6: take step towards minima / optimized weights
2 AM in the UK, when I am writing this comment. Wanted to go to bed so much after work, but, when Siraj drops the video, I have to watch it in hope that the knowledge will optimize while sleeping.
Its incredible to see how many effort you put in these videos, you deserves more subs, non just because your'e hella funny, but also because I can feel that your goal is not make views and money as the other youtubers, your'e teaching me machine learning but Im learning something more from you, respect from Italy 🇮🇹
right it should say Step 2: weights x inputs + bias step 3: calculate loss (mean squared error) step 4: find partial derivative for all weights step 5: calculate optimization direction on computational graph step 6: take step towards minima / optimized weights
Maybe a quick mention of the chain rule would have also been nice, because that plays a big part in backpropigation. Cool video though, it is sort of a quick recap of the course I did last month.
Great video as always man! Been a fan since your early days, your videos got me into ML and now I have 2 papers on Quantum Machine Learning on the arXiv. Just wanted to let you know of your positive impact. Keep up the great work educating the world!
Rap album of ML in cheatsheet form would be a worthy theme track to overplay. As learning styles vary, sometimes going meta (which at the heart is to link use cases with concepts) can help the user differentiate the scope of tree and forest. Much love for the content and the wizard community!
Hi! Great video; but maybe you could have spent more time on the back propagation schema and explain the steps one by one, it seemed to me like the most interessant part, but I still haven't understand it even after the video...
7:54 I don't think that backpropagation is a rename of gradient descent. It's more like backprop followed by gradient descent. Backprop finds gradient and then gradient descent applies "weight = weight - gradient*learning_rate" So gradient descent only does the weight update. It doesn't need to find the gradient. The gradient is provided by backpropagation.
Gradient descent also includes derivation. See Andrew Ng's video on the topic, ua-cam.com/video/yFPLyDwVifc/v-deo.html The gradient is computed using calculus, specifically through partially deriving the weight variables of whatever function is being used to model some data. This process is part of the gradient descent process across all models, and it’s called back-propagation in the context of neural networks.
This is where it gets tricky. The definition of it is not so clear. I kind of see it like this: In order for gradient descent to work, it needs to call backpropagation to get the gradient. So you could say that the whole thing is gradient descent, that's right. But as far as I know, backpropagation does not include the weight update part. It only computes gradients. So gradient descent (in neural network context) is more like backprop plus weight update. Gradient descent needs learning rate as a hyperparameter but backprop doesn't. So I think they are not the same because of the weight update part.
please verify your knowledge before posting half-truth. MSE or Cross-Entropy is the Loss/Cost-function and not gradient descent, you use gradient descent on the cost function to update the weigths and biases. The chain rule is the chain rule and is not called backpropagation, it is used to calculate the gradient and is an essential part of backpropagation.
you said that the backpropagation is the chain rule derivative but that is not true. Backpropagation is an algorithm that is dependent on the chain-rule but it is not equivalent to the chain-rule. You said gradient descent/ascent is the cross entropy or MSE that is not true and misleading. The cross entropy or MSE are error functions.
Chanchana Sornsoontorn exactly, we can still use Advanced optimisers like lbfgs, and use backprop for getting the gradients alone, backprop and gradient descent are 2 different things
Finally your videos make sense, great work , your progress is my ease of learning . Looking forward to the next one. Aren't quantum computers of youre do optimization particularly well ?
I love your videos and have learnt a lot from them. How ever, I think you shouldn't use Python 2 examples anymore, since the industry is (finally) migrating from it to modern python. Great video!
Great video, I would of liked it more if there was actually an backprop example with a very simple neural net, but I guess there isn't enough space for that. Try to record the UCLA lecture if possible.
Excellent video on GD, but this video is being too general and covers the topic on the GD itself, rather than elaborating on application of the chain rule for derivation of the recursive formula for the gradient of a cost function wrt to the weigths of a feedforward NN.
It is a very interesting video congratulations, could you make an example of how to perform backpropagation with TensorFlow? of an example of mathematical calculation, as for example to predict the cost of a house with parameters of area location etc. Keep it up by sharing your knowledge. THANK YOU
I also think of use of gradient decent as find in the circuit (the mess of weights) where are the weight Knobs how make most ”bang for the buck” to tuning on to change the outcome situation
Consistently impressed by your ability to collapse a half semesters worth of info into 10 minutes, as well as the great video production. Keep it up!!!
Hi Siraj, Could you please tell how to do Data Augmentation for numerical data (I mean not for images) I created a ANN with backpropagation algorithm. Now I wish to do Data Augmentation.
Computing the error for the last layer of the neural network makes perfect sense, but calculating error on earlier layers... nope. I.e. it would be really nice to see the values of (for example) an XOR gate being trained from random weights on the minimal network that can implement an XOR function, using backpropagation.
Bob I'm ur fan 💕...I'm a front-end developer, but I love AI, n can u help Me Plzzz , What is the EXACT difference between machine learning and deep learning?.. (The cutting-edge tip like that seperates ML n DL.. How to make out).. And can v learn deep learning without learning ML??.. What are the functions for ML and other functions for DL?
Bharath kumar As per my knowledge, you only needs basics of machine learning and a bit of maths to start on deep learning. It is a subset and most popular part of Machine Learning
hy,, thnx bro. but. im already halfway into neural-networks.. like linear-algebra, logic gates,vector,matrixs, inputs,weights, tendons,networks nll but the problem is, im not getting how to fetch up n apply which functions for wat type of problems :(,,, like why this linear regressions, y sigmoid,y this tanH, y ReLu where to apply this nll // full on confusions bro :'( .
There’s a reason I’m telling all my friends to block him. He’s more interested in blocks than actually learning the concepts. He’s a fraud only interested in profits.
Not to be a weenie, but this topic of NN on youtube is much like the plethora of many programming channels that talks like they are regurgitating from a book and less like "let me show you how you actually apply this in code so that you can you use it in your own projects." And most people don't read scientific notation
to those who think gradient descent and backprop r different things, guys don't embarrass yourselves by writing that. Also it's just a got damn partial derivatives for christ sake.
I commented on the wrong video that poped up after the one i watched. In any case that was the impression i got after watching it. Equations just popping on the screen without in depth explanation. Is the purpose of videos to teach people or to just let them know that this stuff exists ?:) Edit : i was referring to machine learning video , your latest
At 8:40 I think it is suppose to say something like
Step 2: weights x inputs + bias
step 3: calculate loss (mean squared error)
step 4: find partial derivative for all weights
step 5: calculate optimization direction on computational graph
step 6: take step towards minima / optimized weights
really good, this is a big miss in the video
correct, so embarrassing. won't happen again
Random is very important :)
it was an animation error
Use www.threelly.com
Finally. It's 4 AM in India and I am out of bed waiting for this video.
2 AM in the UK, when I am writing this comment. Wanted to go to bed so much after work, but, when Siraj drops the video, I have to watch it in hope that the knowledge will optimize while sleeping.
Its 07:46 in Europe and I just woke up so I post this comment!
4:20 AM in Oklahoma, USA. 420 blaze it
bahaha
If you can’t explain it simply, you don’t understand it well enough. You, sir understand it way better than others 🔥🔥🔥
This was SO helpful, concise video packed with so many concepts I needed to understand before I could understand the back propagation.
Its incredible to see how many effort you put in these videos, you deserves more subs, non just because your'e hella funny, but also because I can feel that your goal is not make views and money as the other youtubers, your'e teaching me machine learning but Im learning something more from you, respect from Italy 🇮🇹
I know, right? Siraj's passion for these incredible subjects is contagious! I love this guy. (Venezuela)
Dario Cardajoli he gets paid and he's professionally produced
8:35 I feel like this is a mistake in the video. Everything is Step 1 Random Initialization.
right it should say
Step 2: weights x inputs + bias
step 3: calculate loss (mean squared error)
step 4: find partial derivative for all weights
step 5: calculate optimization direction on computational graph
step 6: take step towards minima / optimized weights
Maybe a quick mention of the chain rule would have also been nice, because that plays a big part in backpropigation.
Cool video though, it is sort of a quick recap of the course I did last month.
Great video as always man! Been a fan since your early days, your videos got me into ML and now I have 2 papers on Quantum Machine Learning on the arXiv. Just wanted to let you know of your positive impact. Keep up the great work educating the world!
Rap album of ML in cheatsheet form would be a worthy theme track to overplay. As learning styles vary, sometimes going meta (which at the heart is to link use cases with concepts) can help the user differentiate the scope of tree and forest. Much love for the content and the wizard community!
I wrote that before seeing the intro, as my video was lagging. Trippy.
Would love to see a video about using docker. Keep it up siraj!
it really shaped my thoughts!!
Thanks for this
Hi! Great video; but maybe you could have spent more time on the back propagation schema and explain the steps one by one, it seemed to me like the most interessant part, but I still haven't understand it even after the video...
7:54 I don't think that backpropagation is a rename of gradient descent.
It's more like backprop followed by gradient descent.
Backprop finds gradient and then gradient descent applies "weight = weight - gradient*learning_rate"
So gradient descent only does the weight update. It doesn't need to find the gradient. The gradient is provided by backpropagation.
Gradient descent also includes derivation. See Andrew Ng's video on the topic, ua-cam.com/video/yFPLyDwVifc/v-deo.html The gradient is computed using calculus, specifically through partially deriving the weight variables of whatever function is being used to model some data. This process is part of the gradient descent process across all models, and it’s called back-propagation in the context of neural networks.
This is where it gets tricky. The definition of it is not so clear. I kind of see it like this:
In order for gradient descent to work, it needs to call backpropagation to get the gradient.
So you could say that the whole thing is gradient descent, that's right.
But as far as I know, backpropagation does not include the weight update part. It only computes gradients.
So gradient descent (in neural network context) is more like backprop plus weight update.
Gradient descent needs learning rate as a hyperparameter but backprop doesn't.
So I think they are not the same because of the weight update part.
please verify your knowledge before posting half-truth. MSE or Cross-Entropy is the Loss/Cost-function and not gradient descent, you use gradient descent on the cost function to update the weigths and biases. The chain rule is the chain rule and is not called backpropagation, it is used to calculate the gradient and is an essential part of backpropagation.
you said that the backpropagation is the chain rule derivative but that is not true. Backpropagation is an algorithm that is dependent on the chain-rule but it is not equivalent to the chain-rule. You said gradient descent/ascent is the cross entropy or MSE that is not true and misleading. The cross entropy or MSE are error functions.
Chanchana Sornsoontorn exactly, we can still use Advanced optimisers like lbfgs, and use backprop for getting the gradients alone, backprop and gradient descent are 2 different things
so great explanation ,could give me a simple definition for the Back propagation
Finally your videos make sense, great work , your progress is my ease of learning . Looking forward to the next one. Aren't quantum computers of youre do optimization particularly well ?
I love your videos and have learnt a lot from them. How ever, I think you shouldn't use Python 2 examples anymore, since the industry is (finally) migrating from it to modern python. Great video!
Waking up to this
That Rap in the beginning though. Kind of summary of the whole video xD
That opening scene was beautiful❤️
This is an excelent video, well explained! Thanks!
Great video, I would of liked it more if there was actually an backprop example with a very simple neural net, but I guess there isn't enough space for that. Try to record the UCLA lecture if possible.
Iike the new setup of the video awesome!
Yay good video on Backpropagation!!!!
Excellent video on GD, but this video is being too general and covers the topic on the GD itself, rather than elaborating on application of the chain rule for derivation of the recursive formula for the gradient of a cost function wrt to the weigths of a feedforward NN.
Siraj a wavyy dude!
Great video as always :)
It is a very interesting video congratulations, could you make an example of how to perform backpropagation with TensorFlow? of an example of mathematical calculation, as for example to predict the cost of a house with parameters of area location etc. Keep it up by sharing your knowledge. THANK YOU
I also think of use of gradient decent as find in the circuit (the mess of weights) where are the weight Knobs how make most ”bang for the buck” to tuning on to change the outcome situation
Hi... When do you come on live video? please mention the time.
Thank you for your excellent video and perfect english
Consistently impressed by your ability to collapse a half semesters worth of info into 10 minutes, as well as the great video production. Keep it up!!!
thanks Mason!
These videos are really great.
Great video. Love from India
Hi Siraj, Could you please tell how to do Data Augmentation for numerical data (I mean not for images)
I created a ANN with backpropagation algorithm. Now I wish to do Data Augmentation.
ahhhhh dude! i knew you lived in LA! I had a feeling. You have the vibe!
Computing the error for the last layer of the neural network makes perfect sense, but calculating error on earlier layers... nope. I.e. it would be really nice to see the values of (for example) an XOR gate being trained from random weights on the minimal network that can implement an XOR function, using backpropagation.
You should make a video on how to convince VC and other investors to invest in our AI project.
Hey siraj!!.. you have a spelling mistake of usain (osain) in your UA-cam about section😊
Btw thanks for your videos... #lovefromIndia
Thank you for this awesome video
Did you get a chance to record the UCLA lecture?
Could you make a playlist of ML models built from scratch without using pre-trained libs.
That would be of great help.
4:00 AM in India and Siraj is uploading video..
i will do a better time next time
its okke :) we love that .. coders == owl :P
yr videos are awesome man keep uploading more,yr great man
A video of detailed explanation of Math behind populer ML Algorithms...
Look up "The Elements of Statistical Learning". PDF book is available for free online.
Can we make generative desing with it?
Ahh! so Momentum is used to come over a smaller hill to then go to global minimum instead of local minimum, am I correct. (Preventing overtranig ??)
One video of Gradient Descent Optimization Algorithms??
Can you explain back propogation through time please?
Fresh Explanation!
Bob I'm ur fan 💕...I'm a front-end developer, but I love AI, n can u help Me Plzzz , What is the EXACT difference between machine learning and deep learning?.. (The cutting-edge tip like that seperates ML n DL.. How to make out).. And can v learn deep learning without learning ML??.. What are the functions for ML and other functions for DL?
Bharath kumar As per my knowledge, you only needs basics of machine learning and a bit of maths to start on deep learning. It is a subset and most popular part of Machine Learning
hy,, thnx bro. but. im already halfway into neural-networks..
like linear-algebra, logic gates,vector,matrixs, inputs,weights, tendons,networks nll but the problem is, im not getting how to fetch up n apply which functions for wat type of problems :(,,, like why this linear regressions, y sigmoid,y this tanH, y ReLu where to apply this nll // full on confusions bro :'( .
Siraj remember a long time ago: 'Backpropagate to update weights'
Awesome ! Appreciate it!
At 8:35 there's written 'step 1 random initialization' on every box. Was supposed to be like this??
You've discussed gradient descent more here than backpropagation. To be fair, there's almost nothing about backpropagation.
cause he's a fraud
There’s a reason I’m telling all my friends to block him. He’s more interested in blocks than actually learning the concepts. He’s a fraud only interested in profits.
Yo siraj do you possess any information on the new AI algorithm "Augmented Random Search" (ARS)? If you do please make a video on it ;)
How is back propagation different to gradient descent?
Editing in this video was a little bit sloppy, a few grammatical and spelling mistakes. However it was quite a well made video overall.
Siraj, you are awesome :)
I'm going to watch this video back- propagating
Let's gooooo
Always good to review
6:48 "It's represented by this little squiggly character." Hahaha
F5 F5 F5 since the announcement on Twitter :-)
3:31 how many people caught that PPAP meme I wonder?
Great video. Cover backpropagation through time (context of RNN) in detail?
The maths is so difficult and I can't understand! Can anyone reference an article of step by step derivation of backpropagation?
Nice
Nice!
tho I know how back prop works- YAAAAY!!!!! love you channel! #BeingCoolWhileProgramming
Using hashtags is not cool.
well done ;)
Did I hear you say "derive the meaning of life?"...
4:11 AM squad
Ajinkya Jumbad itachi is alert all the time!!!!
I thought it was supposed to be a backpropagation explanation, not a full machine learning course.
Not to be a weenie, but this topic of NN on youtube is much like the plethora of many programming channels that talks like they are regurgitating from a book and less like "let me show you how you actually apply this in code so that you can you use it in your own projects." And most people don't read scientific notation
1st Bro Jai HInd #swachhcrypto
no programer to full ai in xx days
Upto 4 minutes i didn't found backpropagation
i clicked video to understand the concept, i am more confused now.
明明适合当歌手,非要做技术
You look like Mika Singh
Just thought I would leave this here, optimization doesn't always end well or as we might expect: ua-cam.com/video/tcdVC4e6EV4/v-deo.html
Machine Learning is so ripoff of Linear programming.
This video lacked Siraj-ness...
Tell me
More
Why are u repeating it again and again?
just wait till next week, there is a bigger audience and im getting them up to speed
to those who think gradient descent and backprop r different things, guys don't embarrass yourselves by writing that. Also it's just a got damn partial derivatives for christ sake.
Like the way he pretends
Haha pen apple pineapple pen😹😹 if you got it
Why do i think that he is actually clueless about all of this and is just there to present it?
Tell me why
I commented on the wrong video that poped up after the one i watched. In any case that was the impression i got after watching it. Equations just popping on the screen without in depth explanation. Is the purpose of videos to teach people or to just let them know that this stuff exists ?:)
Edit : i was referring to machine learning video , your latest
You tell me if im wrong
First one here 😎
Yo siraj do you possess any information on the new AI algorithm "Augmented Random Search" (ARS)? If you do please make a video on it ;)