@@erich_l4644 LMAO - your comment just won the internet. You'll soon receive an email by a Nigerian Prince with instructions on how to claim your winnings lol.
Hands down.. This is the best video on Optimizers.. !!! I've been trying to understand the complex math equations for the past few days and this one literally gave me the overall intuition in 7min 🙏🙏🙏
What does the alpha at the SGD momentum equation do? I mean alpha is your learning rat at the first two equations but since then you use n as the learning rate, so for what is alpha since then?
Thanks for watching Gaurav (and the suggestion). Saw your comment on the last video too. And it was also in a line of videos I wanted to do. Probably not as "mathematical" as you'd like. I wanted to just explain why certain terms appear the way they do. Hopefully this helped that understanding. I might do a more mathematical video in the future though. But for now, this will do :)
why gradient on 1:21 would be large? Isn't it just avg for every element in dataset? Same for mini-batch, but only there gradient is avg for every element in batch
Hey, thank you a lot for the explanations! Do you happen to know any heuristics with which to chose a specific optimizer? Right now I have a problem where every paper uses Natural Gradient descent, but when I use it it barely ever converges, while Adam always gets it right (or at least comes close)...
I think that a critical point missing here in the explanation: You have forgotten to mention that a loss surface is different for each sample so there DOES NOT EXIST any universal loss surface for a given dataset and this is a problem in stochastic gradient descent
Hey. Great work on the video :D It was v clear and fascinating What's NAG? I wonder how come Nadam isn't popular -seems like a better choice. How would you describe RMSProp? You seem to have really great insight into DL concepts :D Also why expectation in particular for Adam parameter updates? Sorry for the questions bombardment. Jus pretty curious
The first part is not correct: the fact that you use a mini-batch in each step, rather than the entire dataset, does not give you a higher chance to converge to the optimum. Because even when considering the entire dataset in each step, you're still taking the average gradient, so the expected magnitude of the gradient does not change. It all depends on the step size.
Imo you should have gone into more detail of the math of the optimizers. I did not understand how the terms relate to the behaviour the optimizers are supposed to have.
I like how you anthropomorphize optimizers. Makes me really empathize with their struggles.
with a profile pic like that- you would
@@erich_l4644 LMAO - your comment just won the internet. You'll soon receive an email by a Nigerian Prince with instructions on how to claim your winnings lol.
Died laughing like thrice. Witty joke.
This is how every lecture should be like. Funny but perfectly explained, and greatly visualized. Thanks!
The best explaination of optimizers in DL I HAVE EVER WATCHED!
Thank you! More of this to come!
great little overview! love how you get to the point quickly yet provide all the needed intuition
Thanks! That's exactly what I was going for :)
Hands down.. This is the best video on Optimizers.. !!! I've been trying to understand the complex math equations for the past few days and this one literally gave me the overall intuition in 7min 🙏🙏🙏
I'm so glad this helps :)
Such an underrated channel! Great explanations and visuals!
Much appreciated :)
Don't know why this video is under spreading, the explanation is great and the high-level summarization helps me a lot.
Thank you! Mind fixing that by sharing this around? Would love to get more eyeballs here :)
Loving the sound effect
I am watching this at 2 am in the morning and that sudden effect 00:13 cracked me up!
thanks! one of those rare videos that explain the intuition perfectly instead of hovering around the terms
I try. :)
This video was both informative and hilarious. I absolutely loved it!
That was the objective. Glad you liked it :)
That...... was........ one EXCELLENT VIDEO!!!!!
Thank you so much, I thought I would struggle with optimizers but now it's all clear to me
the only video thats ever made me laugh while explaining a concept. Love it, thank you!
You are very welcome :)
Absolute way of learning with lot of fun. Thanks for such a funny and insightful video
Absolutely love this iterative explanation.
Thank you. I'm experimenting with different teaching styles :)
like your video, and mostly i wanted to see the graph that most people dont show, thank you
This is one hell of a video to refresh on this stuff! kindly appreciated!!
Man you gave the best explanation which even a noob like me in machine learning can understand . Keep it up man 👍.
Awesome! Glad you like it :)
Best video ever on optimizer. Thanks a lot.
Man, this video is slept on. Such a good explanation!
Very well explained and in a fun way.
love it! thank you! explained better than my professors. i finally get these now after so long
Words that are too kind. Thank for the kind words
Best explanation ever!! Thank you so much!!!
This is my new favorite video on the internet
Thanks so much for the compliments:) I try
Very clear, very well explained 10/10
Thank you;
What does the alpha at the SGD momentum equation do? I mean alpha is your learning rat at the first two equations but since then you use n as the learning rate, so for what is alpha since then?
Thanks ajay for giving this a shoot. Loved it❤️
Thanks for watching Gaurav (and the suggestion). Saw your comment on the last video too. And it was also in a line of videos I wanted to do. Probably not as "mathematical" as you'd like. I wanted to just explain why certain terms appear the way they do. Hopefully this helped that understanding. I might do a more mathematical video in the future though. But for now, this will do :)
@@CodeEmporium You did a pretty awesome job in just 7 minutes. Its both beginner friendly and refreshing for intermediates.
Such a clear and simple explanation of complicated things. Great job.
Awesome work easy to get a quick review before my interview keep going
Wow most complex topic in under 7 minutes 😊 with pretty good visualizations.
Great explanation and fun, too. Thank you!
You are very welcome :)
m8, i was searching for channel like that for a really long time
Never thought I would spit out drink while watching machine learning video
Great explanation. Thank you!
You are most welcome!
If I understood well, Acceleration should not be called 'deceleration' in this particular case ?
😮 Am silence of how you explain everything in details
This video is so good, and it deserves 100X more attention!
I like how you explained this!
Thanks so much!
why gradient on 1:21 would be large? Isn't it just avg for every element in dataset? Same for mini-batch, but only there gradient is avg for every element in batch
man you do great work!
The first scene is precisely what happened to my neural network 2 weeks ago.
Your Explanation went to the deep of my brain!
Hey, thank you a lot for the explanations! Do you happen to know any heuristics with which to chose a specific optimizer? Right now I have a problem where every paper uses Natural Gradient descent, but when I use it it barely ever converges, while Adam always gets it right (or at least comes close)...
Your implementation might not be ideal, I'd try to use a KFAC preconditioning term maybe?
I think that a critical point missing here in the explanation:
You have forgotten to mention that a loss surface is different for each sample so there DOES NOT EXIST any universal loss surface for a given dataset and this is a problem in stochastic gradient descent
Amazing explanation
Nicely done! Thanks!
Thank youu
Nice channel! better than my professors lol
Super happy this is helpful. Thanks!
you voice is like some cool anime main character. I wished i had a voice like yours. Anyway great explanation.
Please keep doing this kind of videos I’m in love with ML and with u
Haha thank you so much for the support
amazing vid you just earned a subscriber! looking forward to more content like this!
awesome explanation
very nice explanation and visualization.
Thanks homie
do Neural ODEs and self-supervised learning techniques pls,
great video btw
Thanks. I saw your comment on another video. I'll look into this a bit
So underrated 😭
Right?!
Thanks for the great clarification!
Of course. Anytime :)
❤️ the videos man. They're so clear
Always good
Hey. Great work on the video :D It was v clear and fascinating
What's NAG? I wonder how come Nadam isn't popular -seems like a better choice.
How would you describe RMSProp? You seem to have really great insight into DL concepts :D
Also why expectation in particular for Adam parameter updates?
Sorry for the questions bombardment. Jus pretty curious
NAG = Nested Accelerated Gradient
The first part is not correct: the fact that you use a mini-batch in each step, rather than the entire dataset, does not give you a higher chance to converge to the optimum. Because even when considering the entire dataset in each step, you're still taking the average gradient, so the expected magnitude of the gradient does not change. It all depends on the step size.
Well explained.
Thank youuuu
Amazing video.
Thanks so much!
Question how does Newton play here?????
this video was the bestttttt
That's fun!
really good overview
Thanks! Making more of this stuff on the channel
very helpful, thank you
Glad it was!
love this
Love this too!
0:43 this really cracks me up HAHA!
Nice explanation
Thanks!
I can start laughing for intial 20 sec, I am watching in loop
excellent
Great Video!
I subscribed to your channel.
I appreciate that subscription!
well done!
What a good video!
Awesome ❣️❣️
Thank youu
LMFAO the ontro I don't know why it's so funnyyyyyy for me
dude where did you study this. understanding the maths make the coding so much fun
Not enough formal for me.. intuition is nice but need to be a little concrete about what is actually done
Fair. I did what I cold in a short video like this. Thanks for watching!
@@CodeEmporium Thank you!
Bro could you please label you equations. It'll be helpful
W explination
wonderfulllllllllllllll ...........!!!!
Valeu!
Thanks so much for the donation! Glad you liked this content!
Loved the start, watched 5 times.
Also my first comment on UA-cam. =)
Yas! Thank for this comment! Absolutely love it
Good jop
I think I watched that intro like 7 times haha
hahahaha amazing
one critique, your notation is really weird and non intuative for beginners
Imo you should have gone into more detail of the math of the optimizers. I did not understand how the terms relate to the behaviour the optimizers are supposed to have.
Thanks for the great explanation!