@@erich_l4644 LMAO - your comment just won the internet. You'll soon receive an email by a Nigerian Prince with instructions on how to claim your winnings lol.
Hands down.. This is the best video on Optimizers.. !!! I've been trying to understand the complex math equations for the past few days and this one literally gave me the overall intuition in 7min 🙏🙏🙏
I think that a critical point missing here in the explanation: You have forgotten to mention that a loss surface is different for each sample so there DOES NOT EXIST any universal loss surface for a given dataset and this is a problem in stochastic gradient descent
Thanks for watching Gaurav (and the suggestion). Saw your comment on the last video too. And it was also in a line of videos I wanted to do. Probably not as "mathematical" as you'd like. I wanted to just explain why certain terms appear the way they do. Hopefully this helped that understanding. I might do a more mathematical video in the future though. But for now, this will do :)
What does the alpha at the SGD momentum equation do? I mean alpha is your learning rat at the first two equations but since then you use n as the learning rate, so for what is alpha since then?
Hey, thank you a lot for the explanations! Do you happen to know any heuristics with which to chose a specific optimizer? Right now I have a problem where every paper uses Natural Gradient descent, but when I use it it barely ever converges, while Adam always gets it right (or at least comes close)...
why gradient on 1:21 would be large? Isn't it just avg for every element in dataset? Same for mini-batch, but only there gradient is avg for every element in batch
Hey. Great work on the video :D It was v clear and fascinating What's NAG? I wonder how come Nadam isn't popular -seems like a better choice. How would you describe RMSProp? You seem to have really great insight into DL concepts :D Also why expectation in particular for Adam parameter updates? Sorry for the questions bombardment. Jus pretty curious
The first part is not correct: the fact that you use a mini-batch in each step, rather than the entire dataset, does not give you a higher chance to converge to the optimum. Because even when considering the entire dataset in each step, you're still taking the average gradient, so the expected magnitude of the gradient does not change. It all depends on the step size.
Imo you should have gone into more detail of the math of the optimizers. I did not understand how the terms relate to the behaviour the optimizers are supposed to have.
This is THE best video I’ve watched on this topic: clear, perfectly motivated, and insanely engaging
I like how you anthropomorphize optimizers. Makes me really empathize with their struggles.
with a profile pic like that- you would
@@erich_l4644 LMAO - your comment just won the internet. You'll soon receive an email by a Nigerian Prince with instructions on how to claim your winnings lol.
Died laughing like thrice. Witty joke.
This is how every lecture should be like. Funny but perfectly explained, and greatly visualized. Thanks!
The best explaination of optimizers in DL I HAVE EVER WATCHED!
Thank you! More of this to come!
Hands down.. This is the best video on Optimizers.. !!! I've been trying to understand the complex math equations for the past few days and this one literally gave me the overall intuition in 7min 🙏🙏🙏
I'm so glad this helps :)
great little overview! love how you get to the point quickly yet provide all the needed intuition
Thanks! That's exactly what I was going for :)
NGL this is one of the best video for Optimizers
Such an underrated channel! Great explanations and visuals!
Much appreciated :)
Don't know why this video is under spreading, the explanation is great and the high-level summarization helps me a lot.
Thank you! Mind fixing that by sharing this around? Would love to get more eyeballs here :)
Loving the sound effect
This video was both informative and hilarious. I absolutely loved it!
That was the objective. Glad you liked it :)
I am watching this at 2 am in the morning and that sudden effect 00:13 cracked me up!
thanks! one of those rare videos that explain the intuition perfectly instead of hovering around the terms
I try. :)
the only video thats ever made me laugh while explaining a concept. Love it, thank you!
You are very welcome :)
That...... was........ one EXCELLENT VIDEO!!!!!
Thank you so much, I thought I would struggle with optimizers but now it's all clear to me
Best video ever on optimizer. Thanks a lot.
Man, this video is slept on. Such a good explanation!
Absolute way of learning with lot of fun. Thanks for such a funny and insightful video
Absolutely love this iterative explanation.
Thank you. I'm experimenting with different teaching styles :)
Man you gave the best explanation which even a noob like me in machine learning can understand . Keep it up man 👍.
Awesome! Glad you like it :)
This is one hell of a video to refresh on this stuff! kindly appreciated!!
This is my new favorite video on the internet
Thanks so much for the compliments:) I try
This video is so good, and it deserves 100X more attention!
Very clear, very well explained 10/10
Thank you;
like your video, and mostly i wanted to see the graph that most people dont show, thank you
love it! thank you! explained better than my professors. i finally get these now after so long
Words that are too kind. Thank for the kind words
Very well explained and in a fun way.
Such a clear and simple explanation of complicated things. Great job.
I think that a critical point missing here in the explanation:
You have forgotten to mention that a loss surface is different for each sample so there DOES NOT EXIST any universal loss surface for a given dataset and this is a problem in stochastic gradient descent
Thanks ajay for giving this a shoot. Loved it❤️
Thanks for watching Gaurav (and the suggestion). Saw your comment on the last video too. And it was also in a line of videos I wanted to do. Probably not as "mathematical" as you'd like. I wanted to just explain why certain terms appear the way they do. Hopefully this helped that understanding. I might do a more mathematical video in the future though. But for now, this will do :)
@@CodeEmporium You did a pretty awesome job in just 7 minutes. Its both beginner friendly and refreshing for intermediates.
Wow most complex topic in under 7 minutes 😊 with pretty good visualizations.
Your Explanation went to the deep of my brain!
Awesome work easy to get a quick review before my interview keep going
Best explanation ever!! Thank you so much!!!
m8, i was searching for channel like that for a really long time
What does the alpha at the SGD momentum equation do? I mean alpha is your learning rat at the first two equations but since then you use n as the learning rate, so for what is alpha since then?
The first scene is precisely what happened to my neural network 2 weeks ago.
man you do great work!
Never thought I would spit out drink while watching machine learning video
😮 Am silence of how you explain everything in details
So underrated 😭
Right?!
Please keep doing this kind of videos I’m in love with ML and with u
Haha thank you so much for the support
Great explanation and fun, too. Thank you!
You are very welcome :)
I like how you explained this!
Thanks so much!
❤️ the videos man. They're so clear
do Neural ODEs and self-supervised learning techniques pls,
great video btw
Thanks. I saw your comment on another video. I'll look into this a bit
Thanks for the great explanation!
you voice is like some cool anime main character. I wished i had a voice like yours. Anyway great explanation.
amazing vid you just earned a subscriber! looking forward to more content like this!
Amazing explanation
Nice channel! better than my professors lol
Super happy this is helpful. Thanks!
Nicely done! Thanks!
Thank youu
Always good
Hey, thank you a lot for the explanations! Do you happen to know any heuristics with which to chose a specific optimizer? Right now I have a problem where every paper uses Natural Gradient descent, but when I use it it barely ever converges, while Adam always gets it right (or at least comes close)...
Your implementation might not be ideal, I'd try to use a KFAC preconditioning term maybe?
why gradient on 1:21 would be large? Isn't it just avg for every element in dataset? Same for mini-batch, but only there gradient is avg for every element in batch
awesome explanation
Thanks for the great clarification!
Of course. Anytime :)
very nice explanation and visualization.
Thanks homie
this video was the bestttttt
If I understood well, Acceleration should not be called 'deceleration' in this particular case ?
Hey. Great work on the video :D It was v clear and fascinating
What's NAG? I wonder how come Nadam isn't popular -seems like a better choice.
How would you describe RMSProp? You seem to have really great insight into DL concepts :D
Also why expectation in particular for Adam parameter updates?
Sorry for the questions bombardment. Jus pretty curious
NAG = Nested Accelerated Gradient
Amazing video.
Thanks so much!
really good overview
Thanks! Making more of this stuff on the channel
Well explained.
Thank youuuu
I can start laughing for intial 20 sec, I am watching in loop
Great Video!
What a good video!
love this
Love this too!
Loved the start, watched 5 times.
Also my first comment on UA-cam. =)
Yas! Thank for this comment! Absolutely love it
That's fun!
Nice explanation
Thanks!
very helpful, thank you
Glad it was!
0:43 this really cracks me up HAHA!
LMFAO the ontro I don't know why it's so funnyyyyyy for me
well done!
Awesome ❣️❣️
Thank youu
excellent
The first part is not correct: the fact that you use a mini-batch in each step, rather than the entire dataset, does not give you a higher chance to converge to the optimum. Because even when considering the entire dataset in each step, you're still taking the average gradient, so the expected magnitude of the gradient does not change. It all depends on the step size.
I subscribed to your channel.
I appreciate that subscription!
Question how does Newton play here?????
Bro could you please label you equations. It'll be helpful
W explination
dude where did you study this. understanding the maths make the coding so much fun
Not enough formal for me.. intuition is nice but need to be a little concrete about what is actually done
Fair. I did what I cold in a short video like this. Thanks for watching!
@@CodeEmporium Thank you!
wonderfulllllllllllllll ...........!!!!
Valeu!
Thanks so much for the donation! Glad you liked this content!
I think I watched that intro like 7 times haha
Good jop
hahahaha amazing
one critique, your notation is really weird and non intuative for beginners
Imo you should have gone into more detail of the math of the optimizers. I did not understand how the terms relate to the behaviour the optimizers are supposed to have.
Great explanation. Thank you!
You are most welcome!