Man I've discovered your channel and am watching your videos non-stop. No matter which topic, it is ALL as if a stream of light shines and makes it all understandable. You've got a gift.
Non math person here and even i could understand this tutorial. Probably have to see it a couple more times because I'm a bit slow in my 40s now. But you really have a gift. Keep up the good work.
You always make your content so easy to understand. Just the right amount of math mixed with simple examples that clearly illustrate the main ideas of whatever topic you are talking about. Keep up the great work!
Hey thanks a lot, was literally just searching about Gradient Boosting today and your explanations have always been great. Good pacing and explanations even with some math involved.
Great video! A bonus for using squared error loss (which is commonly used) as the loss function for regression problems: the gradient of squared error loss is just the residual! So each weak learner is essentially trained on the previous residual, which makes sense intuitively. (I think that's why each gradient is called "r"?)
Yeah, squared error is easily differentiable compared to others like root squared error, and is not dependent upon number of observation like mean squared error or root mean squared error does , if you want gradient exactly equal to residual , you can choose to take (1/2)(squares error) as loss function.
your channel is criminally underrated. Just one question. You mentioned using linear weak learners, i.e. f(x) is a linear function of x. In this case how would you ever get anything other than a linear function after any number of iterations? at the end of the day, you are just adding multiple linear functions. it seems this whole procedure would only make sense, if you pick a nonlinear weak learner.
Just asking that is the concept of gradient Boosting similar to Taylor Series functions. Each term is not very good at predicting the function but as u add more functions(terms), the approximation to the function gets better.
I get it except for the differentiability of the L function, _over what range of values_? in L(y,y_hat) , the domain of y_hat is a space of potential learners (?) and so how do you characterise that space and differentation over that space 😵💫
Great video! Never seen gradient descent used with the derivative of the loss function with respect to the prediction. Not sure if I understand it 100% but If the gradient were, for example, -1 for ri, would the subsequent weak learner fit a model to -1? Or would the new weak learner fit a model to (old pred -(Learning Rate * gradient))? Would love to see a simple example worked out for 1 or 2 iterations if possible. Thank you! :)
one question , in Step 3 , is your target variable , the gradient with respective to the previous prediction? if so , dont you think there is a possibility of it becoming infinity and we try to fit something to infinity?
In words, is it correct to phrase Gradient Boosting as being multiple regression models combined, where each subsequent model aims to correct the error that the previous models couldn't account for?
Hi! Why do we use f2(x) instead of raw r1_hat? I mean why to make predictions of residuals and use them if we already have the exact value of gradient ?
Hi, thank you for this informative video. I have some problem understanding the graph at 5:27. How do you map out the curve on the graph if you have a single pair of prediction and loss function values. do you create some mesh out of the give pair?
Come to think of it, concepts from gradient boosting apply perfectly to less mathematical aspects of life too. Just take a tiny step in the right direction and repeat!
Very good content but then it would be great if you can stay at the corner allowing us to have a look at the board for us to understand otherwise great session
Honestly, StatQuest has a much better way of explaining this. First he explains the logic by means of an example and then he explains the algebra afterwards. I'd recommend his videos on gradient boosting for anyone who didn't understand this. Without having seen his videos on it I would have been unable to understand the algebra.
Man I've discovered your channel and am watching your videos non-stop. No matter which topic, it is ALL as if a stream of light shines and makes it all understandable. You've got a gift.
Agreed! You've got a gift to shine the light over topics.
No
Non math person here and even i could understand this tutorial. Probably have to see it a couple more times because I'm a bit slow in my 40s now. But you really have a gift. Keep up the good work.
You always make your content so easy to understand. Just the right amount of math mixed with simple examples that clearly illustrate the main ideas of whatever topic you are talking about. Keep up the great work!
Hey thanks a lot, was literally just searching about Gradient Boosting today and your explanations have always been great. Good pacing and explanations even with some math involved.
This is a fantastic video. Thank you for sharing!
Glad you enjoyed it!
you are awesome man! I just love coming back to your videos every time. they are just the right length, and the perfect depth.. Kudos!
The last part of 'Why does it work?' made all the difference.
totally agree
Your videos on data science are awesome! They help me to prepare for my university exam a lot. Thank you very much!
Great video! A bonus for using squared error loss (which is commonly used) as the loss function for regression problems: the gradient of squared error loss is just the residual! So each weak learner is essentially trained on the previous residual, which makes sense intuitively. (I think that's why each gradient is called "r"?)
Yeah, squared error is easily differentiable compared to others like root squared error, and is not dependent upon number of observation like mean squared error or root mean squared error does , if you want gradient exactly equal to residual , you can choose to take (1/2)(squares error) as loss function.
Great video as always! I would love If you could build on that video and talk about XGBoost and math behind it next!
holy shit. I've been tryna understand gradient boosting for a week and now it suddenly clicked. beautiful!!!
Man U r the 5th person, none has explained as simple and clear as you, thanks a ton
incredible video, you make understandable a really hard concept. Keep teaching like this and big things will come!
Completely agree, you are changing our lives! Cheers!
I worked on this 5(?) years ago, but needed a reminder - thanks!
Thanks for the effort u put in to help ur watchers understand, it really helped me understand the concept behind gradient descent!
your channel is criminally underrated. Just one question. You mentioned using linear weak learners, i.e. f(x) is a linear function of x. In this case how would you ever get anything other than a linear function after any number of iterations? at the end of the day, you are just adding multiple linear functions. it seems this whole procedure would only make sense, if you pick a nonlinear weak learner.
Unbelievable variety of topics in this channel! What is your daily job? You have an amazing amount of knowledge
You're an amazing teacher, thanks a lot from Sweden!
Thanks for the video, also really like the whiteboard format
You are doing a great job, really enjoying your videos.
Waw thank you so much for this amazingly clear video explanation 🤗!!! Instantly subscribed :)
Wow. Fantastic video.
Finally understood it really well, thanks!
Pls don't stop making these videos
Phenomenal. Thank you again for making these videos
thanks man you explain it so much better than my uni professor :)
Glad to hear that!
Very awesome, thanks for the explanation 👍
that was very clear and useful, thank you
Thank you so much! You just blew my mind
You're very welcome!
Any chance your interested in doing a video on EM algorithm intro with a toy example? Love your videos please keep them coming!
Best boosting definition yet.
Perfect, really well done!
Thanks!
Can mathematics behind ML be less dreadful and more fun? Well yes, if we have a tutor like him... amazing explanation ❤️
So so well explained
Thank you for this good explanation.
Great video brother.
well done - gee there is something to be said about a good explanation and a whiteboard. Fantastic explanation.
Thanks!
Thanks for sharing!
Amazing video. Thanks.
Thumbs up for the pen catch recovery at the start.
😂
3:24 I like that 🦾😅
Love the videos! Great topic
Excellently explained. I was just reviewing this and was very helpful to see how someone else think through this.
nice video series
Thanks! great videos.
You're the man. Thank you!
Man, you are amazing!
great vid!
let's talk about the first word in gradient boosting..... boosting :D Nice video as always
Just asking that is the concept of gradient Boosting similar to Taylor Series functions. Each term is not very good at predicting the function but as u add more functions(terms), the approximation to the function gets better.
great video
Thanks man
Hmmmm v interesting. Something to think about. Thx
Yeiii you are the best !!
I get it except for the differentiability of the L function, _over what range of values_? in L(y,y_hat) , the domain of y_hat is a space of potential learners (?) and so how do you characterise that space and differentation over that space 😵💫
Great video! Never seen gradient descent used with the derivative of the loss function with respect to the prediction. Not sure if I understand it 100% but If the gradient were, for example, -1 for ri, would the subsequent weak learner fit a model to -1? Or would the new weak learner fit a model to (old pred -(Learning Rate * gradient))? Would love to see a simple example worked out for 1 or 2 iterations if possible. Thank you! :)
one question , in Step 3 , is your target variable , the gradient with respective to the previous prediction? if so , dont you think there is a possibility of it becoming infinity and we try to fit something to infinity?
Learners together strong
move on so I can get screenshot 😂.
brilliant explanation ,well done
In words, is it correct to phrase Gradient Boosting as being multiple regression models combined, where each subsequent model aims to correct the error that the previous models couldn't account for?
Can you please make a video on XGBoost and its advantages by comparing. Thank you.
Hi! Why do we use f2(x) instead of raw r1_hat? I mean why to make predictions of residuals and use them if we already have the exact value of gradient ?
Hi, thank you for this informative video. I have some problem understanding the graph at 5:27. How do you map out the curve on the graph if you have a single pair of prediction and loss function values. do you create some mesh out of the give pair?
Its not super clear to me how or where the learning rate comes into play here and what its relation to the scaling factor gamma is.
Isn't gradient the partial derivative with respect to feature(xi), not with respect to the prediction(y^)?
The first time I watched this video, I understood shit! Now the second time, I studied the subject and learn more :), it is much more clear now :)
after 4 hrs. of searching in vain, this has truly proven to be a savior!
Bro, it's late AF and I'm not gonna lie, I'm passing out now, but I'mma DEFINITELY catch this shit tomorrow. 👍
😂 come back anytime
@@ritvikmathWell, it's been a year, but I came back! 😂
Come to think of it, concepts from gradient boosting apply perfectly to less mathematical aspects of life too. Just take a tiny step in the right direction and repeat!
yes love when math reflects life!
are the initial weak learners randomly selected? If so, can this initial random selection be optimized?
Very good content but then it would be great if you can stay at the corner allowing us to have a look at the board for us to understand otherwise great session
Thanks for the suggestion !
Honestly, StatQuest has a much better way of explaining this. First he explains the logic by means of an example and then he explains the algebra afterwards. I'd recommend his videos on gradient boosting for anyone who didn't understand this. Without having seen his videos on it I would have been unable to understand the algebra.
Ripped...
Hello Ritvik, are you on LinkedIn? Would love to connect with you!
That was amazing. Thanks a lot.
Great video, Thank you
Glad you liked it!