Gradient Boosting : Data Science's Silver Bullet

Поділитися
Вставка
  • Опубліковано 8 лют 2025
  • A dive into the all-powerful gradient boosting method!
    My Patreon : www.patreon.co...

КОМЕНТАРІ • 93

  • @ew6392
    @ew6392 2 роки тому +55

    Man I've discovered your channel and am watching your videos non-stop. No matter which topic, it is ALL as if a stream of light shines and makes it all understandable. You've got a gift.

  • @KameshwarChoppella
    @KameshwarChoppella 8 місяців тому +6

    Non math person here and even i could understand this tutorial. Probably have to see it a couple more times because I'm a bit slow in my 40s now. But you really have a gift. Keep up the good work.

  • @shnibbydwhale
    @shnibbydwhale 3 роки тому +11

    You always make your content so easy to understand. Just the right amount of math mixed with simple examples that clearly illustrate the main ideas of whatever topic you are talking about. Keep up the great work!

  • @mrirror2277
    @mrirror2277 3 роки тому +9

    Hey thanks a lot, was literally just searching about Gradient Boosting today and your explanations have always been great. Good pacing and explanations even with some math involved.

  • @hameddadgour
    @hameddadgour 10 місяців тому +1

    This is a fantastic video. Thank you for sharing!

    • @ritvikmath
      @ritvikmath  10 місяців тому

      Glad you enjoyed it!

  • @nikhildharap4514
    @nikhildharap4514 2 роки тому +1

    you are awesome man! I just love coming back to your videos every time. they are just the right length, and the perfect depth.. Kudos!

  • @soroushesfahanian5625
    @soroushesfahanian5625 8 місяців тому +6

    The last part of 'Why does it work?' made all the difference.

  • @АннаПомыткина-и8ш
    @АннаПомыткина-и8ш 3 роки тому +1

    Your videos on data science are awesome! They help me to prepare for my university exam a lot. Thank you very much!

  • @jiangjason5432
    @jiangjason5432 Рік тому +5

    Great video! A bonus for using squared error loss (which is commonly used) as the loss function for regression problems: the gradient of squared error loss is just the residual! So each weak learner is essentially trained on the previous residual, which makes sense intuitively. (I think that's why each gradient is called "r"?)

    • @samirkhan6195
      @samirkhan6195 5 місяців тому

      Yeah, squared error is easily differentiable compared to others like root squared error, and is not dependent upon number of observation like mean squared error or root mean squared error does , if you want gradient exactly equal to residual , you can choose to take (1/2)(squares error) as loss function.

  • @honeyBadger582
    @honeyBadger582 3 роки тому +4

    Great video as always! I would love If you could build on that video and talk about XGBoost and math behind it next!

  • @ashkan.arabim
    @ashkan.arabim Місяць тому

    holy shit. I've been tryna understand gradient boosting for a week and now it suddenly clicked. beautiful!!!

  • @arjungoud3450
    @arjungoud3450 3 роки тому

    Man U r the 5th person, none has explained as simple and clear as you, thanks a ton

  • @MiK98Mx
    @MiK98Mx 2 роки тому +2

    incredible video, you make understandable a really hard concept. Keep teaching like this and big things will come!

    • @alicedennieau5459
      @alicedennieau5459 2 роки тому

      Completely agree, you are changing our lives! Cheers!

  • @pgbpro20
    @pgbpro20 3 роки тому

    I worked on this 5(?) years ago, but needed a reminder - thanks!

  • @luismikalim2535
    @luismikalim2535 2 роки тому

    Thanks for the effort u put in to help ur watchers understand, it really helped me understand the concept behind gradient descent!

  • @marcosrodriguez2496
    @marcosrodriguez2496 3 роки тому +5

    your channel is criminally underrated. Just one question. You mentioned using linear weak learners, i.e. f(x) is a linear function of x. In this case how would you ever get anything other than a linear function after any number of iterations? at the end of the day, you are just adding multiple linear functions. it seems this whole procedure would only make sense, if you pick a nonlinear weak learner.

  • @grogu808
    @grogu808 Рік тому

    Unbelievable variety of topics in this channel! What is your daily job? You have an amazing amount of knowledge

  • @jakobforslin6301
    @jakobforslin6301 2 роки тому

    You're an amazing teacher, thanks a lot from Sweden!

  • @Andres186000
    @Andres186000 3 роки тому +1

    Thanks for the video, also really like the whiteboard format

  • @jonerikkemiwarghed7652
    @jonerikkemiwarghed7652 3 роки тому

    You are doing a great job, really enjoying your videos.

  • @chau8719
    @chau8719 2 місяці тому

    Waw thank you so much for this amazingly clear video explanation 🤗!!! Instantly subscribed :)

  • @carsonbring758
    @carsonbring758 Місяць тому

    Wow. Fantastic video.

  • @adityamohan7372
    @adityamohan7372 Рік тому

    Finally understood it really well, thanks!

  • @Sanatos98
    @Sanatos98 2 роки тому

    Pls don't stop making these videos

  • @Halo-uz9nd
    @Halo-uz9nd 3 роки тому

    Phenomenal. Thank you again for making these videos

  • @joachimheirbrant1559
    @joachimheirbrant1559 Рік тому +1

    thanks man you explain it so much better than my uni professor :)

  • @Ranshin077
    @Ranshin077 3 роки тому +1

    Very awesome, thanks for the explanation 👍

  • @MiladDana-b7h
    @MiladDana-b7h 7 місяців тому

    that was very clear and useful, thank you

  • @markus_park
    @markus_park Рік тому

    Thank you so much! You just blew my mind

  • @Matt_Kumar
    @Matt_Kumar 3 роки тому +2

    Any chance your interested in doing a video on EM algorithm intro with a toy example? Love your videos please keep them coming!

  • @garrettosborne4364
    @garrettosborne4364 2 роки тому

    Best boosting definition yet.

  • @benjaminwilson1345
    @benjaminwilson1345 Рік тому

    Perfect, really well done!

  • @rajrehman9812
    @rajrehman9812 2 роки тому +4

    Can mathematics behind ML be less dreadful and more fun? Well yes, if we have a tutor like him... amazing explanation ❤️

  • @Sam-uy1db
    @Sam-uy1db Рік тому

    So so well explained

  • @domr.2694
    @domr.2694 2 роки тому

    Thank you for this good explanation.

  • @GodeyAmp
    @GodeyAmp 9 місяців тому

    Great video brother.

  • @dialup56k
    @dialup56k Рік тому

    well done - gee there is something to be said about a good explanation and a whiteboard. Fantastic explanation.

  • @sohailhosseini2266
    @sohailhosseini2266 Рік тому

    Thanks for sharing!

  • @estebanortega3895
    @estebanortega3895 2 роки тому

    Amazing video. Thanks.

  • @zAngus
    @zAngus 10 місяців тому

    Thumbs up for the pen catch recovery at the start.

  • @yashagrahari
    @yashagrahari Місяць тому

    3:24 I like that 🦾😅

  • @ianclark6730
    @ianclark6730 3 роки тому

    Love the videos! Great topic

  • @bassoonatic777
    @bassoonatic777 3 роки тому

    Excellently explained. I was just reviewing this and was very helpful to see how someone else think through this.

  • @ganzmit
    @ganzmit 5 місяців тому

    nice video series

  • @sophia17965
    @sophia17965 Рік тому

    Thanks! great videos.

  • @parthicle
    @parthicle 2 роки тому

    You're the man. Thank you!

  • @ИльдарАлтынбаев-г1ь
    @ИльдарАлтынбаев-г1ь 9 місяців тому

    Man, you are amazing!

  • @7vrda7
    @7vrda7 2 роки тому

    great vid!

  • @EW-mb1ih
    @EW-mb1ih 3 роки тому

    let's talk about the first word in gradient boosting..... boosting :D Nice video as always

  • @kaustabhchakraborty4721
    @kaustabhchakraborty4721 Рік тому +1

    Just asking that is the concept of gradient Boosting similar to Taylor Series functions. Each term is not very good at predicting the function but as u add more functions(terms), the approximation to the function gets better.

  • @chocolateymenta
    @chocolateymenta 3 роки тому

    great video

  • @chiemekachinaka5236
    @chiemekachinaka5236 5 місяців тому

    Thanks man

  • @rickharold7884
    @rickharold7884 3 роки тому +1

    Hmmmm v interesting. Something to think about. Thx

  • @mitsuinormal
    @mitsuinormal Рік тому

    Yeiii you are the best !!

  • @cboisvert2
    @cboisvert2 Місяць тому

    I get it except for the differentiability of the L function, _over what range of values_? in L(y,y_hat) , the domain of y_hat is a space of potential learners (?) and so how do you characterise that space and differentation over that space 😵‍💫

  • @user-xi5by4gr7k
    @user-xi5by4gr7k 3 роки тому +1

    Great video! Never seen gradient descent used with the derivative of the loss function with respect to the prediction. Not sure if I understand it 100% but If the gradient were, for example, -1 for ri, would the subsequent weak learner fit a model to -1? Or would the new weak learner fit a model to (old pred -(Learning Rate * gradient))? Would love to see a simple example worked out for 1 or 2 iterations if possible. Thank you! :)

  • @KevinGodfreyVerpula
    @KevinGodfreyVerpula 3 місяці тому

    one question , in Step 3 , is your target variable , the gradient with respective to the previous prediction? if so , dont you think there is a possibility of it becoming infinity and we try to fit something to infinity?

  • @adinsolomon1626
    @adinsolomon1626 3 роки тому

    Learners together strong

  • @m.badreddine9466
    @m.badreddine9466 Рік тому

    move on so I can get screenshot 😂.
    brilliant explanation ,well done

  • @jeroenritmeester73
    @jeroenritmeester73 2 роки тому +1

    In words, is it correct to phrase Gradient Boosting as being multiple regression models combined, where each subsequent model aims to correct the error that the previous models couldn't account for?

  • @arjungoud3450
    @arjungoud3450 3 роки тому

    Can you please make a video on XGBoost and its advantages by comparing. Thank you.

  • @Artem_Vashina
    @Artem_Vashina 2 місяці тому

    Hi! Why do we use f2(x) instead of raw r1_hat? I mean why to make predictions of residuals and use them if we already have the exact value of gradient ?

  • @jamolbahromov4440
    @jamolbahromov4440 2 роки тому +1

    Hi, thank you for this informative video. I have some problem understanding the graph at 5:27. How do you map out the curve on the graph if you have a single pair of prediction and loss function values. do you create some mesh out of the give pair?

  • @tehgankerer
    @tehgankerer 4 місяці тому

    Its not super clear to me how or where the learning rate comes into play here and what its relation to the scaling factor gamma is.

  • @xmanxman1527
    @xmanxman1527 Рік тому

    Isn't gradient the partial derivative with respect to feature(xi), not with respect to the prediction(y^)?

  • @emirhandemir3872
    @emirhandemir3872 6 місяців тому

    The first time I watched this video, I understood shit! Now the second time, I studied the subject and learn more :), it is much more clear now :)

  • @ashutoshpanigrahy7326
    @ashutoshpanigrahy7326 2 роки тому

    after 4 hrs. of searching in vain, this has truly proven to be a savior!

  • @lashlarue7924
    @lashlarue7924 Рік тому

    Bro, it's late AF and I'm not gonna lie, I'm passing out now, but I'mma DEFINITELY catch this shit tomorrow. 👍

    • @ritvikmath
      @ritvikmath  Рік тому

      😂 come back anytime

    • @lashlarue7924
      @lashlarue7924 8 місяців тому

      @@ritvikmathWell, it's been a year, but I came back! 😂

  • @gayathrigirishnair7405
    @gayathrigirishnair7405 Рік тому

    Come to think of it, concepts from gradient boosting apply perfectly to less mathematical aspects of life too. Just take a tiny step in the right direction and repeat!

    • @ritvikmath
      @ritvikmath  Рік тому

      yes love when math reflects life!

  • @VictorianoOchoa
    @VictorianoOchoa Рік тому

    are the initial weak learners randomly selected? If so, can this initial random selection be optimized?

  • @saravankumargowthamv9338
    @saravankumargowthamv9338 Рік тому

    Very good content but then it would be great if you can stay at the corner allowing us to have a look at the board for us to understand otherwise great session

  • @regularviewer1682
    @regularviewer1682 2 роки тому

    Honestly, StatQuest has a much better way of explaining this. First he explains the logic by means of an example and then he explains the algebra afterwards. I'd recommend his videos on gradient boosting for anyone who didn't understand this. Without having seen his videos on it I would have been unable to understand the algebra.

  • @SimplyAndy
    @SimplyAndy 2 роки тому

    Ripped...

  • @sharmakartikeya
    @sharmakartikeya 3 роки тому

    Hello Ritvik, are you on LinkedIn? Would love to connect with you!

  • @NedaJalali-tz7vw
    @NedaJalali-tz7vw 2 роки тому

    That was amazing. Thanks a lot.

  • @ahmedismailbinamrai1080
    @ahmedismailbinamrai1080 3 дні тому

    Great video, Thank you