XGBoost Part 2 (of 4): Classification

Поділитися
Вставка
  • Опубліковано 19 чер 2024
  • In this video we pick up where we left off in part 1 and cover how XGBoost trees are built for Classification.
    NOTE: This StatQuest assumes that you are already familiar with...
    XGBoost Part 1: XGBoost Trees for Regression: • XGBoost Part 1 (of 4):...
    ...the main ideas behind Gradient Boost for Classification: • Gradient Boost Part 3 ...
    ...Odds and Log(odds): • Odds and Log(Odds), Cl...
    ...and how the Logistic Function works: • Logistic Regression De...
    Also note, this StatQuest is based on the following sources:
    The original XGBoost manuscript: arxiv.org/pdf/1603.02754.pdf
    The original XGBoost presentation: homes.cs.washington.edu/~tqch...
    And the XGBoost Documentation: xgboost.readthedocs.io/en/lat...
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    UA-cam Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    Corrections:
    14:24 I meant to say "larger" instead of "lower.
    18:48 In the original XGBoost documents they use the epsilon symbol to refer to the learning rate, but in the actual implementation, this is controlled via the "eta" parameter. So, I guess to be consistent with the original documentation, I made the same mistake! :)
    #statquest #xgboost

КОМЕНТАРІ • 405

  • @statquest
    @statquest  4 роки тому +29

    Corrections:
    14:24 I meant to say "larger" instead of "lower.
    18:48 In the original XGBoost documents they use the epsilon symbol to refer to the learning rate, but in the actual implementation, this is controlled via the "eta" parameter. So, I guess to be consistent with the original documentation, I made the same mistake! :)
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @parijatkumar6866
      @parijatkumar6866 3 роки тому +1

      Very nice videos. God bless you man!!

    • @rahul-qo3fi
      @rahul-qo3fi Рік тому

      15:27 The similarity equations are missing residual **2 (Thanks for the detailed explanations Love your content)

    • @statquest
      @statquest  Рік тому +1

      @@rahul-qo3fi At 15:27 we are calculating the output values for the leaf, not similarity scores, and the equation in the video at this time point is the correct equation for calculating output values.

    • @rahul-qo3fi
      @rahul-qo3fi Рік тому +1

      @@statquest aah got it, thanks:)

    • @pedromerrydelval7260
      @pedromerrydelval7260 4 місяці тому

      Hi Josh, I don't understand the mention to parameter "min_child_weight" in 12:58. Is that a typo or am I missing something. Thanks!

  • @TY-il7tf
    @TY-il7tf 4 роки тому +88

    How do I pass any interviews without these videos? I don't know how much I owe you Josh!

    • @statquest
      @statquest  4 роки тому +9

      Thanks and good luck with your interview. :)

    • @TheParijatgaur
      @TheParijatgaur 4 роки тому

      did you clear ?

    • @guneygpac6505
      @guneygpac6505 4 роки тому +15

      I got a few academic papers under review thanks to Josh. I watch his videos first before studying the other sources. Without his videos it would be Xhard to understand those sources. I put his name in the acknowledgements for helpful suggestions (he did actually reply to me several times here). I wish I could cite some of his papers but they are very unrelated to my area (economics). Unfortunately that's all I can do because the exchange rate would make any donations I can make look very stupid...

    • @arda8206
      @arda8206 3 роки тому

      @@guneygpac6505 I think you are from Turkey :D

    • @jiangtaoshuai1188
      @jiangtaoshuai1188 2 роки тому

      so you also yell BAM!! ?

  • @manuelagranda2932
    @manuelagranda2932 4 роки тому +53

    I finished with this video all the list, I am from Colombia and is hard to pay for learn about this concepts, so I am very gratful for your videos, and now my mom hates me when I say Double Bamm for nothing!! jajaja

    • @statquest
      @statquest  4 роки тому +6

      That's awesome! I'm glad the videos are helpful. :)

  •  Рік тому +5

    From Vietnam, and hats off to your talent in explaining complicated things in a way that I feel so comfortable to continue watching.

  • @alihaghighat1244
    @alihaghighat1244 10 місяців тому +4

    When we use fit(X_train,y_train) and predict(X_test) without watching Josh's videos or studying the underline concepts, nothing happens even if we get good results.
    Thank you Josh for simplifying these hard pieces of stuff for us and creating these perfect numerical examples. Please keep up this great work.

    • @statquest
      @statquest  10 місяців тому +1

      Thank you very much!

  • @yukeshdatascientist7999
    @yukeshdatascientist7999 3 роки тому +1

    I have come across all the videos from gradient boosting till now, you clearly explain each and every step. Thanks for sharing the information with all. It helps a lot of people.

  • @wucaptian1155
    @wucaptian1155 4 роки тому +15

    You are a nice guy , absolutely! I can't wait for the part 3.Although I have been learned XGBoost from the original paper, I can still get more interesting things from your video.Thank you :D

  • @prathamsinghal5261
    @prathamsinghal5261 4 роки тому +12

    Josh! You made a machine learning a beautiful subject and finally I m in love with these Super BAM videos.

  • @shaelanderchauhan1963
    @shaelanderchauhan1963 2 роки тому +5

    Josh, On a scale of 5 you are a level 5 Teacher. I have learned so much from your videos. I owe so much to Andrew Ng and You. I will contribute to Patreon Once I get a Job. Thank you

  • @wongkitlongmarcus9310
    @wongkitlongmarcus9310 2 роки тому +2

    as a beginner of data science, I am super grateful for all of your tutorials. Helps a lot!

  • @joshisaiah2054
    @joshisaiah2054 3 роки тому +3

    Thanks Josh. You're a life saver and have made my Data Science transition a BAM experience. Thank You!

  • @amalsakr1381
    @amalsakr1381 4 роки тому

    Million Thanks Josh. I can not wait to watch other videos about XGBoost, lightBoost, CatBoost and deep learning. Your videos are the best.

    • @statquest
      @statquest  4 роки тому

      Part 3 on XGBoost should be out on Monday.

  • @saptarshisanyal4869
    @saptarshisanyal4869 2 роки тому +2

    All the boosting and bagging algorithms are complicated algorithms. In universities, I have hardly seen any professor who can make these algorithms understand like Joshua does. Hats off man !!

  • @seanmcalevey4566
    @seanmcalevey4566 4 роки тому +2

    Yo fr these are the best data science/ML explanatory vids on the web. Great work, Josh!

    • @statquest
      @statquest  4 роки тому

      Thank you very much! :)

  • @lakhanfree317
    @lakhanfree317 4 роки тому +1

    Finally yay waited for these video. For long but worth the wait. Thanks for everything.

  • @lambdamax
    @lambdamax 4 роки тому +2

    Thanks for boosting my confidence in understanding. There was this recent Kaggle tutorial that said LightGBM model "usually" does better performance than xgboost, but it didn't provide any context! I remember that xgboost was used as a gold standard-ish about 2-3 years ago(even CERN uses it if I'm not mistaken). Anyhoo, I hope I can keep up with all of this. I need to turn my boosters on.

    • @statquest
      @statquest  4 роки тому +3

      I'm happy to boost your confidence! Part 3 will explain the math if you are interested in those details - they are not required - and Part 4 will describe a lot of optimizations that XGBoost uses to be efficient (making it easier to find good hyper-parameters).

  • @changning2743
    @changning2743 3 роки тому +1

    I must have watched almost every video at least three times during this pandemic. Thank you so much for your effort!

    • @statquest
      @statquest  3 роки тому

      Wow!!! Thank you very much! :)

  • @madhur089
    @madhur089 3 роки тому +1

    Josh you are saviour...thanks a ton for making these fantastic videos...your video lectures are simple and crystal clear! Plus I love the sounds you make in between :)

  • @chelseagrinist
    @chelseagrinist 3 роки тому +3

    Thank you so much for making Machine Learning this easy for us . Grateful for your content . Love from India

  • @hassaang
    @hassaang 3 роки тому +1

    Bravo! Thanks for making life easy. Thanks and appreciation from Qatar.

    • @statquest
      @statquest  3 роки тому

      Hello Qatar!! Thank you very much!

  • @allen8376
    @allen8376 2 місяці тому +1

    The little calculation noises give me life

    • @statquest
      @statquest  2 місяці тому +1

      beep, boop, beep!

  • @parinitagupta6973
    @parinitagupta6973 4 роки тому +2

    All the videos are awesome and this is THE BAMMEST way to learn about ML and predictive modelling. Can we also have some videos about time series and the underlying concepts. That would be TRIPLE TRIPLE BAM!!!

    • @statquest
      @statquest  4 роки тому +1

      Thank you very much! :)

  • @furqonarozikin7157
    @furqonarozikin7157 3 роки тому +1

    thanks buddy, its hard for me to know how xgboost works in classification, but this tutorial has explained well

  • @gerrard1661
    @gerrard1661 4 роки тому +1

    Thank you! Can’t wait for part 3.

    • @statquest
      @statquest  4 роки тому

      Thanks! Part 3 should be out soon.

    • @gerrard1661
      @gerrard1661 4 роки тому

      ​@@statquest Thanks for your reply. I am a stats PhD student. These days the industry prefer machine learning and deep learning. However, I feel like stats people are not strong at programming compared to CS people. We know lots of theory, but when solve real problems CS seems better? You have any idea on this? Thanks!

    • @statquest
      @statquest  4 роки тому

      @@gerrard1661 It really just boils down to the type of job you want to work on. There are tons of jobs in both statistics and cs and machine learning.

  • @globamia12
    @globamia12 4 роки тому +1

    Your videos are so funny and smart! Thank you

  • @paligonshik6205
    @paligonshik6205 4 роки тому +1

    Thanks a lot, keep doing an awesome job

    • @statquest
      @statquest  4 роки тому

      Thank you very much! :)

  • @yusufbalci4935
    @yusufbalci4935 4 роки тому +1

    Very well explained!! Awesome..

  • @mehdi5753
    @mehdi5753 4 роки тому +4

    Thanx for this simplification, can you do the same this for LGBM and CatBoost ?

  • @teetanrobotics5363
    @teetanrobotics5363 4 роки тому

    Best Professor on the planet. Could you please make a playlist for DL or RL ?

  • @abylayamanbayev8403
    @abylayamanbayev8403 2 роки тому

    Thank you very much professor! I would love to see your explanations of statistical learning theory covering following topics: concentration inequalities, rademacher complexity and so on

    • @statquest
      @statquest  2 роки тому

      I'll keep that in mind.

  • @zzygyx9119
    @zzygyx9119 11 місяців тому +1

    awesome explanation! I bought your book "The statquest illustrated guide to machine learning" even though I have understanded all the concepts.

    • @statquest
      @statquest  11 місяців тому

      Thank you so much!!! I really appreciate your support.

  • @maruthiprasad8184
    @maruthiprasad8184 8 місяців тому +1

    hats off all my doubts clarified here, superb cooooooooooooool Big BAAAAAAAAMMMMMMMMMM!

  • @itisakash
    @itisakash 4 роки тому +1

    Hey thanks for the videos. Can't wait for the remaining parts in the XGboost series. When are you gonna release the next part?

    • @statquest
      @statquest  4 роки тому +1

      Since you are a member, you'll get early access to part 3 this coming monday (January 27). Part 4 will be available for early access 2 weeks later.

  • @jamemamjame
    @jamemamjame 2 роки тому +1

    Ty very much, will buy your song within tomorrow morning from Thailand :)

  • @osmanparlak1756
    @osmanparlak1756 3 роки тому

    Thanks a lot Josh for making ML algorithms understandable. I am learning a lot from your videos. Just one question on the order when splitting to create the trees. I think it doesn't matter whether you start from the last two or first two as we check all.

  • @thebearguym
    @thebearguym 8 місяців тому +1

    Enjoyed it! Cool explanation

  • @andrewwilliam2209
    @andrewwilliam2209 4 роки тому +1

    Hey Josh, you might not see this, but I really look up to you and your videos. I got sucked into machine learning last month, and you have made the journey easier thusfar. If I get an internship or something in the following months, I'll be sure to donate to you and hit you up on your social media to thank you :). Hopefully one day I will have enough knowledge to share it widely like you.
    Cheers

    • @statquest
      @statquest  4 роки тому

      Thank you very much! Good luck with your studies! :)

    • @andrewwilliam2209
      @andrewwilliam2209 4 роки тому +1

      @@statquest thanks Josh, will definitely update you in a year or two about the progress I've made😀

    • @statquest
      @statquest  4 роки тому

      Bam!

  • @nurdauletkemel8155
    @nurdauletkemel8155 2 роки тому +1

    Wow, I just discovered this channel and will use it to prep for my interview BAM! But the interview is in 2 hours Smal BAM :ccccccc

  • @lfalfa8460
    @lfalfa8460 Рік тому +1

    Classification is not a vacation,
    it is not a sensation,
    but it's cooooool!
    🤣

  • @bharathjc4700
    @bharathjc4700 4 роки тому

    What statistical tests do we need to perform on the training data and how do we validate the data

  • @henkhbit5748
    @henkhbit5748 3 роки тому

    Love this series of xgboost. I read your answer about finding the best gamma value parameter using cross validation. According this video xgboost does not create new leaves when the gain < 0. When is extra pruning necessary? I suppose pruning can be done using lambda and additionally use gamma to prevent overfitting...?

    • @statquest
      @statquest  3 роки тому +1

      Trees, in general, are notorious for over fitting the data. Random chance can easily result in a gain < 0 and adding an extra parameter for pruning will help prevent over fitting. For more details about the need for pruning trees in general, see: ua-cam.com/video/D0efHEJsfHo/v-deo.html

  • @kamaldeep8257
    @kamaldeep8257 3 роки тому

    Hi Josh, Thank you for such a great explanation. Just want to clarify one thing i.e. is this cover concept applies specifically on the xgboost trees or is it a normal method for all the tree-based algorithms. As every tree-based algorithm have this min_child_weight parameter in sklearn library.

    • @statquest
      @statquest  3 роки тому

      Every tree based method has a way of filter out leaves that do no have enough samples going to them, however, the way XGBoost does it is unique.

  • @jingo6221
    @jingo6221 4 роки тому +1

    life saver, cannot thank more

    • @statquest
      @statquest  4 роки тому

      Thanks! Part 3 should be out soon.

  • @jamiescotcher1587
    @jamiescotcher1587 3 роки тому

    Hi Josh,
    Specifically, the gradient of the training loss is used to predict the target variables for each successive tree, right? Therefore, does a steeper gradient imply it is going to try harder to correctly predict a specific sample that has been mis-classified, or does it mean it will work harder to predict any member of a certain true class?
    Thanks!

    • @statquest
      @statquest  3 роки тому

      For details on how XGBoost treats misclassified samples and how, exactly, it tries harder to correctly classify them, see ua-cam.com/video/oRrKeUCEbq8/v-deo.html

  • @Brandy131991
    @Brandy131991 2 роки тому

    Hi Josh, thank you for your amazing videos. They are really helping me a lot.
    One thing i still don‘t get is how does xgboost predict multiple classes (e.g. „most likely drug to use“ with drugs 1,2 and 3)?
    Does this work like in multinomial logistic regression, where each class is checked against a baseline-class? Or is it something like a random forrest when using xgboost?

    • @statquest
      @statquest  2 роки тому +1

      When there are multiple classes, XGBoost uses the softmax objective function. I explain softmax in my series on Neural Networks: ua-cam.com/video/CqOfi41LfDw/v-deo.html

  • @manojbhardwaj27
    @manojbhardwaj27 4 роки тому +1

    @Josh Starmer: I would like to know about PRUNING concept in XGB.
    Are Gamma and Cover used for Pre-Pruning or Post-Pruning. In sklearn, we generally use Pre-Pruning which make more sense to me.
    However, from you tutorial it's seems like we are doing Post-Pruning (after full tree built).
    Can you please specify with a reason ?

    • @statquest
      @statquest  4 роки тому +1

      These videos on XGBoost describe how XGBoost was designed from the ground up. Thus, the reason for anything in these video is "that's the way they designed XGBoost."

  • @ducanhlee3467
    @ducanhlee3467 4 місяці тому +1

    Thank Josh for your knowledge and funny BAM!!!

  • @munnangimadhuri3334
    @munnangimadhuri3334 2 роки тому +1

    Excellent explanation Brother!

  • @FF4546
    @FF4546 2 роки тому

    Hello Josh, thank you for your video.
    How would this work with more than one variable? Does each variable end up with only one threshold?
    Thank you!

    • @statquest
      @statquest  2 роки тому

      You test every variable to find the optimal thresholds and use the one that does the best. However, XGBoost has some optimizations explained here: ua-cam.com/video/oRrKeUCEbq8/v-deo.html

  • @yulinliu850
    @yulinliu850 4 роки тому +1

    Awesome bang. Happy 2020

  • @user-kf7vg3bq8z
    @user-kf7vg3bq8z 2 місяці тому

    These videos are being truly helpful. Many thanks for sharing them! I do have a question RE XGBoost usage context. You mentioned that XGB is designed for large, complicated datasets; does this mean that it performs poorly with smaller datasets? Thanks in advance

    • @statquest
      @statquest  2 місяці тому

      I'm not sure - I just know that it has tons of optimizations for large datasets. To learn more about them, see: ua-cam.com/video/oRrKeUCEbq8/v-deo.html

  • @vijaykrish64
    @vijaykrish64 4 роки тому +1

    Must watch videos.Just a small question,why do we need both cover and gamma for pruning?

    • @statquest
      @statquest  4 роки тому +2

      Although gamma is thoroughly discussed in the original manuscript, cover is never mentioned. So my best guess is that while both cover and gamma do similar things, there are still differences in how they do them and the types of leaves they prune. For example, you could have a leaf with a lot of residuals in it (and thus, a relatively high "cover", so cover would not prune), but if they are not very similar, you will have a low similarity score and a low gain (so gamma would prune).

  • @omreekapon2465
    @omreekapon2465 Рік тому

    Great explanation like always! just a small question, at 10:12 you mentioned that the cover is defined as the similarity score minus lambda, but it looks in the equation that is plus, so what is the right answer? thanks for such an amazing explanations!

    • @statquest
      @statquest  Рік тому

      The denominator = [Sum (previous * (1 - previous)] + lambda. Cover = Sum (previous * (1 - previous). Thus, cover = denominator - lambda = [Sum (previous * (1 - previous)] + lambda - lambda = Sum (previous * (1 - previous)

  • @anggipermanaharianja6122
    @anggipermanaharianja6122 3 роки тому +1

    Awesome vid!

  • @pierrebedu
    @pierrebedu Рік тому

    great explanations! and how does this generalize to multiclass classification? Thanks (one vs all classif repeated n_classes times? )

    • @statquest
      @statquest  Рік тому +1

      That's one way to do it. I believe that you can also swap out the loss function and use cross entropy.

  • @superk9059
    @superk9059 2 роки тому +1

    Awsome!!!👍👍👍very very very very good teacher!!!

  • @dikshantgupta5539
    @dikshantgupta5539 3 роки тому

    for purning the tree , is gain-gamma is same as cover value? As you remove the leaf when you calculate the cover value and also when you calculate gain-gamma

    • @statquest
      @statquest  3 роки тому

      For details on cover (and everything else in XGBoost), see: ua-cam.com/video/ZVFeW798-2I/v-deo.html

  • @asabhinavrock
    @asabhinavrock 4 роки тому

    Hey Josh. Your videos are really informative and easy to understand. I have joined your channel today and look forward to more exciting content coming up. I was also eager to see your third video in the XGBoost Series. When will that be live?

    • @statquest
      @statquest  4 роки тому

      If you go to the community page, you may be able to find a link to part 3 since you are a channel member. Here's the link to the community page: ua-cam.com/users/joshstarmercommunity

    • @asabhinavrock
      @asabhinavrock 4 роки тому +1

      @@statquest Finally. Made my day!!!

    • @statquest
      @statquest  4 роки тому +1

      @@asabhinavrock Awesome!!! Thank you very much.

  • @dhruvbishnoi8840
    @dhruvbishnoi8840 4 роки тому +1

    Hi Josh,
    What happens if after splitting the node, one leaf has cover lower than the set threshold and the other leaf has cover greater than the set threshold.
    Splitting would not be performed, right?

  • @Patrick881199
    @Patrick881199 3 роки тому

    Hi, Josh, when building the trees, does xgboost like the random forest which bootstrap the dataset and choose random subset of features for each tree?

    • @statquest
      @statquest  3 роки тому

      You can do that with XGBoost, but it's not as fundamental to the algorithm as it is to Random Forests.

    • @Patrick881199
      @Patrick881199 3 роки тому +1

      @@statquest Thanks, Josh

  • @shalinirajanna4281
    @shalinirajanna4281 4 роки тому +1

    Thank you such good videos. I see that XGBoost has boot alpha and lambda parameters. you've explained about lambda, where would alpha fit in ?

    • @statquest
      @statquest  4 роки тому +2

      Alpha was added after the original publication, so I didn't cover it. Presumably alpha is just like lambda and makes the trees shorter and shrinking the output values. And presumably it can shrink output values all the way to 0, just like lasso regression (and presumably lambda can not, just like ridge regression).

  • @muralikrishna9499
    @muralikrishna9499 4 роки тому +5

    After a long time..... BAMMM!

  • @gabrielpadilha8638
    @gabrielpadilha8638 2 роки тому

    Josh, good morning, let me ask you a question. You said that we can put the initial probability to a value different than 0.5 if, for example, the training dataset is unbalanced. That means that xgboost can deal with unbalanced datasets without the needing to balanced the training dataset before submitting it to the model?

    • @statquest
      @statquest  2 роки тому

      I'm not really sure. It probably depends on how imbalanced the data are.

  • @khaikit1232
    @khaikit1232 Рік тому

    Hi Josh,
    At 19:20, it is written that:
    log(odds) Prediction = 0 + (0.3 x -2) = -0.6
    However I was just wondering since the tree is predicting the residuals, isn't the output of the XGBoost tree a probability? So shouldn't we convert the output from probabilities to log(odds) before we add it to the initial guess of 0?

    • @statquest
      @statquest  Рік тому +1

      The tree predicts residuals, but the output values from the leaves are not residuals, instead, they calculated at 14:58. Now, to be honest, I have no idea why that particular formula results in a log(odds), but it must, because that is what both XGBoost and Gradient Boost do, and neither of them do anything else before calculating the final log odds.

  • @hubert1990s
    @hubert1990s 4 роки тому +1

    while cover makes a leaf not to be sufficient enough to stay in the tree, is it also kinda pruning?

    • @statquest
      @statquest  4 роки тому +1

      That is correct. Cover is a way to enforce pruning and not over fitting the training data.

  • @user-ng1hs4lx4u
    @user-ng1hs4lx4u 3 роки тому

    Thank you for marvelous video!
    I have some questions regarding what's explained
    1. Can Number of trees we make be controlled by what we call 'Epoch' in ML?
    2. When the model runs through epochs, is there any chance some epochs go the other way from the answer value?
    - I understood that by setting learning rate too high, new prediction will bypass the answer, causing the learning procedure to
    fluctuate a lot.
    3. Ways we can slow down learning speed, I think are
    1) Larger cover, 2) Larger gamma, 3) larger lambda
    is it right? or are there more ways to control the speed?
    Always thanks to all the efforts you made for the materials!

    • @statquest
      @statquest  3 роки тому

      1) I think you can use that terminology if you want, but I don't know of anyone else who does. In xgboost, the parameter you set for the number of trees is "num_boost", and generally speaking, building trees is called "boosting".
      2) I don't know.
      3) Although not mentioned in the original paper, XGBoost contains a few other ways to slow down learning (add regularization). For full details, see the manual: xgboost.readthedocs.io/en/latest/parameter.html

    • @user-ng1hs4lx4u
      @user-ng1hs4lx4u 3 роки тому +1

      @@statquest
      Thanks for kind reply! :)

  • @yjj.7673
    @yjj.7673 4 роки тому +1

    That's great. BTW is there a video that only contains songs? ;)

  • @mrcharm767
    @mrcharm767 Рік тому +1

    concepts going straight to my head as if u shot arrows bam!!!!!

  • @somalkant6452
    @somalkant6452 4 роки тому

    Hi josh,
    Can you help and tell me whether similarity score and entropy are different things or same?

  • @dylangaldes7044
    @dylangaldes7044 3 роки тому

    Ive been researching on how to use XGBoost for image classification, unfortunately I did not find a lot of research papers on this. Is it a good algorithm for this job, Classification has multiple different classes that are either various types of diseases on leaf plants or a healthy leaf. Thank you

    • @statquest
      @statquest  3 роки тому

      I've never done that myself, but I've heard of people who have and been successful.

  • @LL-hj8yh
    @LL-hj8yh 9 місяців тому +1

    Hey Josh, how does the similarity score here related to gini/entropy we use for XGBoost’s classification?

    • @statquest
      @statquest  9 місяців тому +1

      I'm not sure I understand your question. Are you wanting to compare the similarity score for XGBoost to how classification is done (with GINI or entropy) for a normal decision tree? If so, they are not related. This similarity score is derived from loss function, whereas GINI and entropy are just used because they work. For details on the XGBoost similarity score, see: ua-cam.com/video/ZVFeW798-2I/v-deo.htmlsi=iv2nJpFE41ijE3zo

    • @LL-hj8yh
      @LL-hj8yh 9 місяців тому

      @@statquest thanks Josh! I was earlier under the impression that we need to specify gini or entropy in a xgboost classifier, which seems incorrect as they are only for decision tree, not XGBoost’ classifier. Yet is it true that the similarity score and gini/entropy serve the same purpose, that is to calculate the similarity/purity therefore determine the split?
      Thanks again and congrats on 1M subscribers, that says a lot!

    • @statquest
      @statquest  9 місяців тому +1

      @@LL-hj8yh Yes, the similarity score and GINI serve the same purpose, but we can't use them (Gini or entropy) here since we are fitting the tree to continuous values (even for classification). Thanks!

  • @lucaslai6782
    @lucaslai6782 4 роки тому

    Hello Josh, Could you tell me how I can decide the value of Tree Complexity Parameter (Y Gamma)?

  • @rajdipsur3617
    @rajdipsur3617 3 роки тому +1

    Infinite BAAAAAAAAAAAAAAAMMMMMMMMMM for these amazing videos bosss... :-)

  • @karangupta6402
    @karangupta6402 3 роки тому +2

    Awesome :)

  • @karangupta6402
    @karangupta6402 3 роки тому

    Hi Josh:
    Can it be possible to make some video on the scale_pos_weight feature of XGBoost and how it can help in solving imbalanced datasets problems?

    • @statquest
      @statquest  3 роки тому +1

      I'll keep that in mind.

  • @priyabratbishwal5149
    @priyabratbishwal5149 3 роки тому

    Hi Josh ,
    How to make a tree with multiple predictors using XG boost .Here you showed only single variable called Dosage . How to do it for multiple variables?
    Thanks

    • @statquest
      @statquest  3 роки тому

      For each variable in your dataset, you go through the process shown here. You then select the variable that results in the best similarity score.

  • @santoshkumar-bz9mg
    @santoshkumar-bz9mg 3 роки тому +2

    U r awesome
    Love from INDIA

  • @yuchenzhao6411
    @yuchenzhao6411 4 роки тому +1

    8:04 If two thresholds have same 'Gain' why would we pick "Dosage < 15" rather than "Dosage < 5"? Dose it matters for larger dataset?
    13:23 Since in part1 we set gamma=130 and part2 we set gamma=3, I'm wondering how do we choose the value for gamma?

    • @statquest
      @statquest  4 роки тому +3

      1) If 2 or more thresholds have the same "best GINI score", then just pick one, it doesn't matter. Since this is a greedy algorithm it does not look ahead to see if one of those choices is better in the long run.
      2) When we use XGBoost for regression, the residuals can be relatively large, so gamma may need to be relatively large. When we use XGBoost for classification, the residuals are relatively small, so gamma may need to be relatively small. You can always just build a few trees to get a sense of what values for gamma make sense for pruning.

    • @yuchenzhao6411
      @yuchenzhao6411 4 роки тому

      @@statquest thank you very much Josh! Really enjoy your video!

  • @tudormanoleasa9439
    @tudormanoleasa9439 3 роки тому

    What do you do if the cover of a left leaf is less than 1, but the cover of a right leaf is greater than 1? Do you only remove the left leaf or the entire subtree made of root, left leaf, right leaf?

    • @statquest
      @statquest  3 роки тому

      If the cover value for one of the leaves is too small, we remove both leaves.

  • @rrrprogram8667
    @rrrprogram8667 4 роки тому +1

    Hit and like first... Then later i am gonna watch video... MEGAAAA BAMMMM

  • @gahbor
    @gahbor 9 місяців тому

    If my dataset has a binary target variable to predict, and most features are also binary, would it make sense to go with min_child_weight=0 ?

    • @statquest
      @statquest  9 місяців тому

      Probably not since that will result in the trees overfitting your data.

  • @zahrahsharif8431
    @zahrahsharif8431 4 роки тому

    Hi Josh, if there were outliers in the data say dosage 1000, this wouldn't affect how the tree makes it's split therefore outliers do not affect it? Aren't tree methods robust to outliers

    • @statquest
      @statquest  4 роки тому

      Trees can be less sensitive to outliers than other methods, however, it's always a good idea to remove them first.

  • @francescoperia9768
    @francescoperia9768 Рік тому

    Hi Josh, I cannot understand why at minute 08:15, after you created the first split (Dosage < 15) and the consequent similarity gain, you don't update the predicted probabilities of the residuals by using the formula e^log(ODDS) / (1 + e^log(ODDS)). In the video it seems that the "previous predicted probability" remains always the initial 0.5, so I'm asking if it should be changed after the first split instead. Thank you in advance

    • @statquest
      @statquest  Рік тому +1

      The predicted probabilities should not be changed until after we have created the entire tree and calculated the output values for the leaves.

    • @francescoperia9768
      @francescoperia9768 Рік тому +1

      Oh my mistake, you are totally right.. thank you very much. So basically like a standard Gradient Boosting Classifier I build the whole weak learner tree and once I obtain the output leaf values (which are log(ODDS) values calculated with the same formula as the standard GB Classifier apart from lambda) I compute the new prediction starting from the previous one. Then I convert the new log(ODDS) prediction into probability using the logistic function.

  • @mehuljain4920
    @mehuljain4920 3 роки тому

    Hi
    Hope it’s not too late to get a reply on this video from you.
    I just wanted to know how the tree will grow when there are more variables. Like in decision tree it takes 1 variable in the root followed by other variables in other nodes.
    How will xgb build its tree.
    Thanks

    • @statquest
      @statquest  3 роки тому

      XGBoost builds its trees just like any other tree algorithm, although it uses a different criteria for selecting the best way to split the data.

  • @tauseefnawaz8888
    @tauseefnawaz8888 3 роки тому

    that's a really great video and all others. i want to thanks about these. So, my Question is, you are solving this for binary classification. How we can Make model with multiclass Classification? Thanks

    • @statquest
      @statquest  3 роки тому

      Typically people just combine multiple binary classifiers. However, you can also swap out the loss function for multi-classification with softmax.

  • @davidlo2247
    @davidlo2247 3 роки тому

    at around 11:00, could you explain further why the cover, meaning the minimum number of residuals in each leaf, is 0.25, why it cannot allow a leave with 1 residual? isn't 1 > 0.25?

    • @statquest
      @statquest  3 роки тому

      I answer this question in the StatQuest that explains the math behind XGBoost: ua-cam.com/video/ZVFeW798-2I/v-deo.html

  • @neelamyadav533
    @neelamyadav533 4 роки тому

    Hi Josh, just wanted to ask .. at time 9.18 .. its showing similarity for the leaf is 2... will you kindly check and explain... i dont know where i am going wrong in calculation .. according to me it should be 0.5

    • @statquest
      @statquest  4 роки тому +1

      It is possible that I'm doing the math wrong. So let's check.
      The two residuals in that leaf are 0.5 and 0.5, and the previous prediction for each residual is 0.5, so...
      Similarity = (0.5 + 0.5)^2 / ((0.5*0.5) + (0.5*0.5))
      = 1^2 / (0.25 + 0.25)
      = 1 / 0.5
      = 2
      ...so that's how I got 2. How did you get 0.5?

    • @neelamyadav533
      @neelamyadav533 4 роки тому +1

      @@statquest oh yeah .. thanks .. my mistake... thanks a lot for your explanation... and a big thanks for uploading these awesome videos

  • @ahindrilasaha5850
    @ahindrilasaha5850 3 роки тому

    When number of variables >1 , do we calculate gain for all the variables individually and select the one with greatest gain? Or , is there any other procedure?

  • @KUNALVERMAResScholarDeptofMath
    @KUNALVERMAResScholarDeptofMath 2 роки тому

    Hi Josh, Why are we taking the last two values at 6:04?

    • @statquest
      @statquest  2 роки тому

      I'm not sure I understand your question. At 6:04, we put 3 residuals in the leaf on the left and 1 residual in the leaf on the right.

  • @mimaaristide7151
    @mimaaristide7151 3 роки тому

    Thank Your sir for this wonderful video.
    I have a question please, once we've built all classifiers, how do we obtain the final classification ?

    • @statquest
      @statquest  3 роки тому +1

      You decide what the probability threshold should be and all predictions with probabilities > threshold are classified one way, and everything else is classified the other way.

    • @mimaaristide7151
      @mimaaristide7151 3 роки тому

      @@statquest will this decision be made considering all classifiers, and take majority vote as in classic bagging method ?

    • @statquest
      @statquest  3 роки тому

      @@mimaaristide7151 This is a "boosting" method, which is different from a "bagging" method. The gist of boosting is 22:52. However, more details are here: ua-cam.com/video/3CC4N4z3GJc/v-deo.html

    • @mimaaristide7151
      @mimaaristide7151 3 роки тому +1

      @@statquest yes sir, you're right, sorry for this confusion. And once again thank you for your wonderful videos...

  • @Azuremastery
    @Azuremastery 3 роки тому +1

    Thanks!

  • @n0pe1101
    @n0pe1101 4 роки тому

    Can you do a video about Gaussian Process Regression/Classification?

  • @arpitsolanki5254
    @arpitsolanki5254 Рік тому

    How do we assume that the output value from the tree fitted to residuals can directly be added to the log odds of the initial prediction?

    • @statquest
      @statquest  Рік тому +1

      That's a good question and, to be honest, I don't know. I've tried to work it out and failed - so maybe someone else can weigh in.

    • @arpitsolanki5254
      @arpitsolanki5254 Рік тому +1

      @@statquest thanks so much Josh! This is hands down the best channel on UA-cam to learn about machine learning!

    • @statquest
      @statquest  Рік тому

      @@arpitsolanki5254 Thanks!

  • @lorenzodagostino5338
    @lorenzodagostino5338 Рік тому

    Hey Josh. First of all i just want to thank with you fo your amazing work. I have a question. How those concepts fit the multi-label classification problem?

    • @statquest
      @statquest  Рік тому +1

      You can swap out the loss function for one that works better with multiple classification, like cross entropy.

    • @lorenzodagostino5338
      @lorenzodagostino5338 Рік тому

      @@statquest Thank you for the reply. Ok good, but in that case how do you interpret the output of the leaves?

    • @statquest
      @statquest  Рік тому

      @@lorenzodagostino5338 Good question! I'm not sure, but I'd bet it's the exact same as what we are doing here - regardless of the class (binary in this case) the output of the leaves is the average of the residuals in that leaf.

    • @lorenzodagostino5338
      @lorenzodagostino5338 Рік тому

      @@statquest Clear. Thank you for the answer. One last thing. Have you ever dealt with online machine learning? It would be very interesting to understand more about Hoeffding trees, maybe there might be a possibility to adapt xgboost to online machine learning by implementing its trees in this way...

    • @statquest
      @statquest  Рік тому

      @@lorenzodagostino5338 I'll keep that in mind.

  • @zeus1082
    @zeus1082 Рік тому

    Thank you for the explanation. Why are we using different decision nodes for each new trees?
    entropy is calculated independent of the residual right?

    • @statquest
      @statquest  Рік тому

      I'm not sure I understand your question, however, if you want to learn about the underlying details (i.e. see more of the math) of how XGBoost works, see: ua-cam.com/video/ZVFeW798-2I/v-deo.html

    • @zeus1082
      @zeus1082 Рік тому

      @@statquest for each new tree, the root node is different in the video,so I'm confused why the root nodes are different since we are using the same gini or entropy to decide the root node.

    • @statquest
      @statquest  Рік тому

      @@zeus1082 Every time we build a tree, we update the residuals. Different residuals = different trees.

    • @zeus1082
      @zeus1082 Рік тому

      @@statquest aren't we deciding the split nodes based on gini ?. Please refer a video/timestamp where we decide the split node based on the residuals

    • @statquest
      @statquest  Рік тому

      @@zeus1082 See: 3:20 That said, I appreciate your interest in these topics, but I would it would help me if you could watch the videos, all of them (including a 4 gradient boost videos), in order, and maybe watch them a few times before asking more questions. It is possible that my videos are not the best learning tool for you, so I would also consider seeing how other people teach this topic, or consider reading the original manuscript.

  • @junghyunlee781
    @junghyunlee781 2 роки тому

    Thanks for video. 12:58 So you mean 'cover' is equal to hyperparameter 'min_child_weight' ??

  • @raj345to
    @raj345to 2 роки тому +1

    which vedio making tool do u use .....its so cool.

    • @statquest
      @statquest  2 роки тому +1

      I answer these questions in this video: ua-cam.com/video/crLXJG-EAhk/v-deo.html

  • @himanshumangoli6708
    @himanshumangoli6708 2 роки тому +1

    I hope you were my teacher in my college days. So instead of watching your videos, i am able to create it.

  • @deana00
    @deana00 9 місяців тому

    Hi, thanks for your great video! But, I have question here..
    Why the way to get the initial prediction in xgboost different from the gradient boosting? In gradient boosting, you were using log odds, but in xgboost you were set it 0.5, am I missing something?

    • @statquest
      @statquest  9 місяців тому

      Gradient Boosting and XGBoost start differently. In gradient boosting, we use the training data to make an initial estimate (log(odds) or probability) for the initial prediction. In contrast, with XGBoost, we just set the initial prediction to 0.5.

    • @deana00
      @deana00 9 місяців тому

      @@statquest I'm sorry, but I still dont get it, why is it so?

    • @statquest
      @statquest  9 місяців тому

      @@deana00 Because that is how they define it in the XGBoost manuscript.

    • @deana00
      @deana00 9 місяців тому

      @@statquest ahh, thank you for your answer. do you plan to make a video about lightgbm? or histogram-based algorithm?

    • @statquest
      @statquest  9 місяців тому

      @@deana00 At some point I'd like to make some videos about lightGBM. It's just a matter of finding the time to do them.

  • @nishidutta3484
    @nishidutta3484 3 роки тому

    How is dosage selected as the first split and not any other variable? Is it on the basis of gini impurity?

    • @statquest
      @statquest  3 роки тому

      Dosage is selected because it is the only variable. If there were more variables, we would pick the variable (and associated threshold) that gave us the largest value for Gain.