XGBOOST in Python (Hyper parameter tuning)

Поділитися
Вставка
  • Опубліковано 21 гру 2024

КОМЕНТАРІ • 105

  • @jaripeltola
    @jaripeltola 2 роки тому +3

    This presentation is the best overall view on the most important XGBoost model parameters I have seen.

  • @kmnm9463
    @kmnm9463 3 роки тому +5

    Hi Ashok,
    What I think is, there is no need to check for training accuracy. This is a redundant approach. The reason is the model is trained on the training data. So obviously the accuracy, whatever the hyperparameter tuning we do, is more likely to be close to 1.0. The better approach is to just focus on the test data. In real time scenario , for a problem statement, we would be feeding unseen data to the model and then fine - tune the hyper parameter. Thanks for the tutorial.
    Thanks from KM

    • @DataMites
      @DataMites  3 роки тому

      Thank you

    • @darrencr1987
      @darrencr1987 Рік тому +1

      Just for discussion… I think the purpose for calculating training performance is to compare it with test performance and see if there is any overfitting, otherwise how would you know ? Also I don’t think accuracy is a good measure here, AUC might be a better one, just my 2 cents

  • @satishb9975
    @satishb9975 Рік тому

    Thank you and excellent way with detailed elaboration, of each parameters for Hyper parameter tuning) explained very well, finally in got the topic of hyper parameter tuning concept

    • @DataMites
      @DataMites  Рік тому

      Thank you, Keep Supporting

  • @prakharbaheti4055
    @prakharbaheti4055 4 роки тому +4

    Great tutorial , exact and to the point.

  • @sugandhchauhan1900
    @sugandhchauhan1900 2 роки тому +1

    Great video. Could you help me fine-tune my model, please? I am getting really low training and testing accuracy?

  • @nasifosmanshuvra8607
    @nasifosmanshuvra8607 2 роки тому +1

    Great explaplnation Sir! How can I provide batches of Images by using data generator for image dataset to Xgb classifier model to fit images and labels ??

    • @DataMites
      @DataMites  2 роки тому

      Kindly refer this: github.com/bnsreenu/python_for_microscopists/blob/master/195_xgboost_for_image_classification_using_VGG16.py

  • @PhenomenalInitiations
    @PhenomenalInitiations 3 роки тому +1

    Sir I want to use sotmax as objective, I have 4 dependent varaibles. How to make xgboost understand that there are 4 such variables? Pls reply.

    • @DataMites
      @DataMites  3 роки тому +1

      Hi Sai Akhil Katukam, Thanks for your comment.
      If you want to use softmax and define the number of class in xgboost you need to put the following parameter while building the model...
      from xgboost.sklearn import XGBClassifier
      XGBClassifier(objective= 'multi:softmax', num_class=4,...)

  • @welcomethanks5192
    @welcomethanks5192 2 роки тому

    WHy your digital pad can have pressure? my wacom intuos doesn't?

    • @DataMites
      @DataMites  2 роки тому

      Can you reframe your question?

    • @welcomethanks5192
      @welcomethanks5192 2 роки тому

      @@DataMites I mean you are writing something with a digital pad and the words you write can have different thicknesses. But my digital pad only works like a marker pen(all same thickness)...

    • @DataMites
      @DataMites  Рік тому

      @@welcomethanks5192 You will have an option to change the thickness

  • @NextVersionOfYou
    @NextVersionOfYou 3 роки тому

    Thank you. What's said regarding random state... true for regression problems as well?

  • @rafsunahmad4855
    @rafsunahmad4855 3 роки тому

    Is knowing the math behind algorithm must or just knowing that how algorithms works is enough? please please please give a reply.

    • @DataMites
      @DataMites  3 роки тому +1

      "Hi Rafsun Ahmad, thanks for your comment.
      It is necessary to know the math and other background behind any algorithm so that you will have better idea on why and how that algorithm should be used."

    • @rafsunahmad4855
      @rafsunahmad4855 3 роки тому

      Thank you very much

  • @kuox0005
    @kuox0005 4 роки тому

    It appears that the target variable, y, is limited to nx1 array for making predictions using XGBOOST. Could the target variable, y, be a nXm, where m > 1, array ?

    • @DataMites
      @DataMites  4 роки тому

      Yes possible, you can use multioutputregressor as a wrapper on xgboost

  • @praveenk302
    @praveenk302 3 роки тому

    What is min_child_weight and its significance?

    • @DataMites
      @DataMites  3 роки тому

      Hi, please refer to this documentation. xgboost.readthedocs.io/en/latest/parameter.html

  • @majorcemp3612
    @majorcemp3612 4 роки тому +1

    Hi, what about gamma, don't you use it ? I think it's the only important missing here.

    • @vikasrajput1957
      @vikasrajput1957 4 роки тому +1

      I guess since he is already using max_depth just 2-3, he doesn't need much of a pruning parameter for the trees, I guess. Your thoughts?

    • @majorcemp3612
      @majorcemp3612 4 роки тому

      @@vikasrajput1957 Surely, and you can tune parameters differently with gamma too, I just think in term of education he should mention it 😅😊

    • @DataMites
      @DataMites  4 роки тому +1

      Please refer stats.stackexchange.com/questions/418687/gamma-parameter-in-xgboost

  • @carlmemes9763
    @carlmemes9763 3 роки тому

    Sir this problems also in the gradient boosting? Am i correct?
    If it in, we can do as you explained.
    If no, what have we da sir?
    Thank you sir, your videos are amazing ❤️

    • @DataMites
      @DataMites  3 роки тому

      Hi, Thank you for your comment, can you clarify which problem you are trying to figure out?

    • @carlmemes9763
      @carlmemes9763 3 роки тому

      @@DataMites overfitting sir....

  • @arjungoud3450
    @arjungoud3450 2 роки тому

    There is explanation of what they. Hoping you would a video in more detail

  • @estebanbraganza1067
    @estebanbraganza1067 4 роки тому +1

    Amazing video it would be better if you could use a different dataset so we can see the effects of the different parameters better.

    • @DataMites
      @DataMites  4 роки тому

      Sure will do that since this is to explain you basic concept.

  • @carolinnerabbi965
    @carolinnerabbi965 4 роки тому +2

    Very good explanation and test strategy, thanks!

  • @dehumanizer668
    @dehumanizer668 3 роки тому

    Nice one 👍🏼

  • @gauravverma365
    @gauravverma365 3 роки тому

    Such an informative video about the tunning of xgboost hyperparameter. My question is, can we extract mathematical equation for the input and output parameters. For instance, I have successfully applied Xgboost regression to predict y parameter using X1, X2, X3, X4 input parameters, now how can I get the xgboost's predicting equation between those input and output parameters. Please provide the information in this manner

    • @DataMites
      @DataMites  3 роки тому

      No we cannot extract mathematical equation

  • @souptikmukhopadhyay6531
    @souptikmukhopadhyay6531 2 роки тому

    If your train accuracy is 1 and test accuracy is 0.97 how can you say that the model is overfitted ? The model is clearly performing very well on the test data. What you can do is perform k-fold cross validation to be more sure that it gives high accuracy on various test sets .... But having high train and test accuracies is not overfitting, it means that the data is relatively simple for the model to learn.

    • @DataMites
      @DataMites  2 роки тому +1

      Yes it could be a simple dataset. But we can validate this model using cross validation to see if model overfits.

  • @avaolsen1339
    @avaolsen1339 3 роки тому

    Thank you, Mr. Veda! This is really helpful. I have a question: is there an efficient way to tune these parameters automaticall?.

    • @DataMites
      @DataMites  3 роки тому +1

      Hi Ava Olsen, you can automate the tuning of hyper parameter using python scripts. Or you can have a look in automl.

    • @youmadvids
      @youmadvids 3 роки тому

      @@DataMites Hi, what about GridCV?

    • @allalzaid1872
      @allalzaid1872 2 роки тому +1

      grid search cv

    • @avaolsen1339
      @avaolsen1339 2 роки тому

      But grid can is resource/time consuming. Is there an efficient way to do it?

  • @gauravrajpal1994
    @gauravrajpal1994 4 роки тому

    Very good explanation and test strategy, thank you so much sir

  • @wimavlogs6826
    @wimavlogs6826 4 роки тому

    can you do a full video of time series forecasting for any future prediction using previous data? (Using XGBoost)

    • @DataMites
      @DataMites  3 роки тому

      We will definitely do in future. Thank you

  • @vikasrajput1957
    @vikasrajput1957 4 роки тому

    increase learning rate makes the algorithm learn faster but at the cost of accuracy and does not dicrease the sensitivity contributed by a single point by a great amount, and thus does not generalises the model well and leads to overfitting in some cases

    • @DataMites
      @DataMites  4 роки тому +1

      That is what convergence of algorithm means.

  • @davintjandra4226
    @davintjandra4226 4 роки тому

    Hey, i ve got a question, say if i use a correlation matrix, and manually deselect the feature that are ambiguous(neutral), can i still put the col sample as 1?? Great tutorial man

    • @DataMites
      @DataMites  4 роки тому

      Yes you can but check how you model performed.

  • @Nixterrex
    @Nixterrex 3 роки тому

    Thank you! Are the parameters for XGBClassifier similar for the XGBRegressor? I can look at the documentations on my own, but it’s late at night for me and I can’t sleep thinking about it but i also don’t want to get sucked back into my project (i fixate XD) and i need to sleep hahah…
    Thank you again though! The video really helped me. I’m only 3 months into learning data science with python so it feels good every time i finally piece things together.

    • @DataMites
      @DataMites  3 роки тому

      Hi Niko Blanco, yes you can find some similar parameters in XGBClassifier and XGBRegression. Thank you

  • @analuciademoraislimalucial6039
    @analuciademoraislimalucial6039 3 роки тому

    Thanks Teacher. Love it explanation

  • @qazdata-science4420
    @qazdata-science4420 4 роки тому +1

    Amazing Tutorial!!!!

  • @xolanijozi8375
    @xolanijozi8375 3 роки тому

    This is great.

  • @AkshayArbune
    @AkshayArbune 7 місяців тому

    Very helpful Video

    • @DataMites
      @DataMites  6 місяців тому

      Glad it was helpful!

  • @madhur089
    @madhur089 3 роки тому

    Thank you this helped in understanding

  • @prajothshetty6848
    @prajothshetty6848 4 роки тому

    great video sir!
    straight & to the point explanation.
    sir where is the link to the code report or the repository?

    • @DataMites
      @DataMites  4 роки тому

      We request you to pause video and type the code and will soon update the code in the description

  • @prakashaiml8423
    @prakashaiml8423 4 роки тому +1

    EXCELLENT EXPLANATION..

  • @pradeepsharma30
    @pradeepsharma30 4 роки тому

    This is amazing stuff!!

  • @planetscore
    @planetscore 4 роки тому +2

    What a chaos!

  • @mdfahd1795
    @mdfahd1795 4 роки тому

    Keep it up bro

  • @wangrichard2140
    @wangrichard2140 3 роки тому

    perfect!

  • @2broke2code
    @2broke2code 4 роки тому +1

    Starts at 14:50

  • @aiinabox1260
    @aiinabox1260 2 роки тому

    Training accuracy was 1 don't u think it's a overfit

    • @DataMites
      @DataMites  2 роки тому

      Yes. Hyperparameter tuning will help to overcome that. But as said, this is a very small dataset.

  • @ltrahul1016
    @ltrahul1016 2 роки тому

    nice

  • @johnmasalu8703
    @johnmasalu8703 3 роки тому

    Fruitful and informative training, please share your email, for clarifications on some of the issues

    • @DataMites
      @DataMites  3 роки тому

      "Hi John Masalu, Thanks for reaching to us.
      You can share all your queries and doubt here in the comment section, we will reply in the comment itself."

  • @dineshpramanik2571
    @dineshpramanik2571 4 роки тому

    please keep your microphone near your mouth...can't hear properly

  • @shashankgpt94
    @shashankgpt94 3 роки тому

    you could have chosen a better dataset

    • @DataMites
      @DataMites  3 роки тому

      Hi Shashank Gupta, thank you for your suggestion but this dataset is working good for this task.

  • @nassimbouhaouita1697
    @nassimbouhaouita1697 2 роки тому

    the data was too easy for the model

    • @DataMites
      @DataMites  2 роки тому

      Yes. This video is to focus on hyper parameters of XGBoost.

  • @lextor99
    @lextor99 5 років тому

    Better to make this on a real dataset, that's how this video could be better.

    • @DataMites
      @DataMites  4 роки тому +1

      Aleksei, Do you mean a large dataset? The one used in this video is a real dataset, contributed by the University of Wisconsin in 1995. ref: archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)

    • @lextor99
      @lextor99 4 роки тому

      @@DataMites, Yeah I mean something more realistic and more challenging.

    • @DataMites
      @DataMites  4 роки тому

      @@lextor99 Sure.

    • @nathan_falkon36
      @nathan_falkon36 4 роки тому

      it's enough for it's teaching proposal i think

  • @wimavlogs6826
    @wimavlogs6826 4 роки тому

    can you do a full video of time series forecasting for any future prediction using previous data? (Using XGBoost)

    • @DataMites
      @DataMites  4 роки тому

      Sure till that time keep checking our channel for more videos