Machine Learning in R using caret: GBM (Gradient Boosting Machine) vs. Random Forest

Поділитися
Вставка
  • Опубліковано 17 вер 2024

КОМЕНТАРІ • 7

  • @pedroojedamay2640
    @pedroojedamay2640 5 років тому

    thanks for the video, the concepts are explained clearly. Do you have the entire R script on some repository?

  • @MontereyBayJack
    @MontereyBayJack 6 років тому

    Nice work and good presentation. Two questions. The Random Forest provides an OOB (out-of-bag) error rate (obviating the need for cross validation). Did you look at that rate? Did your college try to tune the Random Forest model? This is often unnecessary, but provides an apples to apples comparison of methods in this case.

    • @StatistikinDD
      @StatistikinDD  6 років тому +2

      Hi Jack,
      good questions!
      OOB and cross validation are different resampling methods. Repeated cross validation can reduce variance, so believe it makes sense to use repeated cv and not rely on OOB alone. I chose to compare repeated cv results. In caret's train control object, you can specify the resampling method, which can be OOB or cv or repeated cv, among others.
      Well noted: The Random Forest model was not tuned beyond caret's default train parameter choices here. I chose to focus on tuning the GBM, after comparing the models which were tuned by caret's defaults. I found tuning the GBM more interesting (and challenging), as caret offers four tuning parameters. For Random Forests, caret only tunes the mtry parameter (number of randomly selected predictors to choose from at each split). Caret tried 2, 7 and 13 and selected mtry = 7. We can specify a finer grid (e. g. all possible mtry values from 1 to 13) and make caret compare all of them. I got mtry = 6 and a slightly better model (but still with marginally higher RMSE than the GBM model trained by custom grid parameter search).
      My intention was not to say that GBM is a better algorithm than RF. My point was rather to show how a model can be improved beyond the default tuning parameters.

  • @craft.for.everyone
    @craft.for.everyone 4 роки тому

    Superb explain , can u extend this and explain how eXtreme Gradient Boosting is better than Gradient Boosting in R. I am eager to see your all other videos but its not in English.

  • @teamseokbaez7478
    @teamseokbaez7478 5 років тому

    How do you look at overfitting issue at the end of the comparison? with RMSE it obviously looks like custom GBM had the most accurate model, but in terms of over fitting, how do you evaluate across all these methods?

    • @stanislavmartsenyuk
      @stanislavmartsenyuk 4 роки тому

      He is doing this with caret with cross validation. See 'trainControl' params. Optimal parameters were found with cross validation, hence, such a model is pretty robust to overfitting. But I shall note, that CV cannot magically remove the problem of overfitting entirely.

    • @StatistikinDD
      @StatistikinDD  4 роки тому

      Agreed, but I have found that a single test data set can also be misleading. I. e. repeating evaluation on different test data can lead to quite different results.