Gradient Boosting Method and Random Forest - Mark Landry

Поділитися
Вставка
  • Опубліковано 27 січ 2025

КОМЕНТАРІ • 12

  • @pbharadwaj3
    @pbharadwaj3 7 років тому +4

    The coding starts at 16:38

  • @geoffreyanderson4719
    @geoffreyanderson4719 8 років тому +1

    Mr.Landry reviewed accuracy, @27:20, based on validation dataset which was used during training to tune the model. It's not a realistic error estimate -- too optimistic -- consequently, when hit_ratio_table is looked at. Better to use new data to estimate error rather than data that was used to tune the model.

    • @marklandry2140
      @marklandry2140 8 років тому

      Hi @Geoffrey Anderson. A final/new test set is used, actually. This is introduced at about 18:00, and discussed at more length at 32:00, where it is scored for the first time. It trains on 60%, uses 20% for an internal validation set (early stopping), and the final 20% to evaluate when all tuning is complete.

  • @poojawalavalkar355
    @poojawalavalkar355 7 років тому

    Beautifully explained. Thanks Mark!

  • @sarthakyadav371
    @sarthakyadav371 4 роки тому

    You are awesome Mark!

  • @kojikitagawa7333
    @kojikitagawa7333 6 років тому

    Could someone please elaborate a little more on the hit ratio table starting at 23:45? I am a little confused on what the score represents at k >= 2

  • @nomadjoy
    @nomadjoy 6 років тому +1

    Hi Mark, this was extremely helpful. Can you please share the github path for the same. Thanks.

  • @anitamishra04
    @anitamishra04 6 років тому

    The best explaination

  • @chsuresh009
    @chsuresh009 8 років тому

    Hi Mark, I could not find anywhere, how to know the optimal number of rounds for GBM, in xgboost in cv, we get to know at what iteration the model reached optimal loss, but h2o, even when I give, validation set, stopping metric (logloss), stopping rounds (150) and stopping error 0.0001, it does not seems to stop. number of trees is always what is set in ntrees

    • @marklandry2140
      @marklandry2140 8 років тому +1

      Hi @Suresh Chinta.
      Stopping rounds of 150 is quite high. It may be valid in your case, but H2O will wait until the average of 150 consecutive rounds is within the stopping tolerance (you are using 0.0001 it seems) of the prior 150 consecutive rounds. And rounds uses score_tree_interval for how many trees are part of a round (default is variable by scoring time estimation).
      For reference, I typically use 2 for stopping_rounds. I usually set ntrees at a nearly unattainable number (e.g. 2000, 10000), drop the tolerance to 0, and also set the score_tree_interval to somewhere between 2 and 5. And those models typically stop well before the ntrees limit.
      In case it helps, since the demo is intended to be fast for people in the audience and that makes it a little less indicative of typical modeling, this is the latest model I've run this week:
      gbm

  • @hleljihen2007
    @hleljihen2007 5 років тому

    thank you for the video but can you please talk slowly

    • @Coral_dude
      @Coral_dude 4 роки тому +1

      You can control the speed yourself with youtube controls