How to Use Random Forest in R

Поділитися
Вставка
  • Опубліковано 3 гру 2024

КОМЕНТАРІ • 11

  • @osamamohamed88
    @osamamohamed88 4 роки тому

    thanks a lot for your explaination

  • @jackkay3388
    @jackkay3388 5 років тому

    Thank you thatRnerd! You are going to help me so much with my Graduate Thesis

  • @mohammedomor1458
    @mohammedomor1458 5 років тому +1

    Do you do tutorials based on predictive models and is it similar to Matlab?

    • @thatrnerd4265
      @thatrnerd4265  5 років тому

      I have another video on decision trees, I plan on doing more predictive models in the future. I haven't worked with Matlab so I'm not sure how close the code would be.

  • @saikatkar547
    @saikatkar547 4 роки тому

    well explained

  • @rynokleinhans3689
    @rynokleinhans3689 4 роки тому

    I am busy with a project where I have 65 000 entries and 70 variables. What method would you propose for me to use to identify the most important predictors? Will I be able to use randomForest?

    • @thatrnerd4265
      @thatrnerd4265  4 роки тому +1

      I would do what is done in the video with varImpPlot(rf). Then I would keep the variables that are positive on the left graph. If it is greater than zero it means that it is increasing the accuracy of the model. Depending on how well the top few variables predict, I would also consider a model with maybe the top ten variables.

    • @rynokleinhans3689
      @rynokleinhans3689 4 роки тому

      @@thatrnerd4265 Thank you for your response. I appreciate it :)

  • @PeterKingnz
    @PeterKingnz 4 роки тому

    thatRnerd. You invite the viewer to follow along but don't make the code available. STraight away RStudio says "Error in tbl_df(iris) : could not find function "tbl_df"". So which library is that? I've never seen tbl_df() before.

    • @thatrnerd4265
      @thatrnerd4265  4 роки тому

      Thank you for letting me know that! The library you want to use for that function is dplyr, I use the tidyverse package which will load dplyr as well as a bunch of other packages that are very helpful for data science. That should definitely have been at the top with the other libraries, though the tbl_df() is not saving anything to use in random forest, but just gives a clean way to get an overview of the data.