Predict water availability in Sierra Leone with random forests

Поділитися
Вставка
  • Опубліковано 23 січ 2025

КОМЕНТАРІ • 28

  • @bbvv8004
    @bbvv8004 3 роки тому +15

    The best R based channel on yt! Is it possible to have a video with time series data?

    • @PeterHontaru
      @PeterHontaru 3 роки тому +4

      Would also love a time series video :)

  • @Insipidityy
    @Insipidityy 3 роки тому +1

    The visualizations used are simple, elegant and impactful - absolutely love them. Also my first time learning of the "space" argument in facet_grid - definitely going to be using that more. Thank you Julia!

  • @cheninitayeb6590
    @cheninitayeb6590 3 роки тому

    Very nice exploitation of a both dataset and applied modeling ,this is awesome the see you your video Julia

  • @hugosantarem8314
    @hugosantarem8314 3 роки тому

    I really dig your videos. I learn a lot from you and your videos. You're an amazing expositoR!

  • @trevorschrotz_ncgcare
    @trevorschrotz_ncgcare 3 роки тому

    Fantastic, Julia, thank you! Very helpful.

  • @PeterHontaru
    @PeterHontaru 3 роки тому

    Really enjoying these videos, Julia, and learning a lot. Thank you!

  • @alelust7170
    @alelust7170 3 роки тому +1

    I love your videos! Tks a lot!!!

  • @jferris
    @jferris 3 роки тому

    A neat trick could be to predict on the unknown sources and in theory when they were checked in the future you would be able to compare.

    • @JuliaSilge
      @JuliaSilge  3 роки тому

      Ah, I totally should have done that!

  • @mrsnakesss
    @mrsnakesss 3 роки тому +1

    Why did you use bake(new_data = NULL) and not juice() at 36:23? (I thought it was a shortcut)

    • @JuliaSilge
      @JuliaSilge  3 роки тому +2

      Either one works! We are trying to focus on using `bake()` in documentation, because feedback from people indicated it was confusing to have yet another function to do the same thing.

    • @mrsnakesss
      @mrsnakesss 3 роки тому

      @@JuliaSilge Okay, thanks! (and a big thank for your videos too!)

  • @christelleleitzingerphd7491
    @christelleleitzingerphd7491 3 роки тому

    Awesome video, thanks! I was wondering why you changed your set.seed().

    • @JuliaSilge
      @JuliaSilge  3 роки тому +1

      I may call set.seed() more than strictly necessary for reproducible results, but I tend to call it every time I know that I'll be doing something that involves randomness. So for example, the initial split involves the RNG, and then creating the folds involves the RNG. I have used enough functions in R that change the RNG in ways I don't expect that I have some habits that may be overkill.

    • @christelleleitzingerphd7491
      @christelleleitzingerphd7491 3 роки тому

      @@JuliaSilge I see! And you are not using the same set.seed number (123 then 234). Is there a reason for it?

    • @JuliaSilge
      @JuliaSilge  3 роки тому

      @@christelleleitzingerphd7491 The value for the seed is not a special number; we just need to set it to something. It feels weird to me to continually set the seed to the same value within the same script, but again, that may be a habit from bad experiences in the past.

    • @christelleleitzingerphd7491
      @christelleleitzingerphd7491 3 роки тому

      @@JuliaSilge Haha! Thank you so much for the answer! I don't know what is the best to do. I will do some research on that! Thanks! And again, I love your video!

  • @alelust7170
    @alelust7170 3 роки тому

    Julia, could you make a video using LIME on the workflow? Thanks

  • @ecbytes
    @ecbytes 3 роки тому

    At 14:00, What do the two periods in the aes( y = ..density..) function mean?

    • @JuliaSilge
      @JuliaSilge  3 роки тому +1

      It is saying to put density on the y axis instead of counts. It looks like a newer way to do this is to use `after_stat()`:
      ggplot2.tidyverse.org/reference/geom_histogram.html

    • @ecbytes
      @ecbytes 3 роки тому

      @@JuliaSilge I had never seen that notation before. Thank you, Julia! I learn so much for these videos!

  • @mynonaanonym8177
    @mynonaanonym8177 3 роки тому

    Great channel thanks a lot for your work, you helped me a lot. However, there is something I dont really understand. You are preprocessing the data on the train data. But the recipe is never applied on the test data or does the "last_fit()" command automatically apply the recipe also on the test data ?

    • @JuliaSilge
      @JuliaSilge  3 роки тому +1

      Notice that the **workflow** contains both a model and a recipe; when we fit or tune a workflow, all the components (preprocessor + model) are estimated from training data and then applied to new data. You can read a bit more about this in Ch 8 of our book, especially this section: www.tmwr.org/workflows.html#workflows-and-recipes

    • @mynonaanonym8177
      @mynonaanonym8177 3 роки тому

      @@JuliaSilge thank you so much Julia

  • @victormandela1810
    @victormandela1810 3 роки тому

    Very nice video. Which software did you use to add the sub-titles on the video?

    • @JuliaSilge
      @JuliaSilge  3 роки тому

      These are just the auto-generated subtitles from UA-cam, which appear after the video is online for a while. They are frankly pretty darn good! It would of course be better to create them from scratch and upload them with the video but I haven't started doing that yet.