Predict injuries for Chicago traffic crashes with tidymodels

Поділитися
Вставка
  • Опубліковано 1 гру 2024

КОМЕНТАРІ • 21

  • @avnavcgm
    @avnavcgm 3 роки тому +5

    Thank you yet again for this exceptional material Ms. Silge.

  • @sidharthadaggubati438
    @sidharthadaggubati438 3 роки тому +1

    This channel deserves more views. High quality content. Thank you Julia

  • @hesamseraj
    @hesamseraj 3 роки тому +1

    Once again thank you very much Julia. I watched and worked the coding of all your videos and will be following you as long as you share these fantastic videos.

  • @HamJeong
    @HamJeong 3 роки тому +3

    Incredibly useful, thanks so much, I really learn a lot from you sharing like this!

  • @brendenmorley2643
    @brendenmorley2643 3 роки тому +1

    Once again your tutorial is sooo insightful. My r knowledge continues to explode, due to your time and work.

  • @ochiwar
    @ochiwar 3 роки тому +3

    Another Excellent tutorial! I love your plot theme/aesthetics. Will it be possible for you to share your ggplot template theme? Thanks!

    • @JuliaSilge
      @JuliaSilge  3 роки тому +4

      I have it in a little personal package here -- theme_plex(): github.com/juliasilge/silgelib
      But there are some very similar themes in the hrbrthemes package (the one that uses IBM Plex):
      cinc.rud.is/web/packages/hrbrthemes/

  • @mattm9069
    @mattm9069 3 роки тому +2

    thanks Julia!!!

  • @datasciencenerd3263
    @datasciencenerd3263 3 роки тому +1

    I learn a lot from you thank you.

  • @prod.kashkari3075
    @prod.kashkari3075 3 роки тому +2

    Hello Julia! Thanks so much for these tutorials and your book on tidymodels, I’m a undergrad who wanted to learn machine learning in R and you had great resources to help me get started. A few things I wanted to ask you about tidymodels based on what I’ve noticed recently when working with it.
    1. I’ve been getting errors when trying to call the tune_grid() function, I have all my workflows setup, my recipe, I even prep and bake it to check to make sure it’s good, I create my cross validation folds and tuning grids yet when I call tune_model and pass in my workflow, resamples, and grid, it says that my models have failed, do you know what the source of this could be? It says on every fold that something failed. Also it is very slow and tends to freeze.
    2. When I try and fit with my workflow object, by calling fit(), I get a message which says “error could not find fit function from workflow” so I solved the problem by attaching the parsnip:: in front of it and it worked fine, but this error came up one day randomly when I never experienced it the day before.
    These issues I’m sure are because tidymodels is so new and in development.
    Also as a request could you make more videos on the stacks package as well with building ensemble learners in tidymodels?
    Thanks!

    • @JuliaSilge
      @JuliaSilge  3 роки тому +1

      In general, I'd recommend making sure your packages are up to date with the latest CRAN versions. If you can create a reprex with your problem and post on RStudio Community, we are happy to help find the solution:
      rstd.io/tidymodels-community

  • @UndecidedFellow
    @UndecidedFellow 3 роки тому +1

    Thank you for the video Dr Silge! Quick question, how are `bag_tree()` and `vfold_cv()` functions accounting for the time series nature in the data? I'm reading the documentation and it looks like your current pipeline treats the dates as non ordinal and categorical, using the dates as factors with line `step_date(crash_date) %>%`. Is my reading correct? In short, why did you choose `vfold_cv()` over `rolling_origin()` and how is seasonality/autocorrelation modeled in your pipeline?

    • @JuliaSilge
      @JuliaSilge  3 роки тому +2

      So this isn't time series in the sense that I want to predict the next crash(es). Instead it is a classification model where some of the predictors are date features. You can look at another example of this kind of model here: www.tidymodels.org/start/recipes/

  • @terrencerussell1999
    @terrencerussell1999 3 роки тому +1

    Hey Julia! Great stuff again here as always. I look forward to each one of your posts and follow along in R.
    When doing this one with my own Canadian Lat/longs I don't produce a map like yours did in Chicago is that a limit of the function for Canada coordinates? or am I missing something?

    • @JuliaSilge
      @JuliaSilge  3 роки тому +1

      Hmmmmm, I haven't looked at data from Canada so I can't say for sure. If you can put together a small, self-contained reprex demonstrating the issue and post on RStudio Community, I bet folks will be eager to help. There is even a spatial tag where you can get interested folks to see: community.rstudio.com/tag/spatial

    • @terrencerussell1999
      @terrencerussell1999 3 роки тому

      @@JuliaSilge Ok great will do! Thanks again

  • @mattm9069
    @mattm9069 3 роки тому

    Julia, can you please elaborate on what step_downsample() does once we get to the resampling steps? I wanted to see what I would get out of this code:
    train_preprocessed %
    prep(crash_train) %>%
    juice()
    I get a balanced dataset of the outcome variable, and it has ~45,000 rows. Yet, one cross validation fold has 138,000 rows for analysis. So, I want to understand what's happening conceptually. I've seen other people build the recipe from the original dataset, but we use the training set
    (i.e. recipe(injuries ~ ., data = crash_train))

    • @JuliaSilge
      @JuliaSilge  3 роки тому +1

      Reading this section might help clear some things up for you: www.tmwr.org/recipes.html#skip-equals-true
      As well as the section a little bit further about row sampling steps like downsampling.
      A subsampling step like `step_downsample()` will downsample the analysis set of a CV fold but not the assessment set.