The visualizations used are simple, elegant and impactful - absolutely love them. Also my first time learning of the "space" argument in facet_grid - definitely going to be using that more. Thank you Julia!
Either one works! We are trying to focus on using `bake()` in documentation, because feedback from people indicated it was confusing to have yet another function to do the same thing.
I may call set.seed() more than strictly necessary for reproducible results, but I tend to call it every time I know that I'll be doing something that involves randomness. So for example, the initial split involves the RNG, and then creating the folds involves the RNG. I have used enough functions in R that change the RNG in ways I don't expect that I have some habits that may be overkill.
@@christelleleitzingerphd7491 The value for the seed is not a special number; we just need to set it to something. It feels weird to me to continually set the seed to the same value within the same script, but again, that may be a habit from bad experiences in the past.
@@JuliaSilge Haha! Thank you so much for the answer! I don't know what is the best to do. I will do some research on that! Thanks! And again, I love your video!
It is saying to put density on the y axis instead of counts. It looks like a newer way to do this is to use `after_stat()`: ggplot2.tidyverse.org/reference/geom_histogram.html
Great channel thanks a lot for your work, you helped me a lot. However, there is something I dont really understand. You are preprocessing the data on the train data. But the recipe is never applied on the test data or does the "last_fit()" command automatically apply the recipe also on the test data ?
Notice that the **workflow** contains both a model and a recipe; when we fit or tune a workflow, all the components (preprocessor + model) are estimated from training data and then applied to new data. You can read a bit more about this in Ch 8 of our book, especially this section: www.tmwr.org/workflows.html#workflows-and-recipes
These are just the auto-generated subtitles from UA-cam, which appear after the video is online for a while. They are frankly pretty darn good! It would of course be better to create them from scratch and upload them with the video but I haven't started doing that yet.
The best R based channel on yt! Is it possible to have a video with time series data?
Would also love a time series video :)
The visualizations used are simple, elegant and impactful - absolutely love them. Also my first time learning of the "space" argument in facet_grid - definitely going to be using that more. Thank you Julia!
Very nice exploitation of a both dataset and applied modeling ,this is awesome the see you your video Julia
I really dig your videos. I learn a lot from you and your videos. You're an amazing expositoR!
Fantastic, Julia, thank you! Very helpful.
Really enjoying these videos, Julia, and learning a lot. Thank you!
I love your videos! Tks a lot!!!
A neat trick could be to predict on the unknown sources and in theory when they were checked in the future you would be able to compare.
Ah, I totally should have done that!
Why did you use bake(new_data = NULL) and not juice() at 36:23? (I thought it was a shortcut)
Either one works! We are trying to focus on using `bake()` in documentation, because feedback from people indicated it was confusing to have yet another function to do the same thing.
@@JuliaSilge Okay, thanks! (and a big thank for your videos too!)
Awesome video, thanks! I was wondering why you changed your set.seed().
I may call set.seed() more than strictly necessary for reproducible results, but I tend to call it every time I know that I'll be doing something that involves randomness. So for example, the initial split involves the RNG, and then creating the folds involves the RNG. I have used enough functions in R that change the RNG in ways I don't expect that I have some habits that may be overkill.
@@JuliaSilge I see! And you are not using the same set.seed number (123 then 234). Is there a reason for it?
@@christelleleitzingerphd7491 The value for the seed is not a special number; we just need to set it to something. It feels weird to me to continually set the seed to the same value within the same script, but again, that may be a habit from bad experiences in the past.
@@JuliaSilge Haha! Thank you so much for the answer! I don't know what is the best to do. I will do some research on that! Thanks! And again, I love your video!
Julia, could you make a video using LIME on the workflow? Thanks
At 14:00, What do the two periods in the aes( y = ..density..) function mean?
It is saying to put density on the y axis instead of counts. It looks like a newer way to do this is to use `after_stat()`:
ggplot2.tidyverse.org/reference/geom_histogram.html
@@JuliaSilge I had never seen that notation before. Thank you, Julia! I learn so much for these videos!
Great channel thanks a lot for your work, you helped me a lot. However, there is something I dont really understand. You are preprocessing the data on the train data. But the recipe is never applied on the test data or does the "last_fit()" command automatically apply the recipe also on the test data ?
Notice that the **workflow** contains both a model and a recipe; when we fit or tune a workflow, all the components (preprocessor + model) are estimated from training data and then applied to new data. You can read a bit more about this in Ch 8 of our book, especially this section: www.tmwr.org/workflows.html#workflows-and-recipes
@@JuliaSilge thank you so much Julia
Very nice video. Which software did you use to add the sub-titles on the video?
These are just the auto-generated subtitles from UA-cam, which appear after the video is online for a while. They are frankly pretty darn good! It would of course be better to create them from scratch and upload them with the video but I haven't started doing that yet.