Thanks, Julia for the video. Really interesting how you approached the cleaning and models in comparison to David. Pretty nice you keep making these videos. They are super helpful.
Thanks a lot, excellent material. I'm having a different response from the fitted workflow (@ 27:00). I'm receiving a tibble: 31 x 3 with only one intecept while yours is a tibble 1,563 x 5 with many intercepts. I copy/paste the code as in my blog post.
Ah, I believe there has been a change in parsnip since this video was published that you only get the lambda you actually specified, not the whole path of lambdas: github.com/tidymodels/parsnip/blob/master/NEWS.md#parsnip-013
Thanks so much Julia for the valuable videos, im trying to evaluate LDA topic modelling on tweets using NPMI , do you have an idea how to implement it in R? thanks Sam
Thanks Julie, this is great. Just got one question at 4:20. The other day I realised I can put pipes inside a mutate to get something like below... do you reckon using this is a good idea (I don't see it much but it feels really efficient)? transmute(episode_name = title %>% str_to_lower() %>% str_remove_all(remove_regex) %>% str_trim(), imdb_rating)
You might check out the spatialsample package: spatialsample.tidymodels.org/ And here is a blog post where I walk through how to use it: juliasilge.com/blog/drought-in-tx/
Thanks for the nice tutorial. At 22:30, office_prep was created. What was that about? It was never used downstream. In general, I don't get the use of prep and bake.
I think it *is* useful to know how to use `prep()` and `bake()` if you are going to be a tidymodels user, in order to debug and problem solve when things don't go right with your recipes. It's a way to check out how your recipe will preprocess your data for modeling. You can read about what the two functions do here: www.tmwr.org/recipes.html#using-recipes
Awesome and informative video as always! I have a question and hope you can help clarify - I noticed when you did the bootstrap resampling you used office_train as the dataset, which is the unmodified training data. In another video (the hotel bookings one) you used the juiced recipe as the dataset when creating the monte carlo cross validation resamples. Is there a best practice on which dataset to use when resampling with tidymodels - the un-processed training data vs the pre-processed & juiced recipe data? Thanks!
oh! wait is it because here you're using a workflow() and in the hotel bookings video you weren't? and if so, is the workflow applying the recipe, prepping and juicing in the resampling step for you?
@@brendang8610 Yes, that's basically it! A workflow that includes a recipe will apply that recipe. Generally it is probably better practice to do resampling on the unmodified training set, because otherwise you can get LEAKAGE from your preprocessing steps and then overly optimistic results from resampling.
Hi @julia - great video! funny - I tried tuning hyperparameters with two different values of trees. when I tune the model with trees = 100 and with trees = 1000 the order of variable importance changes. With trees = 100 the most important variable is mhi_2018, followed by one_race_a, while with trees = 1000 the most important variable is one_race_a (followed by mhi_2018). How is this possible? From where this could be coming from?
I think you may be asking about a different video in this comment? But yes, maybe I should have been more clear that the variable importance I show is for *that model specifically*. The hyperparameters you choose for your algorithm often have an impact on variable importance. (And if you use variable importance to do feature selection, then that will change the hyperparameters you choose!) There is some related discussion here: stats.stackexchange.com/questions/264533/how-should-feature-selection-and-hyperparameter-optimization-be-ordered-in-the-m
@@JuliaSilge OMG, how embarrassing :), indeed it is related to another video of yours. The question was about this video: ua-cam.com/video/OMn1WCNufo8/v-deo.html (Predict Childcare Costs), but UA-cam kept rolling to next video while I was waiting for my model to be trained :). However, I was surprised that hyperparameter such as number of trees could impact order of variable importance. I guess my intuition was wrong.
Hi Julia, Love the video! I was wondering how you would compare the accuracy of the model to the testing data? I need to submit a report with both the predicted and actual values and cannot seem to find it.
In the vip package, what "importance" is varies from model to model. You can look more at the documentation but for a linear model like a lasso regularized model, it is just literally the coefficients from the model itself (similar to coefficients from `lm()`). You can check out documentation for vip here: koalaverse.github.io/vip/
Thanks for the great video Julia! I learned a lot. If we use a GLM, we might want to use a univariate filter to keep only relevant variables in the model since GLM's don't have built-in variable selection. Is there a way to do this with tidymodels? Maybe with recipes?
Hi Julia, thanks for the video! I am getting the error: "All models failed in tune_grid(). See the `.notes` column." when running tune_grid(). My code is identical to yours and I'm also using a mac. Any ideas?
Thanks, Julia for the video. Really interesting how you approached the cleaning and models in comparison to David. Pretty nice you keep making these videos. They are super helpful.
Can't agree with you more!
Every time I watch one of your videos I learn something new and become more confident in my modeling. Thank you so much for them!
Love it! Finding this channel has made my day.
Great video Julia.
It was a refresher for add_count and geom_col because I stop using them for some reason.
Oh wow, It's so amazing. I know you via Text mining with R book, Found David and your channel is a memorable milestone in my learning R process :D
This was fantastic! It got me really excited about tidymodels =)
Thanks a lot, excellent material. I'm having a different response from the fitted workflow (@ 27:00). I'm receiving a tibble: 31 x 3 with only one intecept while yours is a tibble 1,563 x 5 with many intercepts. I copy/paste the code as in my blog post.
Ah, I believe there has been a change in parsnip since this video was published that you only get the lambda you actually specified, not the whole path of lambdas: github.com/tidymodels/parsnip/blob/master/NEWS.md#parsnip-013
Thanks so much Julia for the valuable videos, im trying to evaluate LDA topic modelling on tweets using NPMI , do you have an idea how to implement it in R? thanks Sam
Thanks Julie, this is great. Just got one question at 4:20.
The other day I realised I can put pipes inside a mutate to get something like below... do you reckon using this is a good idea (I don't see it much but it feels really efficient)?
transmute(episode_name = title %>% str_to_lower() %>% str_remove_all(remove_regex) %>% str_trim(),
imdb_rating)
Julia, this is great!! It's so well explained (: ... Do you know by any chance how to do exactly this for spatial (polygon) data?
You might check out the spatialsample package:
spatialsample.tidymodels.org/
And here is a blog post where I walk through how to use it:
juliasilge.com/blog/drought-in-tx/
@@JuliaSilge oh, my god! This is GREATTTT!!! many many thanks!!
Thanks for the nice tutorial. At 22:30, office_prep was created. What was that about? It was never used downstream. In general, I don't get the use of prep and bake.
I think it *is* useful to know how to use `prep()` and `bake()` if you are going to be a tidymodels user, in order to debug and problem solve when things don't go right with your recipes. It's a way to check out how your recipe will preprocess your data for modeling. You can read about what the two functions do here: www.tmwr.org/recipes.html#using-recipes
I am reviewing all the videos and adding the tree episode names as some sort of homework for myself.
Awesome and informative video as always! I have a question and hope you can help clarify - I noticed when you did the bootstrap resampling you used office_train as the dataset, which is the unmodified training data. In another video (the hotel bookings one) you used the juiced recipe as the dataset when creating the monte carlo cross validation resamples. Is there a best practice on which dataset to use when resampling with tidymodels - the un-processed training data vs the pre-processed & juiced recipe data? Thanks!
oh! wait is it because here you're using a workflow() and in the hotel bookings video you weren't? and if so, is the workflow applying the recipe, prepping and juicing in the resampling step for you?
@@brendang8610 Yes, that's basically it! A workflow that includes a recipe will apply that recipe. Generally it is probably better practice to do resampling on the unmodified training set, because otherwise you can get LEAKAGE from your preprocessing steps and then overly optimistic results from resampling.
Hi @julia - great video!
funny - I tried tuning hyperparameters with two different values of trees. when I tune the model with trees = 100 and with trees = 1000 the order of variable importance changes. With trees = 100 the most important variable is mhi_2018, followed by one_race_a, while with trees = 1000 the most important variable is one_race_a (followed by mhi_2018).
How is this possible? From where this could be coming from?
I think you may be asking about a different video in this comment?
But yes, maybe I should have been more clear that the variable importance I show is for *that model specifically*. The hyperparameters you choose for your algorithm often have an impact on variable importance. (And if you use variable importance to do feature selection, then that will change the hyperparameters you choose!) There is some related discussion here:
stats.stackexchange.com/questions/264533/how-should-feature-selection-and-hyperparameter-optimization-be-ordered-in-the-m
@@JuliaSilge OMG, how embarrassing :), indeed it is related to another video of yours.
The question was about this video: ua-cam.com/video/OMn1WCNufo8/v-deo.html (Predict Childcare Costs), but UA-cam kept rolling to next video while I was waiting for my model to be trained :).
However, I was surprised that hyperparameter such as number of trees could impact order of variable importance. I guess my intuition was wrong.
Hi Julia,
Love the video! I was wondering how you would compare the accuracy of the model to the testing data? I need to submit a report with both the predicted and actual values and cannot seem to find it.
This is super interesting. I would love to do this analysis with Doctor Who (specifically New Who!)
What does the value used to indicate "importance" on the x-axis mean? is that R^2?
In the vip package, what "importance" is varies from model to model. You can look more at the documentation but for a linear model like a lasso regularized model, it is just literally the coefficients from the model itself (similar to coefficients from `lm()`). You can check out documentation for vip here:
koalaverse.github.io/vip/
Great video! Love it!
Thanks for the great video Julia! I learned a lot. If we use a GLM, we might want to use a univariate filter to keep only relevant variables in the model since GLM's don't have built-in variable selection. Is there a way to do this with tidymodels? Maybe with recipes?
Not currently, but we're interested in recipes supporting feature selection like that in the future!
That's great! Thank you.
Hi Julia, thanks for the video! I am getting the error: "All models failed in tune_grid(). See the `.notes` column." when running tune_grid(). My code is identical to yours and I'm also using a mac. Any ideas?
all of the .notes say "model 1/1 (predictions): Error in cbind2(1, newx) %*% nbeta: invalid class 'NA' to dup_mMatrix_as_dgeMatrix"
@@travisknoche5639 Is this using the same code/data as in my blog post? juliasilge.com/blog/lasso-the-office/ Or different data?
@@JuliaSilge Yep!
@@travisknoche5639 Does the first fit work, when you are not tuning?
you are the best, I should put your name in my PhD thesis
What RStudio theme are you using? I could not find that in the default appearances.
It's one of the themes from the rsthemes package: github.com/gadenbuie/rsthemes
@@JuliaSilge Thank you for the quick reply! Watched and now reading through the blog explanation for further understanding.
Amazing thank you very much.
Very nice video! Well explained and above all: 30:18 :-)