What a great video, waw! Even the small section on the ROC-curve, thaught me more than all the other videos out there! Would love a video in which you break down these metrics of the curve more into detail. Thank you so much!!!
Glad you enjoyed it, Elias! I am working on roc curve and optimal cutpoint video right now. Hope it will deliver the things you are interested in. Stay tuned. Kind regards from holidays in Australia
Like how you explain everything. And a clear, easy to understand voice (some none-1st Lang English speakers are SO MUCH WORK to understand - way too much cognitive load for me). You're easy to parse...and thanks for the non- YT generated caption text 💙💙💙
Awesome! I’m really stoked you find it easy to understand! Makes all the work worth it! The subtitles were also suggested by my permament viewer. Thus, don't hesitate to suggest any improvements I can make for the video to increase the quality of content. Thanks for watching! 🙌
Really like your videos and want to follow along with them with my own data. You have great expertise and know how to code one's own data in the best way so that you can do everything that you taught on your channel. I think this is the only hurdle left for me. I want to apply what you taught on my own data. Your way of teaching and also your videos being more towards real research and article writing orientated makes me ask for a video on coding data the right way in R which will go through the tools that you teach such as, flextable, gtsummary, sjplot, etc without any issue and giving some common pitfall there can be. The main problem I am facing is to code the levels and labels of factors and order them. In SPSS we give it a number and a label. Well, I think most of us are trying to come for SPSS to R so this will also be a good video idea if it is contrasted with SPSS also. Really can't find a video on youtube that teaches it more towards research orientated. Love your content. The best channel for teaching what you need to know in R.
Thank you very much Muhammad for such a nice feedback! Sure, in the beginning we'll all had difficulties to switch to R. I came from Matlab and NCSS to R. And also needed to box myself through the error messages. The good news is - the error messages are finite. The are only a few (20-50) error messages, you quickly learn how to deal with. After it error message will become a help. Levels are easy, you can determine the order yourself: library(dplyr) library(forcats) # install the packages, if they don't load df
Of course! The mixed-models content will come in the near future. Just need to cover some other basic models and topics, which also will be useful to you I hope, and then I plan to cover most of the spectra of the mixed models beyond linear and logistic and present the best (in my opinion) packages for mixed-models. thus, please, stay tuned! Kind regards! Yury
Another great video! My R output has become 1000% easier and better following your videos. Will you include interpretation and visualization of interactions also?
hey man, thanks a lot for such a nice feedback! yes, I plan a video on interactions in logistic regression. stay tuned and if you think my content could be also helpful for someone you know, please share my videos with them :) cheers
thanks you very much for the feedback! will do survival analysis with R similar to this one! I have two very old not very good and not R, but a bit theoretical videos on survival analysis on this channel. I don't think they are helpful, but you want, you could check them out.
Sure thing! They were on my radar anyway, but now I am getting serious about them! The content will come in the near future. Just need to cover some other basic models and topic, which also will be useful to you I hope, and then I plan to cover most of the spectra of the mixed models beyond linear and logistig and present the best (in my opinion) packages for mixed-models. thus, please, stay tuned! Kind regards! Yury
Hi Greig, When you mean a usual linear regression (not mixed-effects linear), then I have recently done 4 videos on it. Besides, I have content on quantile, robust, bootstrapping regressions .... since I use them too in my everyday work life. Hope other videos will also resonate with you. And hope you'll stick around until I create a mixed-effects series ;) Kind regards! Yury
Thank you so much for the kind words! Your support really motivates me to keep creating! If it helped you, please, share it with somebody, who also might benefit from it! That would mean the world to me! Cheers, Yury
Your video is amazing and so explanatory!!! Thanks for posting!!! Could I ask something please, as I see conflicting information- if you have several independent variables(predictors) and you want to assess which ones are more important for your logistic regression (as in univariate analysis), is it appropriate to check each one with logistic regression? What would you recommend? I read that it is an outdated approach? But in medicine I have seen several authors using it?
@@yuzaR-Data-Science thanks for replying! Just to clarify would you put all of the available predictors in a multivariate model and then based on p-values
thanks Robert! When you mean linear regression, then I have recently done 4 videos on it. Besides, I have content on quantile, robust, bootstrapping regressions .... since I use them too in my everyday work life. Hope other videos will also resonate with you. Kind regards! Yury
:) I actually already did ;) check out my video on {glmulti} package and let me know whether it's what you wanted. Thanks for feedback and for watching!
Terrific video, very detailed yet clear. I don't know if you covered it already, but if you plan to cover cross-tabulation analysis, would you consider giving my 'chisquare' package a try?
Hi Nike, thanks for the positive feedback. And I am interested in your 'chisquare' package. Unfortunately I did not find much info online on it. I have actually already made one video on chi-squared test. If you have seen this one, what does your package does better and differently? If you send me the code for what your package can do and explanations why it is useful and why it is better than usuals chi-square function or ggbarstats, I would love to make a video on your package!
@@yuzaR-Data-Science Hello, and thanks for your reply. The package is on CRAN, and it's currently in its version 1.1.1 (it started from vers 0.1 in 2022). In few words, the package is meant to provide a one-stop shop for chi-square analysis of cross-tabs, and provides a number of facilities that are not coherently integrated in existing packages (to the best of my knowledge). For example, it provides (in just one simple line of code), different types of chi-sq residuals (with adjustements for multiple comparisons, and color coded for easy visual interpretation) and a extensive suite of association coefficients (for both 2x2 and larger tables), some of which not currently implemented elsewhere (maximum-corrected version of the phi and Cramer's V coeff, corrected version of Goodman-Kruskal's lambda, both asymmetric and symmetric). Also, it provides different versions of the chi-sq test itself, like the N-1-corrected version, which (again) is not currently provided elsewhere. As for post-hoc-analysis, it provides measures not currently available elsewhere, like the so-called Quetelet index and the IJ association factor. Further, it computes independent odds ratios for tables larger than 2x2, while for 2xK tables it can optionally produce a plot of pair-wise odds ratios (plus confidence intervals). Also, it provides suggestions as to a 'viable' chi-sq test given the input table characteristics. Effect size verbal articulation for relevant association coefficients (both chi-square-based and marginal-free) are also reported. Finally, all the outputs are nicely formatted via the 'gt' table package. I think that should be almost pretty much all. Everything can be obtained by just running: chisquare(mytable). Cheers.
hey, your package is impressive, I found the visualization of odds ratios good. I have two questions: - first, do you have more info, like article or so on post hoc pairwise tests with all the significance, like when we have a table 4x4 or 3x5, so that all categories (percentages) are checked automatically. till now I use a pairwise_fishers_test() function which is cool, but an extra code. It would be amazing when we could just use your function and get all we need - ORs plot with significances and all the pairwise 2x2 tests from bigger contingency table in some form of a table. - second, may be more important: I could not get chisquare() function work with a simple table() function: > chisquare(table(mtcars$cyl, mtcars$am) ) Error in `gt::tab_style()`: ! Failed to style the body of the table. Caused by error in `cells_body()`: ! Can't select columns that don't exist. ✖ Column `0` doesn't exist. Run `rlang::last_trace()` to see where the error occurred. so, when this can be allowed and we could do bigger tables, like this one: chisquare(table(ISLR::Wage$jobclass, ISLR::Wage$education) ), this could be awesome!
@@yuzaR-Data-Science Hello. Thanks for taking the time to check that and for replying. I do not want to hijack your comments section here. If you want to contact me on the email you find in the package documentation, I will more than happy to discuss things further. Looking forward. Cheers.
hey mate, no worries, you don't hijack the comments section! :) I am actually glad to read and answer the comments. the next weeks I'll be on holidays, but we can talk about your package next month. generally, as I said before, I would love to be able to apply your chisquare function to a simple cross table, like that "chisquare(table(mtcars$cyl, mtcars$am) )". do you think it's possible?
Unfortunately not. Only two older theoretical videos on survival, but they low quality and no programming. Plan to do the similar one in the future. So, please, stay tuned.
Excellent video! I came here from the recommendation of the video on simple linear regression, and it's great. I have a question that I haven't been able to resolve. When using performance, I understand that categorical variables are analyzed by creating dummies, but I don't know how the VIF is calculated. Is there a formula, or how could we check multicollinearity for non-quantitative variables?
sure, vif works for both numeric and categorical variables. how it's calculated - I don't know exactly, just superficial formula like 1/(1 - summary(model)$r.squared) - but I treat it like a car: I don't know how engine works, but I know how to drive. so, if your vif is below 5 or in some cases below 10, you can accept the results. when vif is above 10 you'll find some multicollinear variables (both numeric and categorical)
I have a short contibution: If any categorical variable exists, classical VIF values are not appropriate. Then, it would be the best to use generalize VIF values.
That is another great video, thank you so much! For the ROC curve, the performance package provides a function which produces a similar result : performance_roc(x = m) %>% plot() . Is there a difference with pRoc::roc() ?
Glad it was helpful! Sure, there are several functions for ROC curves in R. Several packages provide good results, but I like two of them more then the rest: Epi::ROC(form = survived ~ predicted_glm, data = d, plot = "ROC", grid = F, MX = T, MI = F, lwd = 3) cutpointr() - I am workind on a whole video about this one, it's just amazing
Oh man, the more I do science the more I see it's imperfections. Different definitions of the same think are the norm. Unfortunately. But still, I think, science is the best thing people can do.
I tried using the performance package on various models and unfortunately it seems a bit limited to lm and glm. Doesn't work with glmnet, for example. Doesn't work with KNN or RandomForest. I'm assuming it's because it checks for linear assumptions only... Bit of a shame, I had hoped it could be a go to tool for all model types. For now, I find that the parsnip package has more standardized functions like collect_metrics. But they're not as visually cool as check_model...
well, yes, performance package doesn't work well with the machine learning models, but it works with almost all "important" statistical model, from frequentists to bayesian. I use more stats than ML, so I can't suggest an alternative better then collect_metrix at the moment. But I'll get into ML one day and will see what I'll find. In the meanwhile, I hope you enjoy the rest of the videos :) cheers
Of coarse, when you join my channel, I send you the pdf with code and explanations (transcripts) of any video. But, please, don't feel like you have to join! You just can pause the video and type up the code, it's free and not much of a code. Please, only join if you want to support my work and you'll get the benefit of getting the transcripts. Kind regards, Yury
I really like how you are patient and make the interpretations so understandable! I also love the memes 😂 Please do you have a website where you share the codes? Please can you make a video explaining the basic assumptions, visualisations and interpretations of the outcomes from the nearest neighbour matching outcome?
Thank you soo much for such a nice feedback! :) I am never sure, whether people like my memes, but I find similar memes in other videos always good :) the nearest neighbour matching outcome is actually new to me, I check that our and find it totally interesting. I'll put it on the list ;) Thanks for watching!
finally! waited for that video long time! thanks!
Glad I did it! Took me a long time to make. Hopefully it’s useful!
What a great video, waw! Even the small section on the ROC-curve, thaught me more than all the other videos out there! Would love a video in which you break down these metrics of the curve more into detail. Thank you so much!!!
Glad you enjoyed it, Elias! I am working on roc curve and optimal cutpoint video right now. Hope it will deliver the things you are interested in. Stay tuned. Kind regards from holidays in Australia
man your presentation is staggering. keep doing your thing, do not lose an inch
Thanks a ton, Marco 🙏 I’ll do my best to keep the content going 😉 hope you like other videos too. Kind regards
Like how you explain everything. And a clear, easy to understand voice (some none-1st Lang English speakers are SO MUCH WORK to understand - way too much cognitive load for me). You're easy to parse...and thanks for the non- YT generated caption text 💙💙💙
Awesome! I’m really stoked you find it easy to understand! Makes all the work worth it! The subtitles were also suggested by my permament viewer. Thus, don't hesitate to suggest any improvements I can make for the video to increase the quality of content. Thanks for watching! 🙌
Really like your videos and want to follow along with them with my own data. You have great expertise and know how to code one's own data in the best way so that you can do everything that you taught on your channel. I think this is the only hurdle left for me. I want to apply what you taught on my own data. Your way of teaching and also your videos being more towards real research and article writing orientated makes me ask for a video on coding data the right way in R which will go through the tools that you teach such as, flextable, gtsummary, sjplot, etc without any issue and giving some common pitfall there can be. The main problem I am facing is to code the levels and labels of factors and order them. In SPSS we give it a number and a label. Well, I think most of us are trying to come for SPSS to R so this will also be a good video idea if it is contrasted with SPSS also. Really can't find a video on youtube that teaches it more towards research orientated. Love your content. The best channel for teaching what you need to know in R.
Thank you very much Muhammad for such a nice feedback! Sure, in the beginning we'll all had difficulties to switch to R. I came from Matlab and NCSS to R. And also needed to box myself through the error messages. The good news is - the error messages are finite. The are only a few (20-50) error messages, you quickly learn how to deal with. After it error message will become a help. Levels are easy, you can determine the order yourself:
library(dplyr)
library(forcats) # install the packages, if they don't load
df
@@yuzaR-Data-Science Thank you very much. Will also be looking forward to more video.
Great video. Could you please record mixed (random) effects models as well. I know it’s a big ask but at least linear and logistic would be great!
Of course! The mixed-models content will come in the near future. Just need to cover some other basic models and topics, which also will be useful to you I hope, and then I plan to cover most of the spectra of the mixed models beyond linear and logistic and present the best (in my opinion) packages for mixed-models. thus, please, stay tuned! Kind regards! Yury
Another great video! My R output has become 1000% easier and better following your videos. Will you include interpretation and visualization of interactions also?
hey man, thanks a lot for such a nice feedback! yes, I plan a video on interactions in logistic regression. stay tuned and if you think my content could be also helpful for someone you know, please share my videos with them :) cheers
Great video❤
Please make some videos on Survival Analysis as well.
thanks you very much for the feedback! will do survival analysis with R similar to this one! I have two very old not very good and not R, but a bit theoretical videos on survival analysis on this channel. I don't think they are helpful, but you want, you could check them out.
Thank you so much very informative
You are welcome 🙏
That's an amazing summary - thank you so much and yes please to a mixed effects and a linear model video
Sure thing! They were on my radar anyway, but now I am getting serious about them! The content will come in the near future. Just need to cover some other basic models and topic, which also will be useful to you I hope, and then I plan to cover most of the spectra of the mixed models beyond linear and logistig and present the best (in my opinion) packages for mixed-models. thus, please, stay tuned! Kind regards! Yury
Hi Greig, When you mean a usual linear regression (not mixed-effects linear), then I have recently done 4 videos on it. Besides, I have content on quantile, robust, bootstrapping regressions .... since I use them too in my everyday work life. Hope other videos will also resonate with you. And hope you'll stick around until I create a mixed-effects series ;) Kind regards! Yury
Nice lecture deliverd and Best explanation about multivariable logistic through example. Thanks
Glad it was helpful! Thanks for watching!
The best explanation I have ever seen!
Thank you so much for the kind words! Your support really motivates me to keep creating! If it helped you, please, share it with somebody, who also might benefit from it! That would mean the world to me! Cheers, Yury
Excellent work
thanks
Thanks a lot for this piece of work 👌
You are very welcome!
Your video is amazing and so explanatory!!!
Thanks for posting!!!
Could I ask something please, as I see conflicting information- if you have several independent variables(predictors) and you want to assess which ones are more important for your logistic regression (as in univariate analysis), is it appropriate to check each one with logistic regression?
What would you recommend? I read that it is an outdated approach? But in medicine I have seen several authors using it?
no, you can sort them out via p-values, e.g.
@@yuzaR-Data-Science thanks for replying! Just to clarify would you put all of the available predictors in a multivariate model and then based on p-values
Thank you so much. Love your content. This is incredibly helpful. I hope you do linear too :)
thanks Robert! When you mean linear regression, then I have recently done 4 videos on it. Besides, I have content on quantile, robust, bootstrapping regressions .... since I use them too in my everyday work life. Hope other videos will also resonate with you. Kind regards! Yury
Great video. Looking forward to a separate video on ROC curve and confusion matrix.
Coming soon! Thanks for watching! :)
Please do a mixed effects model (random ) video! Your videos are the best on UA-cam so far.
thanks you so much! I'll definetely do videos on mixed models in R! Stay tuned ;)
Great video! Please, can you make one on variable selection for multivariate models? Excellent content!
:) I actually already did ;) check out my video on {glmulti} package and let me know whether it's what you wanted. Thanks for feedback and for watching!
Terrific video, very detailed yet clear. I don't know if you covered it already, but if you plan to cover cross-tabulation analysis, would you consider giving my 'chisquare' package a try?
Hi Nike, thanks for the positive feedback. And I am interested in your 'chisquare' package. Unfortunately I did not find much info online on it. I have actually already made one video on chi-squared test. If you have seen this one, what does your package does better and differently? If you send me the code for what your package can do and explanations why it is useful and why it is better than usuals chi-square function or ggbarstats, I would love to make a video on your package!
@@yuzaR-Data-Science Hello, and thanks for your reply. The package is on CRAN, and it's currently in its version 1.1.1 (it started from vers 0.1 in 2022). In few words, the package is meant to provide a one-stop shop for chi-square analysis of cross-tabs, and provides a number of facilities that are not coherently integrated in existing packages (to the best of my knowledge). For example, it provides (in just one simple line of code), different types of chi-sq residuals (with adjustements for multiple comparisons, and color coded for easy visual interpretation) and a extensive suite of association coefficients (for both 2x2 and larger tables), some of which not currently implemented elsewhere (maximum-corrected version of the phi and Cramer's V coeff, corrected version of Goodman-Kruskal's lambda, both asymmetric and symmetric). Also, it provides different versions of the chi-sq test itself, like the N-1-corrected version, which (again) is not currently provided elsewhere. As for post-hoc-analysis, it provides measures not currently available elsewhere, like the so-called Quetelet index and the IJ association factor. Further, it computes independent odds ratios for tables larger than 2x2, while for 2xK tables it can optionally produce a plot of pair-wise odds ratios (plus confidence intervals). Also, it provides suggestions as to a 'viable' chi-sq test given the input table characteristics. Effect size verbal articulation for relevant association coefficients (both chi-square-based and marginal-free) are also reported. Finally, all the outputs are nicely formatted via the 'gt' table package. I think that should be almost pretty much all. Everything can be obtained by just running: chisquare(mytable). Cheers.
hey, your package is impressive, I found the visualization of odds ratios good. I have two questions:
- first, do you have more info, like article or so on post hoc pairwise tests with all the significance, like when we have a table 4x4 or 3x5, so that all categories (percentages) are checked automatically. till now I use a pairwise_fishers_test() function which is cool, but an extra code. It would be amazing when we could just use your function and get all we need - ORs plot with significances and all the pairwise 2x2 tests from bigger contingency table in some form of a table.
- second, may be more important: I could not get chisquare() function work with a simple table() function:
> chisquare(table(mtcars$cyl, mtcars$am) )
Error in `gt::tab_style()`:
! Failed to style the body of the table.
Caused by error in `cells_body()`:
! Can't select columns that don't exist.
✖ Column `0` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.
so, when this can be allowed and we could do bigger tables, like this one: chisquare(table(ISLR::Wage$jobclass, ISLR::Wage$education) ), this could be awesome!
@@yuzaR-Data-Science Hello. Thanks for taking the time to check that and for replying. I do not want to hijack your comments section here. If you want to contact me on the email you find in the package documentation, I will more than happy to discuss things further. Looking forward. Cheers.
hey mate, no worries, you don't hijack the comments section! :) I am actually glad to read and answer the comments. the next weeks I'll be on holidays, but we can talk about your package next month. generally, as I said before, I would love to be able to apply your chisquare function to a simple cross table, like that "chisquare(table(mtcars$cyl, mtcars$am) )". do you think it's possible?
Fantastic intro to a whole analysis pipeline for logistic regression. Do you have something similar for survival regression? ❤
Unfortunately not. Only two older theoretical videos on survival, but they low quality and no programming. Plan to do the similar one in the future. So, please, stay tuned.
@ looking forward to that. Thanks for this great vid nonetheless!
welcome!
Excellent video! I came here from the recommendation of the video on simple linear regression, and it's great. I have a question that I haven't been able to resolve. When using performance, I understand that categorical variables are analyzed by creating dummies, but I don't know how the VIF is calculated. Is there a formula, or how could we check multicollinearity for non-quantitative variables?
sure, vif works for both numeric and categorical variables. how it's calculated - I don't know exactly, just superficial formula like 1/(1 - summary(model)$r.squared) - but I treat it like a car: I don't know how engine works, but I know how to drive. so, if your vif is below 5 or in some cases below 10, you can accept the results. when vif is above 10 you'll find some multicollinear variables (both numeric and categorical)
Thank you for the video
You are very welcome 🙏
I have a short contibution: If any categorical variable exists, classical VIF values are not appropriate. Then, it would be the best to use generalize VIF values.
Nice contribution! Yeah, the gtsummary package uses GVIF be default:
tbl_regression(model) |> add_vif()
That is another great video, thank you so much! For the ROC curve, the performance package provides a function which produces a similar result : performance_roc(x = m) %>% plot() . Is there a difference with pRoc::roc() ?
Glad it was helpful! Sure, there are several functions for ROC curves in R. Several packages provide good results, but I like two of them more then the rest:
Epi::ROC(form = survived ~ predicted_glm, data = d, plot = "ROC", grid = F, MX = T, MI = F, lwd = 3)
cutpointr() - I am workind on a whole video about this one, it's just amazing
amazing!!
Thank you! Cheers!
Where (or when) could be the "Multivariate" Linear Regression one, since you covered the Multivariable Linear (this time, logistic) Regression?
By multivariate you mean several outcomes? The terms is used often, but people define it differently.
@@yuzaR-Data-Science And I don't like that way. It should be defined equivalently. In that way, many literatures will be produced and reproduced.
Oh man, the more I do science the more I see it's imperfections. Different definitions of the same think are the norm. Unfortunately. But still, I think, science is the best thing people can do.
You are a magician!
Really appreciate your feedback 🙏 Thanks for watching!
I tried using the performance package on various models and unfortunately it seems a bit limited to lm and glm. Doesn't work with glmnet, for example. Doesn't work with KNN or RandomForest. I'm assuming it's because it checks for linear assumptions only... Bit of a shame, I had hoped it could be a go to tool for all model types.
For now, I find that the parsnip package has more standardized functions like collect_metrics. But they're not as visually cool as check_model...
well, yes, performance package doesn't work well with the machine learning models, but it works with almost all "important" statistical model, from frequentists to bayesian. I use more stats than ML, so I can't suggest an alternative better then collect_metrix at the moment. But I'll get into ML one day and will see what I'll find. In the meanwhile, I hope you enjoy the rest of the videos :) cheers
Do you have a website where you share your code?
Of coarse, when you join my channel, I send you the pdf with code and explanations (transcripts) of any video. But, please, don't feel like you have to join! You just can pause the video and type up the code, it's free and not much of a code. Please, only join if you want to support my work and you'll get the benefit of getting the transcripts. Kind regards, Yury
I really like how you are patient and make the interpretations so understandable! I also love the memes 😂
Please do you have a website where you share the codes?
Please can you make a video explaining the basic assumptions, visualisations and interpretations of the outcomes from the nearest neighbour matching outcome?
Thank you soo much for such a nice feedback! :) I am never sure, whether people like my memes, but I find similar memes in other videos always good :) the nearest neighbour matching outcome is actually new to me, I check that our and find it totally interesting. I'll put it on the list ;) Thanks for watching!