Dear Yuri, Thanks for your great videos, which I have been following and recommending my fellow physicians. These are so great ! Please consider making some tuts on univariable and multivariable analyses on oncology. With independent parameters like Age, cancer stage, treatment, baseline lab values, ECOG scores, etc and outcomes like time to event, death or not death. That would be great !
Dear so4ragb, thank you very much for your feedback! And thanks for the suggestion. Interestingly, I am already in process of making a video about a cool package for quick uni- and multivariate analyses in med area ... although statistics is truly agnostic. So, please, stay tuned ;)
Thanks for the wonderful explanation. As I said before you set the bar high. By the way I want to move to data science, I have a bachelor's degree in Chemistry and Master's degree in Environmental Science from Addis Ababa University. I have started learning data science with R programing software for the last 6 months. What best can you advice me. Obviously I live in Ethiopia so I can't take online course because we don't have the international bank payment system so I depend on UA-cam and reading books that are freely available. Data science really excites me a lot. Thanks.
Hi man, glad to hear that you are excited for data science! The good news is, with internet you can learn anything! There are more than enough books and free resources to learn about data science and R or python! Please, don't pay for courses, they are usually crap. UA-cam, blogs and free books will be enough. If you want to really learn R, here are some free books: R4DS, Tidy Modeling with R and ISLR. If you focus on those (+ some practice and real work + learning from other ressourses) you'll be a better data scientist in a year then 90% of those who finished a fancy university. So, keep up the learning energy and I hope my youtube channel helps you on the way there! Cheers
Thanks for your prompt reply. I will do as you said, I'm into R first so I will stay in it for a while to master it. I will also stay in touch with your channel. I am on LinkedIn so we can be friends there too. Thanks.
Very well explained! I had a doubt: In your example, the two groups have the same size of observations (15). Can I play in groups of different sizes with the same video parameter? Tks
Sure, since MWU test is for independent samples, it does not matter how many observations every sample has. For dependent samples, Wilkoxon test, it does. Thanks for the feedback and thanks for watching!
Thanks for the video! I have a question: The ranking makes it obsolete to know the distribution. However, how would you approach the same problem under the new Generalized Linear Models as base? For what I understand, all previous hypothesis testing tests can be done by Generalized Linear Models or Lineal Mixed Models. For GLMs, I would need a link function, but how do I decide which? I am not sure what the advantage of the ranking will be apart of getting around the assumptions of normality.
Dear Luisa, to answer what link function to choose would need the whole new video and I am planning to make one in the future. While the ranking resolves normality and heterogenety of variances, I am not a big fan of ranking, because it kills the real data we have measured. It was just important to describe, so that people dont think that they compare medians. Median, by the way, is the better choice to address many problems in the data, so that I would recommend to dive into quantile regression first, before getting to link functions. I have two videos on Quantile Regression on the channel, so, feel free to check them out. Cheers!
Hello @Yuzar, thank you for sharing all this knowledge. I"m working on some datas about changes in soil organic carbon after conversion of forest into agriculture. Those data were collected in diference depth (fives depth), besides doing a plot between forest and agriculture in each depth. Is there anyway using this package (ggbetweenstats) that I can plot all the depth into the same plot and see the changes among the groups?
suer, you either use grouped_ggbetweenstats to produce subplots for different depth, or you can put all the depth into one column and determine the order of categories on the x-axis via "factor" and "levels" and then put the variable on the x-axis. then you'll be able to get post-hoc tests
Thank you so much! can you tell me please, I have 4 animal groups, in each group there are 5 animals. the groups are: 1- group of intact animals, 2- group which exposed to first factor, 3- group which exposed to second different factor and 4- control group without exposure to the second factor. I'm interested in comparing between the 3rd and 4th groups, in same time i want to compare 4th group with 1st group. In your opininon which test i should choose, Mann-whitney to compare firstly 4th group with 1st group and then 3rd with 4th group, or Kruskal-Wallis to compare all the groups together. I just tried the both test, Kruskal-Wallis gives me no differences while Mann-whitney gives. I guess the reults of Mann-Whitney more trustful but I am not sure so i decided to ask you as a statistician. P.S. I didn't apply any correction method for mann-whitney
Hi Alex, the short answer: ggbetweenstats(mtcars, x = cyl, y = mpg, p.adjust.method = "none", pairwise.display = "all"). The longer answer is: you have to correct for multiple comparisons! Or at least explicitely state it in your paper. I have a video on kruskal wallis on my channel, in case you still did not discover it. Hope that helps!
great explanation!! However I am getting error while using ggstatsplot function. Can you please suggest an alternative here or recommendation of solving this error?
sure, since ggstatsplot works on top of other languages, there might be discrepancies between packages. so, update R, then update RStudio, then update all the packages. then if you still get the error message, just read it carefully, there is may be one package missing, check whether you data is in a right format, or just google the error message, there were tons of folk, who hat it before too, and most of them are already solved. cheers
I would still go with normal distribution. If not sure, you can use plot_density() or ggqqplot() for this group and visually test for normality, when it is aproximately (nobody knows what approximately means ;) everyone decided for himself) normla, use a parametric test
yes, and this is pretty sure, no guessing ;) the two tests (Shapiro and Levene's) are useful, because they help you to decide which final test to take.
Thank you. When I rung ggbetweenstats, I get following error msg. Any idea where the problem lies ?: Error in `mutate()`: ! Problem while computing `n_label = paste0(one_drug1, " (n = ", .prettyNum(n), ")")`. Caused by error in `vapply()`: ! values must be length 1, but FUN(X[[1]]) result is length 3 > rlang::last_error() Error in `mutate()`: ! Problem while computing `n_label = paste0(one_drug1, " (n = ", .prettyNum(n), ")")`. Caused by error in `vapply()`: ! values must be length 1, but FUN(X[[1]]) result is length 3 --- Backtrace: 1. ggstatsplot::ggbetweenstats(...) 15. statsExpressions:::.prettyNum(n) 16. base::prettyNum(x, big.mark = ",", scientific = FALSE) 17. base::vapply(...) Run `rlang::last_trace()` to see the full context. > rlang::last_trace() Error in `mutate()`: ! Problem while computing `n_label = paste0(one_drug1, " (n = ", .prettyNum(n), ")")`. Caused by error in `vapply()`: ! values must be length 1, but FUN(X[[1]]) result is length 3
You are the best you can find on youtube! Thank you so much
Thanks 🙏 glad you enjoyed my content!
Brilliant thank you so much
Thanks 🙏 if you liked this one, you might like the package reviews, gtsummary, for example is one of the most useful
thank you so much! that was really a helpful and accurate explanation
Glad it was helpful!
Very well explained, thanks!
Glad you enjoyed it! 😊
Great video! Thank you very much!
You are welcome! I am glad you enjoyed it
Dear Yuri,
Thanks for your great videos, which I have been following and recommending my fellow physicians. These are so great !
Please consider making some tuts on univariable and multivariable analyses on oncology. With independent parameters like Age, cancer stage, treatment, baseline lab values, ECOG scores, etc and outcomes like time to event, death or not death. That would be great !
Dear so4ragb, thank you very much for your feedback! And thanks for the suggestion. Interestingly, I am already in process of making a video about a cool package for quick uni- and multivariate analyses in med area ... although statistics is truly agnostic. So, please, stay tuned ;)
@@yuzaR-Data-Science that's a fantastic news. Very much looking forward to watching it. Hoping for more clinical stats 😉. Thanks for all your efforts.
You are very welcome!
Thanks for the wonderful explanation. As I said before you set the bar high. By the way I want to move to data science, I have a bachelor's degree in Chemistry and Master's degree in Environmental Science from Addis Ababa University. I have started learning data science with R programing software for the last 6 months. What best can you advice me. Obviously I live in Ethiopia so I can't take online course because we don't have the international bank payment system so I depend on UA-cam and reading books that are freely available. Data science really excites me a lot. Thanks.
Hi man, glad to hear that you are excited for data science! The good news is, with internet you can learn anything! There are more than enough books and free resources to learn about data science and R or python! Please, don't pay for courses, they are usually crap. UA-cam, blogs and free books will be enough. If you want to really learn R, here are some free books: R4DS, Tidy Modeling with R and ISLR. If you focus on those (+ some practice and real work + learning from other ressourses) you'll be a better data scientist in a year then 90% of those who finished a fancy university. So, keep up the learning energy and I hope my youtube channel helps you on the way there! Cheers
Thanks for your prompt reply. I will do as you said, I'm into R first so I will stay in it for a while to master it. I will also stay in touch with your channel. I am on LinkedIn so we can be friends there too. Thanks.
sure, just send me the invite ;)
Amazing
Thanks! Glad you liked it!
Very well explained!
I had a doubt: In your example, the two groups have the same size of observations (15). Can I play in groups of different sizes with the same video parameter? Tks
Sure, since MWU test is for independent samples, it does not matter how many observations every sample has. For dependent samples, Wilkoxon test, it does. Thanks for the feedback and thanks for watching!
Thanks for the video! I have a question: The ranking makes it obsolete to know the distribution. However, how would you approach the same problem under the new Generalized Linear Models as base? For what I understand, all previous hypothesis testing tests can be done by Generalized Linear Models or Lineal Mixed Models. For GLMs, I would need a link function, but how do I decide which? I am not sure what the advantage of the ranking will be apart of getting around the assumptions of normality.
Dear Luisa, to answer what link function to choose would need the whole new video and I am planning to make one in the future. While the ranking resolves normality and heterogenety of variances, I am not a big fan of ranking, because it kills the real data we have measured. It was just important to describe, so that people dont think that they compare medians. Median, by the way, is the better choice to address many problems in the data, so that I would recommend to dive into quantile regression first, before getting to link functions. I have two videos on Quantile Regression on the channel, so, feel free to check them out. Cheers!
Hello @Yuzar, thank you for sharing all this knowledge. I"m working on some datas about changes in soil organic carbon after conversion of forest into agriculture.
Those data were collected in diference depth (fives depth), besides doing a plot between forest and agriculture in each depth. Is there anyway using this package (ggbetweenstats) that I can plot all the depth into the same plot and see the changes among the groups?
suer, you either use grouped_ggbetweenstats to produce subplots for different depth, or you can put all the depth into one column and determine the order of categories on the x-axis via "factor" and "levels" and then put the variable on the x-axis. then you'll be able to get post-hoc tests
@@yuzaR-Data-Science Thank you. I'm gonna do that.
@@ednacossa8863 you are welcome!
Thank you so much! can you tell me please, I have 4 animal groups, in each group there are 5 animals. the groups are: 1- group of intact animals, 2- group which exposed to first factor, 3- group which exposed to second different factor and 4- control group without exposure to the second factor. I'm interested in comparing between the 3rd and 4th groups, in same time i want to compare 4th group with 1st group. In your opininon which test i should choose, Mann-whitney to compare firstly 4th group with 1st group and then 3rd with 4th group, or Kruskal-Wallis to compare all the groups together. I just tried the both test, Kruskal-Wallis gives me no differences while Mann-whitney gives. I guess the reults of Mann-Whitney more trustful but I am not sure so i decided to ask you as a statistician.
P.S. I didn't apply any correction method for mann-whitney
Hi Alex, the short answer: ggbetweenstats(mtcars, x = cyl, y = mpg, p.adjust.method = "none", pairwise.display = "all"). The longer answer is: you have to correct for multiple comparisons! Or at least explicitely state it in your paper. I have a video on kruskal wallis on my channel, in case you still did not discover it. Hope that helps!
great explanation!! However I am getting error while using ggstatsplot function. Can you please suggest an alternative here or recommendation of solving this error?
sure, since ggstatsplot works on top of other languages, there might be discrepancies between packages. so, update R, then update RStudio, then update all the packages.
then if you still get the error message, just read it carefully, there is may be one package missing, check whether you data is in a right format, or just google the error message, there were tons of folk, who hat it before too, and most of them are already solved. cheers
The lowest p-value in one of the group I want to test is 0.06, is it low enough to be called not normally distributed?
I would still go with normal distribution. If not sure, you can use plot_density() or ggqqplot() for this group and visually test for normality, when it is aproximately (nobody knows what approximately means ;) everyone decided for himself) normla, use a parametric test
@@yuzaR-Data-Science Okay, so I did use ggqqplot, and the data sits in the grey color area, so, they're normally distributed?
Yes
@@yuzaR-Data-Science Okay, I guess I'll use Welch t-Test, since the variance are not equal.
yes, and this is pretty sure, no guessing ;) the two tests (Shapiro and Levene's) are useful, because they help you to decide which final test to take.
Thank you. When I rung ggbetweenstats, I get following error msg. Any idea where the problem lies ?:
Error in `mutate()`:
! Problem while computing `n_label = paste0(one_drug1, "
(n = ", .prettyNum(n), ")")`.
Caused by error in `vapply()`:
! values must be length 1,
but FUN(X[[1]]) result is length 3
> rlang::last_error()
Error in `mutate()`:
! Problem while computing `n_label = paste0(one_drug1, "
(n = ", .prettyNum(n), ")")`.
Caused by error in `vapply()`:
! values must be length 1,
but FUN(X[[1]]) result is length 3
---
Backtrace:
1. ggstatsplot::ggbetweenstats(...)
15. statsExpressions:::.prettyNum(n)
16. base::prettyNum(x, big.mark = ",", scientific = FALSE)
17. base::vapply(...)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
Error in `mutate()`:
! Problem while computing `n_label = paste0(one_drug1, "
(n = ", .prettyNum(n), ")")`.
Caused by error in `vapply()`:
! values must be length 1,
but FUN(X[[1]]) result is length 3
Hey, try to update all packages 📦 that should solve it
This is a wonderful and interesting channel. I found it very useful. worth subbing and liked ! a fellow creator,,,,
Thanks, Yasin! Appreciate your feedback!