R demo | Correlation | Pearson, Spearman, Robust, Bayesian | How to conduct, visualise and interpret
Вставка
- Опубліковано 2 січ 2022
- Having two numeric variables, we often wanna know whether they are correlated and how. One simple command can answer both questions by visualizing the data and conducting frequentists and bayesian correlation analysis at the same time. So, let’s learn how to do that, how to interpret all these results and how to choose the right correlation method in the first place.
Here is a quick R code:
install.packages("ggstatsplot")
library(ggstatsplot)
ggscatterstats(
data = mtcars,
x = mpg,
y = hp,
type = "p") # or "np" or "r"
?ggscatterstats
If you only want more code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
Enjoy! 🥳
Wow this is golden. I truly appreciate the awesome editting and reasons and explanations behind the interpretations. Thank you and look forward to watching more!
You're very welcome, Rumil! :) I am glad it is useful not only to me :)
Amazing Yury, always clear. Love your tuts !
Thanks, Ousmane! Glad you like them! 😊 More to come!
you always have the best and very clearly understandable tuts. Always eagerly waiting for the next. 1000x thanks
1000x thanks for the feedback! 😊 More to come!
top content, very concise and to the point! thanks!
waw, thanks for such a generous feedback!
Subscribed! I am very excited to explore your video playlists. Thank you!
Awesome, thank you! :) hope you like the rest! Cheers
REALLY AWESOME . Very clear tut.
Glad you think so! Thanks 🙏 You might also like the rest
This is good. Thank you
Thanks a lot 🙏
Thank you soo much!!
You're welcome!
This video has been awesome to watch, Sir.
I have, though, 2 small questions. Where could I find the step before of a shapiro-wilk or kolmogorov-smirnoff test for normality? I'm new in R, by the way. And a little question about the aesthetic appearence of the present correlation graph, is it possible to change the colours within this function ggstatsplot? I mean, if it could be, for example, one just simple colour but with different tonality for the variable x and y. Is that possible?
I thank you so much for the answers in advance, Sir.
Hi man, "normality" is the function which conducts shapiro-wilk. I do not recommend kolmogorov-smirnoff. you can change colors. for that just write ?ggscatterstats in the console of RStudio, hit enter and explore the possibilities. Cheers
Hey there! I'm wondering where you get the information about the conventional thresholds for interpretation (like for p-values, Bayes, etc). There are so many different versions from different authors out there, which one should we trust? I'm really struggling to make up my mind! I already know about the effectsize package, but should we trust their frames of reference? Thanks in advance sir.
Oh man, I totally get it! It made me also crazy when there are two different interpretations of the same effectsize. Where is the truth? Not even statistitians know, they only can defend their opinion. Thus, I also decided to take the one which make the most sense for me with the reference to it. The reference is important, because then you have the source you trust and the others can reproduce and build on your knowledge. When you ask RStudio in this way "?interpret_eta_squared()" you'll get all the references you need. Hope that helps! Cheers
@@yuzaR-Data-Science Thank you so much for the response. I know that feeling too man. I am going to check that right away!
Wow , super clear explanation. If my variables x and y are non linear and I use spearman's instead of Pearson's, how can I graphically justified that? I mean, using scatterstats I see a lm model blue line, how can I replace with a monotonic curve that describes better my association ?
thanks. in case of non-linearity the best way is to get a gam model. but be careful with the interpretatino of the coeffitient, it's not a linear slope anymore. What you can also do after you have seen the pattern, you can split the predictor into several categories and do anova or kruskal wallis with this.
Thanks for the explanation :)@@yuzaR-Data-Science
you are very welcome!
Can't get started after installing. I'm returned 'no package called dplyr' on command of line 2 and at line 4. I installed it successfully but not sure if I missed you mentioning another package to install
yes, it seems to be a package problem. generally, keep r, rstudio and most of the packages uptodate. espetially install, or update the {tidyverse} the {easistaty} and {ggstatsplots}. If the error message says that some other package is missing, install those too. hope that helps!
cool!
Thanks!
Hi! I'm trying to replicate the analysis you showed but the package no longer exists. Can you share where this function can now be found?
I works perfectly on my PC. Have you installed and loaded the package?
install.packages("ggstatsplot")
library(ggstatsplot)
ggscatterstats(mtcars, mpg, wt)
Hi, nice video. How about stats for 3 variables?
Thanks, Samihah! It really depends on these 3 variables. If you mean a correlation, you can check out my video on correlation matrix. If not, you can check out my very first video (don't expext a good quality there please), where I showed a small table of 4 variables and explained what kind of analysis you can do with them. Starting with a categorical goodness off fit test and finishing up with the linear and logistic regression.
@@yuzaR-Data-Science Thanks for suggestions. I'll checked on it!
У вас очень понятные видео для новичков! Спасибо! Не могли бы вы посоветовать курсы или специализацию онлайн по биостатистике?
Спасибо! Приятно слышать! Мне помогли курсы на стэпике. Особенно курсы Anatoliy Karpovа. У него походу уже свой сайт где есть (CEO KarpovCourses). Он очень хорошо объясняет.
@@yuzaR-Data-Science да, у Анатолия я прошла статистику. А R вы где изучали?
в основном сам по книгам, которых много онлайн и бесплатныx. Но если бы я начинал сначала, я бы сам себе посоветовал сконцентрироваться на одной книге - R4DS r4ds.had.co.nz/ . Кроме того можешь посмотреть на мой блог > yuzar-blog.netlify.app/ этих двух рессурсов более чем достаточно для начала
@@yuzaR-Data-Science спасибо!
can we do multiple correlation using this sir ?
of coarse: use "grouped_ggscatterstats" function. Moreover, I make a 4 minutes Video on correlation matrix in R. I think it's exactly what you need.
@@yuzaR-Data-Science yes plzz sir, would really appreciate it and also, i like your videos a lot,.
@@mayurwabhitkar2041 sorry, mispelled, I wanted to say "I made", so the video is already online for ages ;)
@@yuzaR-Data-Science which one is it sir can you share the link, i mam unable to find it
Hey sir, just for information, it seems like the package is under maintenance or remission because the feature no longer works. I tried several datasets and variables, even copied your example character by character, but it just always shows the same error:
`stat_xsidebin()` with `bins = 30`. Choose a better value with `binwidth'.
`stat_ysidebin()` with `bins = 30`. Choose a better value with `binwidth`.
Error in `plot_theme()`:
! The `ggside.axis.minor.ticks.length' theme element must be a object.
I've tried to troubleshoot it but no success jet, and I know it's out of your control, but I just wanted to give you a heads up.
PD: I noticed one drawback to this feature: it only has 4 types of correlations, and you cannot use e.g. Kendall's, Gaussian's or Shepherd's correlation, which is not bad in itself, but it would be great to test these other types of correlations as well.
PD2: I found a sort of alternative with the easystats correlation package (easystats.github.io/correlation/), which offers a large number of methods and a very similar plot (like plot(cor_test(iris, "Sepal.Width", "Sepal.Length")), but it only shows the frequentist calculation at once (as far as I know). would you consider doing a review of this package or even the other easystats packages (you have already done some 😉)?
As always, thank your for your labor and fast responses.
hey man, thanks for the update. Interestingly, the ggstatsplot works perfectly on my computer. So, it might be some dependency which is not updated. Try to update all the packages your have (espetially ggside) and R version. Sure, I also wanted to suggest "correlation" package as I was reading your message. I love the whole easystats environment, and was thinking about doing further packages reviewes, but desided to wait and do modelling first, which is what I working on right now. I might do those packages eventually in the future :) cheers
@@yuzaR-Data-Science it solved my problem: ggside was not to date. Rookie Mistake Hahahaha. thank you so much. I am looking forward to see the modeling reviews. The tidymodels is a marvelous but overwhelming world. Thanks sir
@@WilForDataScience I've been there ;) one update and all the troubles are gone.