R demo | Correlation | Pearson, Spearman, Robust, Bayesian | How to conduct, visualise and interpret

Поділитися
Вставка
  • Опубліковано 2 січ 2022
  • Having two numeric variables, we often wanna know whether they are correlated and how. One simple command can answer both questions by visualizing the data and conducting frequentists and bayesian correlation analysis at the same time. So, let’s learn how to do that, how to interpret all these results and how to choose the right correlation method in the first place.
    Here is a quick R code:
    install.packages("ggstatsplot")
    library(ggstatsplot)
    ggscatterstats(
    data = mtcars,
    x = mpg,
    y = hp,
    type = "p") # or "np" or "r"
    ?ggscatterstats
    If you only want more code (or want to support me), consider join the channel (join button below any of the videos), because I provide the code upon members requests.
    Enjoy! 🥳

КОМЕНТАРІ • 50

  • @Rumil_
    @Rumil_ 2 роки тому +2

    Wow this is golden. I truly appreciate the awesome editting and reasons and explanations behind the interpretations. Thank you and look forward to watching more!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 роки тому

      You're very welcome, Rumil! :) I am glad it is useful not only to me :)

  • @oousmane
    @oousmane 2 роки тому

    Amazing Yury, always clear. Love your tuts !

  • @so4ragb
    @so4ragb 2 роки тому +2

    you always have the best and very clearly understandable tuts. Always eagerly waiting for the next. 1000x thanks

  • @hikeaway1596
    @hikeaway1596 Місяць тому

    top content, very concise and to the point! thanks!

  • @joeyoviedo5202
    @joeyoviedo5202 7 місяців тому

    Subscribed! I am very excited to explore your video playlists. Thank you!

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 місяців тому

      Awesome, thank you! :) hope you like the rest! Cheers

  • @user-sm1se3sq5x
    @user-sm1se3sq5x 7 місяців тому

    REALLY AWESOME . Very clear tut.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 місяців тому +1

      Glad you think so! Thanks 🙏 You might also like the rest

  • @ErickAmkoa
    @ErickAmkoa Рік тому

    This is good. Thank you

  • @Dewisd2002
    @Dewisd2002 4 місяці тому

    Thank you soo much!!

  • @Dergicetea
    @Dergicetea Місяць тому

    This video has been awesome to watch, Sir.
    I have, though, 2 small questions. Where could I find the step before of a shapiro-wilk or kolmogorov-smirnoff test for normality? I'm new in R, by the way. And a little question about the aesthetic appearence of the present correlation graph, is it possible to change the colours within this function ggstatsplot? I mean, if it could be, for example, one just simple colour but with different tonality for the variable x and y. Is that possible?
    I thank you so much for the answers in advance, Sir.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  Місяць тому +1

      Hi man, "normality" is the function which conducts shapiro-wilk. I do not recommend kolmogorov-smirnoff. you can change colors. for that just write ?ggscatterstats in the console of RStudio, hit enter and explore the possibilities. Cheers

  • @WilForDataScience
    @WilForDataScience 10 днів тому

    Hey there! I'm wondering where you get the information about the conventional thresholds for interpretation (like for p-values, Bayes, etc). There are so many different versions from different authors out there, which one should we trust? I'm really struggling to make up my mind! I already know about the effectsize package, but should we trust their frames of reference? Thanks in advance sir.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  8 днів тому +1

      Oh man, I totally get it! It made me also crazy when there are two different interpretations of the same effectsize. Where is the truth? Not even statistitians know, they only can defend their opinion. Thus, I also decided to take the one which make the most sense for me with the reference to it. The reference is important, because then you have the source you trust and the others can reproduce and build on your knowledge. When you ask RStudio in this way "?interpret_eta_squared()" you'll get all the references you need. Hope that helps! Cheers

    • @WilForDataScience
      @WilForDataScience 8 днів тому

      @@yuzaR-Data-Science Thank you so much for the response. I know that feeling too man. I am going to check that right away!

  • @paoloemiliobartolucci9844
    @paoloemiliobartolucci9844 3 місяці тому

    Wow , super clear explanation. If my variables x and y are non linear and I use spearman's instead of Pearson's, how can I graphically justified that? I mean, using scatterstats I see a lm model blue line, how can I replace with a monotonic curve that describes better my association ?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  3 місяці тому +1

      thanks. in case of non-linearity the best way is to get a gam model. but be careful with the interpretatino of the coeffitient, it's not a linear slope anymore. What you can also do after you have seen the pattern, you can split the predictor into several categories and do anova or kruskal wallis with this.

    • @paoloemiliobartolucci9844
      @paoloemiliobartolucci9844 3 місяці тому

      Thanks for the explanation :)@@yuzaR-Data-Science

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  3 місяці тому

      you are very welcome!

  • @motomarx
    @motomarx 9 місяців тому

    Can't get started after installing. I'm returned 'no package called dplyr' on command of line 2 and at line 4. I installed it successfully but not sure if I missed you mentioning another package to install

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  9 місяців тому

      yes, it seems to be a package problem. generally, keep r, rstudio and most of the packages uptodate. espetially install, or update the {tidyverse} the {easistaty} and {ggstatsplots}. If the error message says that some other package is missing, install those too. hope that helps!

  • @andredasilvapereira150
    @andredasilvapereira150 2 роки тому

    cool!

  • @jtwest8
    @jtwest8 3 місяці тому

    Hi! I'm trying to replicate the analysis you showed but the package no longer exists. Can you share where this function can now be found?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  3 місяці тому

      I works perfectly on my PC. Have you installed and loaded the package?
      install.packages("ggstatsplot")
      library(ggstatsplot)
      ggscatterstats(mtcars, mpg, wt)

  • @samihahzura4735
    @samihahzura4735 2 роки тому

    Hi, nice video. How about stats for 3 variables?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 роки тому +3

      Thanks, Samihah! It really depends on these 3 variables. If you mean a correlation, you can check out my video on correlation matrix. If not, you can check out my very first video (don't expext a good quality there please), where I showed a small table of 4 variables and explained what kind of analysis you can do with them. Starting with a categorical goodness off fit test and finishing up with the linear and logistic regression.

    • @samihahzura4735
      @samihahzura4735 2 роки тому

      @@yuzaR-Data-Science Thanks for suggestions. I'll checked on it!

  • @ekaterinanikitina1092
    @ekaterinanikitina1092 2 роки тому

    У вас очень понятные видео для новичков! Спасибо! Не могли бы вы посоветовать курсы или специализацию онлайн по биостатистике?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 роки тому +1

      Спасибо! Приятно слышать! Мне помогли курсы на стэпике. Особенно курсы Anatoliy Karpovа. У него походу уже свой сайт где есть (CEO KarpovCourses). Он очень хорошо объясняет.

    • @ekaterinanikitina1092
      @ekaterinanikitina1092 2 роки тому

      @@yuzaR-Data-Science да, у Анатолия я прошла статистику. А R вы где изучали?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  2 роки тому +1

      в основном сам по книгам, которых много онлайн и бесплатныx. Но если бы я начинал сначала, я бы сам себе посоветовал сконцентрироваться на одной книге - R4DS r4ds.had.co.nz/ . Кроме того можешь посмотреть на мой блог > yuzar-blog.netlify.app/ этих двух рессурсов более чем достаточно для начала

    • @ekaterinanikitina1092
      @ekaterinanikitina1092 2 роки тому

      @@yuzaR-Data-Science спасибо!

  • @mayurwabhitkar2041
    @mayurwabhitkar2041 11 місяців тому

    can we do multiple correlation using this sir ?

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  11 місяців тому

      of coarse: use "grouped_ggscatterstats" function. Moreover, I make a 4 minutes Video on correlation matrix in R. I think it's exactly what you need.

    • @mayurwabhitkar2041
      @mayurwabhitkar2041 10 місяців тому

      @@yuzaR-Data-Science yes plzz sir, would really appreciate it and also, i like your videos a lot,.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  10 місяців тому

      @@mayurwabhitkar2041 sorry, mispelled, I wanted to say "I made", so the video is already online for ages ;)

    • @mayurwabhitkar2041
      @mayurwabhitkar2041 10 місяців тому

      @@yuzaR-Data-Science which one is it sir can you share the link, i mam unable to find it

  • @WilForDataScience
    @WilForDataScience 7 днів тому

    Hey sir, just for information, it seems like the package is under maintenance or remission because the feature no longer works. I tried several datasets and variables, even copied your example character by character, but it just always shows the same error:
    `stat_xsidebin()` with `bins = 30`. Choose a better value with `binwidth'.
    `stat_ysidebin()` with `bins = 30`. Choose a better value with `binwidth`.
    Error in `plot_theme()`:
    ! The `ggside.axis.minor.ticks.length' theme element must be a object.
    I've tried to troubleshoot it but no success jet, and I know it's out of your control, but I just wanted to give you a heads up.
    PD: I noticed one drawback to this feature: it only has 4 types of correlations, and you cannot use e.g. Kendall's, Gaussian's or Shepherd's correlation, which is not bad in itself, but it would be great to test these other types of correlations as well.
    PD2: I found a sort of alternative with the easystats correlation package (easystats.github.io/correlation/), which offers a large number of methods and a very similar plot (like plot(cor_test(iris, "Sepal.Width", "Sepal.Length")), but it only shows the frequentist calculation at once (as far as I know). would you consider doing a review of this package or even the other easystats packages (you have already done some 😉)?
    As always, thank your for your labor and fast responses.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 днів тому +1

      hey man, thanks for the update. Interestingly, the ggstatsplot works perfectly on my computer. So, it might be some dependency which is not updated. Try to update all the packages your have (espetially ggside) and R version. Sure, I also wanted to suggest "correlation" package as I was reading your message. I love the whole easystats environment, and was thinking about doing further packages reviewes, but desided to wait and do modelling first, which is what I working on right now. I might do those packages eventually in the future :) cheers

    • @WilForDataScience
      @WilForDataScience 7 днів тому

      @@yuzaR-Data-Science it solved my problem: ggside was not to date. Rookie Mistake Hahahaha. thank you so much. I am looking forward to see the modeling reviews. The tidymodels is a marvelous but overwhelming world. Thanks sir

    • @yuzaR-Data-Science
      @yuzaR-Data-Science  7 днів тому +1

      @@WilForDataScience I've been there ;) one update and all the troubles are gone.