StatQuest: MDS and PCoA

Поділитися
Вставка
  • Опубліковано 17 лис 2024
  • MDS (multi-dimensional scaling) and PCoA (principal coordinate analysis) are very, very similar to PCA (principal component analysis). There really only one small difference, but that difference means you need to know what you're doing if you're going to use MDS effectively. This video make sure you learn what you need to know to use MDS and PCoA.
    There is a minor error at 4:14: The difference for gene 3 should be (2.2 - 1)². Instead the distance for gene 2 was repeated.
    For a complete index of all the StatQuest videos, check out:
    statquest.org/...
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumr...
    Paperback - www.amazon.com...
    Kindle eBook - www.amazon.com...
    Patreon: / statquest
    ...or...
    UA-cam Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshi...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer....
    ...or just donating to StatQuest!
    www.paypal.me/...
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    #statquest #MDS #PCoA

КОМЕНТАРІ • 195

  • @statquest
    @statquest  2 роки тому +3

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @taoyang563
    @taoyang563 4 роки тому +20

    This is such a great video.
    To answer a student's question in one sentence demonstrates the teacher's complete understanding of the knowledge.
    The more the teacher talks to answer, the less the teacher knows what you are asking and the more confused you become.

    • @statquest
      @statquest  4 роки тому +2

      Thank you very much! :)

  • @lade_edal
    @lade_edal 10 місяців тому +1

    runn around all over the internet none the wiser then come across this channel and bam! It all fits so easy. Why do some people over complicate such simple things? Thanks Josh!

  • @dsagman
    @dsagman Рік тому +13

    Honestly the best machine learning and stats videos available. How did we live before Statquest?

  • @son681
    @son681 4 роки тому +7

    Thank you so much for such an easy and bite-size content that I can understand to the fullest. It's way much better visualized and informative compared with other videos I've seen !!!

    • @statquest
      @statquest  4 роки тому

      Thank you very much! :)

    • @MSuriyaPrakaashJL
      @MSuriyaPrakaashJL 4 роки тому

      @@statquest This is a great video, but where can I find the maths behind it?

    • @statquest
      @statquest  4 роки тому

      @@MSuriyaPrakaashJL Start here: en.wikipedia.org/wiki/Multidimensional_scaling

  • @Ivaniushina
    @Ivaniushina 6 років тому +5

    Brilliant! so clear. Now I understand (at last!) the relations between PCA and MDS.

  • @초롱초록
    @초롱초록 4 роки тому +3

    Thank you so much! I was confused with the concept of difference about PCA and MDS. Thanks to your explanation, I could understand.

  • @AlonKedem1000
    @AlonKedem1000 9 місяців тому +1

    I love your videos. Just want to mention that in 4:18 you calculated the euclidian distances for gene 2 twice while saying its gene number 3. :)

  • @nikhiljoyappa687
    @nikhiljoyappa687 2 роки тому +1

    very helpful in the world of people who are always helpfool.

  • @nittygritty8161
    @nittygritty8161 Місяць тому

    When searched, there were many explanations that insisted PCoA and MDS have differences. But, I couldn't get it enough. You said both are exactly identical in this video, and can you explain the reason more, please...

    • @statquest
      @statquest  Місяць тому

      I talk about the differences very early in the video, at 0:42

  • @takethegaussian7548
    @takethegaussian7548 4 роки тому +3

    Thank you very much! This is a really really good explanation.

  • @ahmetlacin5748
    @ahmetlacin5748 2 роки тому +1

    ı just have no idea how to thank you. Viva Josh!

  • @alejandrotenorio2327
    @alejandrotenorio2327 4 роки тому +3

    In MDS where does the minimizing of the Raw Stress go? I'm not getting how you can do that while performing EVD to reduce the dimensions

  • @Stephanbitterwolf
    @Stephanbitterwolf 6 років тому +1

    Very helpful. Not sure if this has been pointed out yet but at around 4:17 you talk about distance for gene 3 but the numbers aren't accurate for that gene difference.

  • @poojakunte6865
    @poojakunte6865 6 років тому +7

    difference for Gene 3 should be (2.2 - 1)^2 right ?

    • @statquest
      @statquest  6 років тому +4

      Yes! That's just a typo in the video.

  • @simonhunter-barnett6616
    @simonhunter-barnett6616 4 роки тому

    If MDS and PCA have the same outputs, why would you chose one over the other? What's the importance of correlation vs distance? P.S. I've been trying to understand PCA and MDS for months now and this was so much easier than reading articles and books :D

    • @statquest
      @statquest  4 роки тому +1

      Starting at 4:48 I give examples of using MDS with different distance metrics, which result in outputs that are different PCA.

  • @sofiagreen9742
    @sofiagreen9742 5 років тому +2

    Hello Josh and thank you for your videos, they are really helpful. Would you mind making a video on Canonical Correlations please?

  • @MrZanvine
    @MrZanvine 7 років тому +2

    Brilliant video, you're awesome! Thanks for taking the time to make these :)

  • @rlh4648
    @rlh4648 2 роки тому +1

    Thanks Josh
    You're feckin awesome.

  • @madihamariamahmed8727
    @madihamariamahmed8727 2 роки тому

    Please make videos on Deep clustering methods!

    • @statquest
      @statquest  2 роки тому

      I'll keep that in mind.

  • @medazzouzi2649
    @medazzouzi2649 Рік тому +1

    Heyy josh i m confuse in the pca statement "correlations among samples" isn't suppose to be correlation among variables? Since we are reducing dimension of variables in this case ( genes) not the samples?

    • @statquest
      @statquest  Рік тому +1

      The goal of the plot is to show correlations among the samples - so each sample has a lot of gene measurements, and correlations among sample would mean that a lot of those measurements are similar (or the exact opposite of similar) and we want to preserve those relationships. We want things that are highly correlated to appear close to each other in a graph.

    • @medazzouzi2649
      @medazzouzi2649 Рік тому +2

      @@statquest ahhh okayyyy i gettt itt 😍😍😍

    • @medazzouzi2649
      @medazzouzi2649 Рік тому +2

      @@statquest thanks josh

  • @abcd123456789zxc
    @abcd123456789zxc 3 роки тому +2

    Thanks so much for your video, but I still have a question; I really don't understand what is the difference between PCoA and MDS.
    It would be a great help if anyone could explain the difference between PCoA and MDS.

    • @statquest
      @statquest  3 роки тому +2

      MDS has two versions: "Classical" and "Non-Metric". This video shows how "Classical" MDS works. Classical MDS is the exact same thing as PCoA. There is no difference. However, there is a difference between PCoA and "Non-Metric" MDS. Maybe one day I'll make a video on "Non-Metric" MDS.

    • @abcd123456789zxc
      @abcd123456789zxc 3 роки тому +1

      @@statquest Thank you so much for your time and consideration.

  • @HOMESTUDY247
    @HOMESTUDY247 3 роки тому +1

    Great video

  • @alexlee3511
    @alexlee3511 7 місяців тому

    Thank you for the effort! but i am wondering if we are going to reduce the dimension of genomic data, do people prefer PCA or PCoA?

    • @statquest
      @statquest  7 місяців тому

      MDS with log fold change is the default for DESeq2 and possibly other programs. However, I feel like PCA is more commonly used.

  • @marchino1981
    @marchino1981 6 років тому +1

    Very nice and clear! Thank you!

  • @dist321
    @dist321 5 років тому +1

    Hi Josh! I´ve been here many times and love your channel. I have a question about the axis. I understand that each one accounts for "x" percentage variation of the dataset, being axis one, the higher percentage, however if I look at samples along PC1, can I assume any biological meaning for those samples far to the right or far to the left?

  • @trinh123456
    @trinh123456 4 роки тому +1

    Your videos are amazing!

    • @statquest
      @statquest  4 роки тому +1

      Thank you very much! :)

  • @DungPham-ai
    @DungPham-ai 7 років тому

    best video. Can you make video explain Non-negative matrix factorization (NMF) ?

  • @Mako0123
    @Mako0123 7 років тому

    Nice explanation as always!

  • @liranzaidman1610
    @liranzaidman1610 4 роки тому +1

    Hi Josh,
    have you ever encountered a clustering model where there were more than 3-4 clusters? I've done it many times, and it looks like the number of optimal clusters (3-4) is "natural".

    • @statquest
      @statquest  4 роки тому

      Very interesting. I'll try to remember to keep track of these things in the future to see if I get similar results.

  • @kaynkayn9870
    @kaynkayn9870 Рік тому

    I like to learn using videos (mainly from your channel) and gpt for the maths equation. I checked wikipedia just to be sure but it looks like you skipped the step about "Double Centering and Matrix Transformation" entirely.

    • @statquest
      @statquest  Рік тому

      I talk about that in my PCA videos: ua-cam.com/video/FgakZw6K1QQ/v-deo.html and ua-cam.com/video/oRvgq966yZg/v-deo.html

    • @kaynkayn9870
      @kaynkayn9870 Рік тому

      @@statquest I must have missed it, ill review it again. Thank you.

    • @statquest
      @statquest  Рік тому +1

      @@kaynkayn9870 Those video specifically talk about the centering of the data - how and why we need to do that. I don't talk about matrix transformations explicitly because those are just one of several ways to specifically perform PCA.

  • @manueltiburtini6528
    @manueltiburtini6528 3 роки тому +1

    Hi Josh from Italy! Are the assumptions of this methods always the same? (Normality, independence, homosch., linearity)

    • @statquest
      @statquest  3 роки тому +1

      The same as PCA? I'm not sure. However, I do know that whatever assumptions there are are often ignored and people just try PCA or MDS and see what happens.

    • @manueltiburtini6528
      @manueltiburtini6528 3 роки тому

      @@statquest this could lead to false interpretations. isn't it? I'm using such technique and LDA to analyze taxonomic data and I'm scared that my dataset is not independent due to phylogenetic common origin.

    • @statquest
      @statquest  3 роки тому +1

      @@manueltiburtini6528 I don't really think that's a big problem for MDS or PCA. These methods are just designed to reduce dimensionality for drawing graphs or to plug into some other analysis (like regression).

  • @adelutzaification
    @adelutzaification 7 років тому

    Wow. The PCA and MDS really are very similar, just like the videos describing them (clearly explained and overall awesome ;) It seems to me that PCA is just a particular case of MDS, as in the case of MDS one can adjust the distance metric to get various outputs, including the one given by PCA. If that is the case, why aren't people use MDS more? It seems under-utilized. Is it trickier to implement?

  • @malteneumeier3274
    @malteneumeier3274 5 років тому +1

    @Josh Starmer: in minute 4:14 there is a tiny mistake in the formula: the difference for gene 3 should be (2.2 - 1)². Instead the distance for gene 2 was repeated.

    • @statquest
      @statquest  5 років тому

      Thanks a lot for pointing that out. I've added this to the "Errata" page that I maintain so that one day, when I create new editions of these videos, I can correct all the little mistakes.

  • @khajariazuddinnawazmohamme3092
    @khajariazuddinnawazmohamme3092 6 років тому +2

    Hi Josh, I really like your videos and they are very intuitive. Could you do a StatQuest video on Partial Least Squares if possible? Thanks in Advance :)

    • @statquest
      @statquest  6 років тому +3

      Partial Least Squares is on the to-do list, so, with your vote, I'll bump it up a notch so that it is closer to the top.

    • @khajariazuddinnawazmohamme3092
      @khajariazuddinnawazmohamme3092 6 років тому

      @@statquest thank you so much Josh 😊

    • @melaniee467
      @melaniee467 5 років тому +1

      @@statquest cant wait for your Partial Least Square explanation!

    • @statquest
      @statquest  5 років тому

      @@melaniee467 Sounds good! I'll bump it up another notch!

  • @rrrprogram8667
    @rrrprogram8667 6 років тому

    Great Video.... Actually I am elevating my self from Excel data analysis to machine learning... Right now I am in stage to grab everything I can....What are ur advise to excel users to machine learning enthusiasts...

  • @YooToobins
    @YooToobins 5 років тому +9

    Recommend speeding this up to 1.25x while viewing

  • @Emily-Bo
    @Emily-Bo 2 роки тому

    Hi Josh, how do you choose among PCA, LDA and MDS methods?

    • @statquest
      @statquest  2 роки тому +1

      LDA is supervised, so you can only use it when you know what groups you want to supervise. MDS is useful when you want to change the distance metric. And if you don't want to change the distance metric, MDS and PCA are the same.

    • @Emily-Bo
      @Emily-Bo 2 роки тому +1

      @@statquest Thank you, Josh! very helpful!

  • @jxaskcijiaxhsic9943
    @jxaskcijiaxhsic9943 3 місяці тому

    How do you exactly find the axis of MDS? What do you do after you calculate the distances?

    • @statquest
      @statquest  3 місяці тому

      To get a sense of how it works, see: ua-cam.com/video/FgakZw6K1QQ/v-deo.html

    • @jxaskcijiaxhsic9943
      @jxaskcijiaxhsic9943 3 місяці тому

      ​@@statquest is it the same thread as calculating the PCs when calculating the axis of MDS? Like finding the best fitted line by minimizing the SSR. If it is, what role does calculating the distances between points play?

    • @statquest
      @statquest  3 місяці тому

      @@jxaskcijiaxhsic9943 It's a related technique. It's not the same, but related. Based on the distances we can calculate variances and covariances and from those we can find the directions that there is the most variation in the data.

    • @jxaskcijiaxhsic9943
      @jxaskcijiaxhsic9943 3 місяці тому

      @@statquest okay so it is still finding the best fitted line but remain the distance between the points same after dimension reduction

  • @ranitchatterjee5552
    @ranitchatterjee5552 3 роки тому

    To plot the data, do we select the cells with maximum distances? Like for example if cell 1 & 2 and cell 3&4 have maximum distances, do we plot with respect to them?

    • @statquest
      @statquest  3 роки тому

      To get a better understanding of how it works, check out the StatQuest on PCA: ua-cam.com/video/FgakZw6K1QQ/v-deo.html

  • @bitsajmer
    @bitsajmer 3 роки тому

    Hi Josh,
    1. How do we plot the values of MDS on the graph. because with distances we only have a single value.
    DO we plot it on a number line? but you showed a graph with 2 axis

    • @statquest
      @statquest  3 роки тому

      MDS converts a matrix of distances into different axes in much the same way that we do it for PCA. For details, see: ua-cam.com/video/_UVHneBUBW0/v-deo.html

  • @ketalesto
    @ketalesto 3 роки тому +1

    Day 40 of #66DaysOfData
    Yeah baby! Let's go!

  • @KayYesYouTuber
    @KayYesYouTuber 5 років тому

    Are you saying we compute Eigen Values and Eigen vectors on the distance matrix instead of the covariance matrix? Is that the only the only difference between PCA and MDS ?

    • @statquest
      @statquest  5 років тому +1

      And you get your choice of distance metrics.

  • @CWunderA
    @CWunderA 6 років тому +2

    Good video, but it was not very clear to me why you would choose one over the other (MDS vs PCA)

    • @statquest
      @statquest  6 років тому

      If you're working with distances, then MDS is the way to go.

    • @CWunderA
      @CWunderA 6 років тому +1

      My question was more why would someone choose to cluster/reduce dimensionally using distances over correlations?

    • @statquest
      @statquest  6 років тому +2

      At 6:20 in the video I mention that a Biologist might choose to use MDS to show clustering using log-fold changes because, traditionally, gene measurements are analyzed in terms of log-fold changes.
      Alternatively, it could be you want to cluster locations in a city based on how far they are away via taxi (so blocks and one-way streets are a factor) - MDS can do this.

    • @CWunderA
      @CWunderA 6 років тому +2

      Ah I see, so it is more that MDS allows you to cluster via any distance metric of interest, where as PCA limits you to correlation/euclidian distance. Thanks for taking the time to help me out!

    • @statquest
      @statquest  6 років тому +2

      You are correct - MDS lets you cluster stuff using any distance metric. The coolest thing about that, which I forgot to mention, is that, via Random Forests, you can use MDS to cluster any data, regardless of type. Check it out in "Random Forests Part 2:" ua-cam.com/video/nyxTdL_4Q-Q/v-deo.html

  • @whatyouwantyouare
    @whatyouwantyouare 3 роки тому

    Hi Josh, thanks so much ... Confusion: the new table with distances will have columns d12. d13 d14 .... d23 d23 .... so when we plot stuff why would we still have clusters corresponding to cell1 cell 2... wouldn;t the colours correspond to d12 d13 ... etc. ?

    • @statquest
      @statquest  3 роки тому

      The first column in the distance matrix will be cell1, the second will be cell2, etc, the first row in the distance matrix cell1 and the second will be cell2 etc. The distances are then the values in the matrix. The distance between cell1 and cell1 (in the upper left hand corner of the matrix) is 0, etc.

  • @darkredrose7683
    @darkredrose7683 2 роки тому

    Thank you! And how about the CAP analysis? I'm so confused >< Thank you in advance!

    • @statquest
      @statquest  2 роки тому

      I'll keep that topic in mind.

  • @mohsenvazirizade6334
    @mohsenvazirizade6334 5 років тому +1

    Hi, Thank you so so much for such a good explanation. Do you mind if I ask the reference book/paper for the terminologies? I am a little bit confused since I assume the same methods are a little bit different in various reference books. Thank you

    • @statquest
      @statquest  5 років тому +2

      To be honest, I can't remember what my original sources are for this video. More recently I've been putting the sources in the description below the video, but this video is too old for that.

  • @swarnimkoteshwar
    @swarnimkoteshwar 2 роки тому +1

    Thank you!

  • @oliseh2285
    @oliseh2285 5 років тому

    Hi Josh, thanks a lot your amazing videos!!!
    I have a question, with molecular markers (SSRs or SNPs) what would you personally choose?
    PCA or PCoA?

    • @statquest
      @statquest  5 років тому +1

      If you use the euclidian distance, then they are the same.

    • @oliseh2285
      @oliseh2285 5 років тому

      Yes, I got it seeing the video. But I'm not sure which kind of distance should I use in case I want to perform a PCoA with microsatellites in R, and also if PCoA is better than PCA when you use a specific distance for microsatellites.
      It's weird because when I used the Adegenet function [dudi.pca()] for my df of 5 SSRs with 23 alleles, the function instead of considering 5 variables (the 5 SSRs) took 23 variables (the 23 alleles) and for this reason, the explanation of variance of PC1 and PC2 is quite low.
      Hope you can suggest me something based on your experience as a geneticist.
      Thanks a lot.

    • @statquest
      @statquest  5 років тому +1

      PCA is the most commonly used method in genetics.

    • @oliseh2285
      @oliseh2285 5 років тому +1

      Thanks a lot for the answer and for making statistics accessible to all and funny. Please continue your terrific job. We love you!!!

  • @drzun
    @drzun 5 років тому

    Hi Josh, thanks for the video. I'm a bit confused that when you said "PCA starts by calculating the correlation among samples", did you mean the plotting of each sample on multi-dimensions like your previous PCA video? If so, how about PCoA? Do we also "plot" the distances among samples first and then try to get the top 2 PCs as well? If that's true, then how is the number of dimensions determined in the case of PCoA? I watched all of your PCA videos and I can understand how to get a PCA, but somehow I still don't know how a PCoA is done... thank you!

    • @statquest
      @statquest  5 років тому +1

      There are two ways to do PCA - an old method that is based on covariances and correlations (described in this ua-cam.com/video/HMOI_lkzW08/v-deo.html and this ua-cam.com/video/_UVHneBUBW0/v-deo.html ) and a new method that uses Singular Value Decomposition (described in this ua-cam.com/video/FgakZw6K1QQ/v-deo.html ) . This video on PCoA/MDS references the older method (using covariances and correlations). To calculate the covariances and correlations among the samples, you follow the steps outlined in these videos on covariance statquest.org/2019/10/08/covariance-and-correlation-part-1-covariance/ and correlation statquest.org/2019/10/08/covariance-and-correlation-part-2-pearsons-correlation/ . That gives you a single number for every pair of samples. We then do Eigen Decomposition of those numbers to get the PCs. With PCoA, we calculate distances (using the euclidian distance or some other metric) between each pair of samples and do Eigen Decomposition of those numbers to get the PCs.

  • @shahbazsiddiqi74
    @shahbazsiddiqi74 5 років тому +4

    Unlike PCA where we compared Genes variation in order to give weight to calculate the value for each cell and then map them accordingly to PC1 and PC2. Here we are calculating the distance between cells with reference to each genes. What is the calculation for MDS1 and MDS2 . I am confused because we are taking 2 cells at a time, instead of one and are we plotting the difference of each gene with respect to cell 1 along x axis and cell 2 along y axis. Could you please explain what to consider for MDS1 and MDS2 ? Thanks a ton

    • @chrisjfox8715
      @chrisjfox8715 4 роки тому

      If this is in reference to the LogFold Change graph then I too agree that it isn’t explained what those two axis distinctly represent. I get how the lfc was calculated before then (between every single pair of datapoints), but those axis could theoretically be anything at the discretion of the investigator...and what it is here hasn’t been made clear.

  • @YasmineNazmy
    @YasmineNazmy 3 роки тому +1

    Brilliant thank you

    • @statquest
      @statquest  3 роки тому

      Wow! You're going through them all! BAM! :)

  • @sathsarawijerathna9325
    @sathsarawijerathna9325 5 років тому

    Hi Josh. Do you have any videos for NMDS?

    • @statquest
      @statquest  5 років тому

      Not yet. You can find an organized listing of all of my videos here: statquest.org/video-index/

  • @Retko85
    @Retko85 3 роки тому

    Hi Josh.. I am a little confused, regarding features and samples . For example here on 6:56, you say that PCA create plots based on correlations among samples. Only concept of correlation that I know is between features, so when 2 features change together, correlation is big. But I got confused here. I tried to search about sample correlations, and what I found was correlations on samples, as part of a population, but here samples should be like rows/instances/observations. Also your computation of Euclidean distance got me confused, Since you have features as rows - gene1, gene2, and samples as columns, cell 1, cell 2. Can you please confirm my understanding - Does PCA create plot based on correlations among FEATURES, like, person age, weight etc., where each person is a sample?? Thank you :)

    • @statquest
      @statquest  3 роки тому

      To get a better sense of how PCA works, see: ua-cam.com/video/FgakZw6K1QQ/v-deo.html

  • @mahdimohammadalipour3077
    @mahdimohammadalipour3077 2 роки тому

    Where can I find a numerical example ? I googled but couldn't find anything :(

    • @statquest
      @statquest  2 роки тому

      See: ua-cam.com/video/pGAUHhLYp5Q/v-deo.html

  • @jihadrachid9044
    @jihadrachid9044 4 роки тому

    Thank you for this great video but I want to understand for nMDS graph I should transform my values from % to square root?
    I have like 28 species. Your help will be highly appreciated.

    • @statquest
      @statquest  4 роки тому

      Unfortunately this video only covers classical MDS.

    • @jihadrachid9044
      @jihadrachid9044 4 роки тому

      @@statquest Can I contact you by email to understand more my case?

  • @yudiherdiana4979
    @yudiherdiana4979 3 роки тому +1

    Thank you!!

  • @SophieLemire
    @SophieLemire Рік тому +1

    Thanks!

    • @statquest
      @statquest  Рік тому

      Hooray! Thank you so much for supporting StatQuest! TRIPLE BAM! :)

  • @hannahnelson4569
    @hannahnelson4569 5 місяців тому

    Ok. I'm going to admit. I don't understand what this video is saying. It says to just replace the dot product with other distance metrics. And that sounds fine? But it doesn't make sense that we are using the same computations mathmatically for a distance matrix and a correlation matrix. The correlation matrix (dot product distance) makes sense because its special properties allow it to have a decomposition with a diagonal component which we can sort and then reduce in dimension to produce our PCA plot. It is not at all clear to me why an arbitrary distance plot of the predictors will be diagonalizable in the same way. So the rest of the mathmatical interpretation breaks down from there.
    Basically. The math and the interpretation feels a bit off to me. I'll have to do more research on the topic.

  • @jamesayukayuk1151
    @jamesayukayuk1151 6 років тому

    Hey Joshua. I have not found anything on the non-metric version of MDS. Any videos, please?

    • @jamesayukayuk1151
      @jamesayukayuk1151 6 років тому

      Thank you. Will keep an eye out for it when done. Thanks for the good work.

  • @trinh123456
    @trinh123456 4 роки тому

    Hi Josh, it is me again. Thanks for the great video! I am wondering if you have a video on nMDS because I saw this quite often in biological studies, but still quite blur..

    • @statquest
      @statquest  4 роки тому

      Unfortunately I don't have a video on non-metric MDS.

    • @trinh123456
      @trinh123456 4 роки тому

      No worries. Are you going to do it any time soon? I am quite looking forward to it because it is quite common in Biology. Thanks Josh!

    • @statquest
      @statquest  4 роки тому +2

      @@trinh123456 Unfortunately, I do not have plans to do it anytime soon. My to-do list is huge (it has 100s of items on it) and I can only make a few videos each month. I work as fast as I can, and I work all the time, but it's not enough to keep up with the requests.

    • @datenfritz9860
      @datenfritz9860 4 роки тому

      Hi Tien, maybe I can provide some help for nMDS based on Josh's tripple BAM video! (As always amazing job Josh!). To my knowledge NMDS is a ranked based approach. Like MDS you start with computing the distance between samples. The these distance values get then ranked. After the ranking you perform the "fancy math" thing to get the coordiantes for a graph. Be aware that you loose quantitative information when clustering on ranks.
      You can check this website for more details: mb3is.megx.net/gustame/dissimilarity-based-methods/nmds

  • @siddheshb.kukade4685
    @siddheshb.kukade4685 Рік тому +1

    Thanks😊

  • @bibinkalirakath
    @bibinkalirakath 4 роки тому

    i have seen pcoa graphs with 3 dimensions, is there any video explaining about them

    • @statquest
      @statquest  4 роки тому

      That would be a lot like seeing a 3-dimensional PCA plot. For more details, see: ua-cam.com/video/FgakZw6K1QQ/v-deo.html

    • @bibinkalirakath
      @bibinkalirakath 4 роки тому +1

      @@statquest Thank you very much. This helped me a lot.

  • @thourayaaouledmessaoud9223
    @thourayaaouledmessaoud9223 6 років тому

    Thanks for this video, i just have one question does MDS only accept symmetric (square) matrix as input?

  • @chrischoir3594
    @chrischoir3594 5 років тому

    Hi, What software do you use here?
    thanks

  • @糜家睿
    @糜家睿 7 років тому

    Hi, Joshua. I noticed that you mention "the data is not linear" in the reply of comments. I am really confused about this concept for some time. What does non-linear data mean(I guess it is not the same kind of concept of linear model right, haha)? A bioinformatician told me that single-cell data is non-linear and we'd better used tSNE rather than PCA. How to explain the bulk RNA-seq data is linear data and single-cell RNA-seq data is non-linear. I really really hope you could answer my question because it really really confuses me for quite a long time.

    • @糜家睿
      @糜家睿 7 років тому

      Haha, thank you Joshua. The spiral pattern is the so-called "Swiss roll" model I think. Someone says that linear dimensional reduction focus more on global pattern(like distance), while the non-linear dimensional reduction methods focus more on local pattern.
      Why not talking about zero-inflation in single-cell next time and the normalization methods used in single-cell data analysis?

  • @kartikmalladi1918
    @kartikmalladi1918 Рік тому

    What value is plotted exactly on MDS?

    • @statquest
      @statquest  Рік тому

      It depends on what metric you use.

    • @kartikmalladi1918
      @kartikmalladi1918 Рік тому

      @@statquest if mds is plotted between 2 genes, then the distance itself became single variable. Any combination and their distance can be pointed on number scale. So if this is x coordinate of the plot, what is the y coordinate for a point

  • @DaisyKB123
    @DaisyKB123 5 років тому

    What does it mean by the "percentage of variation each axis accounts for"?

    • @杨明-r6i
      @杨明-r6i 5 років тому

      The principle component axis 1, 2, 3, 4, 5... explained Rate in the PCA plot.

  • @rekhasharma4962
    @rekhasharma4962 Рік тому

    How to adjust overlapping labels in PCA biplot???

  • @jcb0trashmail
    @jcb0trashmail 4 роки тому

    I still don't get why you would would choose MDS over PCA or the other way around...

    • @statquest
      @statquest  4 роки тому

      MDS can work with any distance metric, not just euclidian. Here's a great example: ua-cam.com/video/sQ870aTKqiM/v-deo.html

  • @yulinliu850
    @yulinliu850 6 років тому +1

    Excellent!

  • @urjaswitayadav3188
    @urjaswitayadav3188 7 років тому

    Great video!

  • @ninakoch1799
    @ninakoch1799 Рік тому +1

    THANK YOUU❤️

  • @rncg0331
    @rncg0331 5 років тому

    do you have a python version for mds?

  • @doremekarma3873
    @doremekarma3873 8 місяців тому

    can someone please explain how do we calculate MDS1 and MDS2 after obtaining the distance between each pair of cells

    • @statquest
      @statquest  8 місяців тому

      You use eigendecomposition.

  • @psyferinc.3573
    @psyferinc.3573 2 місяці тому +1

    the ukelele at the beginning

  • @kathik595
    @kathik595 5 років тому +1

    Do complete statistical predictive modeling using python & R

  • @nutzanut9817
    @nutzanut9817 5 років тому

    How can we draw 2D graph after calculate distance of all pair ?
    we've got nC2 value for n feature .
    Thanks.

    • @statquest
      @statquest  5 років тому

      You do it just like PCA. For more details on how PCA does it, check out this video: ua-cam.com/video/FgakZw6K1QQ/v-deo.html

  • @raquelpurpleboxes
    @raquelpurpleboxes 5 років тому +1

    You're amazing!!!

  • @BeateSukray
    @BeateSukray 5 років тому +2

    I love you, man

  • @Diegocbaima
    @Diegocbaima 5 років тому +1

    Great, dude!

  • @jameelahharbi2714
    @jameelahharbi2714 Рік тому

    i need more details for PCA

    • @statquest
      @statquest  Рік тому

      For more details about PCA, see: ua-cam.com/video/FgakZw6K1QQ/v-deo.html

  • @lalala90348
    @lalala90348 6 років тому +1

    “Reduce them to a 2-D graph”? How exactly?

  • @noobshady
    @noobshady 6 років тому +1

    where can we read about the fancy math related?

    • @statquest
      @statquest  6 років тому +3

      Wikipedia is always a great place to start: en.wikipedia.org/wiki/Multidimensional_scaling

  • @adelutzaification
    @adelutzaification 7 років тому

    One more comment. The fact that mds uses a precomputed distance reminds me hierarchical clustering. Does it mean that MDS is a 2d representation of hierarchical clustering?

    • @adelutzaification
      @adelutzaification 7 років тому

      That would be cool. I am brewing something. I might have an idea. Not sure how good at this moment. I need to write it up. I'll keep u posted to see if it is worth anything. Ta ta

    • @adelutzaification
      @adelutzaification 7 років тому

      I went down in flames :) It turns out I was thinking of re-inventing the wheel :) My inclination was to further dissect the PCA results/"clouds" and see the relationship between the comprising datapoints. I was deflated to see that this problem was solved many years ago by clustering (either kmeans or hierarchical). ;(
      On the good side, I found a few useful things. A paper that confirms the relatedness between PCA and Kmeans as you were anticipating:. ranger.uta.edu/~chqding/papers/KmeansPCA1.pdf
      I also found out about the HCPC package in R that can do hierarchical clustering after factor analysis. It seems kinda cool as on the graphical side it does pseudo 3D hc. Imagine the first 2 PC as a horizontal plane and the clusters roots coming from the top... www.r-project.org/conferences/useR-2009/slides/LeRay+Molto+Husson.pdf . In the usual 1D hc, I don't like the fact that some less related points are adjacents. This HCPC plotting is not perfect either as it obscures some datapoints.
      I was thinking that 2D-density could be used to further "cluster" the PC plot ; for example with geom_density_2d()/stat_density_2d() in ggplot2. with the right arguments and aesthetics (with the right function) might be able to pick up some "clusters" but not relationship between the point inside of a contour. Maybe adding relatedness by connecting the dots somehow on a zoomed in plot (by adjusting the axes) my help to see further details. ..
      What other ways of showing relatedness besides hc and correlation matrices do people use ?

  • @neckar6006
    @neckar6006 Рік тому

    4:15, maybe distance for gene3 is wrong

  • @marahakermi-nt7lc
    @marahakermi-nt7lc Рік тому

    hmmm i guess the covariane matrix in this case is a matrix with o diatances in ths diagonal

    • @statquest
      @statquest  Рік тому +1

      That would mean the variance was 0.

    • @marahakermi-nt7lc
      @marahakermi-nt7lc Рік тому

      @@statquest yessss since subtracting the same distance=0

  • @alecvan7143
    @alecvan7143 4 роки тому +1

    awesome :)

  • @fatihbaltac1482
    @fatihbaltac1482 6 років тому +1

    BAAAM !!

  • @Cuicui229
    @Cuicui229 3 роки тому

    hi Josh! Thanks for the video! I still didn't get the point how we can do the same thing on the distance matrix as we do on PCA(ua-cam.com/video/FgakZw6K1QQ/v-deo.html) I watched this video, and thanks for your wonderful explanation, I could imagine that for serveral samples with 2 genes, we can draw the dot on the 2-D plot(gene1 and gene2), and we find the best fit line, which is the PC1 and then a PC1 vertical line as PC2, both with the largest distance to the origin. But when it comes to the distance matrix, how can we draw the dot, because there is no gene. Only sample1, sample2 ...et al. I really confused. Truly thankful!

    • @statquest
      @statquest  3 роки тому +1

      There are two methods for doing PCA - the one I present in that video is called "Singular Value Decomposition" and it works the way I presented in that video. Alternatively, we can do something called "Eigen Value Decomposition" and this is based on using the covariance or correlation matrix of the data. It is through this second way that PCA ends up giving us results similar to MDS. Unfortunately, I don't have a good video for explaining how this second way works. :(