Euclidean distance and the Mahalanobis distance (and the error ellipse)

Поділитися
Вставка
  • Опубліковано 24 лис 2024

КОМЕНТАРІ • 66

  • @tilestats
    @tilestats  3 роки тому +6

    Note that the covariance matrix shown at 6:10 should be
    [0.724 0.687
    0.687 1.046] for more accurate calculations.

    • @azibatorbanigo4043
      @azibatorbanigo4043 2 роки тому

      How did you compute the covariance matrix from the green data points?

  • @youngzproduction7498
    @youngzproduction7498 3 роки тому +12

    I love the way you explicitly explain every step of calculations. It helps me who is not a math expert understand the concept at ease. Thanks.

  • @lba7238
    @lba7238 Рік тому +1

    Excellent video currently studying up to be able to break up a single model into sub models and I'm trying to use the m distance

  • @kyle9697
    @kyle9697 4 місяці тому +1

    very comprehensive explainnation. thank you

  • @startupeco2257
    @startupeco2257 11 місяців тому +1

    Very well explained! Even for a non-mathematician.

  • @szymonk.7237
    @szymonk.7237 2 роки тому +3

    So clearly explained ! 😮
    Thank you for it ❤️

  • @alecmunnur5918
    @alecmunnur5918 3 роки тому +4

    That was heck of a good explanation. Thanks very much👍

  • @tilestats
    @tilestats  3 роки тому +7

    I got this comment: "Are you sure the inverse of the covariance matrix is correct? This is what I get when I put it into symbolab. [4.1 -2.82 -2.82 2.95]."
    This is due to that the covariance matrix has been rounded. This is the covariance matrix with more decimals.
    x y
    x 0.7241053 0.6869474
    y 0.6869474 1.0462105

    • @compsci91
      @compsci91 3 роки тому +1

      Got it! Thank you for clearing that up!

  • @merythegirl
    @merythegirl 2 роки тому +1

    This video helped a lot, thank you for this!

  • @forrestoakley4882
    @forrestoakley4882 2 роки тому +1

    Thank you! Very clear explanation

  • @tabyonyt8091
    @tabyonyt8091 2 роки тому +1

    this was enlightening, thanks a lot

  • @guidenote771
    @guidenote771 3 роки тому +2

    Thank you sir for another great video!

  • @ricardpunsola
    @ricardpunsola 2 роки тому +1

    Very helpful, thanks 👍🏻

  • @ya00278
    @ya00278 3 роки тому +1

    Super clear. Thank you!!

  • @mathematicswithmushtaqkhan8647
    @mathematicswithmushtaqkhan8647 3 місяці тому +1

    Excellent

  • @Nada-yc8uo
    @Nada-yc8uo 3 роки тому +3

    Thank you sir

  • @amankushwaha8927
    @amankushwaha8927 2 роки тому +1

    Thanks. It was really informative

  • @TM-vg4mx
    @TM-vg4mx 2 роки тому +1

    great video, thanks

  • @shivamsharma6255
    @shivamsharma6255 Рік тому +1

    mazaa aa Gaya bhai

  • @jacksonchen8679
    @jacksonchen8679 2 роки тому

    Thank you

  • @juhoke
    @juhoke 2 роки тому

    I wish I had seen this video during my clustering methods course. I had to drop it because I did not understand for example meaning of centroids.

    • @tilestats
      @tilestats  2 роки тому

      I have two vids on clustering if you like to catch up:
      ua-cam.com/video/uWf__KIKzPQ/v-deo.html
      ua-cam.com/video/4E_DFMt60rc/v-deo.html

  • @wagon19
    @wagon19 2 роки тому +1

    Can you tell me how you built the ellipse?
    Preferably in the program scilab

    • @tilestats
      @tilestats  2 роки тому

      I answered a similar question below. Hope that helps.

  • @tone5875
    @tone5875 2 роки тому +1

    hi can you elaborate more on generating 95% error ellipse. do we use random number generator with normal distribution to create it? is there a simple example of generating random numbers with intended distribution, or ive read long time ago from monte carlo where we can use cholesky decomposition to create data from correlation matrix? curios to know the mechanics behind them

    • @tilestats
      @tilestats  2 роки тому

      You simply draw the ellipse based on the eigenvectors and eigenvalues of the covariance matrix. I used the package ellipse in R to draw the ellipse but if you like to know the details, I suggest this page:
      www.visiondummy.com/2014/04/draw-error-ellipse-representing-covariance-matrix/#google_vignette

    • @tone5875
      @tone5875 2 роки тому

      @@tilestats thx a lot.

  • @eyupyondem4818
    @eyupyondem4818 2 роки тому

    Hi sir; this is a really nice and clear explanation. However, there may be an incorrect covariance matrix inversion since when I compute the values in R, it gave me another result. X X
    [,1] [,2]
    [1,] 0.72 0.69
    [2,] 0.69 1.00
    > solve(X)
    [,1] [,2]
    [1,] 4.100041 -2.829028
    [2,] -2.829028 2.952030
    > X %*% solve(X)
    [,1] [,2]
    [1,] 1 0
    [2,] 0 1

    • @tilestats
      @tilestats  2 роки тому

      That is because I show rounded values in the covariance matrix. In the first comment below the video, I show the covariance matrix with more decimals.

  • @yd3130
    @yd3130 Рік тому

    Is it the centroid that has to be computed or the mean. I think they aren't always the same, right?

    • @tilestats
      @tilestats  Рік тому +1

      I would say the overall mean in the multivariate space. As you point out, a centroid might have different meanings in different fields.

  • @Unaimend
    @Unaimend Рік тому

    Hi Andreas, could you explain why I should expect a chi-square distribution at 8:26. As always a nice video :)

    • @tilestats
      @tilestats  Рік тому +1

      If you would square the values from a normal distribution, those values will generate a chi-square distribution with 1 df. So, calculations that involve squaring stuff usually result in that we use the chi-square distribution.

    • @Unaimend
      @Unaimend Рік тому

      Thanks for the explanation@@tilestats

  • @lorenzotagliari6699
    @lorenzotagliari6699 5 місяців тому

    I did not understand why the cutoff od 0.001 would not be appropriate in cases when we have many datapoints. Could you clear this up for me?

    • @tilestats
      @tilestats  5 місяців тому +1

      Because, 0.1% of the data points will be outside the ellipse due to chance. If you for example have 1 million data points, you should expect that 1000 are outside the ellipse, right? It would then not be appropriate to define all these as outliers.

  • @cmindaaa
    @cmindaaa 3 роки тому +1

    How do you get 6.45 as the MD for point 2? When I calculate using the same method for point 1, i got back the same MD as point 1

    • @tilestats
      @tilestats  3 роки тому

      Go to minute 6:32, and replace vector [5 5] by [5 1] for data point 2. Try and do the math again and let me know if it works.

    • @cmindaaa
      @cmindaaa 3 роки тому

      @@tilestats Yeap, I have tried and I still did not get it. My workings: [1.9 -2] * matrix * [1.9 -2]. Eventually, I get sqrt(5.080360804). I took 5 - 3.1 = 1.9 and 1 -3 = -2

    • @tilestats
      @tilestats  3 роки тому +1

      @@cmindaaa If you multiply the row vector [1.9 -2 ] by the matrix, you should get the row vector [11.83 -9.56]. If you multiply this row vector by the column vector [1.9 -2.0], you should get the number 41.597. The square root of this number is about 6.45.

    • @cmindaaa
      @cmindaaa 3 роки тому

      @@tilestats omg i got it! thank you so much!!

  • @rambisneves2077
    @rambisneves2077 3 роки тому +1

    Hi Tile, Could you share these points in an excel file?

    • @tilestats
      @tilestats  3 роки тому +1

      I do not have the original data since that was randomly generated. However, the data below should work to reproduce the calculations:
      x=[4.6, 4.4, 3.9, 3.9, 3.8, 3.5, 3.8, 3.4, 3.0, 2.7, 3.7, 3.0, 2.5, 2.2, 2.9, 2.5, 2.3, 2.1, 2.1, 1.5]
      y=[4.6, 4.1, 4.5, 3.9, 3.5, 4.0, 3.3, 3.2, 3.7, 3.5, 2.1, 2.7, 3.1, 3.2, 2.3, 2.0, 2.3, 1.8, 1.4, 1.0]

    • @rambisneves2077
      @rambisneves2077 3 роки тому

      @@tilestatsthanks, What do you think in relation to do the ellipse in the excel file?

  • @Jonathan_wow
    @Jonathan_wow 3 роки тому

    How did you consider the corresponding critical value 13.82 at 9:50 minute of the video if the cut off is 0.001? Can you kindly explain it ?

    • @tilestats
      @tilestats  3 роки тому +2

      If you like a cutoff of 0.001, you should extract the corresponding value from a chi-square distribution, which means that you should extract the value that defines 0.001 of the upper tail. In this example, the area to the right-hand side of 13.82 in a chi-square distribution with 2 degrees of freedom is 0.001. Use a software or a chi-square table to get this value. The cutoff 0.001 is an arbitrary, but common, value to use to detect outliers.

  • @MrTOCSY
    @MrTOCSY 3 роки тому

    Is it correct to calculate the error ellipse for the autoscaled data for PCA calculation?

    • @tilestats
      @tilestats  3 роки тому +1

      Not sure I understand. It would be OK to calculate the error ellipse based on the scores in 2D (if that is what you mean).

    • @MrTOCSY
      @MrTOCSY 3 роки тому

      @@tilestats, yes, I ment 2D score plot. "It would be OK to calculate the error ellipse based on the scores in 2D" But why? The data were previously autoscaled, i.e. were divided by standard deviation. Is it correct to calculate the error ellipse for scores since scores and autoscaled data are DIFFERENT in their own nature?

    • @MrTOCSY
      @MrTOCSY 3 роки тому

      @@tilestats Sorry for the bothering, but you explain transparently and simply. A rare phenomenon if we consider statistics )

    • @tilestats
      @tilestats  3 роки тому +1

      Yes, since scaling does not affect the relative distances between the points. If you create an error ellipse of unscaled data, and you, for example, identify 2 points outside that ellipse, the same points will be outside that ellipse if you scale the data, given that you of course calculate the ellipse on the scaled data. Try this on a simple data set, which will help to understand.

  • @MrTOCSY
    @MrTOCSY 3 роки тому

    Is it correct to calculate MD using correlation matrix instead of covarience matrix?

    • @tilestats
      @tilestats  3 роки тому

      No, you will then not get the correct value, unless you have standardized data, where the covarince and correlation matrix will be identical. Have a look at my video about this:
      ua-cam.com/video/2bcmklvrXTQ/v-deo.html

    • @MrTOCSY
      @MrTOCSY 3 роки тому

      The data are autoscaled. Numerical values of elements of correlation matrix and covariance matrix are equal.

    • @MrTOCSY
      @MrTOCSY 3 роки тому

      And one more question, if I may. If we are up to find an outlier on a 2D score plot of principal components should we use a covariance matrix of SCORES?

    • @tilestats
      @tilestats  3 роки тому

      Yes, but note that PC1 and PC2 are uncorrelated.