Quantile-Quantile Plots (QQ plots), Clearly Explained!!!

Поділитися
Вставка
  • Опубліковано 6 вер 2024
  • Quantile-Quantile (QQ) plots are used to determine if data can be approximated by a statistical distribution. For example, you might collect some data and wonder if it is normally distributed. A QQ plot will help you answer that question. You can also use QQ plots to compare to different datasets that you collected to determine if their distributions are comparable. This video shows you how to do both things.
    NOTE: The data in this video are measures of gene expression. If "gene expression" doesn't mean anything to you, just imagine that the data represents how tall a bunch of people are, or how much they weigh. Then consider the y-axis to be the height or weight of the people, and the x-axis just represents all of the data you collected on a single day. In this case, all of the data were collected on the same day, so they form a single column.
    For a complete index of all the StatQuest videos, check out:
    statquest.org/...
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumr...
    Paperback - www.amazon.com...
    Kindle eBook - www.amazon.com...
    Patreon: / statquest
    ...or...
    UA-cam Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshi...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer....
    ...or just donating to StatQuest!
    www.paypal.me/...
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    Corrections:
    4:35 The Uniform Distribution has one extra quantile
    5:30 I should have said that Quartiles divide the data into 4 parts.
    #statquest #quantile #qqplot

КОМЕНТАРІ • 476

  • @statquest
    @statquest  2 роки тому +5

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @maindepth8830
    @maindepth8830 2 роки тому +58

    that intro alone, made me forget my hate for statistics and instantly fall in love with it

  • @kittyxing
    @kittyxing 4 роки тому +73

    Thanks sooooooo much! This is the only video I found explained the details of generating QQ plot and also make the concept so clear and easy to understand!

    • @statquest
      @statquest  4 роки тому

      Thank you very much! :)

  • @timonveurink6335
    @timonveurink6335 5 років тому +146

    Haven't seen the video yet, but that intro earned you a subscription

    • @angiemycine6509
      @angiemycine6509 4 роки тому +3

      It made me think that the whole video was going to be a song lol. Very interesting nonethless

    • @marioadiez
      @marioadiez 4 роки тому +2

      I'am not suscribed for the plots, but for the music!

    • @setsu2221
      @setsu2221 4 роки тому +1

      That intro hit me hard xD

  • @robertopizziol7459
    @robertopizziol7459 4 роки тому +36

    I was waiting for the "BAAM" all video long, got just a couple of great "HOORAY!".
    Thank you for the awesome channel Josh!

    • @statquest
      @statquest  4 роки тому +10

      You made me laugh. :)

  • @aashishshrivastav9531
    @aashishshrivastav9531 6 років тому +11

    🤔🤔🤔🤔🤔 well I thought that q-q plot was difficult but thanks to you I got it now. thanks and keep it up!!!

  • @jameswhitaker4357
    @jameswhitaker4357 11 місяців тому +1

    Not gonna lie, stats is my super weak spot. You've helped me a lot in my Data Models course and interpreting my results. +1

    • @statquest
      @statquest  11 місяців тому +1

      Happy to help!

    • @jameswhitaker4357
      @jameswhitaker4357 11 місяців тому +1

      @@statquest Thank you! I'm just kicking myself for not taking more stats courses at this point!

    • @statquest
      @statquest  11 місяців тому +1

      @@jameswhitaker4357 My stats courses were all pretty terrible, so you never really know what you're going to get. I had to teach myself statistics, and these videos are how I taught myself.

    • @jameswhitaker4357
      @jameswhitaker4357 11 місяців тому +1

      @@statquest That's what I'm going through right now! I've been using your videos and a "Intro to Statistical Learning with Applications in R" textbook which has helped a lot. I think when I saw terms like "heteroscedasticity" or the crazy formulas I would get scared and put off the studying, until I took a course that required knowing it LOL. And luckily most of these statistical tests and concepts are now pretty easy to perform in programming. Cheers!

    • @statquest
      @statquest  11 місяців тому +2

      @@jameswhitaker4357 I actually wrote a little about heteroscedasticity. Maybe I should record it.

  • @kevon217
    @kevon217 2 роки тому +4

    Couldn’t have asked for more clear explanation, thanks!

  • @navatagames
    @navatagames 2 роки тому +3

    Nice video. Explained everything in just under 7 mins. Awesome.
    😄👍👍

  • @jorenmaes498
    @jorenmaes498 3 місяці тому +1

    I just noticed when you said "please subscribe" at the end of the video, the subscribe button lit up:)

  • @dominicj7977
    @dominicj7977 5 років тому +10

    Can you do a video on normality tests like shapiro wilk and anderson darling? If not anytime soon, can you share link to some good materials?

  • @alisalehi4980
    @alisalehi4980 6 років тому +1

    I really appreciate from your very easy way explanation.
    I faced with so difficult and rough terminologies that I could not even understand the meaning of them.

  • @gianlucalepiscopia3123
    @gianlucalepiscopia3123 2 роки тому +1

    This is very very cool, more likely to learn on UA-cam than in a classroom. Grazie

  • @asmaulhosnanisha4657
    @asmaulhosnanisha4657 3 роки тому +8

    I could have better grades if i had faculties like you...thank you Josh!!

  • @Clarin3t1
    @Clarin3t1 2 роки тому +1

    You had my like at the beginning with the jingle. Thanks for explaining this so well!!

  • @josevaldes7493
    @josevaldes7493 2 роки тому +1

    Triple BAMM! Serious man your channel is pure art. Thanks

  • @biancafeitoza4030
    @biancafeitoza4030 2 місяці тому +1

    Thank you for your help! Greetings from Brazil.

    • @statquest
      @statquest  2 місяці тому +1

      Muito obrigado! :)

  • @kusocm
    @kusocm 4 роки тому +4

    Best intro song, it can be used as a 'mnemonic' for what QQ plots are used for =)

  • @lashlarue7924
    @lashlarue7924 2 роки тому +1

    It's a party time with Josh Starmer and his quantiles! 😆🤘 Party on, Wayne!

  • @heplaysguitar1090
    @heplaysguitar1090 3 роки тому +1

    Explained like a pro.
    Tripple BAM!!!

  • @Dekike2
    @Dekike2 5 років тому +6

    Hi!!! Great video!!!! It was very helpful to understand Q-Q Plots!!!! But just one question, how do you calculate the quantiles for your dataset?? I mean, the first observation of your dataset is 0.6, but I don't understand why, since the first observation leaves 0 observations on one of its sides. Should the quantile be 0? In the video where you explain how to calculate quantiles, you explained that the quantile for each observation is calculated dividing the number of observations that this value leaves below between the total number of observations... So, for the first point... 0/15 = 0. Why 0.6??

    • @statquest
      @statquest  5 років тому +4

      I think I see the confusion here. The x and y-axes on the QQ-plot (on the right side) are labeled "Normal Quantiles" and "Data Quantiles". This is a little misleading - what we are plotting are the values at each quantile, not the quantile name itself. So if the first quantile is called "quantile 0", but it represents -1.5 in the normal distribution and 0.6 in the data, then we draw a dot at -1.5, 0.6 to represent the first quantile. Does that make sense?

    • @Dekike2
      @Dekike2 5 років тому

      @@statquest Perfectly. I understood this after watching some more videos. I would suggest you to clarify this if you make a new version!! As I already told you, congratulations for your videos and of course, your quick reply!! You explain really well, and the videos are perfect (easy to follow and to understand). I'm doing my Ph.D and it is really helpful people like you. Thanks a lot.

    • @Fan-fb4tz
      @Fan-fb4tz 2 роки тому

      @@statquest Thank you very much for all your videos! They help me a lot. Just a follow-up question on this: how can we decide where to start as smallest quantile value in the theoretical distribution? Like you mentioned, "quantile 0" value in the sample distribution is 0.6, but how can it represent -1.5 in the normal distribution? My confusion is normal distribution doesn't technically have "quantile 0" value because it's infinity on the both tails.

    • @statquest
      @statquest  2 роки тому +1

      @@Fan-fb4tz On the left side the first quantile is defined for the first point of 15 data points, meaning that 1/15 of the data is equal to or less than that point. Thus, we find the corresponding point on the normal curve such that 1/15th of the area under the curve is to the left of it.

    • @yangyu5525
      @yangyu5525 8 місяців тому

      @@statquest strictly speaking, the 15 lines(15 data points) divide the whole data into 16 equal groups or parts,So corresponding to normal distribution should be divided into 16 bins so that every bin has the same probability of 1/16 ,right?

  • @MasterMan2015
    @MasterMan2015 6 років тому +8

    Step 3 is not very clear. How do you put the lines on the normal distribution. How do you start putting the lines ? and How about the distance between each two lines ?

    • @statquest
      @statquest  6 років тому +2

      The quantiles for the normal distribution divide it so that the area under the curve between two lines is equal for all of the divisions. Since the normal distribution isn't as tall on the edges, there is more space between lines then in the middle, where the distribution is tall. Thus, the spacing between lines makes the area under the curve between the middle two lines is the same as the area under the curve between lines on the edges.

    • @MasterMan2015
      @MasterMan2015 6 років тому +1

      Thanks! It is easy to see that in the case of Uniform distribution. How about the starting point ? I think it's randomly that you started by -1.5 but I can start from -2 or -1 or ..

    • @statquest
      @statquest  6 років тому +5

      The starting point is defined by the need for each unit between lines to have the same exact area under the curve. To understand what this means, imagine you had to divide a normal distribution with a single line so that 50% of the area under the curve was on the left side of the line and 50% of the area under the curve was on the right side of the line. Where would you draw that line? Well, there is only one choice - right down the middle of the normal curve. If you drew it anywhere else, there would either be more area under the curve on the left side or the right side. Now imagine you had to divide the area under the curve into 4 equal amounts. Again, there is is only one option - you put a line in middle, and then you put another line so that the area under the curve on the left side is divided in half and then a third line so that the area under the curve on the left side is divided in half. Any other locations for those lines will result in the areas under the curve not being equal to each other. Thus, in this example, we have no choice about where to put the lines - they have to be put in the one configuration that makes the area under the curve between every pair of lines equal.

    • @MasterMan2015
      @MasterMan2015 6 років тому +2

      Perfect! got it!

    • @statquest
      @statquest  6 років тому +3

      Hooray!!! :)

  • @sumayyakamal8857
    @sumayyakamal8857 3 роки тому +3

    THANK YOU!!!!!!

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 4 роки тому +2

    great video. a video on the intuition on why q-q plot works might be interesting.

    • @statquest
      @statquest  4 роки тому

      I'll keep that in mind.

  • @robertocannella1881
    @robertocannella1881 2 роки тому +1

    Thanks for all the videos! Great music BTW. Also I'm looking forward to rockin' my new SQ hoodie!

    • @statquest
      @statquest  2 роки тому

      TRIPLE BAM! Thank you for your support!

  • @florence2523
    @florence2523 Місяць тому

    thank you very much for this video. Please a have a few questions to ask.
    1. From your previous video on quantile and percentile the first line was 0% quantile, why does it have a value of 0.6 in this video?
    2. How are you getting the values for the x- axis, and why did it have to range from -2 to 2? Thank you

    • @statquest
      @statquest  Місяць тому

      1) In this video we are plotting the actual values on the y-axis, rather than their quantiles.
      2) The values come from a standard normal distribution (a "standard" normal distribution is a normal distribution with mean = 0 and standard deviation = 1). There are excel functions that will generate the x-axis coordinates from a standard normal distribution for you.

  • @vlakrunn
    @vlakrunn Рік тому +1

    You simply saved my life

  • @sarrae100
    @sarrae100 2 роки тому +1

    How beautiful and simple is that explaination 🥳

  • @matavalamuttej841
    @matavalamuttej841 3 роки тому +2

    You made it very clear man !!! Great doing

  • @sirisudweeks9334
    @sirisudweeks9334 5 років тому +1

    very nicely explained. it was a tricky concept until this video! thanks!

    • @statquest
      @statquest  5 років тому

      Hooray! I'm glad the video helped. :)

  • @julesd3115
    @julesd3115 2 роки тому +1

    Awesome video - thank you SO much for saving my sanity.

  • @pradiptithakur3655
    @pradiptithakur3655 4 роки тому +4

    Awesome video. Explained so clearly. Really helped me a lot!

  • @Danielbassist13
    @Danielbassist13 Рік тому +1

    phenomenal explanation and really cool intro music man!

  • @ThuyPham-yu7cw
    @ThuyPham-yu7cw 4 роки тому +4

    wow, now I can clearly understand it ! thanks alot !

  • @joerich10
    @joerich10 6 років тому +6

    is there a statistical test we can do to determine how far away the dots are allowed to deviate, rather than just eyeballing it? Or is eyeballing good enough? I.e. a stat test that could say 'the chance of these 2 distributions being the same is less than X%

    • @statquest
      @statquest  6 років тому +5

      The "K-S Test" is what you want. However, it is very strict and tends to reject the null too easily. It's one of the few statistical tests where a large p-value (suggesting no difference) is more convincing than a small one. en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

    • @kissapeles
      @kissapeles 2 місяці тому

      @@statquest How were the lines drawn? Least Squares? Maybe doing R^2 calculations can provide an idea?
      Still trying to grasp my statistics a bit better :( :)

  • @user-hv9wx5kd9u
    @user-hv9wx5kd9u 11 місяців тому +1

    Best Explanation ever!!! 🎉🎉🎉

  • @bin4ry_d3struct0r
    @bin4ry_d3struct0r Рік тому +2

    I always wondered how statisticians choose a distribution to which to fit the data when eyeballing it is insufficient. Now I know the answer: QQ-plots. Thank you for this!

  • @anlerkul2988
    @anlerkul2988 2 роки тому +1

    Dear Josh, thank you for this informative video. I have a one question: we are dividing the gene expression data into fifteen and also dividing the normal curve into fifteen. So for example lets take the 3rd data point: 1.9. It is the 3/15 percentile which is 0.2 So in the normal curve when we look at the z-table it should have been -0.84. In your calculations, I am observing that it is -0.89 which is actually 0.1867 percentile in the z-table. Am I missing something?

    • @statquest
      @statquest  2 роки тому +2

      If we want 15 equal sized portions of the normal curve, then, we actually need 16 slices, the extra slice is at positive infinity. This makes it so that the area under the curve to the left of the first vertical bar in the graph is equal to the area under the curve between the 1st and 2nd vertical bar which is equal to the area under the curve between the 2nd and 3rd vertical bar (etc. etc. etc.) So when we do 3/16, we get 0.1875, and the normal quantile for that is -0.8871

    • @anlerkul2988
      @anlerkul2988 2 роки тому +1

      @@statquest thank you very much! Now it seems clear

  • @vineetkaur1667
    @vineetkaur1667 5 місяців тому +1

    Very well explained !

  • @xoda345
    @xoda345 2 роки тому

    Two questions:
    1. How does having more points at the middle make the histograms narrower and the opposite at the ends? If the histograms have more width doesn't that mean that more data points can be placed in that histogram ?
    2. the last example you took 4 data points and said that the distribution has 4 quartiles. Did you say such because there are 4 points ?

    • @statquest
      @statquest  2 роки тому

      1. You might want to review the concept of the normal distribution. ua-cam.com/video/rzFX5NWojp0/v-deo.html The hight of the curve indicates the likelihood of observing a point there. So the wider parts have lower likelihoods of observing points and the taller have a higher likelihood. To make each region have an equal probability of observing a point, we have to have wider regions where the likelihoods are lower and narrower regions where the likelihoods are higher.
      2. yes.

  • @bingxinyan8103
    @bingxinyan8103 2 роки тому +1

    Helpful and easy to undderstand.

  • @alecvan7143
    @alecvan7143 4 роки тому +1

    Best intro by far so far

  • @robertb-l5422
    @robertb-l5422 5 років тому +3

    Very well explained, thanks so much

  • @ehsans2135
    @ehsans2135 3 роки тому +1

    so clear, so good , so nuce thank you , Josh

  • @tymothylim6550
    @tymothylim6550 4 роки тому +2

    Thank you for the video! It was short and easy to understand :)

  • @alanpdrv
    @alanpdrv 2 роки тому +1

    Thanks for this! Finally I understand

  • @danielberkowitz3524
    @danielberkowitz3524 8 місяців тому +1

    How do you know that the smallest quartile on the normal distribution is at -1.5?

    • @statquest
      @statquest  8 місяців тому

      You can get a computer program to figure this out for you. For example, the R programming language has a function that will give you all the quantiles. Presumably MS Excel has a similar function.

  • @fkhan4504
    @fkhan4504 6 років тому +4

    Crytal clear explanation

  • @brayanmurillo4427
    @brayanmurillo4427 2 роки тому +1

    thanks for the explanation, can you clarify this please?: if we have 15 quantiles, then I thought you should plot 14 red lines in the normal distribution and the 15th line should reside in +infinite. and a little question: is the straight line generated by linear regression?

    • @statquest
      @statquest  2 роки тому +1

      Plotting a line at infinity would be hard to do and you can fit the line with regression.

  • @padraiggluck5633
    @padraiggluck5633 4 роки тому +1

    Really excellent presentation, Josh. ⭐️

  • @response2u
    @response2u 2 роки тому +1

    Legendary explanation! Fantastic!

  • @geogeo14000
    @geogeo14000 3 роки тому +1

    And again, thank you for another amazing video ! A little question : most of the points have to fit in the straight line for the data to be considered as normally distributed and at 4:15 you said it is not the case. Althought the intersection points are really close to the line, it does not matter, most of the point have to be strictly ON the line, right ? The fact that other intersection points are close or far from the line does not give any relevant information ?

    • @statquest
      @statquest  3 роки тому +1

      I'm not sure I understand your question. For more details on how to interpret QQ-plots, see: stats.stackexchange.com/questions/101274/how-to-interpret-a-qq-plot

    • @geogeo14000
      @geogeo14000 3 роки тому +1

      @@statquest ok thank you !

  • @rionaalmeida7376
    @rionaalmeida7376 3 роки тому

    The name of the channel should be "Dumbing Down Probability for Dummies". I don't know whether I like the intro song better or that simple explanation.

    • @statquest
      @statquest  3 роки тому +1

      Personally, I like to think that rather than dumbing down the material, I bring people up so that they can understand the tools and techniques in data analysis. I think "dumbing down" suggests watering down the content, as if I am presenting a simplified version of how statistics really works. That's not what happens in my videos. This is the real deal. It's just explained in a way that is relatively easy to understand and that brings people up.

    • @rionaalmeida7376
      @rionaalmeida7376 3 роки тому +1

      @@statquest Understood. The way you teach not only makes it comprehensible but also ensures it sticks to the head!

  • @tawkameyu
    @tawkameyu 4 роки тому +4

    It just saved me, the person who did this => you're the best

  • @ananyaagarwal7108
    @ananyaagarwal7108 2 роки тому +1

    Hi Josh, Amazing Video there :) Just want to understand the intuition behind the working of QQ plots ? Is it the fact that quantiles of every normal distribution are just scaled up values of a standard normal distribution and that is why we expect a straight line ?

    • @statquest
      @statquest  2 роки тому +2

      Pretty much

    • @ananyaagarwal7108
      @ananyaagarwal7108 2 роки тому

      @@statquest Thanks for the response ;) Would really appreciate if you could make something on the same or share some content that could explain the intuition behind QQ Plots.

  • @Atomflinga
    @Atomflinga 5 років тому +1

    What's the approach for determining which distribution has the best fit for the data? Would the r-squared of the data against the straight line be a suitable measure for how well the distribution describes the data?

    • @statquest
      @statquest  5 років тому

      This is a good question, and, to be honest, I'm not sure what the answer is. I like your idea, but it may oversimplify the problem. i.e. you could get a high R^squared value, but still have some real obvious problems if you looked at it visually.

  • @jalbertomendivil
    @jalbertomendivil 2 роки тому

    I know it may sound dumb but i just got it when i understood that theoretical quantiles were the quantiles of a normal standard distribution or Z-value.

  • @urjaswitayadav3188
    @urjaswitayadav3188 6 років тому +1

    Thanks for the great explanation as always! So QQ is just a way to plot and visualize the similarity of two distributions? Are there any other scenarios when these can be used? Thanks!!

    • @TheAbhimait
      @TheAbhimait 4 роки тому

      QQ is mostly used to check tail conditions. Density plots and cumulative plots are the best way to check distribution symmetry.

  • @MeWatchingYouTubeVideos
    @MeWatchingYouTubeVideos Рік тому +1

    How helpful! Thanks a lot for your amazing videos

  • @RamanGatekeeper
    @RamanGatekeeper 5 років тому +1

    Hi, thanks for video but could you please add in the description wtf is gene expression and what mean x and y on the 0:37 graph. For the time being I see 15 data points and no idea why they are shown in this way, thank you. Maybe some simpler example instead of gene expression?

    • @statquest
      @statquest  5 років тому +1

      OK. I added this to the description:
      NOTE: The data in this video are measures of gene expression. If "gene expression" doesn't mean anything to you, just imagine that the data represents how tall a bunch of people are, or how much they weigh. Then consider the y-axis to be the height or weight of the people, and the x-axis just represents all of the data you collected on a single day. In this case, all of the data were collected on the same day, so they form a single column.

    • @RamanGatekeeper
      @RamanGatekeeper 5 років тому

      @@statquest thanks for your effort!

  • @guillemvia6813
    @guillemvia6813 5 років тому +1

    Awesomely explained! Good job!

  • @pfever
    @pfever 4 роки тому +1

    4:35 I think there is an extra quantile drawn on the Uniform distribution

    • @statquest
      @statquest  4 роки тому +1

      You are correct. Thanks for spotting that.

  • @Cozmaus
    @Cozmaus 11 місяців тому

    Actuary studies is something else bro

  • @khanghuy7384
    @khanghuy7384 3 місяці тому +1

    u saved my life

  • @thechickendiet
    @thechickendiet 4 роки тому +1

    very clear with great examples!

  • @pratapseshachalam2859
    @pratapseshachalam2859 5 років тому +1

    very nice video. What's the difference between normal and uniform distribution? I thought both are same

    • @statquest
      @statquest  5 років тому

      Normal Distribution: en.wikipedia.org/wiki/Normal_distribution
      Uniform Distribution: en.wikipedia.org/wiki/Uniform_distribution_(continuous)

  • @Arriyad1
    @Arriyad1 4 дні тому

    If, by visual inspection, we suspect that the distribution is D, how do we get the quantiles of D? I can imagine that, if the CDF of D is inversible (like the CDF of the exponential distribution, or a sigmoid) then we can compute quantiles. But what about quite arbitrary distributions ?

    • @statquest
      @statquest  4 дні тому

      I'm not sure. Pretty much all of the standard distributions can be inverted.

  • @Brockdorf
    @Brockdorf 3 роки тому +1

    the song alone is worth it

  • @user-wx4vf5gj2f
    @user-wx4vf5gj2f 5 років тому +2

    Thanks for saving my life

  • @joaovasconcelos5360
    @joaovasconcelos5360 2 роки тому +1

    Your videos are awesome, thank you so much!

  • @Shuffellove
    @Shuffellove 5 років тому +2

    i love statquest!

  • @piotrszocik7775
    @piotrszocik7775 4 роки тому +1

    Great explanation, have a nice day :)

  • @Shred427
    @Shred427 Рік тому +1

    such an awesome video, thanks!

  • @fantube7511
    @fantube7511 2 роки тому +1

    Best intro 🔥

  • @alexgimeno170
    @alexgimeno170 4 роки тому +1

    Understand it now - thank you!

  • @user-ii5ch8nw6s
    @user-ii5ch8nw6s 6 років тому

    It's so clear! Thanks a lot for your video.

  • @aj_actuarial_ca
    @aj_actuarial_ca Рік тому

    Thanks a lot for the wonderful explanation!

  • @gayathrikurada3315
    @gayathrikurada3315 4 роки тому +1

    Hi Josh, can we use percentiles in place of quantiles to plot QQ plot ? If so, in case of percentiles we can only have upto hundred percentile no matter how big our data is then how to have a definitive answer whether or not the 2 datasets have similar distributions as mention in the video at 6:30 ?

    • @statquest
      @statquest  4 роки тому

      The terms "quantiles" and "percentiles" are often used interchangeably, and in this case you can swap out quantiles for percentiles. And you can have as many percentiles as you want - however, the largest percentile is always 100. For example, you could have the 0.5 percentile, or the 1.23 percentile.

    • @gayathrikurada3315
      @gayathrikurada3315 4 роки тому +1

      @@statquest Thanks Josh.

  • @Yambaization
    @Yambaization 5 років тому +1

    5:30
    I am confused... I thought that quartiles are three (not four) values, which divide the dataset into four equal numbers of data points.
    In your example you say that four data points are quartiles? 🤔

    • @statquest
      @statquest  5 років тому +3

      Oops. That's a mistake. Quartiles divide the data into 4 parts.

  • @schiu867
    @schiu867 2 роки тому +1

    It helps a lot. Thanks!

  • @dioic13
    @dioic13 5 років тому +1

    Nice lecture, but how do u identify boundary conditions, like - 1.5, for normal distribution?

    • @statquest
      @statquest  5 років тому +2

      This is explained at the very start of the video at 0:38. We have 15 data points, so our data have 15 quantiles. We then divide the normal distribution into 15 quantiles. Each quantile should have an equal probability - thus, with the normal distribution, the quantiles on the edge are relatively far apart, to compensate for the relatively low probability of observing a value out there. In the middle of the curve, the quantiles are close together since there is a higher probability of observing a value there. Since each quantile has to have the same probability, then there is only one way to configure the 15 lines that we draw. If that last part doesn't make sense, then just imagine we only had one quantile - so we needed to divide the normal distribution into two equal parts. Where would we put the line? Well, there's no choice involved here because there is only one location for that line - right in the middle. Similarly, when we have to divide the normal distribution into 15 equal parts, there isn't a choice about where to put the lines, there is only one option.

  • @FlopMeister71
    @FlopMeister71 6 років тому +1

    Hi, I understand how the quantile points are plotted wrt observed vs theoretical distributions, what I don't understand is what determines the slope of the straight line. While this is fairly intutitive for a normal distribution, for say the Weibull distribution I am unclear how the slope of the striaght line is used to determine whether the observed vs theoretical quantiles are a good fit for a given distribution. Any ideas?

    • @ngocnguyen9517
      @ngocnguyen9517 2 роки тому

      I came here for the same question and left with no answer LOL

  • @tallwaters9708
    @tallwaters9708 6 років тому

    Thanks for the videos, if you're still looking for ideas how about k-l divergence?

  • @hebaebrahem7893
    @hebaebrahem7893 5 років тому

    Your videos are cool and concise , thank you .

  • @km2052
    @km2052 6 років тому

    thanks , awesone , this is useful in measuring gene expression effect

  • @paulpaschert6215
    @paulpaschert6215 5 років тому +2

    QQ is really handy

  • @arneoosterlinck7590
    @arneoosterlinck7590 5 років тому +1

    Great explanation, thanks!

  • @miakheirkhah373
    @miakheirkhah373 2 роки тому

    Thanks for explanation :))))) , one question , how we determine the 4 quartiles in first dataset for last case?

    • @statquest
      @statquest  2 роки тому +1

      We divide the data into 4 equal sized pieces.

    • @miakheirkhah373
      @miakheirkhah373 2 роки тому +1

      @@statquest oo Tnx...

  • @thehuman5214
    @thehuman5214 2 роки тому

    It's hard to understand Step 3. How should I add same number of quantiles? Where exactly to draw the lines?

    • @statquest
      @statquest  2 роки тому

      In R, you simply call the qnorm() function and it tells you where the draw the lines. For example, for this video, I used the call: qnorm(1:16/16, mean=0, sd=1) to see where to draw the lines. In MS Excel, I believe you can use the NORM.S.INV function. For details, see: www.statology.org/q-q-plot-excel/

  • @Han-ve8uh
    @Han-ve8uh 3 роки тому

    This video explained how to compare a discrete sample to a continuous normal, and to a discrete but smaller sample. What if we wanted to compare continuous sample vs continuous theoretical, is there an analogous qqplot for that? Or the term "continuous sample" is an oxymoron since "sample" means discrete already no matter sampled from theoretically discrete or continuous distribution?

    • @statquest
      @statquest  3 роки тому +1

      In the example, the "discrete" values come from a continuous distribution, so there is no need to do anything special that is not described in this video.

  • @richardanderton
    @richardanderton 5 років тому +1

    Josh, Great video... very helpful. It looks like you might have a slight error when comparing the 2 dataset distributions however. I could be wrong but I think your second plot is incorrect on the chart.

    • @statquest
      @statquest  5 років тому

      I think I know what you're talking about. The second point should be at 5.1 on the x-axis but is only at 4.1. Is that it? That's a typo.

    • @statquest
      @statquest  5 років тому +1

      By the way, the long term plan is to correct/update these videos. Just like textbooks have "new and revised editions", I'd like to have new and revised additions of these videos - so your feedback is helpful and appreciated. I hope that once the channel grows, youtube will give me some options for how to release revised videos - right now I have no options, but I'm also relatively small potatoes. So I can't fix the video right now, but one day I will.

    • @richardanderton
      @richardanderton 5 років тому +1

      @@statquest Yes that's it.

    • @richardanderton
      @richardanderton 5 років тому +1

      @@statquest No problem. Your viewers would certainly appreciate your easy to understand videos. I just wanted to check I understood and maybe other users will find the note in the comments useful.

  • @wenweipeng7056
    @wenweipeng7056 5 років тому +1

    Do I need to matter the exact size size or probability when dividing the contribution? Or just need to only make sure the sizes are equal?

  • @cococnk388
    @cococnk388 2 роки тому

    Your video does explain the concept well Thanks.
    But, I can't do a plot from your video... I will not know how to start with a data set at hand.
    Can you explain step by step... on how to generate QQ plot?

    • @statquest
      @statquest  2 роки тому

      To actually draw one, see: www.r-bloggers.com/2021/06/qq-plots-in-r-quantile-quantile-plots-quick-start-guide/

  • @snehasampath3486
    @snehasampath3486 Рік тому

    Hello, the video was amazing and I was able to get an idea of QQ plots. I do have a question though. How do we draw the normal distribution and uniform distribution? Is it just random?

    • @statquest
      @statquest  Рік тому

      The normal and uniform distributions are well defined by equations. So we just plug numbers into them to get the values out.

  • @mahdimohammadalipour3077
    @mahdimohammadalipour3077 2 роки тому

    In the last example, shouldn't we have drawn a straight line goes through the origin ??

    • @statquest
      @statquest  2 роки тому +1

      No, because the scale of the x and y-axes is based on the original scale of the data.

  • @wanhope3660
    @wanhope3660 6 років тому

    Sweet, its not that difficult to grasp anymore! Thanks

  • @jvjjjvvv9157
    @jvjjjvvv9157 3 роки тому

    In the example with the normal distribution you have 15 lines dividing the probability, but in the example with the uniform distribution you have 16 lines dividing the probability. Or, in other words, in the first example the last line leaves a probability of 14/15 to its left side, but in the second example the last line leaves a probability of 1 to its left side.
    ¿?

    • @statquest
      @statquest  3 роки тому

      That last line on the uniform distribution might be a typo. The idea is that you divide the distribution into the same number of equally sized pieces (where size is defined by probability) as you have data.

  • @alifia276
    @alifia276 3 роки тому +1

    Thank you! Awesome explanation

    • @statquest
      @statquest  3 роки тому

      Thank you! :)

    • @bharathkumar5870
      @bharathkumar5870 3 роки тому

      i have a doubt...why to use this method,instead just plot the points and see if it forms a bell curve....correct me

    • @statquest
      @statquest  3 роки тому +1

      @@bharathkumar5870 I'm not sure I understand your question. Are you asking, "why don't we just create a histogram with the data and see if the histogram looks like a normal distribution"? If so, histograms can be very tricky in terms of selecting the correct bin size. In contrast, with a q-q plot we don't have to worry about optimizing a bin size or anything else.

    • @bharathkumar5870
      @bharathkumar5870 3 роки тому +1

      @@statquest thank you sir ..u cleared my doubt. Different bins give different distributions😀

  • @IGragon
    @IGragon 2 роки тому +1

    What a great song :)

  • @vamshikrishna298
    @vamshikrishna298 4 роки тому

    Excellent comparison with Sample and normal distribution ....Subscribed..:) I have a dought can we take any normal distribution mean N(0,1) OR N(0,4) ?

    • @statquest
      @statquest  4 роки тому

      Any normal distribution will do, no matter what the standard deviation is, because the quantiles represent the same thing and the x-axis will be scaled accordingly.

    • @vamshikrishna298
      @vamshikrishna298 4 роки тому

      @@statquest I have a sample question related to Percentile calculations...can u share a mail I will share my understanding ?