Three suggestions: 1) A dedicated play list about distributions, which also talks about the relationships between them (i.e. How taking the limit of Bernoulli yields Poisson or conditioning Poisson yields Bernoulli etc...) 2) Another playlist for stochastic processes 3) In each playlist, tell also the relationships between the elements you are talking about, this shows us the big picture.
As much as I would love for this to happen, these must already be taking a phenomenal amount of time to make. The videos by themselves are gold enough. We can crowdsource this effort and tag the playlists here
This is great. My stats teacher in high school didn't teach from first principles and I didn't care that he didn't but this is such a simple explanation. Thanks!
Hi Josh! thank you for doing all these videos in your channel. I also hope to see a playlist of distributions and their relationships. that is, the big map of distributions.
Please make another book and explain all of these statistics fundamentals at one place. the ML book doesn't cover all of the concepts. It's gonna save me a lot of time for note taking.
Is distribution same as probability distribution? I mean what is the difference between a frequency distribution(histogram of all achieved observations) and probability distribution? This point has been vague in my head since when I was studying in School
Hi Josh, I just realized you don't have any videos related to PDF (or PMF) and CDF, or at least I couldn't find them so far. I would really appreciate it if you can make a video about them as well!
Will you be doing Markov Chain Monte Carlo type of video? Although it's just for my own, but that would be great for others to learn from it if there will be one =)
Hi Sir You are the only person on UA-cam I have ever seen who resolved their subscribers or non subscribers issues. Whenever I ask you reply to me Thank you for that. I have one question which is bothering me a lot. During probability distribution examples we have given parameters of distributions in our examples. So my questions are: 1. In the real world we don't have given parameters with us, we always have sample data with us? So how we get that given parameters in real world scenario. 2. As in question 1 i said we always have sample data with us not the entire population but in the probability distribution function we are passing parameters like mu and Sigma in case of normal probability distribution, so those parameters are population parameters or estimated population parameters from sample data? Thank you in advance
Bam! that was a great explanation. I have one silly question: Is there any difference between a statistical distribution and probability distribution? Thanks in advance!
Dear Josh, Thank you for the clear explanation of probability distributions; it helped me recall sampling distributions. May I ask a question? When referring to the normal distribution as an example, does 'sampling distribution' specifically mean SD and MEAN, or does it refer to SE and mean of the mean? Or are those both the sampling distribution?"
Here's an example. Say like we collected 8 measurements from a normal distribution and calculated the mean. Then we repeated the process (collected another 8 measurements and calculated the mean), then the collection means would be the "sampling distribution of the means". The standard deviation of that sampling distribution would be the standard error of the mean.
Wish you had explained all the fundamentals in a sequence. Like for us as beginners we are unable to identify which video should be watched first? Can you make a sequence or guide us from where to start?
This is a good question, since choosing the right bin size has a large effect on what the histogram will look like. The strange thing, however, is you can have more bins then you have data. For example, imagine you wanted to draw a histogram where values on the x-axis could be anywhere between 1 and 100. Now imagine each bin was one unit wide, so we had 100 bins. Now imagine we got 50 samples, but all of the values were between 45 and 55. The histogram would have a mound of data in the middle, but nothing on the edges. Does that make sense?
By chance, do you have any material on Bayesian Inference? I'm trying to understand Expectation Maximization (EM) but it's wrecking my mind... all your videos are incredible btw :)
EM algorithm · Initialize (randomly) · Iterate until convergence
- Expectation step: estimate the membership Zji using the θ of the last iteration - Maximization step: update the parameters of the distributions θ,π using zji Soft version of the k-means algorithm · Cluster probabilities Zji instead of choosing the next centroid · Use all data points (weighted by cluster probabilties) to re-estimate the centroids (means, in addition also the covariance matrix)
Great video as usual please keep them coming but how we can make sure that this data follows this particular type of distribution or it is well defined by this curve before making any inference about population from sample when we don't have enough data in our sample ??
This is the million dollar question!!!! How do you pick the right distribution? Sometimes it's just known. For example, so many people have flipped so many coins over the years that we know that flipping a coin a bunch of times follows a binomial distribution. Likewise, sometimes we can make a basic assumption, that a coin should land heads 50% of the time on average, and then just work out the math and essentially derive a binomial distribution from scratch. However, sometimes it's not so obvious or easy - as a result, people can use the "wrong" distribution. In my field (genetics), people used a Poisson distribution to model something (RNA-seq data) for years before discovering that the Poisson distribution didn't allow for enough variation in the measurements, so they switched to a negative-binomial distribution. So, here's my advice for selecting the "correct" distribution: 1) See if other people have looked at this type of data before, if so, see what distribution they used. 2) If it's new, you can collect tons of data and that will tell you, or you can think really hard about the data and what's generating it and that might give you a clue about what sort of distribution you should use. When all else fails, there are always "non-parametric" methods that are just statistics methods that do not assume you know what the distribution is.
StatQuest with Josh Starmer Hey Josh, thank you so much for clarification. Just want to know a little more about the answer you have given - ● As you mentioned in the 2nd step of process, do try to collect more data that helps you to know about the distribution of data. Right ? What if i don't have access to tons of data ? ● And one more question which i want which is related to normal distribution and that is - when we say the data is normally distributed, it means that our data is following the bell shaped curve but the bell shaped curve in this case represent what the intuitive curve we visualize when we draw the histogram of that data or when we draw the `PDF` of that data ? ● One request from my side - if possible please try to make videos on non parametric tests. Thank you
so as I looked at this I tried to figure out why someone just wouldn't use a line graph or such. Then I realized that maybe an important thing to mention is that this histograph is only measuring one characteristic and not 2. So we are not measuring age and height as having some sort of relationship where as you get older your height goes up. But we are just finding how often a certain range of one particular characteristic occurs. If my assumption is correct would this be a good thing to mention in the video? Also would you ever attach some questions along with each video to help provoke thought?
That's the magic of the curve - it's it is - it is and gives us a sense of what the histogram would look like if we had the time and money to measure everyone in the planet. If so, there wouldn't be a gap there.
It depends on the curve you want to draw. If you want to draw a normal distribution, then you can plug in the mean and standard deviation of your data into the equation for a normal curve and... bam! you'll get a curve.
If I'd had this when I was at school and university, I might have done some more statistics and probability, instead of running a mile because it seemed so strange compared with the rest of maths
For example I have some data, then in mini tab I find my data distribution, for example it has normal or poisson or F or T distribution. I want to know what I can do or what I understand, when I find my data distribution?
What do you do with values that fall exactly between 2 bins?? like if the bins are 4.5 to 5.5 and 5.5 to 6.5, where do you put the value of 5.5? Which bin? please help
Sort of like rounding, you just have to decide in advance which way things like that will go - there is no "right" answer. You could decide to always round down, so 5.5 would go into the 5 bin, or you could decide to round up, so 5.5 would go into the 6 bin. It's up to you. Another thing that you have to fiddle around with is how wide the bins should be - different widths can result in different looking distributions - so it's a good idea to try a few and see which one makes the most sense.
@@statquest i didnt expect a reply either, (glad you did) first time(today) watching your vedios and i love the homemade feel of the intro, also i have question about standerd deviation, all the vedios i searched focuses on how to apply it, but i dont yet have a vlear picture of what it is, what i undestood so far : standerd deviation is the messure of spread(in a normal distribution) and there is a formula to calculate it fron a given mean, my question is, how do we derive the formula? i like to think of mean as the mid point so sum/n makes sence.(as division is opposite of repeated sum) similar to that, when i look at the formula of std, i see we find the average of squre the distance between the mean and x then we find the root of the result. is this a geuss to square and the find the root? or we could have cubed and find cuberoot? thanks for your time.
Hey josh, do you have a good link to explain how to integrate the Gaussian curve to arrive at probabilities? I would like to learn how to do it by hand. I understand calculus. Thanls again for the great content!
I don't have one on hand. However, I remember that it has a trick - you have to do a substitution to make it work. To quote Roger Berger (the guy that taught me this): If you know the trick, integrating the normal curve is easy. If you don't know the trick, you'll never figure it out.
The parameters we are supplying into pdf or pmf are the parameters but actually we are not collecting data for population then how in question we have given that parameters?
Hi Josh, sorry for asking a perhaps obvious question, but I've been struggling to wrap my head around the area under the curve part. Shouldn't the probability that a particular value is in the given interval be equal to the area under that part divided by the area under the entire curve. I've seen people explain this with a heads/tails uniform distribution, where two events (heads, tails) are on the x axis, and the probability of that event happening on the y axis (0.5). However, how does this all translate to literal values, such as the number of people with a certain height...
@@statquest Yeah, thanks, I had a few things mixed up. When you had shown the histogram, I thought of the typical ones where the y axis represented the number of things that fall into each range, not the percentage of things. I was confused as to how the area could be equal to one. Thanks for the answer nonetheless!
@@leonandorfi5191 my two cents. when we are talking about the histogram, then the area is not equal to 1 but when we are talking about the bell shaped probability distribution curve, then the area under the curve is 1. moreover in histogram there is no curve, so we cant say technically "area under the CURVE" for histogram. am i correct? i dont know LOL
@ thank u but is it possible to present raw data from a real life example and take us through them to the point of probability distribution so we could appreciate the usefulness of probability distribution? This would be of great help. Thx
@@kowtharhassan882 This is actually a topic I cover in my book. We talk about the limitations of just using a histogram and why we would want to switch to a probability distribution.
The formula for the normal curve is kind of complicated. However, if you want to learn how to fit it to data, see: ua-cam.com/video/XepXtl9YKwc/v-deo.html and ua-cam.com/video/Dn6b9fCIUpM/v-deo.html
Doubt it that it would make any sense to calculate missing values using calculus. For 6" people calculus will give us ~1.5. What if in reality the are 20 of such height?
It depends. Of course you can make mistakes, but, believe it or not, height is normally distributed, so we really can use a normal distribution ( ua-cam.com/video/rzFX5NWojp0/v-deo.html ) to impute missing values.
hi Josh, I'm enjoying your videos, but I think it's only the US who uses inches/feet units of measurement. The rest of the world (which is the majority of the world's population) is on centimeters/meters. 😅
This is one of the first videos I ever made, and back then, no one watched my videos other than a few friends based in the US. Since then I've changed to using more universal metrics.
@@statquest thank you Josh, I placed an order in Lulu for a copy of your book (I believe it's on its way). You're a good teacher. I like to learn visually, and your diagrams are the best I've found so far.
@@statquest Thanks Josh, got the book within a week of placing the order. It is excellent. It reads as easy as reading a comic book, and the images are the best way to explain this topic. I'm also running some examples with TensorFlow, image classification with Fashion MINST, and learning to use the OpenAI API. I'm trying to understand when to use the different activation and loss functions, and architectures for Neural Networks. I didn't have a background in Statistics, so your book helps me with that. But I knew Linear Algebra and Gradients, and was happy to see this again.
@@DiegoGuillen-p3z Awesome! For activation functions, unless you want to do something very specific, people just use the ReLU. To see examples of doing something specific, see my video on LSTMs: ua-cam.com/video/YCzL96nL7j0/v-deo.html
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
Never understood the relationship between histograms and distributions until now and I did stats throughout my undergrad. This is incredible.
bam! :)
@@statquest Bam Bam :)
Three suggestions:
1) A dedicated play list about distributions, which also talks about the relationships between them (i.e. How taking the limit of Bernoulli yields Poisson or conditioning Poisson yields Bernoulli etc...)
2) Another playlist for stochastic processes
3) In each playlist, tell also the relationships between the elements you are talking about, this shows us the big picture.
As much as I would love for this to happen, these must already be taking a phenomenal amount of time to make. The videos by themselves are gold enough. We can crowdsource this effort and tag the playlists here
This is great. My stats teacher in high school didn't teach from first principles and I didn't care that he didn't but this is such a simple explanation. Thanks!
Thanks!
clearly you are one of the best statisticians in the world!! You are definitely providing world class education that too for free!
Thanks!
Imagine you're at a wild and crazy dance party and you overhear someone talking about statistics. Epic 😂👌
Bam! :)
Dag!!😮
I have talked about recursion in programming at a party. The person walked away from me while I was still talking
Best channel for statistics among the ones I have came across so far.
Thank you! :)
@@statquest You are welcome
A step closer to being a data scientist.
How far has the journey come?
😂😂😂
Awesome
Are you a Data scientist now?
how is it going?
You've an amazing way of breaking down things and I can't believe how entertaining you made it.
Thank you very much!
Thanks a lot sir I learned a lot from these videos for my M.Sc.Psychology Exam from India
Thank you!
What a great video,the bins explanation solved a question i had but didnt know how to explain,you deserve a nobel prize
Thank you!
You lietrally open my brain to put stuff in there! Gratefiul I am alive in your era
BAM! :)
I read about it on wikipedia and many other websites but didn't understand anything but this video made the concept clear.
Hooray! I'm glad to hear that the video was helpful. :)
Hi Josh! thank you for doing all these videos in your channel. I also hope to see a playlist of distributions and their relationships. that is, the big map of distributions.
keep up the good work man. Your the saviour of many college students
Thanks, will do!
You don't know how much this helped me...
Awesome! :)
Crystal clear! Thank you for making this video short!
I'm glad to hear you liked the video! :)
I am doing my PhD now, but still have to study this lolll
it's the basics that is hard to grasp .. anyone can regurgitate the complex terms by mugging it up but knowing the why is very hard
@@mathavraj9378 Agreed. Reviewing the basics after completing an MSc Statistics course.
Thought I was the only one lacking basic stat aftr doing masters😂
you are so amazing Josh, thanks for creating this UA-cam channel!!!
Thanks!
Wow is this one of the OG statquests
Totally. It's a classic.
Finally I understand. LOL At the end of the semester where it almost doesn't matter anymore.
Thank you. lol
awesome explanation, one of the best i have ever come across :)
Thank you! :)
This is one of the best videos.
Thanks!
Awesome Work Josh. It would be great to have a video about the KERNEL DENSITY clearly explained. Thanks
I'll keep that in mind.
I'd really love to see a video on skewness and kurtosis of a curve and how the two are interrelated from you!! ❤
I'll keep those topics in mind.
Made my concepts clear with ease, And Now I have subscribed SQ.❤
Thank you!
Please make another book and explain all of these statistics fundamentals at one place. the ML book doesn't cover all of the concepts. It's gonna save me a lot of time for note taking.
I'd love to do that one day.
Thanks!!!! I FINALLY UNDERSTAND THIS!!!
Hooray! :)
Extraordinary explaination ...
Thanks!
Is distribution same as probability distribution?
I mean what is the difference between a frequency distribution(histogram of all achieved observations) and probability distribution?
This point has been vague in my head since when I was studying in School
If you normalize your histogram so that the sum of all of the columns is 1, then you it is also a probability distribution.
OK so just to confirm frequency distribution when normalized gives probability distribution. Right?
@@hetanshthakore5886 yep
Hi Josh, I just realized you don't have any videos related to PDF (or PMF) and CDF, or at least I couldn't find them so far. I would really appreciate it if you can make a video about them as well!
Great ideas!
Lots of love sir.... Can you please do this same things to explain calculus ... Or just suggest any channel who deals good with mathematics... 😊
Check out 3 blue 1 brown: ua-cam.com/channels/YO_jab_esuFRV4b17AJtAw.html
Will you be doing Markov Chain Monte Carlo type of video? Although it's just for my own, but that would be great for others to learn from it if there will be one =)
It's on the to-do list, but I'll move it up a little bit.
@@minhtoto1542 It's on the to-do list, but I am slow and have a lot of ground to cover before I get there.
Hi Sir
You are the only person on UA-cam I have ever seen who resolved their subscribers or non subscribers issues. Whenever I ask you reply to me Thank you for that.
I have one question which is bothering me a lot.
During probability distribution examples we have given parameters of distributions in our examples.
So my questions are:
1. In the real world we don't have given parameters with us, we always have sample data with us? So how we get that given parameters in real world scenario.
2. As in question 1 i said we always have sample data with us not the entire population but in the probability distribution function we are passing parameters like mu and Sigma in case of normal probability distribution, so those parameters are population parameters or estimated population parameters from sample data?
Thank you in advance
The answer to your questions are in these videos: ua-cam.com/video/vikkiwjQqfU/v-deo.html and ua-cam.com/video/SzZ6GpcfoQY/v-deo.html
Statquest = Noice 👍
Thanks!
@@statquest You're welcome :)
Bam! that was a great explanation. I have one silly question: Is there any difference between a statistical distribution and probability distribution?
Thanks in advance!
They are one and the same.
StatQuest with Josh Starmer Thank you for the response.
thank you sir, great explanation.
Thanks!
@@statquest no sir, i am the thankful one here, yours's and many other teacher's videos on UA-cam really help me understand the subject. 😊👍
Dear Josh, Thank you for the clear explanation of probability distributions; it helped me recall sampling distributions. May I ask a question? When referring to the normal distribution as an example, does 'sampling distribution' specifically mean SD and MEAN, or does it refer to SE and mean of the mean? Or are those both the sampling distribution?"
Here's an example. Say like we collected 8 measurements from a normal distribution and calculated the mean. Then we repeated the process (collected another 8 measurements and calculated the mean), then the collection means would be the "sampling distribution of the means". The standard deviation of that sampling distribution would be the standard error of the mean.
Every time your reply makes me more confident in statistics, thank you@@statquest
Awesome tutorial thanks
Love the explanation and even the intro of the video 🤣 Thanks a lot Friendly Folks! 👍🏽👍🏽
Thank you! :)
You are simply awesome
Thank you! :)
Your my cuppy cake, love you soo, Thank you, Sir
:)
great leaning spot for statics
Thanks!
Very good
Thanks!
Thank you!
:)
Wish you had explained all the fundamentals in a sequence. Like for us as beginners we are unable to identify which video should be watched first? Can you make a sequence or guide us from where to start?
Sure, you can find all of my videos, in sequence, here: statquest.org/video-index/
We need a dedicated playlist on distributions and the different types
That's a great idea. Some what you can can be found here: app.learney.me/maps/StatQuest but it could be better.
How much smaller with the bins can we go. There has to be a limit between choosing number of bins 1 and number of bins equal to the size of sample.
This is a good question, since choosing the right bin size has a large effect on what the histogram will look like. The strange thing, however, is you can have more bins then you have data. For example, imagine you wanted to draw a histogram where values on the x-axis could be anywhere between 1 and 100. Now imagine each bin was one unit wide, so we had 100 bins. Now imagine we got 50 samples, but all of the values were between 45 and 55. The histogram would have a mound of data in the middle, but nothing on the edges. Does that make sense?
Thank you Sir
bam! :)
By chance, do you have any material on Bayesian Inference? I'm trying to understand Expectation Maximization (EM) but it's wrecking my mind... all your videos are incredible btw :)
EM algorithm
·
Initialize (randomly)
· Iterate until convergence
- Expectation step: estimate the membership Zji using the θ of the last iteration
- Maximization step: update the parameters of the distributions θ,π using zji
Soft version of the k-means algorithm
· Cluster probabilities Zji instead of choosing the next centroid
· Use all data points (weighted by cluster probabilties) to re-estimate the centroids (means, in addition also the covariance matrix)
This is from my lecture slides...
Great video as usual please keep them coming but how we can make sure that this data follows this particular type of distribution or it is well defined by this curve before making any inference about population from sample when we don't have enough data in our sample ??
This is the million dollar question!!!! How do you pick the right distribution? Sometimes it's just known. For example, so many people have flipped so many coins over the years that we know that flipping a coin a bunch of times follows a binomial distribution. Likewise, sometimes we can make a basic assumption, that a coin should land heads 50% of the time on average, and then just work out the math and essentially derive a binomial distribution from scratch. However, sometimes it's not so obvious or easy - as a result, people can use the "wrong" distribution. In my field (genetics), people used a Poisson distribution to model something (RNA-seq data) for years before discovering that the Poisson distribution didn't allow for enough variation in the measurements, so they switched to a negative-binomial distribution.
So, here's my advice for selecting the "correct" distribution: 1) See if other people have looked at this type of data before, if so, see what distribution they used. 2) If it's new, you can collect tons of data and that will tell you, or you can think really hard about the data and what's generating it and that might give you a clue about what sort of distribution you should use. When all else fails, there are always "non-parametric" methods that are just statistics methods that do not assume you know what the distribution is.
StatQuest with Josh Starmer Hey Josh, thank you so much for clarification. Just want to know a little more about the answer you have given -
● As you mentioned in the 2nd step of process, do try to collect more data that helps you to know about the distribution of data. Right ? What if i don't have access to tons of data ?
● And one more question which i want which is related to normal distribution and that is - when we say the data is normally distributed, it means that our data is following the bell shaped curve but the bell shaped curve in this case represent what the intuitive curve we visualize when we draw the histogram of that data or when we draw the `PDF` of that data ?
● One request from my side - if possible please try to make videos on non parametric tests.
Thank you
I'm not afraid of Statistics any more. Thanks.
bam!
@@statquest XD
so as I looked at this I tried to figure out why someone just wouldn't use a line graph or such. Then I realized that maybe an important thing to mention is that this histograph is only measuring one characteristic and not 2. So we are not measuring age and height as having some sort of relationship where as you get older your height goes up. But we are just finding how often a certain range of one particular characteristic occurs.
If my assumption is correct would this be a good thing to mention in the video?
Also would you ever attach some questions along with each video to help provoke thought?
3:25 since the bin is empty, shouldn't the curve go down to 0 for that particular bin and again go up?
That's the magic of the curve - it's it is - it is and gives us a sense of what the histogram would look like if we had the time and money to measure everyone in the planet. If so, there wouldn't be a gap there.
How do we draw a curve? By joining the mid points of the bars of histogram?
It depends on the curve you want to draw. If you want to draw a normal distribution, then you can plug in the mean and standard deviation of your data into the equation for a normal curve and... bam! you'll get a curve.
If I'd had this when I was at school and university, I might have done some more statistics and probability, instead of running a mile because it seemed so strange compared with the rest of maths
It is strange relative to the rest of math!
Sir your way of teaching is very niece. But your lectures on all distribution like poisson distribution is not available😔
Noted
finished watching
nice!
Sir, Can you make videos related to Linear Algebra and Calculus,
and add long detailed videos with questions and solutions of stats/algebra/calculus
I'll keep that in mind.
@@statquest you can start long videos live classes, we are ready to pay for that
Do you have a video dedicated to calculate the number of bins in a distribution?
Not yet. However, there's no specific way to do it. You just try a bunch of values and see what looks best.
Thank you
:)
Excellent
:)
Is statistical distribution same as the histogram or the curve to approximate a histogram?
If you normalize your histogram so that the sum of all of the columns is 1, then you it is also a probability distribution.
thanks a lot!
Any time! :)
BAMM !!! CRYSTAL CLEAR
Hooray!!!! :)
DAMN !!! That was Awesome
Hooray! :)
ı will be able to data scientist thanks to you :)
Bam! :)
great, thanks
Nice!
For example I have some data, then in mini tab I find my data distribution, for example it has normal or poisson or F or T distribution. I want to know what I can do or what I understand, when I find my data distribution?
What do you do with values that fall exactly between 2 bins?? like if the bins are 4.5 to 5.5 and 5.5 to 6.5, where do you put the value of 5.5? Which bin? please help
Sort of like rounding, you just have to decide in advance which way things like that will go - there is no "right" answer. You could decide to always round down, so 5.5 would go into the 5 bin, or you could decide to round up, so 5.5 would go into the 6 bin. It's up to you.
Another thing that you have to fiddle around with is how wide the bins should be - different widths can result in different looking distributions - so it's a good idea to try a few and see which one makes the most sense.
Please explain Time-series forecasting.
Sir will you please explain standard deviation a little bit
Sure, see: ua-cam.com/video/vikkiwjQqfU/v-deo.html and ua-cam.com/video/SzZ6GpcfoQY/v-deo.html and ua-cam.com/video/sHRBg6BhKjI/v-deo.html
love the intro
:)
@@statquest i didnt expect a reply either, (glad you did)
first time(today) watching your vedios and i love the homemade feel of the intro,
also i have question about standerd deviation,
all the vedios i searched focuses on how to apply it,
but i dont yet have a vlear picture of what it is,
what i undestood so far : standerd deviation is the messure of spread(in a normal distribution)
and there is a formula to calculate it fron a given mean,
my question is,
how do we derive the formula?
i like to think of mean as the mid point so sum/n makes sence.(as division is opposite of repeated sum)
similar to that, when i look at the formula
of std, i see we find the average of squre the distance between the mean and x then we find the root of the result.
is this a geuss to square and the find the root? or we could have cubed and find cuberoot?
thanks for your time.
@@thebestedits3845 See: ua-cam.com/video/vikkiwjQqfU/v-deo.html and ua-cam.com/video/SzZ6GpcfoQY/v-deo.html
Great work! Me and colleagues love your videos. Hey, you think sometime you can tackle explaining the Cauchy distribution?
I can put that on the To-Do list, but that list is long and it might be a while before I get to it.
boom. that was the ultimate in simple POWer!
:)
Hi Joshua ...what tool do u use for making diagrams in your videos
I draw the pictures in Keynote.
Hey josh, do you have a good link to explain how to integrate the Gaussian curve to arrive at probabilities? I would like to learn how to do it by hand. I understand calculus. Thanls again for the great content!
I don't have one on hand. However, I remember that it has a trick - you have to do a substitution to make it work. To quote Roger Berger (the guy that taught me this): If you know the trick, integrating the normal curve is easy. If you don't know the trick, you'll never figure it out.
How do we approximate that curve? I am curious to know how does that curve came from histogram?
For details on this, consider watching the StatQuest on the normal distribution: ua-cam.com/video/rzFX5NWojp0/v-deo.html
Love it! Thx!
You are a God for us🥺,
I need a help, will u please provide ppt that you explaining, please 🥺
I have a book coming out in a few weeks and it contains information from this and a lot of other StatQuest videos.
The parameters we are supplying into pdf or pmf are the parameters but actually we are not collecting data for population then how in question we have given that parameters?
We can estimate them. For details, see: ua-cam.com/video/vikkiwjQqfU/v-deo.html and ua-cam.com/video/SzZ6GpcfoQY/v-deo.html
Hi Josh!
Will you make a statquest on time series analysis anytime in future ?
I'm going to try to work on time series in the spring of 2020.
Hi Josh, sorry for asking a perhaps obvious question, but I've been struggling to wrap my head around the area under the curve part. Shouldn't the probability that a particular value is in the given interval be equal to the area under that part divided by the area under the entire curve. I've seen people explain this with a heads/tails uniform distribution, where two events (heads, tails) are on the x axis, and the probability of that event happening on the y axis (0.5). However, how does this all translate to literal values, such as the number of people with a certain height...
The total area under the curve = 1. So, technically, you can either divide by 1 or not, and you will get the same answer.
@@statquest Yeah, thanks, I had a few things mixed up. When you had shown the histogram, I thought of the typical ones where the y axis represented the number of things that fall into each range, not the percentage of things. I was confused as to how the area could be equal to one.
Thanks for the answer nonetheless!
@@leonandorfi5191 my two cents. when we are talking about the histogram, then the area is not equal to 1 but when we are talking about the bell shaped probability distribution curve, then the area under the curve is 1. moreover in histogram there is no curve, so we cant say technically "area under the CURVE" for histogram. am i correct? i dont know LOL
Is there a difference between data distribution and probability distribution?
yes. the histogram represents the data distribution.
@ thank u but is it possible to present raw data from a real life example and take us through them to the point of probability distribution so we could appreciate the usefulness of probability distribution? This would be of great help. Thx
@@kowtharhassan882 This is actually a topic I cover in my book. We talk about the limitations of just using a histogram and why we would want to switch to a probability distribution.
@ I see. Thank u v much
How do you get the curve? Even if i have a lot of mesurements what is the formula to draw such a curve?
The formula for the normal curve is kind of complicated. However, if you want to learn how to fit it to data, see: ua-cam.com/video/XepXtl9YKwc/v-deo.html and ua-cam.com/video/Dn6b9fCIUpM/v-deo.html
In the meantime I’m looking for a simple explanation of Weibull distributions that is as easy to understand as this one 🥺
bam!
@@statquest indeed, eloquently spoken
THANKS!!!!!!!!!!
:)
한국어로 번역해주신분 진심으로 감사드립니다 ! ( 나머지 20-43 영상도 혹시, 번역 해주실 수 있으실까요?! 부탁드립니다. )
If someone can do that for me, it would be aswesome.
@@statquest Someone already did that for you. Episodes from 1 - 20 were perfectly translated into Korean.
@@dsd1610 bam!
Only subscribed for the theme song.
bam!
Thanks
Triple bam! :)
Doubt it that it would make any sense to calculate missing values using calculus. For 6" people calculus will give us ~1.5. What if in reality the are 20 of such height?
It depends. Of course you can make mistakes, but, believe it or not, height is normally distributed, so we really can use a normal distribution ( ua-cam.com/video/rzFX5NWojp0/v-deo.html ) to impute missing values.
Do you tutor for free? I really like how you teach 😁
Unfortunately I don't tutor for free. I still have to pay for my rent and food etc.
@@statquest lol.. But you can always make an exception.. Hehe
@@NidhiSinha4U :)
Amazing
hi Josh, I'm enjoying your videos, but I think it's only the US who uses inches/feet units of measurement. The rest of the world (which is the majority of the world's population) is on centimeters/meters. 😅
This is one of the first videos I ever made, and back then, no one watched my videos other than a few friends based in the US. Since then I've changed to using more universal metrics.
@@statquest thank you Josh, I placed an order in Lulu for a copy of your book (I believe it's on its way). You're a good teacher. I like to learn visually, and your diagrams are the best I've found so far.
@@DiegoGuillen-p3z Thank you very much! I really hope you enjoy the book. I believe all the units used in the book are metric... :)
@@statquest Thanks Josh, got the book within a week of placing the order. It is excellent. It reads as easy as reading a comic book, and the images are the best way to explain this topic. I'm also running some examples with TensorFlow, image classification with Fashion MINST, and learning to use the OpenAI API. I'm trying to understand when to use the different activation and loss functions, and architectures for Neural Networks. I didn't have a background in Statistics, so your book helps me with that. But I knew Linear Algebra and Gradients, and was happy to see this again.
@@DiegoGuillen-p3z Awesome! For activation functions, unless you want to do something very specific, people just use the ReLU. To see examples of doing something specific, see my video on LSTMs: ua-cam.com/video/YCzL96nL7j0/v-deo.html
Why is normal distribution necessary for implementation of ML algorithms
It is not necessary for a lot of ML algorithms. For example, Neural Networks do not use the normal distribution. Neither do Support Vector Machines.
I’m a student at UNC, are you a researcher here?
I used to be, but not any more. However, I still live in Chapel Hill.
@@statquest Cool, do you think you will be involved with the new Data Science school UNC is currently launching?
@@alexandergeorgiev2631 I keep hoping they will invite me to participate in some way, but so far I haven't heard from them.
You'r legend :BAAM :)
Thanks!
You are god!!
:)
Before there was BAM, there was DAG.
Ha! So true!
BAM!