You don't know the number of people you are helping every now and then. Kudos! I do appreciate your great effort to help in a way contribute to our success. #GODBLESSYOU
The explanation of the MLE, Score function & Information etc.. here, is unbelievably simple and effective! This alternative perspective really helped my understanding. Thank you.
Thank you very much for sharing this. There is a possible confusion at 33:03. The equation shows the likelihood depending on (mu, sigma^2) but the plot shows it depending on (mu, sigma) i.e. without square. This is not an error, because the maximum likelihood estimator is for the (mu, sigma^2) vector as well as for (mu, sigma). It does not change much of the graphical meaning of the figure, but introduces a confusion on the intent of this figure. I guess that a clarification might be helpful on this topic. Anyway, your video was very helpful: thanks again for it.
I'm an engineer in the manufacturing sector. You're videos have been essential in understanding the statistics I use to justify process improvement designed experiments
Thank you for this amazing video! It is very informative and it could be even better if whenever you are using a vector of parameters as "X", use "X bold". Then the notation will become less confusing.
The progression graph at the beginning of each video might seem to some people as a minor aspect of the whole video, but it's very significant for me. Lets me know what to expect and that feels good. :)
This helps me in understand how likelihood helps to estimate model in which the max is obtained by score equation. But i might need your help to understand at ~15:40 how derivative to 0 is transformed?
18:48 you say that the square root of the variance is the standard error (which is then used to find upper and lower limits of confidence interval). I thought the square root of variance is the standard deviation? And therefore, you would need an extra 1/sqrt(n) factor to take the standard deviation to the standard error which can then be used to find the limits? Why in this case is the square root of the variance = standard error and not standard deviation?
Thank you so much! This is so helpful! Can you please make more videos with more proof and algebra? For example, the proof that MLE being asymptotically normal, the calculation of variance estimate, etc?
I'm going over my notes...and this tutorial is very clear and I enjoy verifying the math... but I got stuck at around 15:24 trying to understand the estimator mathematically... intuitively it totally makes sense that the estimate should be 20/100, but I am not understanding how it comes from the derivative of l(theta).... when i isolate for theta I get theta/(1-theta) one one side... but that is not the same as reducing to a single theta variable....
finally got the math right... even though I couldn't isolate theta as a single variable! I got down to n/y-n = theta/1-theta..... substituting I get 20/100-20 = theta/1-theta.... dividing the left side by 100 (top and bottom), I get 0.20/1-0.20 = theta/1-theta... therefore by visual analogy, theta is 0.20 (estimate). You can reduce to a single variable by cross-multiplying the denominators, expanding and reducing, but is a lot of tedious work... 0.20(1-theta)=theta(1-0.20)... blah blah blah....
Hi this video is incredible as are all of yours, but I'm very confused why the second derivative at 16:39 has both values negative. I've taken it multiple ways and plugged it into Wolfram Alpha and receive (y-n)/(1-theta)^2 - n/theta^2
This seems right to me too, the derivative of (y-n)/(1-theta) swapped signs on the first derivative and there's no reason it wouldn't swap back on the second. You still have to chain rule d(theta) which is -1, right?
If I graph the likelihood function at 10:28, it doesn't look anything like the graph in the video. I get really small values for 0.2 rather than really large ones.
I love your videos. You explain the concepts so clearly. I have one question. In the first example, why would the probability of getting pregnant on the second attempt depend on the first event? Aren't the different attempts independent? Shouldn't the probability of getting pregnant be 0.15 for all individual attempts?
This part I think I can answer. The probability of getting pregnant on the second attempt must exclude the probability of success on the first attempt so, success on the 2nd attempt means failure on 1st AND success on 2nd. Prob of success on 1 = 0.15 so prob of failure on 1 = 1 - 0.15 = 0.85. Therefore prob of failure on 1st AND success on 2nd = 0.85 * 0.15.
I'm learning tons from your content Zed, thank you can anyone tell me 36:06 why is mu not a negative? the log likelihood function (after removing the constant and the component with log sigma square), starts with a negative so shouldn't it be nagative?
Does anyone know why in a score test we divide by the information at the null parameter values? I know that the information at the MLE represents the "sharpness" of the likelihood function, but what does information represent at a different parameter value that is not the maxima of the likelihood function?
I once heard OLS and MLE yield the same result under a normal distribution. if that's the case, the pro and cons (especially the pros) just seems negligible, isn't it?
@@k.sladkina872 I found out it is simply related to the distribution you use. Google different distributions (normal, binomial, etc.) and if you look at the wikipedia page, on the right, it states what the mean E(X) and variance V(X) are equal to
Var(RV)=EV(RV^2)- (Mean of RV)^2 ,an easy method. So where is the need to do partial differentiation for two simulteneous equations and setting to zero, as effectively same result for variance is thrown up.
Where is the sample data though?? Aren’t we supposed to fitting the distribution to a sample? Isn’t that the whole point? Why do you just say, oh, 15%??
why does this stuff matter. Im taking math stats for the second time and I understand zero. I can do the basic stuff described in videos but the problems are never just multiply all the pdfs together, take log, derive, and then set to zero... There's always wrinkles. like one problem I have to deal with an absolute value and they start taking about the median in the solution... Iye-yi-yi. I dislike math stats and really want to know how this will help me predict stocks or in any future job.
@@zedstatistics The calculus isn't that bad. I love it. Although I question it. It's a language to explain something, something very complex. Seems like there could be flaws. But these things work time and time again? crazy. More particularly I just don't know how all this MLE and bayes theorm, sufficient statistics, data reduction, improving an estimator relates to real life problems. I'm data science major. I like sentdex's videos on youtube. All this advanced stats classes I am taking just don;t make sense. Or atleast reading from the book and my teachers just don't relate it to the real world and It doesn't make sense. Any suggestions/tips/ or playlists you could point me to that would help my statistical data science career and understanding? I like math, I like stocks. Not sure how to combined them outside sentdex's videos.
any playlist that would help me solve problems like this -- Suppose that 21 observations are taken at random from an exponential distribution for which the mean μ is unknown (μ > 0), the average of 20 of these observations is 6, and although the exact value of the other observation could not be determined, it was known to be greater than 15. Determine the M.L.E. of μ. -- my book is Probability and Statistics 4th edition by DeGroot, there is free pdf available online.
On the subject of predicting stocks, I guess you want to build a robot that takes today's stock market data and spits out a distribution of actions you can take that would make you the most money. Let's call this robot π(θ), because it's just a function parameterises by θ. And you want the maximum likelihood of θ that will make you the most money (let's call that Q*, where Q(a|s) is the reward of taking action a at step s). Since you're a data major you probably can see where this is going. You want a neuro-net that models π(θ) and you want to train it to solve for -Δlog(π(θ))Q (notice the score function here), where Q is the Reward of your trading actions (and in practice simulated by another neuro-net). Notice you want to find the set of θ for π(θ) that maximises Q(Q*) (using maximum likelihood and past stock data possibly flattened by some RNN). Furthermore you want to incrementally improve π within a confidence interval, so you don't make too big of a step that will collapse your convergence.... and you'll see the fisher information matrix come up in this calculation if you dig further. So yeah it prob helps in your future job in stock market prediction, if that's where you're headed.
I am stunned. This video is about a 1000X clearer than the explanation my professor gave on all this. You are SO clear. It's a life-saver! Thank you!
You don't know the number of people you are helping every now and then. Kudos! I do appreciate your great effort to help in a way contribute to our success.
#GODBLESSYOU
The explanation of the MLE, Score function & Information etc.. here, is unbelievably simple and effective! This alternative perspective really helped my understanding. Thank you.
Thank you very much for sharing this. There is a possible confusion at 33:03. The equation shows the likelihood depending on (mu, sigma^2) but the plot shows it depending on (mu, sigma) i.e. without square. This is not an error, because the maximum likelihood estimator is for the (mu, sigma^2) vector as well as for (mu, sigma). It does not change much of the graphical meaning of the figure, but introduces a confusion on the intent of this figure. I guess that a clarification might be helpful on this topic. Anyway, your video was very helpful: thanks again for it.
Why can't my textbooks explain it like this. Zed, you are a legend!
Thank you, so helpful. I appreciate that you touched on MLE with multiple parameters.
I'm an engineer in the manufacturing sector. You're videos have been essential in understanding the statistics I use to justify process improvement designed experiments
Awesome video. Much better than the disorganized lecture by my prof lol.
I would be so fucked in my Math Stats class rn without these videos. Thank u
Thank you for this amazing video! It is very informative and it could be even better if whenever you are using a vector of parameters as "X", use "X bold". Then the notation will become less confusing.
Very nicely explained. A BIGGG GOD BLESS to you!
The progression graph at the beginning of each video might seem to some people as a minor aspect of the whole video, but it's very significant for me. Lets me know what to expect and that feels good. :)
You have put a great deal of work into explaining that. Thank you very much.
Thanks!
Thank you, it was very helpful.
42 minutes? yuck, no thanks. Oh wait, he said Saddle Up. I'm IN! LETS GO
You are crazy good at this
You saved my life! Thank you SO much!
Really well done - the examples following the theoretical discussion are especially useful. Thank you so much for uploading this!
Thank you SO much! This really helped me a lot
This helps me in understand how likelihood helps to estimate model in which the max is obtained by score equation. But i might need your help to understand at ~15:40 how derivative to 0 is transformed?
Life saver...❤
17:06 where did this expectation formula come from ?
@18:26 So you postulate that θ is normally distributed with mean obtained from MLE and variance being 1/I(θ) ?
18:48 you say that the square root of the variance is the standard error (which is then used to find upper and lower limits of confidence interval). I thought the square root of variance is the standard deviation? And therefore, you would need an extra 1/sqrt(n) factor to take the standard deviation to the standard error which can then be used to find the limits? Why in this case is the square root of the variance = standard error and not standard deviation?
Thank you so much! This is so helpful! Can you please make more videos with more proof and algebra? For example, the proof that MLE being asymptotically normal, the calculation of variance estimate, etc?
Thanks for the video. How about the confidence interval in your multivariable example?
I'm going over my notes...and this tutorial is very clear and I enjoy verifying the math... but I got stuck at around 15:24 trying to understand the estimator mathematically... intuitively it totally makes sense that the estimate should be 20/100, but I am not understanding how it comes from the derivative of l(theta).... when i isolate for theta I get theta/(1-theta) one one side... but that is not the same as reducing to a single theta variable....
finally got the math right... even though I couldn't isolate theta as a single variable! I got down to n/y-n = theta/1-theta..... substituting I get 20/100-20 = theta/1-theta.... dividing the left side by 100 (top and bottom), I get 0.20/1-0.20 = theta/1-theta... therefore by visual analogy, theta is 0.20 (estimate). You can reduce to a single variable by cross-multiplying the denominators, expanding and reducing, but is a lot of tedious work... 0.20(1-theta)=theta(1-0.20)... blah blah blah....
Nice lecture sir. Sir kindly make a vedio on MLE for multiple parameters in implicit form with r code.
Hi this video is incredible as are all of yours, but I'm very confused why the second derivative at 16:39 has both values negative. I've taken it multiple ways and plugged it into Wolfram Alpha and receive (y-n)/(1-theta)^2 - n/theta^2
This seems right to me too, the derivative of (y-n)/(1-theta) swapped signs on the first derivative and there's no reason it wouldn't swap back on the second. You still have to chain rule d(theta) which is -1, right?
Thank you, could you please share the sources that you mentioned could help with calculus & differentiation?
Hi could anyone help me with reading the notation L(theta ; y) in the context of the pregnancy example which he gave in the video?
If I graph the likelihood function at 10:28, it doesn't look anything like the graph in the video. I get really small values for 0.2 rather than really large ones.
Thank you very much. But could you tell me why standard errors of ML estimators are inverse of Fisher information matrix?
I love your videos. You explain the concepts so clearly. I have one question. In the first example, why would the probability of getting pregnant on the second attempt depend on the first event? Aren't the different attempts independent? Shouldn't the probability of getting pregnant be 0.15 for all individual attempts?
This part I think I can answer. The probability of getting pregnant on the second attempt must exclude the probability of success on the first attempt so, success on the 2nd attempt means failure on 1st AND success on 2nd. Prob of success on 1 = 0.15 so prob of failure on 1 = 1 - 0.15 = 0.85. Therefore prob of failure on 1st AND success on 2nd = 0.85 * 0.15.
superb lecture
Thanks for the course, it's clearly explained.... May i know what logiciel or application you use for the course ( beamer ? PowerPoint? )
I'm learning tons from your content Zed, thank you
can anyone tell me 36:06 why is mu not a negative? the log likelihood function (after removing the constant and the component with log sigma square), starts with a negative so shouldn't it be nagative?
If x=0 then -x=0 as well. That mu at 36:06 comes from setting the nominator to zero.
@13:45 In order to speak aboyt the "expected" value you MUST have a random variable. Where are they ??
@13:59 WHY ??
convinced again
great video mate.
If there is a god, I want it to be you.
which book is being referred to in this series or any other book for this topic. Anyone who knows please tell
hi. can get you assistance in solving a problem using the maximum likelihood method?
Mr. Justin Z--It would have been helpful if you had gone over the intermediary math steps. Thank you. WhetstoneGuy
Does anyone know why in a score test we divide by the information at the null parameter values? I know that the information at the MLE represents the "sharpness" of the likelihood function, but what does information represent at a different parameter value that is not the maxima of the likelihood function?
I once heard OLS and MLE yield the same result under a normal distribution. if that's the case, the pro and cons (especially the pros) just seems negligible, isn't it?
You are so f good
I don't understand why the E(Y) is equal to n/theta
I have the same problem
@@k.sladkina872 I found out it is simply related to the distribution you use. Google different distributions (normal, binomial, etc.) and if you look at the wikipedia page, on the right, it states what the mean E(X) and variance V(X) are equal to
Var(RV)=EV(RV^2)- (Mean of RV)^2 ,an easy method. So where is the need to do partial differentiation for two simulteneous equations and setting to zero, as effectively same result for variance is thrown up.
best best best
why cant uni lectures be like this. i pay so much money for an inferior education
Where is the sample data though?? Aren’t we supposed to fitting the distribution to a sample? Isn’t that the whole point? Why do you just say, oh, 15%??
Your content is amazing but sound quality is really bad
why does this stuff matter. Im taking math stats for the second time and I understand zero. I can do the basic stuff described in videos but the problems are never just multiply all the pdfs together, take log, derive, and then set to zero... There's always wrinkles. like one problem I have to deal with an absolute value and they start taking about the median in the solution... Iye-yi-yi. I dislike math stats and really want to know how this will help me predict stocks or in any future job.
pretty sure it's what god created on the 3rd day. He created the heaven and earth, the land and the waters, and then differential calculus.
@@zedstatistics The calculus isn't that bad. I love it. Although I question it. It's a language to explain something, something very complex. Seems like there could be flaws. But these things work time and time again? crazy. More particularly I just don't know how all this MLE and bayes theorm, sufficient statistics, data reduction, improving an estimator relates to real life problems. I'm data science major. I like sentdex's videos on youtube. All this advanced stats classes I am taking just don;t make sense. Or atleast reading from the book and my teachers just don't relate it to the real world and It doesn't make sense. Any suggestions/tips/ or playlists you could point me to that would help my statistical data science career and understanding? I like math, I like stocks. Not sure how to combined them outside sentdex's videos.
any playlist that would help me solve problems like this -- Suppose that 21 observations are taken at random
from an exponential distribution for which the mean μ is
unknown (μ > 0), the average of 20 of these observations
is 6, and although the exact value of the other observation
could not be determined, it was known to be greater than
15. Determine the M.L.E. of μ. -- my book is Probability and Statistics 4th edition by DeGroot, there is free pdf available online.
On the subject of predicting stocks, I guess you want to build a robot that takes today's stock market data and spits out a distribution of actions you can take that would make you the most money. Let's call this robot π(θ), because it's just a function parameterises by θ. And you want the maximum likelihood of θ that will make you the most money (let's call that Q*, where Q(a|s) is the reward of taking action a at step s).
Since you're a data major you probably can see where this is going. You want a neuro-net that models π(θ) and you want to train it to solve for -Δlog(π(θ))Q (notice the score function here), where Q is the Reward of your trading actions (and in practice simulated by another neuro-net). Notice you want to find the set of θ for π(θ) that maximises Q(Q*) (using maximum likelihood and past stock data possibly flattened by some RNN). Furthermore you want to incrementally improve π within a confidence interval, so you don't make too big of a step that will collapse your convergence.... and you'll see the fisher information matrix come up in this calculation if you dig further.
So yeah it prob helps in your future job in stock market prediction, if that's where you're headed.
@@lzl4226 except, stock prices are not produced by a stationary process.
Fellow 'nerds'?! That's very abusive. You should be imprisoned for that.
perhaps fellow 'sailors' aye captain?