Maximum Likelihood Estimation (MLE) | Score equation | Information | Invariance

Поділитися
Вставка
  • Опубліковано 24 гру 2024

КОМЕНТАРІ • 70

  • @Manny123-y3j
    @Manny123-y3j 2 роки тому +20

    I am stunned. This video is about a 1000X clearer than the explanation my professor gave on all this. You are SO clear. It's a life-saver! Thank you!

  • @davidbanahene307
    @davidbanahene307 4 роки тому +21

    You don't know the number of people you are helping every now and then. Kudos! I do appreciate your great effort to help in a way contribute to our success.
    #GODBLESSYOU

  • @markwilson9490
    @markwilson9490 3 роки тому +2

    The explanation of the MLE, Score function & Information etc.. here, is unbelievably simple and effective! This alternative perspective really helped my understanding. Thank you.

  • @michaelbaudin
    @michaelbaudin 4 роки тому +2

    Thank you very much for sharing this. There is a possible confusion at 33:03. The equation shows the likelihood depending on (mu, sigma^2) but the plot shows it depending on (mu, sigma) i.e. without square. This is not an error, because the maximum likelihood estimator is for the (mu, sigma^2) vector as well as for (mu, sigma). It does not change much of the graphical meaning of the figure, but introduces a confusion on the intent of this figure. I guess that a clarification might be helpful on this topic. Anyway, your video was very helpful: thanks again for it.

  • @craighennessy3183
    @craighennessy3183 Рік тому

    Why can't my textbooks explain it like this. Zed, you are a legend!

  • @Elizabeth_Lynch
    @Elizabeth_Lynch 6 років тому +7

    Thank you, so helpful. I appreciate that you touched on MLE with multiple parameters.

  • @fightwithbiomechanix
    @fightwithbiomechanix 4 роки тому +2

    I'm an engineer in the manufacturing sector. You're videos have been essential in understanding the statistics I use to justify process improvement designed experiments

  • @アメリカンドクター
    @アメリカンドクター 3 роки тому +1

    Awesome video. Much better than the disorganized lecture by my prof lol.

  • @ciaranmahon7415
    @ciaranmahon7415 4 роки тому +5

    I would be so fucked in my Math Stats class rn without these videos. Thank u

  • @BabakFiFoo
    @BabakFiFoo 5 років тому +6

    Thank you for this amazing video! It is very informative and it could be even better if whenever you are using a vector of parameters as "X", use "X bold". Then the notation will become less confusing.

  • @cdr.dr.shishirsahay9184
    @cdr.dr.shishirsahay9184 Рік тому +1

    Very nicely explained. A BIGGG GOD BLESS to you!

  • @davidradulovic9034
    @davidradulovic9034 5 років тому +8

    The progression graph at the beginning of each video might seem to some people as a minor aspect of the whole video, but it's very significant for me. Lets me know what to expect and that feels good. :)

  • @harikrishnareddygali6244
    @harikrishnareddygali6244 Рік тому

    You have put a great deal of work into explaining that. Thank you very much.

  • @joeekstein9174
    @joeekstein9174 2 роки тому

    Thanks!

  • @srishtigupta9534
    @srishtigupta9534 3 роки тому +2

    Thank you, it was very helpful.

  • @erich_l4644
    @erich_l4644 4 роки тому +12

    42 minutes? yuck, no thanks. Oh wait, he said Saddle Up. I'm IN! LETS GO

  • @mikelmendibeabarrategi1102
    @mikelmendibeabarrategi1102 Рік тому

    You are crazy good at this

  • @lucaslopesf
    @lucaslopesf 6 років тому +2

    You saved my life! Thank you SO much!

  • @lucarampoldi7743
    @lucarampoldi7743 2 роки тому

    Really well done - the examples following the theoretical discussion are especially useful. Thank you so much for uploading this!

  • @jlz5907
    @jlz5907 5 років тому +2

    Thank you SO much! This really helped me a lot

  • @wtsg1982
    @wtsg1982 2 роки тому

    This helps me in understand how likelihood helps to estimate model in which the max is obtained by score equation. But i might need your help to understand at ~15:40 how derivative to 0 is transformed?

  • @BilalTaskin-om6il
    @BilalTaskin-om6il Рік тому

    Life saver...❤

  • @youssefdirani
    @youssefdirani 2 роки тому +1

    17:06 where did this expectation formula come from ?

  • @adamkolany1668
    @adamkolany1668 Рік тому

    @18:26 So you postulate that θ is normally distributed with mean obtained from MLE and variance being 1/I(θ) ?

  • @jimjohnson357
    @jimjohnson357 2 роки тому

    18:48 you say that the square root of the variance is the standard error (which is then used to find upper and lower limits of confidence interval). I thought the square root of variance is the standard deviation? And therefore, you would need an extra 1/sqrt(n) factor to take the standard deviation to the standard error which can then be used to find the limits? Why in this case is the square root of the variance = standard error and not standard deviation?

  • @kjyfhjjj
    @kjyfhjjj 4 роки тому +1

    Thank you so much! This is so helpful! Can you please make more videos with more proof and algebra? For example, the proof that MLE being asymptotically normal, the calculation of variance estimate, etc?

  • @abcpsc
    @abcpsc 5 років тому +2

    Thanks for the video. How about the confidence interval in your multivariable example?

  • @sdsa007
    @sdsa007 Рік тому

    I'm going over my notes...and this tutorial is very clear and I enjoy verifying the math... but I got stuck at around 15:24 trying to understand the estimator mathematically... intuitively it totally makes sense that the estimate should be 20/100, but I am not understanding how it comes from the derivative of l(theta).... when i isolate for theta I get theta/(1-theta) one one side... but that is not the same as reducing to a single theta variable....

    • @sdsa007
      @sdsa007 Рік тому

      finally got the math right... even though I couldn't isolate theta as a single variable! I got down to n/y-n = theta/1-theta..... substituting I get 20/100-20 = theta/1-theta.... dividing the left side by 100 (top and bottom), I get 0.20/1-0.20 = theta/1-theta... therefore by visual analogy, theta is 0.20 (estimate). You can reduce to a single variable by cross-multiplying the denominators, expanding and reducing, but is a lot of tedious work... 0.20(1-theta)=theta(1-0.20)... blah blah blah....

  • @mohdirfan-pu8fc
    @mohdirfan-pu8fc 2 роки тому

    Nice lecture sir. Sir kindly make a vedio on MLE for multiple parameters in implicit form with r code.

  • @DJMoSheckles
    @DJMoSheckles Рік тому

    Hi this video is incredible as are all of yours, but I'm very confused why the second derivative at 16:39 has both values negative. I've taken it multiple ways and plugged it into Wolfram Alpha and receive (y-n)/(1-theta)^2 - n/theta^2

    • @aschiffer
      @aschiffer 6 місяців тому

      This seems right to me too, the derivative of (y-n)/(1-theta) swapped signs on the first derivative and there's no reason it wouldn't swap back on the second. You still have to chain rule d(theta) which is -1, right?

  • @Maymona93
    @Maymona93 3 роки тому +1

    Thank you, could you please share the sources that you mentioned could help with calculus & differentiation?

  • @arpitanand4693
    @arpitanand4693 Рік тому +1

    Hi could anyone help me with reading the notation L(theta ; y) in the context of the pregnancy example which he gave in the video?

  • @Kogsworth
    @Kogsworth 2 роки тому

    If I graph the likelihood function at 10:28, it doesn't look anything like the graph in the video. I get really small values for 0.2 rather than really large ones.

  • @xuyang2776
    @xuyang2776 4 місяці тому

    Thank you very much. But could you tell me why standard errors of ML estimators are inverse of Fisher information matrix?

  • @yelshadaygebreselassie3163
    @yelshadaygebreselassie3163 3 роки тому

    I love your videos. You explain the concepts so clearly. I have one question. In the first example, why would the probability of getting pregnant on the second attempt depend on the first event? Aren't the different attempts independent? Shouldn't the probability of getting pregnant be 0.15 for all individual attempts?

    • @coolblue5929
      @coolblue5929 2 роки тому

      This part I think I can answer. The probability of getting pregnant on the second attempt must exclude the probability of success on the first attempt so, success on the 2nd attempt means failure on 1st AND success on 2nd. Prob of success on 1 = 0.15 so prob of failure on 1 = 1 - 0.15 = 0.85. Therefore prob of failure on 1st AND success on 2nd = 0.85 * 0.15.

  • @kprao9949
    @kprao9949 4 роки тому

    superb lecture

  • @ouafaeouaali4676
    @ouafaeouaali4676 Рік тому

    Thanks for the course, it's clearly explained.... May i know what logiciel or application you use for the course ( beamer ? PowerPoint? )

  • @enjoying-the-ride1295
    @enjoying-the-ride1295 2 роки тому

    I'm learning tons from your content Zed, thank you
    can anyone tell me 36:06 why is mu not a negative? the log likelihood function (after removing the constant and the component with log sigma square), starts with a negative so shouldn't it be nagative?

    • @steffenmuhle6517
      @steffenmuhle6517 2 роки тому

      If x=0 then -x=0 as well. That mu at 36:06 comes from setting the nominator to zero.

  • @adamkolany1668
    @adamkolany1668 Рік тому

    @13:45 In order to speak aboyt the "expected" value you MUST have a random variable. Where are they ??
    @13:59 WHY ??

  • @mightbin
    @mightbin 2 роки тому

    convinced again

  • @ProfessionalTycoons
    @ProfessionalTycoons 6 років тому

    great video mate.

  • @minma02262
    @minma02262 4 роки тому +2

    If there is a god, I want it to be you.

  • @ruchikalalit1304
    @ruchikalalit1304 8 місяців тому

    which book is being referred to in this series or any other book for this topic. Anyone who knows please tell

  • @angelzash4u2
    @angelzash4u2 4 роки тому

    hi. can get you assistance in solving a problem using the maximum likelihood method?

  • @whetstoneguy6717
    @whetstoneguy6717 4 роки тому

    Mr. Justin Z--It would have been helpful if you had gone over the intermediary math steps. Thank you. WhetstoneGuy

  • @sherlocksilver9392
    @sherlocksilver9392 3 роки тому

    Does anyone know why in a score test we divide by the information at the null parameter values? I know that the information at the MLE represents the "sharpness" of the likelihood function, but what does information represent at a different parameter value that is not the maxima of the likelihood function?

  • @backerlifan
    @backerlifan 3 роки тому

    I once heard OLS and MLE yield the same result under a normal distribution. if that's the case, the pro and cons (especially the pros) just seems negligible, isn't it?

  • @Pier_Py
    @Pier_Py 3 роки тому +1

    You are so f good

  • @Catwomen4512
    @Catwomen4512 6 років тому +5

    I don't understand why the E(Y) is equal to n/theta

    • @k.sladkina872
      @k.sladkina872 6 років тому +1

      I have the same problem

    • @Catwomen4512
      @Catwomen4512 6 років тому

      @@k.sladkina872 I found out it is simply related to the distribution you use. Google different distributions (normal, binomial, etc.) and if you look at the wikipedia page, on the right, it states what the mean E(X) and variance V(X) are equal to

  • @anindadatta164
    @anindadatta164 2 роки тому

    Var(RV)=EV(RV^2)- (Mean of RV)^2 ,an easy method. So where is the need to do partial differentiation for two simulteneous equations and setting to zero, as effectively same result for variance is thrown up.

  • @joeyquiet4020
    @joeyquiet4020 Рік тому

    best best best

  • @snackbob100
    @snackbob100 4 роки тому +1

    why cant uni lectures be like this. i pay so much money for an inferior education

  • @coolblue5929
    @coolblue5929 2 роки тому

    Where is the sample data though?? Aren’t we supposed to fitting the distribution to a sample? Isn’t that the whole point? Why do you just say, oh, 15%??

  • @jhanvitanna4541
    @jhanvitanna4541 2 роки тому

    Your content is amazing but sound quality is really bad

  • @johnmook135
    @johnmook135 4 роки тому

    why does this stuff matter. Im taking math stats for the second time and I understand zero. I can do the basic stuff described in videos but the problems are never just multiply all the pdfs together, take log, derive, and then set to zero... There's always wrinkles. like one problem I have to deal with an absolute value and they start taking about the median in the solution... Iye-yi-yi. I dislike math stats and really want to know how this will help me predict stocks or in any future job.

    • @zedstatistics
      @zedstatistics  4 роки тому +5

      pretty sure it's what god created on the 3rd day. He created the heaven and earth, the land and the waters, and then differential calculus.

    • @johnmook135
      @johnmook135 4 роки тому

      @@zedstatistics The calculus isn't that bad. I love it. Although I question it. It's a language to explain something, something very complex. Seems like there could be flaws. But these things work time and time again? crazy. More particularly I just don't know how all this MLE and bayes theorm, sufficient statistics, data reduction, improving an estimator relates to real life problems. I'm data science major. I like sentdex's videos on youtube. All this advanced stats classes I am taking just don;t make sense. Or atleast reading from the book and my teachers just don't relate it to the real world and It doesn't make sense. Any suggestions/tips/ or playlists you could point me to that would help my statistical data science career and understanding? I like math, I like stocks. Not sure how to combined them outside sentdex's videos.

    • @johnmook135
      @johnmook135 4 роки тому

      any playlist that would help me solve problems like this -- Suppose that 21 observations are taken at random
      from an exponential distribution for which the mean μ is
      unknown (μ > 0), the average of 20 of these observations
      is 6, and although the exact value of the other observation
      could not be determined, it was known to be greater than
      15. Determine the M.L.E. of μ. -- my book is Probability and Statistics 4th edition by DeGroot, there is free pdf available online.

    • @lzl4226
      @lzl4226 4 роки тому +1

      On the subject of predicting stocks, I guess you want to build a robot that takes today's stock market data and spits out a distribution of actions you can take that would make you the most money. Let's call this robot π(θ), because it's just a function parameterises by θ. And you want the maximum likelihood of θ that will make you the most money (let's call that Q*, where Q(a|s) is the reward of taking action a at step s).
      Since you're a data major you probably can see where this is going. You want a neuro-net that models π(θ) and you want to train it to solve for -Δlog(π(θ))Q (notice the score function here), where Q is the Reward of your trading actions (and in practice simulated by another neuro-net). Notice you want to find the set of θ for π(θ) that maximises Q(Q*) (using maximum likelihood and past stock data possibly flattened by some RNN). Furthermore you want to incrementally improve π within a confidence interval, so you don't make too big of a step that will collapse your convergence.... and you'll see the fisher information matrix come up in this calculation if you dig further.
      So yeah it prob helps in your future job in stock market prediction, if that's where you're headed.

    • @coolblue5929
      @coolblue5929 2 роки тому

      @@lzl4226 except, stock prices are not produced by a stationary process.

  • @CaptZdq1
    @CaptZdq1 5 років тому

    Fellow 'nerds'?! That's very abusive. You should be imprisoned for that.

    • @zedstatistics
      @zedstatistics  4 роки тому +9

      perhaps fellow 'sailors' aye captain?