4. Parametric Inference (cont.) and Maximum Likelihood Estimation

Поділитися
Вставка
  • Опубліковано 8 лют 2025
  • MIT 18.650 Statistics for Applications, Fall 2016
    View the complete course: ocw.mit.edu/18-...
    Instructor: Philippe Rigollet
    In this lecture, Prof. Rigollet talked about confidence intervals, total variation distance, and Kullback-Leibler divergence.
    License: Creative Commons BY-NC-SA
    More information at ocw.mit.edu/terms
    More courses at ocw.mit.edu

КОМЕНТАРІ • 47

  • @sw-qn4bb
    @sw-qn4bb 6 років тому +80

    33:46 Maximum Likelihood Estimation

  • @emirtarik999
    @emirtarik999 3 роки тому +2

    this guy is my theta^ for the distribution of cool statisticians

  • @waynevanilla
    @waynevanilla 2 роки тому +3

    the audio is so low even when i turn up all volume. prof needs to put his mike anywhere but his chest plz. specifically talking about parts around 33:18. mans whispering. and you can't even hear what the students are asking. but i guess since this is opencourseware, i can't complain much about free college lectures.

  • @MrCraber
    @MrCraber 7 років тому +19

    OMG, best ever explanation of maximum likelihood estimator!

  • @williamwolfe8708
    @williamwolfe8708 6 років тому +7

    Excellent lecture -- very difficult material --- expert presentation. Suggestion: let the students work out the mathematical manipulations, and emphasize the concepts in the lecture: Yes, state the theorems, but then present lots of real world examples, such as weather data, demographic data, economic data, especially cases where the data might be modeled in a variety of ways etc . The point: mathematical manipulations are relatively easy once you know the rules, but the ideas are subtle and (probably) can only be grasped/conveyed via examples. Yes, the lecture had lots of simple examples expertly woven into the theory, but the emphasis was on the mathematical manipulations and not enough on the concepts. Bottom line: superior lecture in any case. Two thumbs up!

  • @Yseerv
    @Yseerv 2 місяці тому

    for the complements it's better to have the bar rather than ^c imo

  • @agarwalarti
    @agarwalarti 4 роки тому +17

    Why do all the videos start from the middle of the lecture, rather than the beginning? Can this please be fixed somehow? :( It starts off the bat & we spend like 15 mins trying to understand what the professor is discussing.

  • @qiaohuizhou6960
    @qiaohuizhou6960 3 роки тому +1

    44:40 the probability of x falling into the a interval is the integral of the density over this set

    • @NoRa-ws8fo
      @NoRa-ws8fo 3 роки тому

      Did you follow any books along with this course ?

    • @qiaohuizhou6960
      @qiaohuizhou6960 3 роки тому

      @@NoRa-ws8fo no I didn't .... do you have any recommendation?

    • @NoRa-ws8fo
      @NoRa-ws8fo 3 роки тому

      @@qiaohuizhou6960check our mathematical statistics and data analysis. the first 5 chapters are related to probability and cover the same topic taught in 6.041 the rest are related to 18.650

    • @qiaohuizhou6960
      @qiaohuizhou6960 3 роки тому +1

      @@NoRa-ws8fo oh yes it seems on the syllabus the course follows the textbook you suggested! thank you so much!

  • @tlijaniraed1599
    @tlijaniraed1599 Рік тому +2

    in 1:03:04 the professor said that we build an estimator that capture theta and theta star for all possible theta in capital theta my question isn't theta is the unknon parammettre that we try to estimate and equal to theta star i did not get that statment

    • @xuchuan6401
      @xuchuan6401 4 місяці тому

      I think here theta is the estimate

  • @haejinsong1835
    @haejinsong1835 5 років тому +3

    Note that the slides in ocw website are missing the absolute values in the second TV formular

  • @anarionzuo1425
    @anarionzuo1425 4 роки тому

    Damn! I hope I saw this last year

  • @abdolreza82
    @abdolreza82 3 роки тому +1

    @45:36 If you don't remember that, please take immediate remedy :))))

  • @not_amanullah
    @not_amanullah Місяць тому

    thanks ♥️🤍

  • @maroctech761
    @maroctech761 5 років тому +4

    Im doing a masters at this

  • @Bmmhable
    @Bmmhable 5 років тому +3

    Typo at 39:00? The max shouldn't be there? It's by definition the TV, so we want to say the TV is larger or equal to any other difference of probabilities of A.

  • @johnlysons3176
    @johnlysons3176 3 роки тому +1

    In the KL divergence near the end, there's a step that was mentioned quite quickly which went from the expectation of the logs to the average of the logs by the law of large numbers, and n seems to have been introduced. Am I right in saying that the average is after taking n samples, sampled according to the theta star distribution?

    • @formlessspace3560
      @formlessspace3560 Рік тому +1

      I think the law of large numbers makes statements about X_1,X_2,.., a sequence about r.v. about a probability space. In this example it makes assumptions about h(X_1),h(X_2) that lives in the probability space with a measure theta*. But actually the X_i's might come from a different space with measure theta. However they are connected by h. So you can think about taking the average of log_theta(X_i) where X_i come from a probability space Omega..

    • @johnlysons3176
      @johnlysons3176 Рік тому

      @@formlessspace3560Thanks. Yes, for a discreet space we have X_1, X_2, ....,X_n and n is the number of samples. For continuous spaces the KL divergence is defined as an integral. I was wondering if n came from some sort of sampling of the continuous space. Maybe at this stage we are just considering discreet spaces.

  • @DimLightPoetries
    @DimLightPoetries Рік тому

    Why is the volume so low?

  • @raitup00
    @raitup00 5 років тому

    Why the Total Variation distance between P theta and P theta prime is one half of the sum? Where this one half comes from? (Min 42:32)

    • @juandiegodonneys4665
      @juandiegodonneys4665 5 років тому +1

      It is to normalize the Total Variation, so that the Total Variation takes values in the interval [0, 1].

    • @yasmineguemouria9099
      @yasmineguemouria9099 5 років тому +1

      if you look at the gaussian example that he gave I think it'll be a little clearer. geometrical what p_theta - p_theta' : only one of those two areas he colored, hence the 1/2

    • @ravikmoreiradarocha427
      @ravikmoreiradarocha427 4 роки тому +1

      he started to prove this fact for the continuous case in 52:00. For the discrete case you should look for elsewhere else.

    • @raitup00
      @raitup00 4 роки тому +1

      @@ravikmoreiradarocha427 Thanks!

  • @3dartlivesketching738
    @3dartlivesketching738 5 років тому

    thank you sir

  • @crunchycho
    @crunchycho 9 місяців тому

    killed me at 1:00:00 with those facts and proof!

  • @saharak90
    @saharak90 6 років тому +2

    I have a question about the bias and variance of the estimator theta_hat = X_1 metioned at time : 11m00s.
    - Why isn't the bias = (X_1 - theta)^2 ? Because bias = (E[X_1] -theta)^2 = (X_1 - theta)^2.
    - Why is the variance = 0? Because variance = E[(X_1 - E[X_1])^2] = E[(X_1 - X_1)^2] = 0.

    • @kratos1537
      @kratos1537 6 років тому

      bias = (E[X_1] - theta) and (E[X_1] = theta), therefore bias = 0
      variance = E[(X_1 - E[X_1])^2] = E[(X_1 - theta)^2] = E[X_1^2] - theta^2 = theta - theta^2 = theta(1-theta)

  • @paulhowrang
    @paulhowrang 2 роки тому

    I have a small doubt, what if we assume that population parameter itself is a random variable, not a constant in frequentist sense, can quadratic risk still be decomposed into Bias and Variance like this?

  • @cheriew4385
    @cheriew4385 3 роки тому

    if you don't remember that please take an immediate remedy hahahhahaha valid tho

  • @何雪凝
    @何雪凝 4 роки тому +3

    OMG I never knew that MLE has its theoretical basis. I always thought that it's based on common sense :-)

  • @not_amanullah
    @not_amanullah Місяць тому

    this is helpful ♥️🤍

  • @safayassin2189
    @safayassin2189 2 роки тому +3

    33:39 Maximum Likelihood Estimation