4. Parametric Inference (cont.) and Maximum Likelihood Estimation
Вставка
- Опубліковано 8 лют 2025
- MIT 18.650 Statistics for Applications, Fall 2016
View the complete course: ocw.mit.edu/18-...
Instructor: Philippe Rigollet
In this lecture, Prof. Rigollet talked about confidence intervals, total variation distance, and Kullback-Leibler divergence.
License: Creative Commons BY-NC-SA
More information at ocw.mit.edu/terms
More courses at ocw.mit.edu
33:46 Maximum Likelihood Estimation
absolute madlad, you saved 34 minutes of my life
what an angel
life saver.
Love you!
this guy is my theta^ for the distribution of cool statisticians
the audio is so low even when i turn up all volume. prof needs to put his mike anywhere but his chest plz. specifically talking about parts around 33:18. mans whispering. and you can't even hear what the students are asking. but i guess since this is opencourseware, i can't complain much about free college lectures.
OMG, best ever explanation of maximum likelihood estimator!
No its not. Not even close.
Excellent lecture -- very difficult material --- expert presentation. Suggestion: let the students work out the mathematical manipulations, and emphasize the concepts in the lecture: Yes, state the theorems, but then present lots of real world examples, such as weather data, demographic data, economic data, especially cases where the data might be modeled in a variety of ways etc . The point: mathematical manipulations are relatively easy once you know the rules, but the ideas are subtle and (probably) can only be grasped/conveyed via examples. Yes, the lecture had lots of simple examples expertly woven into the theory, but the emphasis was on the mathematical manipulations and not enough on the concepts. Bottom line: superior lecture in any case. Two thumbs up!
for the complements it's better to have the bar rather than ^c imo
Why do all the videos start from the middle of the lecture, rather than the beginning? Can this please be fixed somehow? :( It starts off the bat & we spend like 15 mins trying to understand what the professor is discussing.
44:40 the probability of x falling into the a interval is the integral of the density over this set
Did you follow any books along with this course ?
@@NoRa-ws8fo no I didn't .... do you have any recommendation?
@@qiaohuizhou6960check our mathematical statistics and data analysis. the first 5 chapters are related to probability and cover the same topic taught in 6.041 the rest are related to 18.650
@@NoRa-ws8fo oh yes it seems on the syllabus the course follows the textbook you suggested! thank you so much!
in 1:03:04 the professor said that we build an estimator that capture theta and theta star for all possible theta in capital theta my question isn't theta is the unknon parammettre that we try to estimate and equal to theta star i did not get that statment
I think here theta is the estimate
Note that the slides in ocw website are missing the absolute values in the second TV formular
Damn! I hope I saw this last year
@45:36 If you don't remember that, please take immediate remedy :))))
thanks ♥️🤍
Im doing a masters at this
Typo at 39:00? The max shouldn't be there? It's by definition the TV, so we want to say the TV is larger or equal to any other difference of probabilities of A.
It felt very awkward..... true
exactly. The sign should be an equal one there from definition of the TV.
In the KL divergence near the end, there's a step that was mentioned quite quickly which went from the expectation of the logs to the average of the logs by the law of large numbers, and n seems to have been introduced. Am I right in saying that the average is after taking n samples, sampled according to the theta star distribution?
I think the law of large numbers makes statements about X_1,X_2,.., a sequence about r.v. about a probability space. In this example it makes assumptions about h(X_1),h(X_2) that lives in the probability space with a measure theta*. But actually the X_i's might come from a different space with measure theta. However they are connected by h. So you can think about taking the average of log_theta(X_i) where X_i come from a probability space Omega..
@@formlessspace3560Thanks. Yes, for a discreet space we have X_1, X_2, ....,X_n and n is the number of samples. For continuous spaces the KL divergence is defined as an integral. I was wondering if n came from some sort of sampling of the continuous space. Maybe at this stage we are just considering discreet spaces.
Why is the volume so low?
Why the Total Variation distance between P theta and P theta prime is one half of the sum? Where this one half comes from? (Min 42:32)
It is to normalize the Total Variation, so that the Total Variation takes values in the interval [0, 1].
if you look at the gaussian example that he gave I think it'll be a little clearer. geometrical what p_theta - p_theta' : only one of those two areas he colored, hence the 1/2
he started to prove this fact for the continuous case in 52:00. For the discrete case you should look for elsewhere else.
@@ravikmoreiradarocha427 Thanks!
thank you sir
killed me at 1:00:00 with those facts and proof!
I have a question about the bias and variance of the estimator theta_hat = X_1 metioned at time : 11m00s.
- Why isn't the bias = (X_1 - theta)^2 ? Because bias = (E[X_1] -theta)^2 = (X_1 - theta)^2.
- Why is the variance = 0? Because variance = E[(X_1 - E[X_1])^2] = E[(X_1 - X_1)^2] = 0.
bias = (E[X_1] - theta) and (E[X_1] = theta), therefore bias = 0
variance = E[(X_1 - E[X_1])^2] = E[(X_1 - theta)^2] = E[X_1^2] - theta^2 = theta - theta^2 = theta(1-theta)
I have a small doubt, what if we assume that population parameter itself is a random variable, not a constant in frequentist sense, can quadratic risk still be decomposed into Bias and Variance like this?
if you don't remember that please take an immediate remedy hahahhahaha valid tho
OMG I never knew that MLE has its theoretical basis. I always thought that it's based on common sense :-)
this is helpful ♥️🤍
33:39 Maximum Likelihood Estimation