Blasphemous negligence of the chain rule when performing that first derivative to find its roots for the maximum estimation on the mean of the random variables in the second example. Good thing the difference is symmetric!
Thank you for this amazing video!! But i have a quick question: why you are trying to MINIMIZE the negative of that function instead of directly MAXIMIZE that function?
I think this is because of many optimization algorithms are build to minimize functions instead of maximizing these functions. Thus, it's better to put our equations in that form to become more easily adapted to an optimization routine.
Just note that maximize -f(x) is equivalent to minimize f(x); when you apply the logarithm function to the likelihood function, you get an expression just with negative terms, so the professor just multiply all the log-likelihood function by (-1) so that the negative of the log-likelihood function is now an expression with just positive terms. that's why he minimize this function (all positivities terms) rather than maximizing the original (all negative terms). Hope this help.
because think of the Pk expression at the very top to be a curve. If you differentiate it and set it to 0 you get the max or minimum. Now if you rearrange It to find theta you now know what value of theta gives you a max or min
hii , we can use the same for the first one but at the end we will end up in finding the same. Suppose consider the case of head and tail in 5 tosses ,and u know that the output is HHTTH (given) and we want to use the concept of ML to find the same , as all are independent events we can write the resultant probability as product of probabilities @ each toss ,so we end up in getting teta^3*(1-teta)^2 and by diff it and equating it to 0 ,we get probability of head as 3/5 which is obviously right(by the usual method) .The same happened with the first problem but order doesnot matter hence considering permutations we could have nck at the front(but we can remove it ,as it is not irritating much)
@@mallakbasheersyed1859 i read the theory of likelihood function, i think at that time i understood something wrong with the likelihood function. but by the way thanks for the reply
Maybe a bit late but I think here they say v = sigma^2 so it is the same as putting the sigma outside. Here the variance v is the thing you get after you do E[x^2]-E[x]^2 so sigma^2
The number of heads obtained in our population. he probability of having exactly k successes, our number of heads, is given by the probability function showed in the slide.
Blasphemous negligence of the chain rule when performing that first derivative to find its roots for the maximum estimation on the mean of the random variables in the second example. Good thing the difference is symmetric!
Mistakes happen😅
Thank you for this amazing video!! But i have a quick question: why you are trying to MINIMIZE the negative of that function instead of directly MAXIMIZE that function?
I think this is because of many optimization algorithms are build to minimize functions instead of maximizing these functions. Thus, it's better to put our equations in that form to become more easily adapted to an optimization routine.
Just note that maximize -f(x) is equivalent to minimize f(x); when you apply the logarithm function to the likelihood function, you get an expression just with negative terms, so the professor just multiply all the log-likelihood function by (-1) so that the negative of the log-likelihood function is now an expression with just positive terms. that's why he minimize this function (all positivities terms) rather than maximizing the original (all negative terms). Hope this help.
Hi, the real reason has to do with the Kullback-Leibler divergence. More details here:
wiseodd.github.io/techblog/2017/01/26/kl-mle/
Instablaster...
@@carlospinzoncarrera9246 Very helpful, cheers!
thank you very much sir for this lesson you made everything to be so simple🙏🙏
OMFG this guy is amazing
by watching the video, not clear to me why the optimal value turns out to maximize the likelihood function.
because think of the Pk expression at the very top to be a curve. If you differentiate it and set it to 0 you get the max or minimum. Now if you rearrange It to find theta you now know what value of theta gives you a max or min
For the binomial case, why is it that we don't take a product for the likelihood function?
(Is it because there's only one observation?)
Yep, that's why.
Can someone expand the formula to show how he got rid of exponential while taking log of exponential {x - u/2v}
since it's natural log, when you log the exponential of "something", it just becomes that "something"
why we can directly use the PMF of binomial to calculate the ML instead of using the ML function like the second example?
hii , we can use the same for the first one but at the end we will end up in finding the same. Suppose consider the case of head and tail in 5 tosses ,and u know that the output is HHTTH (given) and we want to use the concept of ML to find the same , as all are independent events we can write the resultant probability as product of probabilities @ each toss ,so we end up in getting teta^3*(1-teta)^2 and by diff it and equating it to 0 ,we get probability of head as 3/5 which is obviously right(by the usual method) .The same happened with the first problem but order doesnot matter hence considering permutations we could have nck at the front(but we can remove it ,as it is not irritating much)
@@mallakbasheersyed1859 i read the theory of likelihood function, i think at that time i understood something wrong with the likelihood function. but by the way thanks for the reply
why is the variance under the root shouldn't it be outside of the root(2pi)
Maybe a bit late but I think here they say v = sigma^2 so it is the same as putting the sigma outside. Here the variance v is the thing you get after you do E[x^2]-E[x]^2 so sigma^2
9:09 howcome u can cancel out the 2 and a v? they r both denominators
That is why they can be cancelled out…
Yo this guy is awsome he sounds like Junior from Kim Possible
Hello thanks for the effort. I think you have a mistake when you minimized w.r.t v. the sum part of the denominator must be 4v^2
but it doesn't matter the answer you reach will be the same.
thank you
What is that K letter means?
The number of heads obtained in our population. he probability of having exactly k successes, our number of heads, is given by the probability function showed in the slide.
niiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiice I got it
ωραιος ο γιαννης
Micromasters lfg
If you dislike this video, I am sorry you are just weak at advanced maths/ statistics. Not that MIT Professor’s fault, this isn’t for everybody.
No need to brag about superiority. While the video is truly as good as can be.
oh wow you must be so smort
Yeah, it is not for arrogant pricks like you.
thank you