This could be by far the best explanation I have seen for EM algorithm. The way you have connected the intuitive way to mathematical explanation, is so so commendable!!!! Thank you so much for your efforts
Awesome explanation. I'd like to extend yours with my intuition regarding the E-Step: the first term p(x|m0) shows the probability of x happening for the chosen m0, and the second term LogLikelihood shows the probability of x happening for the computed m, and we want to maximize both. Because we want a choice with high probability from every aspect. That's why we multiply them together. Because the multiplication can weight between them. If one of them is small then the result will be small. It can be high only if both are high.
Brilliant explanation. I especially appreciate you first providing the intuition of the method in the verbal explanation of the E and M steps. I struggled with the seeing the math first in other lectures until seeing your video. Thanks for posting this.
That's really great way to look at EM. I'm an engineering graduate but new to ML and the workup explanation before dropping into the maths is excellent. thanks
Thank you for the high-quality contents that you have produced over the past few years. Most of the time, it really did help me get the intuition and understanding of what was going on with the theoretical concepts I was seeing in my courses. Once again, thank you !
I had a jolt of excitement when I saw you had decided to do a video on this topic. It's something I've had to revisit time and time again, always understanding the intuition, but always getting lost in the formulas. Your post did a great job at helping to explain the intuition. I did struggle a bit with your non-conventional likelihood notation, though. That did throw me off a little bit but I understand why you had to have it that way and quickly adjusted. The care you took in explaining why there is mu and mu0 just shows why you are a fantastic teacher.
Yes please go on with the prove, that will be an interesting topic. Though I went on Andrew's ng video couple of times, but I couldn't understand it better than here!! You're a rock star in simplifying complex concepts!!
Thanks for the very clear explanation! A follow up video on how the EM algorithm can be used in gaussian mixture models or bayesian networks would be awesome!
Thanks for your explanation. I think my main mental knot was wondering why you alter N instead of looking for the best guess for x to locate the value of the unknown value. To realize that x doesn't change and the power of the algorithm lies in finding the optimal solution for the learner without caring for the actual value of x was what I needed for it all to make sense.
Thanks a lot! That is a great explanation!!! I was struggling with EM for a long time!! :)) I'd grateful if you also talk about the proof of convergence!
Thanks so much for this great and explanation! I would definitely be interested in the proof. It will be great if you could do a video on Gaussian mixture models as well and how it is solved using the EM algorithm.
Wonderful. Definitely helps my understanding. When I find time I want to see what you're doing w/ the stock predictions. If I remember lectures from business school, you should not be able "generate alpha" unless you possess information the market does not. In this case you could say you've found some new idea that has real predictive value, but either they will a) already have found this and put much more compute + their proximity to the actual place where the trades happen towards getting the answer first and beating you to the trades or b) didn't know it before but will immediately steal it and then see a) haha. But hey, I'll still watch to see what you've got going on.
I think it applies not only for estimation of mu, but any arbitrary parameter. Then it would not be as simple as taking average of all observed data. I could be wrong though :P
Love this question, thanks for asking. Indeed with this toy example the EM algorithm is overkill and it was mostly meant for instructive purposes. Of course, when we use things for instructive purposes we can miss the more interesting applications. Thinking about this, an interesting variation would be what if you have the data [1,2,x,y] drawn from a N(0,sigma) where now it’s the standard deviation sigma as well as two missing values you want to predict. This is interesting because it’s important to consider the values of the non missing data *and* the potential values of the missing data which are consistent with some estimate for sigma (since standard deviation is inherently a measure between data points)
Thanks So much Ritvik!Your videos are amazing...do you have list of playlist for machine learning to connect dots in ML concepts,I see playlist for data science but not for machine learning. Thanks
Nice to see the theorem guaranteeing convergence for sequences that are increasing and bounded being used to prove this. I do have a more pragmatic question which is how somebody would go about finding the argmax in the M step. Would gradient descent be used on the expectation of log-likelihood function (I would imagine in this case the expectation of log-likelihood would have to be convex for this to work) to find the argmax?
Yep, you can use any optimization method. For Gaussian mixture models there are explicit formulas for the M step which are obtained in the usual way by setting the gradient of the expected log-likelihood to zero.
Great explanation. However, the way you have written it, there is no difference between the likelihood function and the probability function. I think for clarity you should swap x,1,2 and \mu. Also you should use ; instead of | so that the likelihood function is not confused with conditional probability.
Thanks a for the easy-to-understand intuation of EM algorithm. Would you like to explain the Coin-flip example along with your formulation step ② and ③?
The expression for Expectation seems similar to Bayesian theory where we have prior belief (P(x|u)) and likelihood and we are multiplying both to get posterior. Is this the same concept?
This could be by far the best explanation I have seen for EM algorithm. The way you have connected the intuitive way to mathematical explanation, is so so commendable!!!! Thank you so much for your efforts
Glad it was helpful!
@@ritvikmath I comfirm, thank you for helping lost students
Thank you Ritvik for simplifying EM algorithm like this. This is the best video I have seen so far.
The world needs to see this. Thanks Ritvik, I honestly have utmost respect and love for the amount of hard work you put in your videos. Cheers :)
It would take a lot of time to develop these intuitions on your own.
thank you ritvik the best videos are in this channel.
Very intresting way of teaching thank you from TUNISIA
Most welcome!
Your channel and way of teaching is so amazing!! Very inviting, inclusive, and friendly. Thank you so much for such good vibes 💕
I have an exam tomorrow and this video was the thing I needed. I can't thank you enough dude.
It would take me two more lives to be able to explain it this well to someone, kudos! Great job buddy!
Wow, thanks!
YES! I have quiz on this NEXT WEEK!
Awesome explanation. I'd like to extend yours with my intuition regarding the E-Step: the first term p(x|m0) shows the probability of x happening for the chosen m0, and the second term LogLikelihood shows the probability of x happening for the computed m, and we want to maximize both. Because we want a choice with high probability from every aspect. That's why we multiply them together. Because the multiplication can weight between them. If one of them is small then the result will be small. It can be high only if both are high.
thanks for the additional inputs!
Brilliant explanation. I especially appreciate you first providing the intuition of the method in the verbal explanation of the E and M steps. I struggled with the seeing the math first in other lectures until seeing your video. Thanks for posting this.
That's really great way to look at EM. I'm an engineering graduate but new to ML and the workup explanation before dropping into the maths is excellent. thanks
Incredible explanation! Was trying to understand the intuition behind EM for a long time! Thanks for the video! Keep Going!!
Glad it helped!
your understanding and explanation of such a complicated concept is impeccable
It's 4am and I saw this video and had to watch....... really great explanation bro.....your a natural teacher.....thanks for this......subscribed
Your videos are unreal, simple explanations of complex problems its insane.
Thanks!You explained such a complicated subject so clearly!!!!
thanks!
Thank you for the high-quality contents that you have produced over the past few years. Most of the time, it really did help me get the intuition and understanding of what was going on with the theoretical concepts I was seeing in my courses.
Once again, thank you !
Holy, i can't believe how good this video was :) thank you so much
Ngl my favorite rapper-turned-algorithm
By far the best explanation, amazing.
The only explanation you need for understanding EM algorithm, proper chad explanation!
I had a jolt of excitement when I saw you had decided to do a video on this topic. It's something I've had to revisit time and time again, always understanding the intuition, but always getting lost in the formulas. Your post did a great job at helping to explain the intuition. I did struggle a bit with your non-conventional likelihood notation, though. That did throw me off a little bit but I understand why you had to have it that way and quickly adjusted. The care you took in explaining why there is mu and mu0 just shows why you are a fantastic teacher.
this is the best lecture for em algo
Wow, thank you for your work here. I finally feel confident in a subject in my masters classes and that means the world
Very well explained
Thank you! By far the best channel for providing clear explanations to fairly complex problems.
The best Thanks man
Broke down the most complicated algorithm in the simplest terms. Wow!
you are just amazing! What would be super useful would be an EM video based on your "Maximum likelihood" one.
Great video !
cant express how happy i m to see after yr videos . thanks a lot !
Much better explanation than what I normally see. I would also be interested in seeing you go through the proof.
What an amazing channel, honestly
Awesome! Best explanation of EC algorithm for the beginner!
Glad it was helpful!
Yes please go on with the prove, that will be an interesting topic. Though I went on Andrew's ng video couple of times, but I couldn't understand it better than here!! You're a rock star in simplifying complex concepts!!
Absolutely fantastic. I agree w/ other comments... The DS world needs to see this. Thank you.
Glad you enjoyed it!
Although there is more for fully understanding, I was able to gain the concept because of your video!
Thanks for the very clear explanation! A follow up video on how the EM algorithm can be used in gaussian mixture models or bayesian networks would be awesome!
Thanks for your explanation.
I think my main mental knot was wondering why you alter N instead of looking for the best guess for x to locate the value of the unknown value. To realize that x doesn't change and the power of the algorithm lies in finding the optimal solution for the learner without caring for the actual value of x was what I needed for it all to make sense.
Ritvik, you are doing a great job, thanks
i thank GOD i found your channel. A big thanks to youtube and to you!!
A real master can explain the most complex problem in an understandable way.
This explanation is amazing in order to get the concept
Genius man, genius , Wonderful explanation !!
THANK YOU. You're literally saving my ML undergraduate course
awesome!
Thanks a lot! That is a great explanation!!! I was struggling with EM for a long time!! :))
I'd grateful if you also talk about the proof of convergence!
Would love to see a proof video! Keep up the great work!
Can't wait to see the proof
yeah this guy is the fucking goat
Got this crystal clear. Thanks a lot!
Your explanations are soooo clear! really appreciate the effort you put into your videos. Thank youu!!
Loved it. Thanks for the efforts.
Thanks for the great lecture. One question if I may: 2:20, why you put best guess 1 here instead of a random draw from your known distribution?
Lovely, that's very intuitive. Thank you so much.
Very cool! Thank you for teaching!
Amazing explanation!
Coild you please do the derivation or intuition for EM for clustering? I observe that it is described in many textbooks, but not in such a cool way. 😅
Really nice explaination! Thank you!
Glad it was helpful!
Example with python coming anytime?
Thanks so much for this great and explanation! I would definitely be interested in the proof. It will be great if you could do a video on Gaussian mixture models as well and how it is solved using the EM algorithm.
Excellent. Thank you so much! 👍
thanks!
Excellent explanation!!
Excellent explanations!
Very compelling ... Brilliant
I'm interested in the proof!
Thank you for all the work you put in your videos to make life's like mine easier. Cheers man!
Hi Ritvik, thank you very much for awesome videos. Could you please make some videos on SQL?
thanks! and please check out my full SQL playlist here:
ua-cam.com/play/PLvcbYUQ5t0UFAZGthysGOAtZqLl3otZ-k.html
@@ritvikmath Awesome! Thanks a lot.. Could you please add sql with window function to the playlist, if possible?
great man, ultra great
You are a gem
Amazing, thank you for that !
Glad you liked it!
Thankyou for explaining very clearly
Can you do the proof too please
Thank you so much for these videos!
One question: how do you estimate and maximize the integral in practice? That was the elephant for me...
Wonderful. Definitely helps my understanding. When I find time I want to see what you're doing w/ the stock predictions. If I remember lectures from business school, you should not be able "generate alpha" unless you possess information the market does not. In this case you could say you've found some new idea that has real predictive value, but either they will a) already have found this and put much more compute + their proximity to the actual place where the trades happen towards getting the answer first and beating you to the trades or b) didn't know it before but will immediately steal it and then see a) haha. But hey, I'll still watch to see what you've got going on.
Thank you so much for your explanation, helps me a lot
So clear -- wow!
I’m very confused as of why not just maximize the log-likelihood of all the current observed data given some mu?
I think it applies not only for estimation of mu, but any arbitrary parameter. Then it would not be as simple as taking average of all observed data. I could be wrong though :P
Love this question, thanks for asking. Indeed with this toy example the EM algorithm is overkill and it was mostly meant for instructive purposes. Of course, when we use things for instructive purposes we can miss the more interesting applications. Thinking about this, an interesting variation would be what if you have the data [1,2,x,y] drawn from a N(0,sigma) where now it’s the standard deviation sigma as well as two missing values you want to predict. This is interesting because it’s important to consider the values of the non missing data *and* the potential values of the missing data which are consistent with some estimate for sigma (since standard deviation is inherently a measure between data points)
This is explained so well
very good explanation!
Glad you think so!
Thanks for the great video! One question: if you have (1+2+x)/3 = x , then you can have close form solution, why you still need numerical approach?
Thanks So much Ritvik!Your videos are amazing...do you have list of playlist for machine learning to connect dots in ML concepts,I see playlist for data science but not for machine learning.
Thanks
On step 2, what does the dx do at the end of that equation?
this helped so much, thank you a lot!!
Thanks for the video! What was not clear to me is whether we calculate all E(LL|M) for all Ms in which we can calculate the argman in step 3?
Great video! Thank you so much
Glad it was helpful!
Sir , I know, In E - step we estimate unknown x , but you are calculating Likelyhood . how are these connected ?
What if x is high dimensional? How would the integral change?
Great teacher❤
Nice to see the theorem guaranteeing convergence for sequences that are increasing and bounded being used to prove this. I do have a more pragmatic question which is how somebody would go about finding the argmax in the M step. Would gradient descent be used on the expectation of log-likelihood function (I would imagine in this case the expectation of log-likelihood would have to be convex for this to work) to find the argmax?
Yep, you can use any optimization method. For Gaussian mixture models there are explicit formulas for the M step which are obtained in the usual way by setting the gradient of the expected log-likelihood to zero.
Great explanation. However, the way you have written it, there is no difference between the likelihood function and the probability function. I think for clarity you should swap x,1,2 and \mu. Also you should use ; instead of | so that the likelihood function is not confused with conditional probability.
Is the EM algorithm the best algorithm to use in some specific problem (compared for example to the gradient descent algorithm)?
Thanks a for the easy-to-understand intuation of EM algorithm. Would you like to explain the Coin-flip example along with your formulation step ② and ③?
The expression for Expectation seems similar to Bayesian theory where we have prior belief (P(x|u)) and likelihood and we are multiplying both to get posterior. Is this the same concept?
oh my god. this was so helpful
Awesome!
amazing, thanks for such a clear explanation :)
Great videos. Got it in one go! Could you do Gaussian Mixture Models? Thanks.
How would the problem change if we didn't know the variance either?
Incredible
A worked example of the final process would be invaluable.
Thank you Ritvik for your explanation! does it work only for normal distribution or we can apply it for other kind of distributions