There was an error on the hand-written M-Step in the beginning of the video. For the first 3 minutes I was able to overlay it. Please refer to this as the correct expression for the M-Step.
How are you sure that the zeropoints of Q are maxima? Couldnt it be a saddle point or minima as well? Or did you just skip the part where you have to check the second derivatives?
That's a great question! :) There was no specific check for this in the video. For theoretical investigations, you can consult the following paper: ua-cam.com/video/Vj4b4xojPMw/v-deo.html Pragmatically though, EM is often observed to behave robustly if well initialized. Since the runtime of an EM fit is usually quite fast (compared to other ML methods), it is reasonable to start from multiple initial conditions and select the model with the best score (or otherwise best properties). For instance, check out this follow-up Video on sieving: ua-cam.com/video/Vj4b4xojPMw/v-deo.html I can also recommend the documentation of scikit-learn: scikit-learn.org/stable/modules/mixture.html
Is it possible to have the Gaussian distributions be latent and have the class be non-latent? Basically the continuous variable is latent now? What would this look like?
That's a valid question, but it is rather uncommon to do it in practice. At least, I haven't seen it. What would be your application? In my understanding, the EM algorithm works best for Mixture Distributions, which have a latent discrete part (the Categorical distribution) and a conditioned, observed continuous part. (which could also be a different distribution from the Normal/Gaussian, but commonly it is used for the Gaussian Mixture Model). However, generally speaking, you can build any DGM you like. It is just that many DGMs come with huge difficulties in training them. A more general way for training DGMs is by Variational Inference (ua-cam.com/video/dxwVMeK988Y/v-deo.html ) or by MCMC (no video yet) which can also handle scenarios, the EM cannot do. In fact, the EM algorithm is identical to Variational Inference if we can analytically express the posterior, what we can for GMMs. But again, regarding your proposal, I do not think it would make a lot of sense to have the latent variable to be the leaf node in the DGM. How I understand latent variables is, that you use them to model an unobserved cause of something, not an unobserved effect.
Yes, surely that of course depends on how "high-dimensional". But in a reasonable number of high dimensions (2 to 100-ish) you can use the EM for the Gaussian Mixture Model where the Gaussians are Multivariate. This introduces additional degrees of freedom, e.g. choosing full covariance or just diagonal etc. I will cover this in the future once I also introduced the Multivariate Normal in my other playlist. Stay tuned for that ;)
this is legitimately such a great explanation. thanks!
You're very welcome! 😊
There was an error on the hand-written M-Step in the beginning of the video. For the first 3 minutes I was able to overlay it. Please refer to this as the correct expression for the M-Step.
How are you sure that the zeropoints of Q are maxima? Couldnt it be a saddle point or minima as well? Or did you just skip the part where you have to check the second derivatives?
That's a great question! :)
There was no specific check for this in the video. For theoretical investigations, you can consult the following paper: ua-cam.com/video/Vj4b4xojPMw/v-deo.html
Pragmatically though, EM is often observed to behave robustly if well initialized. Since the runtime of an EM fit is usually quite fast (compared to other ML methods), it is reasonable to start from multiple initial conditions and select the model with the best score (or otherwise best properties). For instance, check out this follow-up Video on sieving: ua-cam.com/video/Vj4b4xojPMw/v-deo.html
I can also recommend the documentation of scikit-learn: scikit-learn.org/stable/modules/mixture.html
Is it possible to have the Gaussian distributions be latent and have the class be non-latent? Basically the continuous variable is latent now? What would this look like?
That's a valid question, but it is rather uncommon to do it in practice. At least, I haven't seen it. What would be your application?
In my understanding, the EM algorithm works best for Mixture Distributions, which have a latent discrete part (the Categorical distribution) and a conditioned, observed continuous part. (which could also be a different distribution from the Normal/Gaussian, but commonly it is used for the Gaussian Mixture Model).
However, generally speaking, you can build any DGM you like. It is just that many DGMs come with huge difficulties in training them. A more general way for training DGMs is by Variational Inference (ua-cam.com/video/dxwVMeK988Y/v-deo.html ) or by MCMC (no video yet) which can also handle scenarios, the EM cannot do. In fact, the EM algorithm is identical to Variational Inference if we can analytically express the posterior, what we can for GMMs.
But again, regarding your proposal, I do not think it would make a lot of sense to have the latent variable to be the leaf node in the DGM. How I understand latent variables is, that you use them to model an unobserved cause of something, not an unobserved effect.
11:30 isn't it a lower bound of marginal log-likelihood instead?
Hey, are you referring to the Q function?
Hi, what about EM algorithm for one bivariate Gaussian with missing values
Hey, I answered your similar comment on the other video. Was it referring to the same?
@@MachineLearningSimulation yes thank you
How about syntax in R if we want applied in survival mixture model?
Thanks for the comment. :)
Unfortunately, I am not familiar with survival mixturel models.
Just wondering. Could such EM approach work well in cases where X are high dimensional?
Yes, surely that of course depends on how "high-dimensional". But in a reasonable number of high dimensions (2 to 100-ish) you can use the EM for the Gaussian Mixture Model where the Gaussians are Multivariate.
This introduces additional degrees of freedom, e.g. choosing full covariance or just diagonal etc.
I will cover this in the future once I also introduced the Multivariate Normal in my other playlist. Stay tuned for that ;)