Expectation Maximization for the Gaussian Mixture Model | Full Derivation

Поділитися
Вставка
  • Опубліковано 15 лис 2024

КОМЕНТАРІ • 16

  • @agrawal.akash9702
    @agrawal.akash9702 8 місяців тому +1

    this is legitimately such a great explanation. thanks!

  • @MachineLearningSimulation
    @MachineLearningSimulation  3 роки тому +1

    There was an error on the hand-written M-Step in the beginning of the video. For the first 3 minutes I was able to overlay it. Please refer to this as the correct expression for the M-Step.

  • @patrickg.3602
    @patrickg.3602 Рік тому +1

    How are you sure that the zeropoints of Q are maxima? Couldnt it be a saddle point or minima as well? Or did you just skip the part where you have to check the second derivatives?

    • @MachineLearningSimulation
      @MachineLearningSimulation  Рік тому

      That's a great question! :)
      There was no specific check for this in the video. For theoretical investigations, you can consult the following paper: ua-cam.com/video/Vj4b4xojPMw/v-deo.html
      Pragmatically though, EM is often observed to behave robustly if well initialized. Since the runtime of an EM fit is usually quite fast (compared to other ML methods), it is reasonable to start from multiple initial conditions and select the model with the best score (or otherwise best properties). For instance, check out this follow-up Video on sieving: ua-cam.com/video/Vj4b4xojPMw/v-deo.html
      I can also recommend the documentation of scikit-learn: scikit-learn.org/stable/modules/mixture.html

  • @nickelandcopper5636
    @nickelandcopper5636 2 роки тому +1

    Is it possible to have the Gaussian distributions be latent and have the class be non-latent? Basically the continuous variable is latent now? What would this look like?

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 роки тому

      That's a valid question, but it is rather uncommon to do it in practice. At least, I haven't seen it. What would be your application?
      In my understanding, the EM algorithm works best for Mixture Distributions, which have a latent discrete part (the Categorical distribution) and a conditioned, observed continuous part. (which could also be a different distribution from the Normal/Gaussian, but commonly it is used for the Gaussian Mixture Model).
      However, generally speaking, you can build any DGM you like. It is just that many DGMs come with huge difficulties in training them. A more general way for training DGMs is by Variational Inference (ua-cam.com/video/dxwVMeK988Y/v-deo.html ) or by MCMC (no video yet) which can also handle scenarios, the EM cannot do. In fact, the EM algorithm is identical to Variational Inference if we can analytically express the posterior, what we can for GMMs.
      But again, regarding your proposal, I do not think it would make a lot of sense to have the latent variable to be the leaf node in the DGM. How I understand latent variables is, that you use them to model an unobserved cause of something, not an unobserved effect.

  • @vslaykovsky
    @vslaykovsky 2 роки тому +1

    11:30 isn't it a lower bound of marginal log-likelihood instead?

  • @bartosz5592
    @bartosz5592 3 роки тому +1

    Hi, what about EM algorithm for one bivariate Gaussian with missing values

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 роки тому

      Hey, I answered your similar comment on the other video. Was it referring to the same?

    • @bartosz5592
      @bartosz5592 2 роки тому

      @@MachineLearningSimulation yes thank you

  • @sulasrisuddin294
    @sulasrisuddin294 Рік тому

    How about syntax in R if we want applied in survival mixture model?

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 3 роки тому

    Just wondering. Could such EM approach work well in cases where X are high dimensional?

    • @MachineLearningSimulation
      @MachineLearningSimulation  3 роки тому +1

      Yes, surely that of course depends on how "high-dimensional". But in a reasonable number of high dimensions (2 to 100-ish) you can use the EM for the Gaussian Mixture Model where the Gaussians are Multivariate.
      This introduces additional degrees of freedom, e.g. choosing full covariance or just diagonal etc.
      I will cover this in the future once I also introduced the Multivariate Normal in my other playlist. Stay tuned for that ;)