Gaussian Mixture Model | Intuition & Introduction | TensorFlow Probability

Поділитися
Вставка
  • Опубліковано 2 чер 2024
  • GMMs are used for clustering data or as generative models. Let's start with understanding by looking at a one-dimensional 1D example. Here are the notes: raw.githubusercontent.com/Cey...
    If your (univariate) distribution has more than one mode (peaks), there is a good chance you can model it with a Gaussian Mixture Model (GMM), a Mixture Distribution of Gaussian/Normal. That is helpful for a soft clustering of points in one dimension. For this you select the number of modes you expect (= the number of peaks). This will then correspond to the number of (latent) classes as well as the number of Gaussians that have to be defined.
    In this video, I provide an intuition to this by looking at the grade distribution after an exam, with a first peak at 2.5 and a second peak at the grade corresponding to a fail. We will implement this model in TensorFlow Probability.
    -------
    📝 : Check out the GitHub Repository of the channel, where I upload all the handwritten notes and source-code files (contributions are very welcome): github.com/Ceyron/machine-lea...
    📢 : Follow me on LinkedIn or Twitter for updates on the channel and other cool Machine Learning & Simulation stuff: / felix-koehler and / felix_m_koehler
    💸 : If you want to support my work on the channel, you can become a Patreon here: / mlsim
    -------
    ⚙️ My Gear:
    (Below are affiliate links to Amazon. If you decide to purchase the product or something else on Amazon through this link, I earn a small commission.)
    - 🎙️ Microphone: Blue Yeti: amzn.to/3NU7OAs
    - ⌨️ Logitech TKL Mechanical Keyboard: amzn.to/3JhEtwp
    - 🎨 Gaomon Drawing Tablet (similar to a WACOM Tablet, but cheaper, works flawlessly under Linux): amzn.to/37katmf
    - 🔌 Laptop Charger: amzn.to/3ja0imP
    - 💻 My Laptop (generally I like the Dell XPS series): amzn.to/38xrABL
    - 📱 My Phone: Fairphone 4 (I love the sustainability and repairability aspect of it): amzn.to/3Jr4ZmV
    If I had to purchase these items again, I would probably change the following:
    - 🎙️ Rode NT: amzn.to/3NUIGtw
    - 💻 Framework Laptop (I do not get a commission here, but I love the vision of Framework. It will definitely be my next Ultrabook): frame.work
    As an Amazon Associate I earn from qualifying purchases.
    -------
    Timestamps:
    00:00 Introduction
    00:38 A Multi-Modal Distribution
    01:10 Clustering of Points
    02:04 A Superposition of Gaussians?
    03:59 Using Mixture Coefficients
    05:05 A special case of Mixture Distributions
    05:33 The Directed Graphical Model
    07:52 Alternative Model with plates
    08:45 The joint
    10:28 TFP: Defining the Parameters
    11:27 TFP: The Categorical
    12:12 TFP: The batched Normal
    13:13 TFP: GMM in Principle
    14:13 TFP: Using the TFP Mixture Distribution
    15:15 TFP: Plotting the probability density
    17:05 Outro

КОМЕНТАРІ • 15

  • @karanshah1698
    @karanshah1698 Рік тому +1

    Marginalizing the value conveyed by *your* playlist, to *my* understanding of this subject, is intractable. And if what I said is even remotely sensible, per the rules of probability, the whole credit goes to you.

  • @harshitjuneja7768
    @harshitjuneja7768 Рік тому +1

    Thanks thousands!!

  • @Breeze1208
    @Breeze1208 Місяць тому

    sorry, why the categorical distribution's P(Z) = Cat(Pi) = product of all the pi[0], pi[1]?

  • @saikeerthi5673
    @saikeerthi5673 2 роки тому +2

    I've been following your channel for a while and you've really helped me understand complicated probability concepts, thank you! One question: I didn't understand how the z variable is a latent one. Why can't it just be a parameter?

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 роки тому +1

      First of all: Thanks for the feedback :). I am super glad, I could help!
      I think, here, it wouldn't make sense to have it as a parameter. A parameter, at least in my understanding, would be an adjustable value that defines the distribution of a random variable. Each data-point you observe (and that you want to cluster) consists (at least in the assumptions of a Gaussian Mixture Model) of a class and a position. Both are random variables, meaning that a data-point "does not follow belong to one class or one position". Instead, there is a probability associated with potential classes and potential positions in the observed space. The class variable is considered latent, because in the task of clustering, we do not know to which class a certain point belongs to. Certainly, if we did a scatter plot, we could use our "eye-norm" to figure this out. But we rather want to have a more probabilistic/mathematical treatment. This is because, it could also belong to a not-so-obvious class and just be an unlikely spread from the cluster's center.
      I hope that added some info into the right direction. Please ask a follow-up question if something remains unclear.

    • @saikeerthi5673
      @saikeerthi5673 2 роки тому +1

      @@MachineLearningSimulation That makes sense, thank you! I got confused because of the nature of the latent variable because we infer its belongings from data, similar to how we model the distribution.

  • @nickelandcopper5636
    @nickelandcopper5636 2 роки тому +1

    Hey, another great video! Is the GMM PDF you show at the end normalized? Thanks!

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 роки тому

      Hey, thanks again :)
      Are you referring to what is shown in TensorFlow Probability? If so, then yes. Since they are implemented as the Mixture Model in TFP, the probability of getting any value from the domain of possible values is 1, which is the condition for normalization.
      Is that what you were asking?

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 3 роки тому +1

    Is there a connection between using mixture coefficients to take a linear combinations of two gaussians and DGM approach, where the parameters of the gaussian parameters changes condition upon which category? They seem like 2 very different approaches to arrive at the probability. (Let me know if my question is not clear.). Thanks.

    • @MachineLearningSimulation
      @MachineLearningSimulation  3 роки тому +2

      Thanks for the question :)
      I would view it from two perspectives:
      1) Calculating the likelihood/probability density of one sample. Imagine you have the GMM with two classes from the video, and you have sample, let's say at X=3.0 . Now, in order to get the probability of X, we have to marginalize over the latent class, that is because we don't know the class it belongs to. Hence
      p(X=3.0) = pi_0 * N(X=3.0, mu_0, sigma_0) + pi_1 * N(X=3.0, mu_1, sigma_1)
      In general, we would of course have the summation symbol, but since we only have two contributions in the sum, I explicitly wrote it down. This is of course a mixture here, and you could also call it a linear combination of Normal distributions.
      2) Sampling the GMM: Here you would first sample a latent class, then use the corresponding Normal to sample a point and then "throw away the latent class because it is not observed". Also take a look at this video after 14:10 ua-cam.com/video/kMGjXVb8OzM/v-deo.html where I first do this process manually and then use TensorFlow Probabilities built-in Mixture Distribution
      A remark: If you have a special case of a Mixture Model in which you would observe the class (i.e. it is not latent) then you would evaluate the joint p(Z, X) instead of the marginal. "If you have more information than you should of course also use it." I think this could refer to the second case you mentioned. However, as we commonly use GMM for clustering in which we of course don't know the class, it is not really much seen in application.
      I hope that helped :) Let me know if something was unclear.

  • @engenglish610
    @engenglish610 3 роки тому +1

    Thanks for this video. Can you make a video about multivariate case ?

    • @MachineLearningSimulation
      @MachineLearningSimulation  3 роки тому +1

      Hey, thanks for the feedback ☺️
      Yes that's already planned. I think it will go online in 3 to 4 weeks

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 роки тому +2

      Unfortunately, I overestimated my video output :D So it took me a little longer, but here is the continuation for the Multivariate Case: ua-cam.com/video/iqCfZEsNehQ/v-deo.html
      The videos on the EM derivation and its implementation in Python will follow.