What is a latent variable?

Поділитися
Вставка
  • Опубліковано 30 кві 2024
  • What is the difference between random variables that you can observe and that you cannot? The latter are also called latent or hidden. How do you represent them in Directed Graphical Models? Here are the notes: raw.githubusercontent.com/Cey...
    Latent nodes introduce the problem of missing data. This results in more complicated likelihood calculations, for which we have to find a remedy in another video
    -------
    📝 : Check out the GitHub Repository of the channel, where I upload all the handwritten notes and source-code files (contributions are very welcome): github.com/Ceyron/machine-lea...
    📢 : Follow me on LinkedIn or Twitter for updates on the channel and other cool Machine Learning & Simulation stuff: / felix-koehler and / felix_m_koehler
    💸 : If you want to support my work on the channel, you can become a Patreon here: / mlsim
    -------
    ⚙️ My Gear:
    (Below are affiliate links to Amazon. If you decide to purchase the product or something else on Amazon through this link, I earn a small commission.)
    - 🎙️ Microphone: Blue Yeti: amzn.to/3NU7OAs
    - ⌨️ Logitech TKL Mechanical Keyboard: amzn.to/3JhEtwp
    - 🎨 Gaomon Drawing Tablet (similar to a WACOM Tablet, but cheaper, works flawlessly under Linux): amzn.to/37katmf
    - 🔌 Laptop Charger: amzn.to/3ja0imP
    - 💻 My Laptop (generally I like the Dell XPS series): amzn.to/38xrABL
    - 📱 My Phone: Fairphone 4 (I love the sustainability and repairability aspect of it): amzn.to/3Jr4ZmV
    If I had to purchase these items again, I would probably change the following:
    - 🎙️ Rode NT: amzn.to/3NUIGtw
    - 💻 Framework Laptop (I do not get a commission here, but I love the vision of Framework. It will definitely be my next Ultrabook): frame.work
    As an Amazon Associate I earn from qualifying purchases.
    -------
    Timestamps:
    0:00 Opening
    0:18 Observable Nodes
    2:27 Latent Nodes
    4:00 Problem with latent nodes
    5:24 Solution by Marginalization?

КОМЕНТАРІ • 11

  • @indiajackson5959
    @indiajackson5959 3 місяці тому +1

    This is really an excellent video! The best examples I've seen on latent variables!

  • @mohamadroghani1470
    @mohamadroghani1470 2 роки тому +2

    Awesome videos...

  • @Omar-rc4li
    @Omar-rc4li 2 роки тому +2

    New to this and I have a question... Why do we have to calculate the joint P(T,w) in the first place.
    Is it because we need to infer something about 'w', but 'w' itself depends on 'T'. In that case we will have a conditional probability to solve in order to infer i.e. P(w/T)...
    We know, P(w/T)=P(w,T)/P(T)....So here we have the joint P(w,T) which we need to solve.......Is this reasoning correct to understand why we need to solve the joint distribution P(w,T) in the first place?
    Somewhere i saw that it was the other way round i.e. we need to calculate P(T/w).... but that does not intuitively fit in because in the diagram you made the arrow is from 'w' to 'T'... so "T' depends on 'w'...?????

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 роки тому +1

      Thanks for the comment. I can understand the confusion. :)
      Your confusion is most certainly related to the differences between generative and discriminative models. See e.g. here: stackoverflow.com/questions/879432/what-is-the-difference-between-a-generative-and-a-discriminative-algorithm
      I will elaborate a bit on this, trying to frame it into the context of these videos. If I got you wrong, and this was not the point of the comment, let me know, :) I would be happy to help.
      Generally speaking, one creates Directed Graphical Models/Bayesian Networks with latent and observable nodes in order to model sth. which is either more abstract or pretty close to reality (an example for the latter is a model to predict Covid cases).
      The DGM does not necessarily induce for what you want to use it later on. It is just a way to factorize a joint distribution and helpful in querying the likelihood of a data point.
      A common application for DGMs is inference, which is predicting the probability distribution over the latent variables (or only one of the potentially many latent variables) given observed data. Here, you of course need the posterior (in our example p(W|T)), as you correctly noted. There are multiple ways to obtain the posterior from the joint, sometimes it is possible to find a closed-form solution like in Mixture Models (e.g. GMM), sometimes you would prefer Variational Inference (also take a look at my video on that topic) or you just use MCMC to obtain statistics on the posterior distribution. For all these, you use your knowledge of the joint distribution to either obtain the full posterior, a surrogate or just some statistics on it.
      However, inference is not the only task in Machine Learning. You could use a generative model (i.e. the joint) to generate new data. Think for instance of a generative model of celebrity faces. You could use it to generate previously unseen faces.
      Additionally, a joint distribution offers way more insight and flexibility. Think of very complex models with multiple (groups of) latent variables. Sometimes, you might be interested in posteriors over a subset of them. You cannot do this (or at least not as easy) if you only model one particular posterior, as in a discriminative model.

    • @knowledgedistiller
      @knowledgedistiller Рік тому +1

      @@MachineLearningSimulation So if I understand correctly, we want to compute the joint distribution rather than the conditional distribution p(z|x) (z is latent, x is datapoint), since it is much more flexible, if you have p(x,z) then you can use bayes rule to get p(z|x) - a discriminative model. Or you can generate new data as you said, I think by sampling points of high probability p(x,z=K), where K is a fixed value.
      Correct me if I'm wrong. In essence, DGMs are representing a joint probability - the most general distribution over all random variables involved, which can be used to compute more specific distributions (posterior, lileihood). DGMs represent a joint probability because they tell us how to factor a joint distribution, for example, p(x,z) = p(x|z) *p(z) if x depends on z.

    • @MachineLearningSimulation
      @MachineLearningSimulation  Рік тому

      @@knowledgedistiller Yes, you are correct. DGMs are the most flexible ways to model probabilistic relations by providing a factorization of the joint distribution over all involved random variables.
      Maybe as a side note: Despite joint distributions being the most flexible, it does not necessarily mean they are the most efficient. For some applications, it might be more reasonable to just model certain specific distributions like posteriors etc. like in discriminative models.

  • @UpperM3
    @UpperM3 11 місяців тому

    i can't believe that you have just spelled "happyness" with "Y"...

    • @MachineLearningSimulation
      @MachineLearningSimulation  11 місяців тому +2

      Thanks for spotting the typo. :)
      English is not my mother tongue, so I tend to make mistakes from time to time.

    • @UpperM3
      @UpperM3 11 місяців тому

      @@MachineLearningSimulation it's okay, i actually liked your explanation, it helped me study for my exam, thank you. Just pay attention to basic typos