Factor Analysis and Probabilistic PCA

Поділитися
Вставка
  • Опубліковано 31 тра 2024
  • The machine learning consultancy: truetheta.io
    Want to work together? See here: truetheta.io/about/#want-to-w...
    Factor Analysis and Probabilistic PCA are classic methods to capture how observations 'move together'.
    SOCIAL MEDIA
    LinkedIn : / dj-rich-90b91753
    Twitter : / duanejrich
    Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
    SOURCES
    [1] was my primary source since it provides the algorithm used in the Scikit Learn's Factor Analysis software (which is what I use). Since it walks through the derivation of the fitting procedure, it is quite technical. Ultimately, that level of detail came in handy for this video.
    [2] and [4] were my go-to for Probabilistic PCA. A primary reason is Christopher Bishop is one of the originators of PPCA. That came with a lot of thoughtful motivation for the approach. The discussion there includes a lot advantages of PPCA over PCA.
    [3] was my refresher on this subject when I first decided to this video. Like many of us, I'm a fan of Andrew Ng, so I was curious how he'd explain the subject. He emphasized that this model is particularly useful in high dimension-low data environments - something I forward in this video.
    [5] is an excellent overview of FA and PPCA (as long as you're comfortable with linear algebra and probability). In fact, Kevin Murphy's entire book is like that for every subject and that's why it's my absolute favorite text.
    ---------------------------
    [1] D. Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, 2012
    [2] C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006
    [3] A. Ng, "Lecture 15 - EM Algorithm & Factor Analysis | Stanford CS229: Machine Learning (Autumn 2018)", • Lecture 15 - EM Algori...
    [4] M. Tipping and C. Bishop, "Mixtures of Probabilistic Principal Component Analysers", MIT Press, 1999
    [5] K. P. Murphy. Probabilistic Machine Learning (Second Edition), MIT Press, 2021
    CONTENTS
    0:00 Intro
    0:21 The Problem Factor Analysis Solves
    2:27 Factor Analysis Visually
    5:52 The Factor Analysis Model
    10:56 Fitting a Factor Analysis Model
    14:13 Probabilistic PCA
    15:43 Why is it Probabilistic "PCA"?
    16:59 The Optimal Noise Variance
    TOOLS
    If you'd like to apply Factor Analysis, I'd recommend Scikit-Learn: scikit-learn.org/stable/modul...
    Social Media
    Twitter : / duanejrich
    Patreon : / mutualinformation

КОМЕНТАРІ • 90

  • @sasakevin3263
    @sasakevin3263 Рік тому +32

    The only reason that this guy's video didn't go viral is only 0.01% of the audience are interested in such complex statitics and formulas. But what he made are really awesome!

    • @Mutual_Information
      @Mutual_Information  Рік тому +5

      That 0.01% are the cool kids - that's who I'm going for!

    • @ruizhezang2657
      @ruizhezang2657 10 місяців тому

      @@Mutual_Informationawesome job! Best video in this area I’ve ever watched!

  • @pifibbi
    @pifibbi 2 роки тому +20

    Please don't stop making these!

  • @MikeOxmol_
    @MikeOxmol_ 2 роки тому +21

    It's criminal that you don't have at least 50k subs. Please don't stop making videos, even though they don't have that many views right now, there are people like me who appreciate the videos very much. Certain topics can seem very daunting when you read about them, especially in such "dense" books as Bishop's PRML or Murphy's PML. However, if I start digging into a topic by watching your video and only then do I read the chapter, the ideas seem to connect more easily and I have to spend less time until it "clicks" if you know what I mean.
    On another note, if you look for ideas for future vids (which I'm sure you already have plenty), Variational Inference would be a cool topic

    • @Mutual_Information
      @Mutual_Information  2 роки тому +4

      Thanks extremely nice of you! Yea this channel is for people like you/me, who want to understand those intense details in those books. I know I would have loved a channel like this if it was around when I was learning. I’m glad it’s doing that job for you.
      And yes VI is coming! Thanks for your support! And please don’t hesitate to share the channel with people doing the same as you :)

  • @divine7470
    @divine7470 2 роки тому +8

    Thanks for covering this topic. I learned about and how to use FA and PCA in bootcamps but the way you dive into the internals is made so easily digestible.

  • @Nightfold
    @Nightfold 2 роки тому +4

    This sheds some light into what I'm doing with PPCA but still I resent deeply my lack of formation in statistics during my degree.

  • @mainakbiswas2584
    @mainakbiswas2584 6 місяців тому +1

    Had been finding this piece of information for quite a long time. I understood FA by sort of re-discovering it after seeing the sklearn documentation. From that point onward I wanted ro know why it related to pca. This have me the intuition and the resources ro look upto. ❤❤❤

  • @enx1214
    @enx1214 Рік тому +1

    True old school best techniques still in use them from 2004. They can save you as can build from nowhere amazing models

  • @fenncc3854
    @fenncc3854 2 роки тому +2

    Great video, really informative, easy to understand, good production quality, and you've also got a great personality for these style of videos.

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Thank you! These comments mean a lot. Happy to have you :)

  • @alan1507
    @alan1507 8 місяців тому +1

    Thanks for the very clear explanation. I was doing my PhD under Chris Bishop when Bishop and Tipping were developing PPCA - good to get a refresher!

    • @Mutual_Information
      @Mutual_Information  8 місяців тому

      Wow, it's excellent to get your eyes on it - very cool!

  • @mCoding
    @mCoding 2 роки тому +2

    Always love to hear your explanations!

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      Thanks man - fortunately there’s always a mountain of topics to cover. Plenty to learn/explain :)

  • @MrStphch
    @MrStphch 2 роки тому +1

    Really really nice videos!! Love your way of explaining.

  • @jakubm3036
    @jakubm3036 Рік тому +1

    great video, understandable explanations and cool format!

  • @Kopakabana001
    @Kopakabana001 2 роки тому +4

    Another great video!

  • @xy9439
    @xy9439 2 роки тому +3

    Very interesting video, as always

  • @quitscheente95
    @quitscheente95 Рік тому +1

    Damn, I spend so much time going through 5 different books to understand PPCA and here you are, explaining it in an easy, comprehendable, visual manner. Love it. Thank you :)

  • @EverlastingsSlave
    @EverlastingsSlave 2 роки тому +1

    Man how good are your vidoes, i am amazed at perfection

  • @user-kn4wt
    @user-kn4wt 2 роки тому +1

    these videos are awesome!

  • @AdrianGao
    @AdrianGao 9 місяців тому +1

    Thanks. This is brilliant.

  • @user-lx7jn9gy6q
    @user-lx7jn9gy6q 10 місяців тому +1

    Underrated channel

  • @jonastjepkema
    @jonastjepkema 2 роки тому +1

    Amazing! Hope your channel will eexplode soon!

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Lol thanks - it honestly doesn’t need to for me to keep going. This is a very enjoyable hobby

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      But if you want to tell all your friends, I won’t stop you 😉

  • @Blahcub
    @Blahcub 9 місяців тому +1

    This was a super helpful video thank you so much. I love this material and find it super fun.

    • @Mutual_Information
      @Mutual_Information  9 місяців тому

      Excellent - this one is a doozy so it's nice to hear when it lands

    • @Blahcub
      @Blahcub 9 місяців тому

      @@Mutual_Information There's a level of background information that takes a while to process, and even though you say it slow there may be some extra detail warranted. I had to pause a lot and think and rewind to fully grasp the details.

    • @Mutual_Information
      @Mutual_Information  9 місяців тому

      @@Blahcub That's good to know.. you are my ideal viewer :) thank you for your persistence

    • @Blahcub
      @Blahcub 9 місяців тому

      @@Mutual_Information In simpler terms, can't we just say PPCA is just PCA, but we model a distribution over the latent space and sample from that distribution?

    • @Mutual_Information
      @Mutual_Information  9 місяців тому

      @@Blahcub In all cases, we are considering a distribution over the latent space. PPCA is distinct in that we assume a constant noise term across dimensions, and that gives it some closed form results.

  • @juliafila5709
    @juliafila5709 Місяць тому +1

    Thank you so much for your content!

  • @saeidhoseinipour3101
    @saeidhoseinipour3101 2 роки тому +1

    Another nice video. Thanks 🙏
    Please cover data science topics such as Clustering and Classification or applications like Textming, Recommender Systems, Image Processing and so on, with statistics perspective and linear algebra perspective.

  • @michaelcatchen84
    @michaelcatchen84 Рік тому

    Around 10:35 you skip over the posterior inference of p(z_i | x_i, W, mu, psi) and that it is also a normal distribution because the normal is a conjugate prior for itself. Would love to see this covered in a separate video

  • @wazirkahar1909
    @wazirkahar1909 2 роки тому +2

    Please please please keep doing this :)

  • @taotaotan5671
    @taotaotan5671 2 роки тому +1

    Does restricted maximum likelihood, a technique that is often used in mix effect model, may also apply in factor analysis?

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      I don't know much about restricted max likelihood, but from what I've (just) read, it appears flexible enough to accommodate the FA assumption. Anytime you're estimating variance/covariance, you could use a low-rank approximation.

  • @timseguine2
    @timseguine2 10 місяців тому +1

    One question that came to mind, if you are trying to do factor analysis using an iterative method, are the PPCA ML estimates a good initial value?

    • @Mutual_Information
      @Mutual_Information  10 місяців тому +1

      Possibly.. but if you're going to accept the costs of computing those initial estimates, you might as well just do the FA routine? I don't think it would be worth it

  • @taotaotan5671
    @taotaotan5671 Рік тому +1

    Hi DJ, awesome contents as always!!
    I find I can follow your notations much better than textbook notations. At 8:12, I believe the matrix W is shared across all individuals, while z is specific to each sample. It makes intuitive sense to call matrix W common factors, and call z loadings. However, the textbook (Penn State Stat505 12/12.1) seems to call W (in their notation L) factor loadings, while calling z (in their notation f) common factors.
    I am a little confused and I will appreciate it if you can take a look. Thank you again for the tutorial!

    • @Mutual_Information
      @Mutual_Information  Рік тому +1

      Hey Taotao! I just checked this against Barber's book. It appears Stat505 is correct - W is called the "factor loading". I actually recall being confused by this too (and why I had to double check just now).. and all I can say is.. yea the naming is confusing. For me, I avoid the naming in general by just thinking of z as latent variable and W as parameters. I agree, this "factor loading" name is shit.

    • @taotaotan5671
      @taotaotan5671 Рік тому

      @@Mutual_Information Thanks so much DJ! That clarifies.

  • @tommclean9208
    @tommclean9208 2 роки тому +1

    is there any code that supplements your videos? i always find i learn easier by looking at an playing around with code :)

    • @Mutual_Information
      @Mutual_Information  2 роки тому +2

      Not in this one unfortunately. For this case, I'd check out the use case for FA from sklearn : scikit-learn.org/stable/modules/decomposition.html#fa
      If you look one level deep into their .fit() method, you'll see the SVD algorithm I reference in the vid.
      I have plans for more code examples in future vids

  • @matej6418
    @matej6418 8 місяців тому +1

    elite content, imho after the introduction I would love to see the content mainly, dunno if staying on screen makes the delivery better? whats the objective here ?

    • @Mutual_Information
      @Mutual_Information  8 місяців тому

      Appreciate the feedback. It's effectively a cheaper way of keeping the video lively without having to create animations, which take a long time. If I'm not on screen and I leave the level of animation the same, it's a lot of audio over still text, which I've separated heard makes people 'zone out'.
      This is also an older video. I really don't like how I did things back then. In the future, I'd like to mature into a more dynamic style.

  • @prodbyryshy
    @prodbyryshy 2 місяці тому

    amazing video, i feel like i understand each individual step but im sort of missing the big picture

  • @siddharthbisht1287
    @siddharthbisht1287 2 роки тому +1

    For anyone who is wondering about the Parameter formula :
    2D + D(D-1)/2 = D + D + D(D-1)/2
    D : dimension of the Mean Vector
    D : diagonal of the Covariance Matrix (Variance of every Random Variable)
    D(D-1)/2 : Covariance between any two Random Variables di and dj (di and dj

  • @Blahcub
    @Blahcub 8 місяців тому +1

    Isn't it a problem that in factor analysis, PCA and any dimensionality reduction done here is that it assumes a linear relationship?

    • @Mutual_Information
      @Mutual_Information  8 місяців тому +1

      Yea, definitely. The nonlinear version of this rely on the manifold hypothesis, which is akin to saying the assumptions of FA hold, but only *locally* and after nonlinear transformations.. and that essentially changes everything. None of the analytic results you see here hold and we have to resort to other things, like autoencoders.

  • @horacemanfred354
    @horacemanfred354 2 роки тому +1

    Great video. Could you cover the use of Energy Functions in ML?

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Maybe one day, but no concrete plans. Fortunately, there’s an excellent UA-camr who covers energy models really well : ua-cam.com/video/y6WNrHskm2E/v-deo.html these would probably be a primary source if I were to cover the topic

    • @horacemanfred354
      @horacemanfred354 2 роки тому +1

      @@Mutual_Information Thanks. So funny, in the video the Alfredo Canziani says it has taken him years to understand energy functions. It appears it is about the manifold of the cost function and I understand it better now.

  • @DylanDijk
    @DylanDijk Рік тому +1

    Is the log likelihood a concave function of \psi and w

    • @Mutual_Information
      @Mutual_Information  Рік тому +1

      If you fix w, then the function is a concave func of psi.. and if you fix psi.. yes I bet it's also a concave function (because it's like doing linear regression). I'm fairly sure of this but not 100%.

    • @DylanDijk
      @DylanDijk Рік тому

      @@Mutual_Information Ok thanks, I was asking because I wanted to know whether the EM algorithm is guaranteed to converge to the MLE of the FM. As the Em algorithm is guaranteed to increase the log likelihood at each step, I would assume that if it is concave then we should converge to the MLE. But from reading around, it seems that getting the MLE for the FM using EM is not guaranteed.
      Btw your videas are great!

    • @DylanDijk
      @DylanDijk Рік тому

      I guess an important point is that if a function is concave in each of its variables keeping the rest fixed, the function is not guaranteed to be concave. So using what you said, we dont know if the log likelihood is a concave function of \psi and w.

  • @muesk3
    @muesk3 2 роки тому +1

    Quite funny the difference in treatment of how FA is explained in statistics vs machine learning. :)

    • @gordongoodwin6279
      @gordongoodwin6279 2 роки тому +1

      I was literally thinking this. I honestly thought he was clueless for the first minute then realized it’s just a really interesting and different way to look at factor analysis than what it was originally intended to do (and the way it’s taught in most statistics and psychometrics texts). Great video

  • @mrcaljoe1
    @mrcaljoe1 10 місяців тому +1

    1:18 when you have to use spit instead

  • @njitnom
    @njitnom 2 роки тому +1

    i love you bro

  • @InquilineKea
    @InquilineKea 2 роки тому +1

    is this like fourier decomposition?

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      Eh, I'd say not especially. It's only similar insofar as things are decomposed as the sum of scaled vectors/functions. Fourier series is specific to the complex plane, sin/cos. I don't see much of that showing up. Maybe there is a connection since the normal distribution concerns circles.. but I don't see it.

    • @abdjahdoiahdoai
      @abdjahdoiahdoai 2 роки тому

      i think you are thinking fourier decomposition is by fourier basis, in that sense. Maybe a SVD is what you are looking for

  • @siddharthbisht1287
    @siddharthbisht1287 2 роки тому +1

    I have a couple of questions,
    1. What do you mean by "averaging out"?
    2. What difference does it make by switching the Covariance matrix from Psi to a Full Covariance Matrix WW* + Psi?
    Great video though !!

    • @Mutual_Information
      @Mutual_Information  2 роки тому +3

      Hey Siddharth, nice to hear from you. For “averaging out”, that was a bit of a hand wave to avoid mentioning the integration p(x) = integral of p(x|z)p(z)dz.. the way I think about that is it’s the distribution over x if you were to rerun the data generation process infinitely and ignore the z’s and ask what distribution over x that would create.
      For your second question, Psi is a diagonal matrix. So WW* + Psi isn’t diagonal but Psi is.

    • @siddharthbisht1287
      @siddharthbisht1287 2 роки тому

      @@Mutual_Information I wanted to understand the difference the change in covariance makes, why are we changing the covariance matrix?

    • @Mutual_Information
      @Mutual_Information  2 роки тому +2

      Hmm, if I understand the question, it’s because one way involves a lot few parameters. If you say your covariance matrix is WW* + Psi, then that covariance matrix is determined by D*L + D parameters. If it’s just the regular covariance matrix is a typical multivariate normal, then it’s number of parameters in the covariance is D + D*(D-1)/2.

    • @siddharthbisht1287
      @siddharthbisht1287 2 роки тому +1

      ​@@Mutual_Information Oh okay. So then L

  • @akhileshchander5307
    @akhileshchander5307 2 роки тому +1

    I came to this channel from your comment on another channel, I checked one-two minutes video and found this channel is interesting. My request is please make videos: on these "mathematical notations" you using, because my personal experience, there are many who don't understand this with these symbols, eg: what is the meaning of {x i T} N i =1? Thank

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      Hey Akhilesh, I see what you're saying. I'm thinking of creating a notation guide - something like a 1-pager linked to in the description which would go over the exact notation.
      To answer your question {x_i^T: i = 1, ... N} just refers to the set of row vectors (x_i is assumed to be a column vector, so x_i^T is a row vector). The {...} is just set notation. It's just a way of saying.. this is the set of N row vectors.. and we'll indicate each one using the index i. So x_3^T is the third of N row vectors.

  • @gwho
    @gwho 10 місяців тому

    When discussing personality theory, big 5 (aka OCEAN) is superior to MBTI (meyers briggs type indicator) because big 5 uses factor analysis, whereas MBTI presupposes its 4 dimensions.
    Then when comparing MBTI to astrology, just laugh astrology out of the room

  • @EverlastingsSlave
    @EverlastingsSlave 2 роки тому +1

    you are doing so good work therefore i invite you to read Quran so that you are saved in afterlife
    stay blessed