Factor Analysis and Probabilistic PCA

Поділитися
Вставка
  • Опубліковано 30 жов 2024

КОМЕНТАРІ • 91

  • @sasakevin3263
    @sasakevin3263 2 роки тому +37

    The only reason that this guy's video didn't go viral is only 0.01% of the audience are interested in such complex statitics and formulas. But what he made are really awesome!

    • @Mutual_Information
      @Mutual_Information  2 роки тому +8

      That 0.01% are the cool kids - that's who I'm going for!

    • @ruizhezang2657
      @ruizhezang2657 Рік тому

      @@Mutual_Informationawesome job! Best video in this area I’ve ever watched!

  • @pifibbi
    @pifibbi 3 роки тому +23

    Please don't stop making these!

  • @divine7470
    @divine7470 3 роки тому +8

    Thanks for covering this topic. I learned about and how to use FA and PCA in bootcamps but the way you dive into the internals is made so easily digestible.

  • @MikeOxmol_
    @MikeOxmol_ 2 роки тому +21

    It's criminal that you don't have at least 50k subs. Please don't stop making videos, even though they don't have that many views right now, there are people like me who appreciate the videos very much. Certain topics can seem very daunting when you read about them, especially in such "dense" books as Bishop's PRML or Murphy's PML. However, if I start digging into a topic by watching your video and only then do I read the chapter, the ideas seem to connect more easily and I have to spend less time until it "clicks" if you know what I mean.
    On another note, if you look for ideas for future vids (which I'm sure you already have plenty), Variational Inference would be a cool topic

    • @Mutual_Information
      @Mutual_Information  2 роки тому +4

      Thanks extremely nice of you! Yea this channel is for people like you/me, who want to understand those intense details in those books. I know I would have loved a channel like this if it was around when I was learning. I’m glad it’s doing that job for you.
      And yes VI is coming! Thanks for your support! And please don’t hesitate to share the channel with people doing the same as you :)

  • @Nightfold
    @Nightfold 2 роки тому +5

    This sheds some light into what I'm doing with PPCA but still I resent deeply my lack of formation in statistics during my degree.

  • @alan1507
    @alan1507 Рік тому +1

    Thanks for the very clear explanation. I was doing my PhD under Chris Bishop when Bishop and Tipping were developing PPCA - good to get a refresher!

  • @mainakbiswas2584
    @mainakbiswas2584 11 місяців тому +1

    Had been finding this piece of information for quite a long time. I understood FA by sort of re-discovering it after seeing the sklearn documentation. From that point onward I wanted ro know why it related to pca. This have me the intuition and the resources ro look upto. ❤❤❤

  • @enx1214
    @enx1214 2 роки тому +1

    True old school best techniques still in use them from 2004. They can save you as can build from nowhere amazing models

  • @quitscheente95
    @quitscheente95 2 роки тому +1

    Damn, I spend so much time going through 5 different books to understand PPCA and here you are, explaining it in an easy, comprehendable, visual manner. Love it. Thank you :)

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Awesome - you are the exact type of viewer I'm trying to help

  • @mCoding
    @mCoding 3 роки тому +2

    Always love to hear your explanations!

    • @Mutual_Information
      @Mutual_Information  3 роки тому +1

      Thanks man - fortunately there’s always a mountain of topics to cover. Plenty to learn/explain :)

  • @fenncc3854
    @fenncc3854 2 роки тому +2

    Great video, really informative, easy to understand, good production quality, and you've also got a great personality for these style of videos.

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Thank you! These comments mean a lot. Happy to have you :)

  • @xy9439
    @xy9439 3 роки тому +3

    Very interesting video, as always

  • @GabeNicholson
    @GabeNicholson Рік тому +1

    Underrated channel

  • @Kopakabana001
    @Kopakabana001 3 роки тому +4

    Another great video!

  • @jonastjepkema
    @jonastjepkema 2 роки тому +1

    Amazing! Hope your channel will eexplode soon!

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Lol thanks - it honestly doesn’t need to for me to keep going. This is a very enjoyable hobby

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      But if you want to tell all your friends, I won’t stop you 😉

  • @Blahcub
    @Blahcub Рік тому +1

    This was a super helpful video thank you so much. I love this material and find it super fun.

    • @Mutual_Information
      @Mutual_Information  Рік тому

      Excellent - this one is a doozy so it's nice to hear when it lands

    • @Blahcub
      @Blahcub Рік тому

      @@Mutual_Information There's a level of background information that takes a while to process, and even though you say it slow there may be some extra detail warranted. I had to pause a lot and think and rewind to fully grasp the details.

    • @Mutual_Information
      @Mutual_Information  Рік тому

      @@Blahcub That's good to know.. you are my ideal viewer :) thank you for your persistence

    • @Blahcub
      @Blahcub Рік тому

      @@Mutual_Information In simpler terms, can't we just say PPCA is just PCA, but we model a distribution over the latent space and sample from that distribution?

    • @Mutual_Information
      @Mutual_Information  Рік тому

      @@Blahcub In all cases, we are considering a distribution over the latent space. PPCA is distinct in that we assume a constant noise term across dimensions, and that gives it some closed form results.

  • @jakubm3036
    @jakubm3036 2 роки тому +1

    great video, understandable explanations and cool format!

  • @michaelcatchen84
    @michaelcatchen84 Рік тому

    Around 10:35 you skip over the posterior inference of p(z_i | x_i, W, mu, psi) and that it is also a normal distribution because the normal is a conjugate prior for itself. Would love to see this covered in a separate video

  • @EverlastingsSlave
    @EverlastingsSlave 3 роки тому +1

    Man how good are your vidoes, i am amazed at perfection

  • @siddharthbisht1287
    @siddharthbisht1287 2 роки тому +1

    For anyone who is wondering about the Parameter formula :
    2D + D(D-1)/2 = D + D + D(D-1)/2
    D : dimension of the Mean Vector
    D : diagonal of the Covariance Matrix (Variance of every Random Variable)
    D(D-1)/2 : Covariance between any two Random Variables di and dj (di and dj

  • @user-kn4wt
    @user-kn4wt 2 роки тому +2

    these videos are awesome!

  • @MrStphch
    @MrStphch 2 роки тому +1

    Really really nice videos!! Love your way of explaining.

  • @taotaotan5671
    @taotaotan5671 Рік тому +1

    Hi DJ, awesome contents as always!!
    I find I can follow your notations much better than textbook notations. At 8:12, I believe the matrix W is shared across all individuals, while z is specific to each sample. It makes intuitive sense to call matrix W common factors, and call z loadings. However, the textbook (Penn State Stat505 12/12.1) seems to call W (in their notation L) factor loadings, while calling z (in their notation f) common factors.
    I am a little confused and I will appreciate it if you can take a look. Thank you again for the tutorial!

    • @Mutual_Information
      @Mutual_Information  Рік тому +1

      Hey Taotao! I just checked this against Barber's book. It appears Stat505 is correct - W is called the "factor loading". I actually recall being confused by this too (and why I had to double check just now).. and all I can say is.. yea the naming is confusing. For me, I avoid the naming in general by just thinking of z as latent variable and W as parameters. I agree, this "factor loading" name is shit.

    • @taotaotan5671
      @taotaotan5671 Рік тому

      @@Mutual_Information Thanks so much DJ! That clarifies.

  • @juliafila5709
    @juliafila5709 6 місяців тому +1

    Thank you so much for your content!

  • @saeidhoseinipour3101
    @saeidhoseinipour3101 2 роки тому +1

    Another nice video. Thanks 🙏
    Please cover data science topics such as Clustering and Classification or applications like Textming, Recommender Systems, Image Processing and so on, with statistics perspective and linear algebra perspective.

  • @AdrianGao
    @AdrianGao Рік тому +1

    Thanks. This is brilliant.

  • @wazirkahar1909
    @wazirkahar1909 3 роки тому +2

    Please please please keep doing this :)

  • @Anzar2011
    @Anzar2011 3 місяці тому

    Awesome work !

  • @muesk3
    @muesk3 3 роки тому +1

    Quite funny the difference in treatment of how FA is explained in statistics vs machine learning. :)

    • @gordongoodwin6279
      @gordongoodwin6279 2 роки тому +1

      I was literally thinking this. I honestly thought he was clueless for the first minute then realized it’s just a really interesting and different way to look at factor analysis than what it was originally intended to do (and the way it’s taught in most statistics and psychometrics texts). Great video

  • @mrcaljoe1
    @mrcaljoe1 Рік тому +1

    1:18 when you have to use spit instead

  • @matej6418
    @matej6418 Рік тому +1

    elite content, imho after the introduction I would love to see the content mainly, dunno if staying on screen makes the delivery better? whats the objective here ?

    • @Mutual_Information
      @Mutual_Information  Рік тому

      Appreciate the feedback. It's effectively a cheaper way of keeping the video lively without having to create animations, which take a long time. If I'm not on screen and I leave the level of animation the same, it's a lot of audio over still text, which I've separated heard makes people 'zone out'.
      This is also an older video. I really don't like how I did things back then. In the future, I'd like to mature into a more dynamic style.

  • @prodbyryshy
    @prodbyryshy 7 місяців тому

    amazing video, i feel like i understand each individual step but im sort of missing the big picture

  • @siddharthbisht1287
    @siddharthbisht1287 2 роки тому +1

    I have a couple of questions,
    1. What do you mean by "averaging out"?
    2. What difference does it make by switching the Covariance matrix from Psi to a Full Covariance Matrix WW* + Psi?
    Great video though !!

    • @Mutual_Information
      @Mutual_Information  2 роки тому +3

      Hey Siddharth, nice to hear from you. For “averaging out”, that was a bit of a hand wave to avoid mentioning the integration p(x) = integral of p(x|z)p(z)dz.. the way I think about that is it’s the distribution over x if you were to rerun the data generation process infinitely and ignore the z’s and ask what distribution over x that would create.
      For your second question, Psi is a diagonal matrix. So WW* + Psi isn’t diagonal but Psi is.

    • @siddharthbisht1287
      @siddharthbisht1287 2 роки тому

      @@Mutual_Information I wanted to understand the difference the change in covariance makes, why are we changing the covariance matrix?

    • @Mutual_Information
      @Mutual_Information  2 роки тому +2

      Hmm, if I understand the question, it’s because one way involves a lot few parameters. If you say your covariance matrix is WW* + Psi, then that covariance matrix is determined by D*L + D parameters. If it’s just the regular covariance matrix is a typical multivariate normal, then it’s number of parameters in the covariance is D + D*(D-1)/2.

    • @siddharthbisht1287
      @siddharthbisht1287 2 роки тому +1

      ​@@Mutual_Information Oh okay. So then L

  • @horacemanfred354
    @horacemanfred354 3 роки тому +1

    Great video. Could you cover the use of Energy Functions in ML?

    • @Mutual_Information
      @Mutual_Information  3 роки тому

      Maybe one day, but no concrete plans. Fortunately, there’s an excellent UA-camr who covers energy models really well : ua-cam.com/video/y6WNrHskm2E/v-deo.html these would probably be a primary source if I were to cover the topic

    • @horacemanfred354
      @horacemanfred354 3 роки тому +1

      @@Mutual_Information Thanks. So funny, in the video the Alfredo Canziani says it has taken him years to understand energy functions. It appears it is about the manifold of the cost function and I understand it better now.

  • @timseguine2
    @timseguine2 Рік тому +1

    One question that came to mind, if you are trying to do factor analysis using an iterative method, are the PPCA ML estimates a good initial value?

    • @Mutual_Information
      @Mutual_Information  Рік тому +1

      Possibly.. but if you're going to accept the costs of computing those initial estimates, you might as well just do the FA routine? I don't think it would be worth it

  • @taotaotan5671
    @taotaotan5671 2 роки тому +1

    Does restricted maximum likelihood, a technique that is often used in mix effect model, may also apply in factor analysis?

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      I don't know much about restricted max likelihood, but from what I've (just) read, it appears flexible enough to accommodate the FA assumption. Anytime you're estimating variance/covariance, you could use a low-rank approximation.

  • @tommclean9208
    @tommclean9208 3 роки тому +1

    is there any code that supplements your videos? i always find i learn easier by looking at an playing around with code :)

    • @Mutual_Information
      @Mutual_Information  3 роки тому +2

      Not in this one unfortunately. For this case, I'd check out the use case for FA from sklearn : scikit-learn.org/stable/modules/decomposition.html#fa
      If you look one level deep into their .fit() method, you'll see the SVD algorithm I reference in the vid.
      I have plans for more code examples in future vids

  • @Blahcub
    @Blahcub Рік тому +1

    Isn't it a problem that in factor analysis, PCA and any dimensionality reduction done here is that it assumes a linear relationship?

    • @Mutual_Information
      @Mutual_Information  Рік тому +1

      Yea, definitely. The nonlinear version of this rely on the manifold hypothesis, which is akin to saying the assumptions of FA hold, but only *locally* and after nonlinear transformations.. and that essentially changes everything. None of the analytic results you see here hold and we have to resort to other things, like autoencoders.

  • @akhileshchander5307
    @akhileshchander5307 2 роки тому +1

    I came to this channel from your comment on another channel, I checked one-two minutes video and found this channel is interesting. My request is please make videos: on these "mathematical notations" you using, because my personal experience, there are many who don't understand this with these symbols, eg: what is the meaning of {x i T} N i =1? Thank

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      Hey Akhilesh, I see what you're saying. I'm thinking of creating a notation guide - something like a 1-pager linked to in the description which would go over the exact notation.
      To answer your question {x_i^T: i = 1, ... N} just refers to the set of row vectors (x_i is assumed to be a column vector, so x_i^T is a row vector). The {...} is just set notation. It's just a way of saying.. this is the set of N row vectors.. and we'll indicate each one using the index i. So x_3^T is the third of N row vectors.

  • @InquilineKea
    @InquilineKea 2 роки тому +1

    is this like fourier decomposition?

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      Eh, I'd say not especially. It's only similar insofar as things are decomposed as the sum of scaled vectors/functions. Fourier series is specific to the complex plane, sin/cos. I don't see much of that showing up. Maybe there is a connection since the normal distribution concerns circles.. but I don't see it.

    • @abdjahdoiahdoai
      @abdjahdoiahdoai 2 роки тому

      i think you are thinking fourier decomposition is by fourier basis, in that sense. Maybe a SVD is what you are looking for

  • @njitnom
    @njitnom 2 роки тому +1

    i love you bro

  • @DylanD-v9g
    @DylanD-v9g Рік тому +1

    Is the log likelihood a concave function of \psi and w

    • @Mutual_Information
      @Mutual_Information  Рік тому +1

      If you fix w, then the function is a concave func of psi.. and if you fix psi.. yes I bet it's also a concave function (because it's like doing linear regression). I'm fairly sure of this but not 100%.

    • @DylanD-v9g
      @DylanD-v9g Рік тому

      @@Mutual_Information Ok thanks, I was asking because I wanted to know whether the EM algorithm is guaranteed to converge to the MLE of the FM. As the Em algorithm is guaranteed to increase the log likelihood at each step, I would assume that if it is concave then we should converge to the MLE. But from reading around, it seems that getting the MLE for the FM using EM is not guaranteed.
      Btw your videas are great!

    • @DylanD-v9g
      @DylanD-v9g Рік тому

      I guess an important point is that if a function is concave in each of its variables keeping the rest fixed, the function is not guaranteed to be concave. So using what you said, we dont know if the log likelihood is a concave function of \psi and w.

  • @gwho
    @gwho Рік тому

    When discussing personality theory, big 5 (aka OCEAN) is superior to MBTI (meyers briggs type indicator) because big 5 uses factor analysis, whereas MBTI presupposes its 4 dimensions.
    Then when comparing MBTI to astrology, just laugh astrology out of the room

  • @EverlastingsSlave
    @EverlastingsSlave 3 роки тому +1

    you are doing so good work therefore i invite you to read Quran so that you are saved in afterlife
    stay blessed