The Exponential Family (Part 2)

Поділитися
Вставка
  • Опубліковано 5 чер 2024
  • The machine learning consultancy: truetheta.io
    Want to work together? See here: truetheta.io/about/#want-to-w...
    This is part 2 on the Exponential Family where I cover its useful and remarkable properties. This helps explain why distributions within the Family are so frequently utilized and how it could be more generically exploited for sophisticated applications.
    SOCIAL MEDIA
    LinkedIn : / dj-rich-90b91753
    Twitter : / duanejrich
    Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
    SOURCES
    Chapter 9 of [2] is where I first learned of the Exponential Family. It covers its definition/properties and shows why it's so well adopted for statistics/machine learning. If you're looking to supplement this video with more detail, this is the place to start.
    [1] is where I learned how to precisely interpret the components of the Exponential Family and how that maps onto the special cases.
    [4] was my primary source for understanding conjugacy of the exponential family. It's where I discovered the specific setting to the exponential family to yield conjugate pairs.
    [3] provides an in depth view of the Exponential Family and it's usefulness for statistical modeling. It resolves a lot of ambiguity by discussing the sometimes fuzzy relationship between our language and the notation's precise meaning. It's also where I learned why the mean-parameterization is really what you want to deal with while modeling.
    [5] showed me how the Exponential Family is used in more sophisticated applications (specifically, for general graphical models). Also, it's where I discovered some of the more technical/theoretical details of the Exponential Family (e.g. there is a 1-to-1 mapping between the mean and canonical parameters if and only if the Exponential Family choices are minimal).
    ---------------------------
    [1] M. I. Jordan, Exponential Family: Basics, University of California, Berkeley, people.eecs.berkeley.edu/~jor...
    [2] K. P. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012
    [3] C. J. Geyer, "Stat 8054 Lecture Notes: Exponential Families", University of Minnesota Twin Cities, 2020, www.stat.umn.edu/geyer/8054/n...
    [4] D. M. Blei, "The Exponential Family", Columbia University, 2016, www.cs.columbia.edu/~blei/fogm...
    [5] M. J. Wainwright, M. I. Jordan, Graphical Models, Exponential Families, and Variational Inference, Foundation and Trends in Machine Learning, 2008
    EXTRA NOTES
    In the video, I say "*The* Exponential Family" quite a bit, but Geyer thinks that isn't correct. He says (from [3]) : "Many people also use an older terminology that says a statistical model is in the exponential family, where we say a statistical model is an exponential family. Thus the older terminology says the exponential family is the collection of all of what the newer terminology calls exponential families. The older terminology names a useless mathematical object, a heterogeneous collection of statistical models not used in any application. The newer terminology names an important property of statistical models."
    Timestamps
    0:00 Intro
    0:30 Review of the Exponential Family Definition
    1:54 Mean and Covariance
    5:34 Maximum Likehood Estimation
    8:39 Difficulties from Wild Choices
    10:41 Conjugacy
    16:50 Outro

КОМЕНТАРІ • 35

  • @StratosFair
    @StratosFair 2 місяці тому +1

    That was a wonderful introduction to the exponential family, thank you.

  • @rosarioscalise7190
    @rosarioscalise7190 2 роки тому +10

    Great videos! I think another nice video about a useful fundamental would be "multivariate change of variables". Something like section 2.6 of the Murphy book. Good visualization on these sorts of topics are so very valuable. Thanks for your hard work!

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      Thats a good idea. Hadn’t thought of that one. Added it to the list - may be awhile before I get to it but the plan it there. Thanks!

  • @lookingfordae8938
    @lookingfordae8938 2 роки тому +4

    Great video man, keep em coming. The Algorithm will pick you up eventually

  • @g1sniper2
    @g1sniper2 Рік тому +2

    Absolutely brilliant work! Thanks for this tremendous effort

  • @euyin77
    @euyin77 7 місяців тому +1

    Mas despacio franchute. Esto es oro!

  • @dermitdembrot3091
    @dermitdembrot3091 Рік тому +1

    Thanks for the video and for going into conjugacy!
    You write Z_D(theta) and later Z_l(theta) for the product Z(theta)^N of normalizers that appears in the likelihood.
    The term - log Z_l(theta) = - N * log Z(theta) appears in the conjugate prior, multiplied with tau_0, and in the posterior, multiplied with tau_0 + 1.
    This choice of the prior's sufficient statistics is not wrong but I think it would make things clearer if the prior did not depend on N.
    Instead - log Z_l(theta) could appear in the prior and tau_0 would then have to take a N-fold value so that the prior distribution is the same. (There is always this kind of arbitrariness that you can freely move a multiplicative constant (in this case N) between sufficient statistics and natural parameters.)
    The multiplicative factor in the posterior would be tau_0 + N. That makes it clear that for each data point a value of 1 is added to tau_0.
    Therefore (the new) tau_0 can be interpreted as the number of data points that the prior "acts as if it has observed".

  • @psl_schaefer
    @psl_schaefer 5 місяців тому +1

    This is some awesome content! Also the reference you provide in your video description have been very helpful for me. Thank you so much :)

  • @joed2333
    @joed2333 2 роки тому +3

    Hey, just wanted to say thank you very much for putting this all together. I'm learning about Probabilistic Graphical Modeling now and your channel is a great pairing with Koller's and Jordan's books.

    • @Mutual_Information
      @Mutual_Information  2 роки тому +1

      Oh those sources are excellent! I actually wrote a series of Quora answers on PGMs (qr.ae/pGSs2o) It goes over the portion of Koller's book I found most interesting/important.

  • @timmae9655
    @timmae9655 2 роки тому +1

    Very well made and clear!

  • @Kopakabana001
    @Kopakabana001 2 роки тому +1

    Another great video! I started sharing them with people at work and they love them!

  • @KarlaSavina
    @KarlaSavina 6 місяців тому +1

    You are amazing! Too cool!🤩😎

  • @roshinroy5129
    @roshinroy5129 Рік тому +1

    Amazing video brother! Clearly explained with all the relevant mathematics involved in it!!! Would love to watch more such videos on important topics!!!

  • @NickGeo25
    @NickGeo25 Рік тому +1

    Awesome! Straight out of the Wainwright paper :) Especially the bijection between expected sufficient statistics and canonical parameters.

  • @siddharthbisht1287
    @siddharthbisht1287 2 роки тому +2

    I haven't seen these many equations on a data science channel in a single video till now 😂😂😂. I got lost after Conjugacy but this is helpful. Thank you for citing the sources. Are you aiming for Probabilistic ML? Variational Inference? Generative Modeling ... sort of content? The content that you have been covering forms the basis of so many things today, Generative Modeling and RL amongst others. I am actually looking forward to the next video now

    • @Mutual_Information
      @Mutual_Information  2 роки тому +3

      Yea this one was pretty intense! I’ll probably chill that out a little bit. Ultimately, over the long term, I want to hit those topics. Definitely generative modeling and probabilistic programming. That’s the cool shit! Just.. there are a lot of prereqs :)

  • @saeidhoseinipour3101
    @saeidhoseinipour3101 2 роки тому +1

    @Mutual Information
    Oh my God, this is a amazing video. Thanks 🙏
    I will know that, what is your software for making video?

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Thanks Saeid. I'm using a personal library I've built to turn static animations made with Altair into short videos.

  • @user-ni7re5tl4u
    @user-ni7re5tl4u 3 місяці тому

    Thanks!!
    l has 1question
    how to know about logZ is convex??

  • @antoinestevan5310
    @antoinestevan5310 2 роки тому +1

    I have been following a few courses about probability theory and statistics but very little, if any at all, did cover the bayesian point of view. As I am a bit tired and bored by the Maximum Likelihood and statistical tests, this video, and especially the end, was a real pleasure!!
    It was fast, complex, complete and compact, this is true. This might be the kind of video where you know things are and where you can come back later to dig into the derivations and the sources. Not a big deal for me ;-)
    Cheers DJ
    NB: I noticed a slight offset between video and audio... I do not know if it is my connection or something else, just to let you know

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Always like having your comments and feedback Antoine - much appreciated! It's nice to see others like this style (it's not for everyone).
      I doubled checked the audio and it *seems* ok on my end. So maybe it's a connection issue. Or I may just have a less sensitive perception :)

    • @antoinestevan5310
      @antoinestevan5310 2 роки тому +1

      @@Mutual_Information if no one reports the sync issue, no worries, it is probably me, you can keep your quality comin' ;-)

    • @antoinestevan5310
      @antoinestevan5310 2 роки тому +1

      @@Mutual_Information i double checked the description of the video and had an idea:
      correct me if i am wrong, you might be using python and pyplot/altair to create your animations?
      it could be really cool having access to snippets of your code to play with the examples you put forward and their parameters and visuals!
      Tell me what you think someday, about the feasibility and the use of such a sharing :-)

    • @Mutual_Information
      @Mutual_Information  2 роки тому

      Hey Antoine - that idea is absolutely on my radar, but the timeline is a bit further out. I plan on exposing the tools I use to create animations. That just needs to happen when I'm prepared to respond to the feedback. I'd like to be in a position where I can iterate with those on how to use the software or improve it. Right now, production for a single video just takes too long for me to allocated that extra time. But in time, it'll come. Thanks for the idea - it's an excellent one!

    • @antoinestevan5310
      @antoinestevan5310 2 роки тому +1

      @@Mutual_Information okey, this looks absolutely legit to me! No pressure :-)
      All that being said, if you need or want any help in this respect, feel very free to ask it, i am sure some people in here would be down to helping you ;-)

  • @TRex-fu7bt
    @TRex-fu7bt 11 місяців тому

    whoa, you did conjugacy without doing a beta/binomial example

  • @kimchi_taco
    @kimchi_taco 10 місяців тому +1

    Brilliant explanation for Bayesian + Exponential family. My understanding of why MAP is better,
    "MLE is the optimization problem to find optimal θ. MAP is the optimization problem to find optimal 𝛕. Finding 𝛕 is easier because they are sufficient statistics."
    Does it sound right?

    • @Mutual_Information
      @Mutual_Information  10 місяців тому +1

      Hm, I wouldn't say exactly that. I'd say that the exponential family allows you to do computations, like computing the MAP, that can otherwise be too computationally expensive.

    • @kimchi_taco
      @kimchi_taco 10 місяців тому

      @@Mutual_Information thx for clarifying it. Would you mind asking intuition of the benefit of MAP?

    • @Mutual_Information
      @Mutual_Information  10 місяців тому

      @@kimchi_taco In general, it's because it also accounts for your prior distribution, which is something you bring to the table. So if you think you have a good prior (something better than the uniform distribution), then you'd prefer the MAP over the MLE.