Maximum Likelihood Estimate by Automatic Differentiation | Directed Graphical Models

Поділитися
Вставка
  • Опубліковано 20 тра 2024
  • In this video, we will use the Maximum Likelihood Estimate to fit the parameters of a DGM with two random variables. Both are observable and are modelled as Bernoulli distributions. We take advantage of TensorFlow's Automatic Differentiation to solve the optimization problem using a gradient-based algorithm.
    Here are the notes: raw.githubusercontent.com/Cey...
    TensorFlow Probability is a great toolbox for modelling joint distributions by Directed Graphical Models. Calculating the log likelihood creates a computational graph which can be used by TensorFlow to acquire the gradients needed for an ADAM-Optimizer. It works magically and removes the barrier of finding the derivatives.
    -------
    📝 : Check out the GitHub Repository of the channel, where I upload all the handwritten notes and source-code files (contributions are very welcome): github.com/Ceyron/machine-lea...
    📢 : Follow me on LinkedIn or Twitter for updates on the channel and other cool Machine Learning & Simulation stuff: / felix-koehler and / felix_m_koehler
    💸 : If you want to support my work on the channel, you can become a Patreon here: / mlsim
    -------
    ⚙️ My Gear:
    (Below are affiliate links to Amazon. If you decide to purchase the product or something else on Amazon through this link, I earn a small commission.)
    - 🎙️ Microphone: Blue Yeti: amzn.to/3NU7OAs
    - ⌨️ Logitech TKL Mechanical Keyboard: amzn.to/3JhEtwp
    - 🎨 Gaomon Drawing Tablet (similar to a WACOM Tablet, but cheaper, works flawlessly under Linux): amzn.to/37katmf
    - 🔌 Laptop Charger: amzn.to/3ja0imP
    - 💻 My Laptop (generally I like the Dell XPS series): amzn.to/38xrABL
    - 📱 My Phone: Fairphone 4 (I love the sustainability and repairability aspect of it): amzn.to/3Jr4ZmV
    If I had to purchase these items again, I would probably change the following:
    - 🎙️ Rode NT: amzn.to/3NUIGtw
    - 💻 Framework Laptop (I do not get a commission here, but I love the vision of Framework. It will definitely be my next Ultrabook): frame.work
    As an Amazon Associate I earn from qualifying purchases.
    -------
    Timestamps:
    0:00 Opening
    0:27 The Directed Graphical Model
    02:19 The Task of Fitting Parameters
    05:05 Deriving the Likelihood
    07:51 The Log-Likelihood
    09:32 The Optimization Problem & Automatic Differentiation
    11:40 TFP - Defining the Model
    13.48 TFP - Generating (artificial) data
    16:05 TFP - Defining the (transformed) variables
    19:52 TFP - Solving the optimization problem
    22:22 TFP - Investigating the optimal variables & Discussion

КОМЕНТАРІ • 9

  • @quonxinquonyi8570
    @quonxinquonyi8570 2 роки тому +6

    Best ever playlist on youtube for probabilistic graphical model....I am a tightwad,don’t really comment to appreciate, but I can’t hold on coz the quality of theses lectures are supreme with incredible brevity....gem of stuff👍👍

  • @raajchatterjee3901
    @raajchatterjee3901 27 днів тому +1

    It would be cool here to show analytically that the maximum likelihood estimated parameter for a Bernoulli is just the sample mean of the data :) I guess autodiff is just performing this procedure numerically?

    • @MachineLearningSimulation
      @MachineLearningSimulation  26 днів тому

      Definitely, autodiff is a bit of an overkill here :D
      Was more of a showcase how to do this with TFP.
      In case you are interested in the MLE for Bernoulli: ua-cam.com/video/nTizrDsR1x8/v-deo.html

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 3 роки тому +2

    This video is truly interesting.

  • @MachineLearningSimulation
    @MachineLearningSimulation  3 роки тому

    TensorFlow Probability part starts at 11:40
    I wrongly call theta as sigma in from minute 9 to 11:30

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 3 роки тому +1

    Would it be possible to have a continuous random variable associated with the conditional probability? For example, depending on good or bad weather, how long the commuting would take to get to work?

    • @MachineLearningSimulation
      @MachineLearningSimulation  3 роки тому +1

      Yes, absolutely. Directed Graphical Models are extremely flexible when it comes to how you want to model your joint distribution.
      Think for instance of the following: Encode Bad weather with 0 and good weather with 1. Then define mu as 30 - 5 * W where W is the weather, then mu would be the center of a Gaussian, maybe with unknown parameterized standard deviation sigma. A Maximum Likelihood Estimate would then fit the theta of the Bernoulli distribution for the weather and the sigma of the Gaussian. I hope that made sense. Let me know if it was confusing :D (The output of the Gaussian is then the time in minute your commute takes
      More importantly for our scenario here is: All functions you apply (e.g. the affine map 30 - 5*W) have to be differentiable in order to accurately use Automatic Differentiation. On top of that, you might run into local optima in your optimization problem as your problem becomes more and more non-convex (more details on this in a future video).