Maximum Likelihood Estimate by Automatic Differentiation | Directed Graphical Models
Вставка
- Опубліковано 20 тра 2024
- In this video, we will use the Maximum Likelihood Estimate to fit the parameters of a DGM with two random variables. Both are observable and are modelled as Bernoulli distributions. We take advantage of TensorFlow's Automatic Differentiation to solve the optimization problem using a gradient-based algorithm.
Here are the notes: raw.githubusercontent.com/Cey...
TensorFlow Probability is a great toolbox for modelling joint distributions by Directed Graphical Models. Calculating the log likelihood creates a computational graph which can be used by TensorFlow to acquire the gradients needed for an ADAM-Optimizer. It works magically and removes the barrier of finding the derivatives.
-------
📝 : Check out the GitHub Repository of the channel, where I upload all the handwritten notes and source-code files (contributions are very welcome): github.com/Ceyron/machine-lea...
📢 : Follow me on LinkedIn or Twitter for updates on the channel and other cool Machine Learning & Simulation stuff: / felix-koehler and / felix_m_koehler
💸 : If you want to support my work on the channel, you can become a Patreon here: / mlsim
-------
⚙️ My Gear:
(Below are affiliate links to Amazon. If you decide to purchase the product or something else on Amazon through this link, I earn a small commission.)
- 🎙️ Microphone: Blue Yeti: amzn.to/3NU7OAs
- ⌨️ Logitech TKL Mechanical Keyboard: amzn.to/3JhEtwp
- 🎨 Gaomon Drawing Tablet (similar to a WACOM Tablet, but cheaper, works flawlessly under Linux): amzn.to/37katmf
- 🔌 Laptop Charger: amzn.to/3ja0imP
- 💻 My Laptop (generally I like the Dell XPS series): amzn.to/38xrABL
- 📱 My Phone: Fairphone 4 (I love the sustainability and repairability aspect of it): amzn.to/3Jr4ZmV
If I had to purchase these items again, I would probably change the following:
- 🎙️ Rode NT: amzn.to/3NUIGtw
- 💻 Framework Laptop (I do not get a commission here, but I love the vision of Framework. It will definitely be my next Ultrabook): frame.work
As an Amazon Associate I earn from qualifying purchases.
-------
Timestamps:
0:00 Opening
0:27 The Directed Graphical Model
02:19 The Task of Fitting Parameters
05:05 Deriving the Likelihood
07:51 The Log-Likelihood
09:32 The Optimization Problem & Automatic Differentiation
11:40 TFP - Defining the Model
13.48 TFP - Generating (artificial) data
16:05 TFP - Defining the (transformed) variables
19:52 TFP - Solving the optimization problem
22:22 TFP - Investigating the optimal variables & Discussion
Best ever playlist on youtube for probabilistic graphical model....I am a tightwad,don’t really comment to appreciate, but I can’t hold on coz the quality of theses lectures are supreme with incredible brevity....gem of stuff👍👍
Thanks so much
It would be cool here to show analytically that the maximum likelihood estimated parameter for a Bernoulli is just the sample mean of the data :) I guess autodiff is just performing this procedure numerically?
Definitely, autodiff is a bit of an overkill here :D
Was more of a showcase how to do this with TFP.
In case you are interested in the MLE for Bernoulli: ua-cam.com/video/nTizrDsR1x8/v-deo.html
This video is truly interesting.
Thanks a lot :)
TensorFlow Probability part starts at 11:40
I wrongly call theta as sigma in from minute 9 to 11:30
Would it be possible to have a continuous random variable associated with the conditional probability? For example, depending on good or bad weather, how long the commuting would take to get to work?
Yes, absolutely. Directed Graphical Models are extremely flexible when it comes to how you want to model your joint distribution.
Think for instance of the following: Encode Bad weather with 0 and good weather with 1. Then define mu as 30 - 5 * W where W is the weather, then mu would be the center of a Gaussian, maybe with unknown parameterized standard deviation sigma. A Maximum Likelihood Estimate would then fit the theta of the Bernoulli distribution for the weather and the sigma of the Gaussian. I hope that made sense. Let me know if it was confusing :D (The output of the Gaussian is then the time in minute your commute takes
More importantly for our scenario here is: All functions you apply (e.g. the affine map 30 - 5*W) have to be differentiable in order to accurately use Automatic Differentiation. On top of that, you might run into local optima in your optimization problem as your problem becomes more and more non-convex (more details on this in a future video).