Stanford CS236: Deep Generative Models I 2023 I Lecture 11 - Energy Based Models
Вставка
- Опубліковано 16 жов 2024
- For more information about Stanford's Artificial Intelligence programs visit: stanford.io/ai
To follow along with the course, visit the course website:
deepgenerative...
Stefano Ermon
Associate Professor of Computer Science, Stanford University
cs.stanford.ed...
Learn more about the online course and how to enroll: online.stanfor...
To view all online courses and programs offered by Stanford, visit: online.stanfor...
28:45 Autoregressive models, latent variable models thought of as clever ways of combining simple normalized objects and building more complicated normalized objects by design
43:00 Sampling is hard, even though the likelihood is known, because of the normalizing constant (numeric not analytical methods)
49:20 Applications without the partition function
51:50? Uncorrupting image, maximize p(y|x) or equivalently maximize p(x,y), normalization constant doesn't matter
1:00:00? RBM
1:17:40 Contrastive divergence algorithm is a Monte Carlo approximation (single sample) of the expectation, unbiased estimator of the true gradient
1:23:20 MCMC for sample generation without partition function, Metropolis Hastings, downhill move with probability of taking that move
theta' depends on x the integral x should be applied to the entire product, including the function Py ! Then the intergral calculation would not be that simple. 25:46
from my understanding if Ptheta(x)(y) is normalized by design and has an analytical solution, so "we know" that the integral goes to 1
@@YRTB2830whats an analytical solution ? Im new to it
@@ashishkannad3021 Well two things.
theta doesn't depend on x, theta is the model parameters, however thetaPRIME does depend on x but we don't realy care, let me explain:
the integral is simple because as said in the slide Ptheta(x) and Ptheta'(y) are normalized objects, so we already know that they both integrate to one.
so when you are trying to find the double integral of the product P(x) P(y), if you can find a way to somehow separate them you already know that they each integrate to one.
and from the slide, if you are trying to find the integral over Y of Ptheta(x)*Pthetaprime(Y) you will notice that Ptheta(X) doesn't depend on Y so it is basicaly a constant at this point and you can simplify it to
Ptheta(X)*(integral over Y of Pthetaprime(Y))
now the first assumption we made is that the integral over Y of Pthetaprime(Y) = 1 so it will always integrate to one x does not matter at this point.
and all you will be left with is the integral over X of Ptheta(X)*1 dx