Stanford CS236: Deep Generative Models I 2023 I Lecture 12 - Energy Based Models
Вставка
- Опубліковано 5 тра 2024
- For more information about Stanford's Artificial Intelligence programs visit: stanford.io/ai
To follow along with the course, visit the course website:
deepgenerativemodels.github.io/
Stefano Ermon
Associate Professor of Computer Science, Stanford University
cs.stanford.edu/~ermon/
Learn more about the online course and how to enroll: online.stanford.edu/courses/c...
To view all online courses and programs offered by Stanford, visit: online.stanford.edu/
5:40 Contrastive divergence, changes in gradient of log partition function wrt theta easy to evaluate if samples from the model can be accessed
8:25 Training energy based models by maximum likelihood is feasible to the extent that samples can be generated, MCMC
14:00? MCMC methods, detailed balance condition
22:00? log x=x' term
23:25? Computing log-likelihood is easy for EBMs
24:15 Very expensive to train EBMs, every training data point requires a sample to be generated from the model, generating sample involves Langevin MCMC with 1000 steps
37:30 Derivative of KL divergence is Fisher divergence, two densities convolved with Gaussian noise, derivative wrt size of noise is Fisher divergence
38:40 Score matching, theta is continuous
47:10 Score matching derivation, independent of p_data
51:15? Equivalent to Fisher divergence
52:35 Interpretation of loss function, first term makes data points stationary (local minima or maxima) to minimize log-likelihood, small perturbations in the data points should not increase the log-likelihood by a lot, second term makes data points local maxima not minima
55:30? Backprop n times to calculate Hessian
56:20 Proved equivalence to Fisher divergence, infinite data would yield the exact data distribution
57:45 Fitting EBM, similar flavor to GANs. Instead of contrasting data to samples from the model, contrast to noise
1:00:10 Instead of setting the discriminator to some neural network, define it with the same form as the optimal discriminator. Not feeding x arbitrarily into neural network, evaluate the likelihoods under the model p_theta and noise distributions. The optimal p_theta must match p_data, due to the pre-defined form of the discriminator. Parameterize p_theta with EBM. (In a GAN setting, the discriminator itself would be parameterized by a neural network.)
1:03:00? Classifiers in noise correction
1:11:30 Loss function is independent of sampling, getting EBM and sampling still requires MCMC Langevin steps
1:19:00 GAN vs NCE, generator trained in GAN, noise distribution fixed in NCE but need to evaluate likelihood of noise
1:22:20 Noise contrastive estimation, where the noise distribution a flow that is learned adversarially