Stanford CS236: Deep Generative Models I 2023 I Lecture 17 - Discrete Latent Variable Models
Вставка
- Опубліковано 5 тра 2024
- For more information about Stanford's Artificial Intelligence programs visit: stanford.io/ai
To follow along with the course, visit the course website:
deepgenerativemodels.github.io/
Stefano Ermon
Associate Professor of Computer Science, Stanford University
cs.stanford.edu/~ermon/
Learn more about the online course and how to enroll: online.stanford.edu/courses/c...
To view all online courses and programs offered by Stanford, visit: online.stanford.edu/
0:40 Close connection between score based models and DDPMs (denoising diffusion probabilistic models)
1:00 Score based model goes from noise to data by running Langevin dynamics chains. VAE perspective (DDPM), fixed encoder (SDE) adds noise (Gaussian transition kernel) at each time step, decoder is a joint distribution (parameterized in reverse direction) over the same RVs, sequence of decoders are also Gaussian and parameterized by neural networks (simple DDPM formula), train in usual way (as in VAE) by optimizing evidence lower bound (minimize KL divergence), ELBO is equivalent to a sum of denoising score matching objectives (learning the optimal decoder requires estimating the score of the noise perturbed data density), learning ELBO corresponds to learning a sequence of denoisers (noise conditional score based models)
4:55 Means of decoders at optimality correspond to the score functions, the updates performed in DDPM are very similar to annealed Langevin dynamics
5:45 Diffusion version, infinite noise levels
9:15 SDE describes how the RVs in the continuous diffusion model (or fine discreteness of VAE) are related, enables sampling
14:25 Reversing SDE is change of variables
19:30 Interpolation between two data sets requires gradients wrt t's
19:45? Fokker Planck equation, gradient wrt is completely determined by these objects
20:25 Discretize SDE is equivalent to Langevin dynamics or sampling procedure of DDPM (follow gradient and add noise at every step)
21:40 Get generative model by learning the score functions (of reverse SDE), score functions parameterized by neural networks (theta)
21:55? Same as DDPM 1000 steps
23:15? Equivalence of Langevin, DDPM and diffusion generative modelling
24:40? DDPM, SDE numerical predictor is a Taylor expansion
25:40? Score based MCMC corrector uses Langevin dynamics to generate a sample at the corresponding density
27:15? Score based model uses corrector without predictor, DDPM uses predictor without corrector
27:50 Decoder is trying to invert encoder, defined as Gaussian (only limit of continuous time after infinite steps yields a tight ELBO assuming Gaussian decoders)
29:05? Predictor takes one step, corrector uses Langevin to generate a sample
34:50 Neural ODE
35:55 Reparametrizing randomness into the initial condition and then transforming it deterministically (equivalent computation graph), variational inference backprop through encoder is stochastic computation
38:55 ODE formula (integral) to compute probability density, conversion to ODE accesses solving techniques to generate samples fast
40:10 DDPM as VAE with fixed encoder and same dimension, latent diffusion which first learns a VAE to map data to a lower dimensional space and then learns a diffusion model over that latent space
44:50? Compounding errors in denoiser but not SDE
46:30 Maximum likelihood would differentiate through ODE solver, very difficult and expensive
49:35 Scores and marginals are equivalent (SDE and ODE models) and always learned by score matching, inference time samples generated differently
58:40 Stable diffusion pretrained autoencoder, not trained end to end, only care about reconstruction (disregarding distribution of latent space similar to Gaussian) and getting a good autoencoder, keep initial autoencoder fixed and train diffusion model over latent space
1:08:35 Score of prior (unconditional score), likelihood (forward model/classifier) and normalization constant
1:09:55 Solve SDE or ODE and follow the gradient of the prior plus the likelihood (controlled sampling), Langevin increases the likelihood of the image wrt prior and makes sure the classifier predicts that image (changing the drift to push the samples towards specific classifications)
1:12:35 Classifier free guidance, train two diffusion models on conditional and unconditional scores and take the difference
Do we have lec 17 Discrete Latent Variable Models recording?
lol 2nd
Hope this helps in my kaggle competition
1st