Stanford CS236: Deep Generative Models I 2023 I Lecture 17 - Discrete Latent Variable Models

Поділитися
Вставка
  • Опубліковано 5 тра 2024
  • For more information about Stanford's Artificial Intelligence programs visit: stanford.io/ai
    To follow along with the course, visit the course website:
    deepgenerativemodels.github.io/
    Stefano Ermon
    Associate Professor of Computer Science, Stanford University
    cs.stanford.edu/~ermon/
    Learn more about the online course and how to enroll: online.stanford.edu/courses/c...
    To view all online courses and programs offered by Stanford, visit: online.stanford.edu/

КОМЕНТАРІ • 4

  • @CPTSMONSTER
    @CPTSMONSTER 12 днів тому

    0:40 Close connection between score based models and DDPMs (denoising diffusion probabilistic models)
    1:00 Score based model goes from noise to data by running Langevin dynamics chains. VAE perspective (DDPM), fixed encoder (SDE) adds noise (Gaussian transition kernel) at each time step, decoder is a joint distribution (parameterized in reverse direction) over the same RVs, sequence of decoders are also Gaussian and parameterized by neural networks (simple DDPM formula), train in usual way (as in VAE) by optimizing evidence lower bound (minimize KL divergence), ELBO is equivalent to a sum of denoising score matching objectives (learning the optimal decoder requires estimating the score of the noise perturbed data density), learning ELBO corresponds to learning a sequence of denoisers (noise conditional score based models)
    4:55 Means of decoders at optimality correspond to the score functions, the updates performed in DDPM are very similar to annealed Langevin dynamics
    5:45 Diffusion version, infinite noise levels
    9:15 SDE describes how the RVs in the continuous diffusion model (or fine discreteness of VAE) are related, enables sampling
    14:25 Reversing SDE is change of variables
    19:30 Interpolation between two data sets requires gradients wrt t's
    19:45? Fokker Planck equation, gradient wrt is completely determined by these objects
    20:25 Discretize SDE is equivalent to Langevin dynamics or sampling procedure of DDPM (follow gradient and add noise at every step)
    21:40 Get generative model by learning the score functions (of reverse SDE), score functions parameterized by neural networks (theta)
    21:55? Same as DDPM 1000 steps
    23:15? Equivalence of Langevin, DDPM and diffusion generative modelling
    24:40? DDPM, SDE numerical predictor is a Taylor expansion
    25:40? Score based MCMC corrector uses Langevin dynamics to generate a sample at the corresponding density
    27:15? Score based model uses corrector without predictor, DDPM uses predictor without corrector
    27:50 Decoder is trying to invert encoder, defined as Gaussian (only limit of continuous time after infinite steps yields a tight ELBO assuming Gaussian decoders)
    29:05? Predictor takes one step, corrector uses Langevin to generate a sample
    34:50 Neural ODE
    35:55 Reparametrizing randomness into the initial condition and then transforming it deterministically (equivalent computation graph), variational inference backprop through encoder is stochastic computation
    38:55 ODE formula (integral) to compute probability density, conversion to ODE accesses solving techniques to generate samples fast
    40:10 DDPM as VAE with fixed encoder and same dimension, latent diffusion which first learns a VAE to map data to a lower dimensional space and then learns a diffusion model over that latent space
    44:50? Compounding errors in denoiser but not SDE
    46:30 Maximum likelihood would differentiate through ODE solver, very difficult and expensive
    49:35 Scores and marginals are equivalent (SDE and ODE models) and always learned by score matching, inference time samples generated differently
    58:40 Stable diffusion pretrained autoencoder, not trained end to end, only care about reconstruction (disregarding distribution of latent space similar to Gaussian) and getting a good autoencoder, keep initial autoencoder fixed and train diffusion model over latent space
    1:08:35 Score of prior (unconditional score), likelihood (forward model/classifier) and normalization constant
    1:09:55 Solve SDE or ODE and follow the gradient of the prior plus the likelihood (controlled sampling), Langevin increases the likelihood of the image wrt prior and makes sure the classifier predicts that image (changing the drift to push the samples towards specific classifications)
    1:12:35 Classifier free guidance, train two diffusion models on conditional and unconditional scores and take the difference

  • @zhuolin730
    @zhuolin730 12 днів тому

    Do we have lec 17 Discrete Latent Variable Models recording?

  • @XEQUTE
    @XEQUTE Місяць тому

    lol 2nd
    Hope this helps in my kaggle competition

  • @ayxxnshxrif
    @ayxxnshxrif Місяць тому

    1st