Stanford CS236: Deep Generative Models I 2023 I Lecture 6 - VAEs

Поділитися
Вставка
  • Опубліковано 5 тра 2024
  • For more information about Stanford's Artificial Intelligence programs visit: stanford.io/ai
    To follow along with the course, visit the course website:
    deepgenerativemodels.github.io/
    Stefano Ermon
    Associate Professor of Computer Science, Stanford University
    cs.stanford.edu/~ermon/
    Learn more about the online course and how to enroll: online.stanford.edu/courses/c...
    To view all online courses and programs offered by Stanford, visit: online.stanford.edu/

КОМЕНТАРІ • 1

  • @CPTSMONSTER
    @CPTSMONSTER 9 днів тому

    2:30 Summary. Latent variables are Gaussian with mean 0. The conditional probabilities are also Gaussian, with mean and sd parameters modelled by two neural networks, dependent on each z. The resultant probability p(x) is a complex infinite sum of gaussians.
    4:15 Different from autoregressive where it is was trivial to multiply conditionals to get likelihoods
    13:20 ELBO derived from KL, explains equality when z is posterior distribution
    15:45 Too expensive to compute posterior, possibly model with neural network
    17:45 Joint probability is generative model (simple Gaussian prior, map to mean and sd parameters modelled by two neural networks)
    28:00 Jointly optimize theta and phi to minimize the KL divergence
    31:15 Approximation of log-likelihood (see summary explanation in lecture 5), final equations
    32:10 For generation, theta parameter is sufficient, phi discarded
    39:00? EM theta constant and optimize phi? Not joint optimization
    44:40 Single theta and different variational parameters phi for each data sample
    48:00 VAE training steps for illustrative purposes, in practice train theta and phi in sync
    50:40 Amortized inference, single q, different variational parameters for each data point not scalable (but higher accuracy due to less constraints)
    58:50? Sampling from distribution which depends on phi, samples themselves would change when phi is changed
    1:03:10? Reparameterization trick, sample epsilon, gradient wrt phi does not depend on epsilon
    1:07:45 Reparameterization trick is possible when the sampling procedure can be written as a deterministic transformation of a basic rv that can be sampled from. For discrete (e.g. categorical ) rvs, it's possible to sample by inverting the cdf, but wouldn't know how to get gradients through. Use REINFORCE, or other ways that relax the optimization problem.
    1:10:00 Variational parameter for each data point, expensive. Amortization, encoder of VAE denoted by lambda parameters of a neural network. Performs regression which determines a posterior for each data point without revealing phi variational parameters. This has a benefit when there is a new data point, the optimization problem does not have to be solved for new variational parameters.
    1:17:35 Notation q(z; phi^i) to q_phi(z|x)
    1:21:40 Encoder is variational posterior. Encoder and decoder optimize ELBO, a regularized type of autoencoding objective.