Stanford CS236: Deep Generative Models I 2023 I Lecture 12 - Energy Based Models

Поділитися
Вставка
  • Опубліковано 5 тра 2024
  • For more information about Stanford's Artificial Intelligence programs visit: stanford.io/ai
    To follow along with the course, visit the course website:
    deepgenerativemodels.github.io/
    Stefano Ermon
    Associate Professor of Computer Science, Stanford University
    cs.stanford.edu/~ermon/
    Learn more about the online course and how to enroll: online.stanford.edu/courses/c...
    To view all online courses and programs offered by Stanford, visit: online.stanford.edu/

КОМЕНТАРІ • 1

  • @CPTSMONSTER
    @CPTSMONSTER День тому +1

    5:40 Contrastive divergence, changes in gradient of log partition function wrt theta easy to evaluate if samples from the model can be accessed
    8:25 Training energy based models by maximum likelihood is feasible to the extent that samples can be generated, MCMC
    14:00? MCMC methods, detailed balance condition
    22:00? log x=x' term
    23:25? Computing log-likelihood is easy for EBMs
    24:15 Very expensive to train EBMs, every training data point requires a sample to be generated from the model, generating sample involves Langevin MCMC with 1000 steps
    37:30 Derivative of KL divergence is Fisher divergence, two densities convolved with Gaussian noise, derivative wrt size of noise is Fisher divergence
    38:40 Score matching, theta is continuous
    47:10 Score matching derivation, independent of p_data
    51:15? Equivalent to Fisher divergence
    52:35 Interpretation of loss function, first term makes data points stationary (local minima or maxima) to minimize log-likelihood, small perturbations in the data points should not increase the log-likelihood by a lot, second term makes data points local maxima not minima
    55:30? Backprop n times to calculate Hessian
    56:20 Proved equivalence to Fisher divergence, infinite data would yield the exact data distribution
    57:45 Fitting EBM, similar flavor to GANs. Instead of contrasting data to samples from the model, contrast to noise
    1:00:10 Instead of setting the discriminator to some neural network, define it with the same form as the optimal discriminator. Not feeding x arbitrarily into neural network, evaluate the likelihoods under the model p_theta and noise distributions. The optimal p_theta must match p_data, due to the pre-defined form of the discriminator. Parameterize p_theta with EBM. (In a GAN setting, the discriminator itself would be parameterized by a neural network.)
    1:03:00? Classifiers in noise correction
    1:11:30 Loss function is independent of sampling, getting EBM and sampling still requires MCMC Langevin steps
    1:19:00 GAN vs NCE, generator trained in GAN, noise distribution fixed in NCE but need to evaluate likelihood of noise
    1:22:20 Noise contrastive estimation, where the noise distribution a flow that is learned adversarially