Stanford CS236: Deep Generative Models I 2023 I Lecture 13 - Score Based Models

Поділитися
Вставка
  • Опубліковано 24 лис 2024

КОМЕНТАРІ • 5

  • @客家饒舌執牛耳
    @客家饒舌執牛耳 4 місяці тому +5

    Extremely great explanation, highly recomend professor Ermon's courses!

  • @CPTSLEARNER
    @CPTSLEARNER 6 місяців тому +1

    2:00 Summary
    8:00 EBMs training, maximum likelihood training requires estimation of partition function, contrastive divergence requires samples to be generated (MCMC Langevin with 1000 steps), minimize Fischer divergence (score matching) instead of KL divergence
    19:15 EBMs parameterize a conservative vector field of gradients of an underlying scalar function, score based models generalize this to an arbitrary vector field. EBMS directly model the log-likelihood, score based models directly model the score (no functional parameters).
    29:15? Backprops in EBM, f_theta derivative is s_theta, Jacobian of s_theta
    34:00 Fischer divergence between model and perturbed data
    36:25 Noisy data q to remove trace of Jacobian in calculations
    39:30 Linearity of gradient
    42:15 Estimating the score of the noise perturbed data density is equivalent to estimating the score of the transition kernel (Gaussian noise density)
    44:10 Trace of Jacobian removed from estimation, loss function is a denoising objective
    46:05 Sigma of noise distribution as small as possible
    48:55? Stein unbiased risk estimator trick, evaluating quality of an estimator without knowing ground truth, denoising objective
    51:55 Denoising score matching, these two objectives are equivalent up to a constant, minimizing the bottom objective (denoising) is equivalent to minimizing the top objective (which estimates the score of the distribution convolved with Gaussian noise)
    52:20? Individual conditionals
    53:25 Reduced generative modelling to denoising
    55:35? Tweedie's formula, alternative derivation of denoising objective
    58:25 Interpretation of equations, conditional on x is correlated to the joint density of x and perturbed x, q sigma is the integral of the joint density, Tweedie's formula expresses x in terms of the perturbed x with an optimal adjustment (gradient of q sigma which is correlated to the density of x conditional on the perturbed x)
    1:01:35? Jacobian vector products, directional derivatives, efficient to estimate using backprop
    1:04:20 Sliced score matching single backprop (directional derivative), without slicing needs d backprops
    1:07:00 Sliced score matching not on perturbed data
    1:12:00 Langevin MCMC, sampling with score
    1:14:35 Real world data tends to lie on a low dimensional manifold
    1:21:00 Langevin mixing too slowly, mixture weight disappears when taking gradients

  • @thehigheststateofsalad
    @thehigheststateofsalad Місяць тому

    Dear Stanford, some of the students are pretty tough to understand, it would be great if you could train a LLM subtitle generator to extract whatever they were saying and adding it into the video as a subtitle. Thanks for the free content.

  • @해위잉
    @해위잉 6 місяців тому +2

    Best explanation!

  • @ashishkannad3021
    @ashishkannad3021 2 місяці тому

    6:28 the derivative curve is not correct. The score value from the right graph, should be zero in between (-5, 0) and (5, 0) . Because the derivative comes as (1/Px)*(dPx/dx). So score value should be zero where there is maxima and minima .