7:00 Sliced score matching slower than denoising score matching, taking derivatives 13:45 Denoising data minimizes sigma, but minimum sigma is not optimal for perturbing data when sampling 27:15 Annealed Langevin, 1000 sigmas 38:50 Fokker Planck PDE, interdependence of scores, intractable so treat loss functions (scores) as independent 45:00? Weighted combination of denoising score matching losses, estimation of score for each perturbed data by sigma_i, weighted combination of the estimated scores 48:15 As efficient as estimating a single non-conditional score network, joint estimation of scores is amortized by a single score network 49:50? Smallest to largest noise during training, largest to smallest noise during inference (Langevin) 52:10? Notation, p sigma_i is equivalent to previous q (estimation of perturbed data) 57:20 Mixture denoising score matching is expensive at inference time (Langevin steps), deep computation graph which doesn't have to be unrolled at training time (not generating samples during training) 1:07:00 SDE describes perturbation iterations over time 1:08:50 Inference time (largest to smallest noise) described by reverse SDE which only depends on the score functions of the noise perturbed data densities 1:12:00 Euler-Maruyama discretizes time to solve numerically solve SDE 1:13:25 Numerically integrating SDE that goes from noise to data 1:15:00? SDE and Langevin corrector 1:20:25 Infinitely deep computation graph (refer to 57:20) 1:21:45 Possible to convert SDE model to normalizing flow and get latent variables 1:22:00 SDE can be described as ODE with same marginals 1:23:15 Machinery defines a continuous time normalizing flow where the invertible mapping is given by solving an ODE, paths of solved ODE with different initial conditions can never cross (invertible, normalizing flow), normalizing flow model trained not by maximum likelihood but by score matching, flow with infinite depth (likelihoods can be obtained)
Thanks for the great lectures! Tiny detail: shouldn't this one be called Noise Conditional Score Networks, instead of Energy Based Models? Simply because it's the main topic in the lecture, and would allow for people to find it more easily when searching?
7:00 Sliced score matching slower than denoising score matching, taking derivatives
13:45 Denoising data minimizes sigma, but minimum sigma is not optimal for perturbing data when sampling
27:15 Annealed Langevin, 1000 sigmas
38:50 Fokker Planck PDE, interdependence of scores, intractable so treat loss functions (scores) as independent
45:00? Weighted combination of denoising score matching losses, estimation of score for each perturbed data by sigma_i, weighted combination of the estimated scores
48:15 As efficient as estimating a single non-conditional score network, joint estimation of scores is amortized by a single score network
49:50? Smallest to largest noise during training, largest to smallest noise during inference (Langevin)
52:10? Notation, p sigma_i is equivalent to previous q (estimation of perturbed data)
57:20 Mixture denoising score matching is expensive at inference time (Langevin steps), deep computation graph which doesn't have to be unrolled at training time (not generating samples during training)
1:07:00 SDE describes perturbation iterations over time
1:08:50 Inference time (largest to smallest noise) described by reverse SDE which only depends on the score functions of the noise perturbed data densities
1:12:00 Euler-Maruyama discretizes time to solve numerically solve SDE
1:13:25 Numerically integrating SDE that goes from noise to data
1:15:00? SDE and Langevin corrector
1:20:25 Infinitely deep computation graph (refer to 57:20)
1:21:45 Possible to convert SDE model to normalizing flow and get latent variables
1:22:00 SDE can be described as ODE with same marginals
1:23:15 Machinery defines a continuous time normalizing flow where the invertible mapping is given by solving an ODE, paths of solved ODE with different initial conditions can never cross (invertible, normalizing flow), normalizing flow model trained not by maximum likelihood but by score matching, flow with infinite depth (likelihoods can be obtained)
Can anyone suggest some books or courses for understanding SDE? I’m kinda new to this
Thanks for the great lectures! Tiny detail: shouldn't this one be called Noise Conditional Score Networks, instead of Energy Based Models? Simply because it's the main topic in the lecture, and would allow for people to find it more easily when searching?