Stanford CS236: Deep Generative Models I 2023 I Lecture 11 - Energy Based Models

Поділитися
Вставка
  • Опубліковано 16 жов 2024
  • For more information about Stanford's Artificial Intelligence programs visit: stanford.io/ai
    To follow along with the course, visit the course website:
    deepgenerative...
    Stefano Ermon
    Associate Professor of Computer Science, Stanford University
    cs.stanford.ed...
    Learn more about the online course and how to enroll: online.stanfor...
    To view all online courses and programs offered by Stanford, visit: online.stanfor...

КОМЕНТАРІ • 5

  • @CPTSLEARNER
    @CPTSLEARNER 4 місяці тому +1

    28:45 Autoregressive models, latent variable models thought of as clever ways of combining simple normalized objects and building more complicated normalized objects by design
    43:00 Sampling is hard, even though the likelihood is known, because of the normalizing constant (numeric not analytical methods)
    49:20 Applications without the partition function
    51:50? Uncorrupting image, maximize p(y|x) or equivalently maximize p(x,y), normalization constant doesn't matter
    1:00:00? RBM
    1:17:40 Contrastive divergence algorithm is a Monte Carlo approximation (single sample) of the expectation, unbiased estimator of the true gradient
    1:23:20 MCMC for sample generation without partition function, Metropolis Hastings, downhill move with probability of taking that move

  • @ashishkannad3021
    @ashishkannad3021 Місяць тому

    theta' depends on x the integral x should be applied to the entire product, including the function Py ! Then the intergral calculation would not be that simple. 25:46

    • @YRTB2830
      @YRTB2830 22 дні тому

      from my understanding if Ptheta(x)(y) is normalized by design and has an analytical solution, so "we know" that the integral goes to 1

    • @ashishkannad3021
      @ashishkannad3021 21 день тому

      @@YRTB2830whats an analytical solution ? Im new to it

    • @YRTB2830
      @YRTB2830 20 днів тому

      ​@@ashishkannad3021 Well two things.
      theta doesn't depend on x, theta is the model parameters, however thetaPRIME does depend on x but we don't realy care, let me explain:
      the integral is simple because as said in the slide Ptheta(x) and Ptheta'(y) are normalized objects, so we already know that they both integrate to one.
      so when you are trying to find the double integral of the product P(x) P(y), if you can find a way to somehow separate them you already know that they each integrate to one.
      and from the slide, if you are trying to find the integral over Y of Ptheta(x)*Pthetaprime(Y) you will notice that Ptheta(X) doesn't depend on Y so it is basicaly a constant at this point and you can simplify it to
      Ptheta(X)*(integral over Y of Pthetaprime(Y))
      now the first assumption we made is that the integral over Y of Pthetaprime(Y) = 1 so it will always integrate to one x does not matter at this point.
      and all you will be left with is the integral over X of Ptheta(X)*1 dx