How I Understand Diffusion Models

Поділитися
Вставка
  • Опубліковано 20 вер 2024
  • Diffusion models are powerful generative models that enable many successful applications like image, video, and 3D generation from texts.
    In this tutorial, I share my understanding of the diffusion model basics, including training, guidance, resolution, and speed.
    Below are some other great resources to learn more about diffusion models.
    ===== Slides =====
    Here are the slides used in this video
    Training: bit.ly/3WudEPH
    Guidance: bit.ly/3wedCky
    Resolution: bit.ly/4bqxHmo
    Speed: bit.ly/4bpJzoJ
    ===== Tutorials =====
    [CVPR 2022 Tutorial] Denoising Diffusion-based Generative Modeling: Foundations and Applications
    cvpr2022-tutor...
    [CVPR 2023 Tutorial] Denoising Diffusion Models: A Generative Learning Big Bang
    cvpr2023-tutor...
    [A short course by DeepLearning.AI] How Diffusion Models Work
    • How Diffusion Models W...
    ===== Training =====
    [Sohl-Dickstein et al. 2015] Deep Unsupervised Learning using Nonequilibrium Thermodynamics
    arxiv.org/abs/...
    [Ho et al. 2020]: Denoising Diffusion Probabilistic Models
    arxiv.org/abs/...
    [Luo 2022] Understanding Diffusion Models: A Unified Perspective arxiv.org/abs/...
    [Karras et al. 2022] Elucidating the design space of diffusion-based generative models
    arxiv.org/abs/...
    [Karras et al. 2023] Analyzing and Improving the Training Dynamics of Diffusion Models
    arxiv.org/abs/...
    ===== Guidance =====
    [Dhariwal and Nichol 2021] Diffusion Models Beat GANs on Image Synthesis
    arxiv.org/abs/...
    [Ho and Salimans 2022] Classifier-Free Diffusion Guidance
    arxiv.org/abs/...
    [Sander Dieleman 2022] Guidance: a cheat code for diffusion models
    sander.ai/2022...
    [Sander Dieleman 2023] The geometry of diffusion guidance
    sander.ai/2023...
    ===== Resolution =====
    [Ho et al. 2021] Cascaded Diffusion Models for High Fidelity Image Generation
    arxiv.org/abs/...
    [Saharia et al. 2022] Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
    arxiv.org/abs/...
    [Rombach et al. 2021] High-Resolution Image Synthesis with Latent Diffusion Models
    arxiv.org/abs/...
    [Vahdat et al. 2021] Score-based Generative Modeling in Latent Space
    proceedings.ne...
    [Podell et al. 2023] SDXL: Improving Latent Diffusion Models for High-resolution Image Synthesis
    arxiv.org/abs/...
    [Hoogeboom et al. 2023] Simple diffusion: End-to-end diffusion for high resolution images
    arxiv.org/abs/...
    [Chen et al. 2023] On the importance of noise scheduling for diffusion models
    arxiv.org/abs/...
    [Gu et al. 2023] Matryoshka Diffusion Models
    arxiv.org/abs/...
    ===== Speed =====
    [Song et al. 2021] Denoising Diffusion Implicit Models
    arxiv.org/abs/...
    [Salimans and Ho 2022] Progressive Distillation for Fast Sampling of Diffusion Models
    arxiv.org/abs/...
    [Meng et al. 2023] On Distillation of Guided Diffusion Models
    arxiv.org/abs/...
    [Song et al. 2023] Consistency models
    arxiv.org/abs/...
    [Luo et al. 2023] Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
    arxiv.org/abs/...
    [Luo et al. 2023] LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
    arxiv.org/abs/...
    [Sauer et al. 2023] Adversarial Diffusion Distillation
    arxiv.org/abs/...
    [Yin et al. 2023] One-step Diffusion with Distribution Matching Distillation
    arxiv.org/abs/...

КОМЕНТАРІ • 76

  • @ayushsaraf8421
    @ayushsaraf8421 8 місяців тому +13

    incredible explanation with so much detail packed in so little time. Looking forward to more of these

    • @jbhuang0604
      @jbhuang0604  8 місяців тому

      Thanks, Ayush! Glad that you like it!

  • @JoseColmenarezMoreno
    @JoseColmenarezMoreno 6 місяців тому +7

    BRAVO! No one ever have explained the diffusion model in such an easy way with all the details.

    • @jbhuang0604
      @jbhuang0604  6 місяців тому

      Thank you so much for your kind words! This makes my day!

  • @rtluo1546
    @rtluo1546 5 місяців тому +7

    This is truly a great tutorial video, so well-made. Cannot believe covering so many things within only 17 minutes.

    • @jbhuang0604
      @jbhuang0604  5 місяців тому

      Thanks a lot! Happy that you enjoyed the video!

  • @alexpeng6705
    @alexpeng6705 8 місяців тому +6

    Thanks for your efforts in making such a high-quality video!
    I like the way you break down such complex ideas in a concise manner and visualize them intuitively and elegantly. I wish I could have this video six months ago, lol.

    • @jbhuang0604
      @jbhuang0604  8 місяців тому

      Thanks for your kind words! It's a fun video to make, and I also learn a lot about diffusion models through the process.

  • @wangy01
    @wangy01 6 місяців тому +3

    Thank you for your great work removing the need of the audience to know much prior knowledge before they could enjoy your video. For example, you mentioned maximum likelihood and explain what it is immediately. It is such a challenge to straighten all these in a 17-minute video, but you did a great work. Thank you!

    • @jbhuang0604
      @jbhuang0604  5 місяців тому

      Glad that you liked it! Appreciate your kind words! This made my day!

  • @JionghaoWang-fs1uq
    @JionghaoWang-fs1uq 8 місяців тому +5

    You are a true educator! Great video!

    • @jbhuang0604
      @jbhuang0604  8 місяців тому

      Thank you so much! Glad that you like the video.

  • @welann
    @welann 3 місяці тому +1

    Thank you for making such a high quality video! It's very helpful for me to understand the diffusion model!

    • @jbhuang0604
      @jbhuang0604  3 місяці тому +1

      You're very welcome! Happy that it was helpful!

  • @4thlord51
    @4thlord51 4 місяці тому +1

    I'm building my own diffusion model myself. This is the best breakdown and visualization of the mathematics and implementation. Well done.

    • @jbhuang0604
      @jbhuang0604  4 місяці тому +1

      Thank you! This comment just made my day!

  • @curiousobserver2006
    @curiousobserver2006 5 місяців тому +1

    seriously one of the best educational videos I've ever watched.

  • @Funnyshoes321
    @Funnyshoes321 8 місяців тому +1

    Thanks a lot for the videos! I've been self-studying diffusion models on the side for a few months now and this is the only video I've seen that gives an in-depth yet intuitive explanation of the math.

  • @kathyker3498
    @kathyker3498 27 днів тому +1

    shout out to NCTU alumni! great video with so many sound effect, good visualization and metaphor!
    Just wish there's more reference to the derivation of the math part, as it's still a bit hard to follow even though I suspended the video so many times haha

  • @faiz.wahab7
    @faiz.wahab7 8 місяців тому +1

    Very compressive and precise. Thanks. Also thanks for tweedie formula and simplifying score based model. That is the most convoluted part in most papers. Looking forward to demystified NERFs from you!

  • @morrisfan2004
    @morrisfan2004 21 день тому +1

    Great explanation

  • @nikitadrobyshev7953
    @nikitadrobyshev7953 6 місяців тому +1

    OK, this is the best video explanation of diffusion models I saw. Ideal ratio between simplifications and depth☺👏

    • @jbhuang0604
      @jbhuang0604  6 місяців тому

      Glad it was helpful! Thank you so much for your kind words!

    • @wangy01
      @wangy01 6 місяців тому +1

      I agree. The author must have carefully chosen the most efficient way cutting into the complex concept hierarchy and every single word to achieve that efficiency.

  • @ElLoza
    @ElLoza 8 місяців тому +1

    I would say Top quality video! Congratulations!🎉 More like this would by awesome!

  • @Charles-my2pb
    @Charles-my2pb 8 місяців тому +1

    Thank you so much for your contribution. It's a tutorial make me clear about Diffusion, as beginner.

    • @jbhuang0604
      @jbhuang0604  8 місяців тому

      You are welcome. Glad it was helpful!

  • @AIwithAndy
    @AIwithAndy 7 місяців тому +1

    I appreciated the explanation of conditional generations. Nice job!

    • @jbhuang0604
      @jbhuang0604  7 місяців тому

      Thanks so much! Glad that you like it.

  • @bingzha6099
    @bingzha6099 8 місяців тому +1

    Really enjoying watching this video and learned a lot. Hope more such videos in the future.

    • @jbhuang0604
      @jbhuang0604  8 місяців тому

      Will do! Stay tuned! 😊

  • @ye8495
    @ye8495 2 місяці тому +1

    great video explained! A lot of things behind for me to explore

  • @khalilsabri7978
    @khalilsabri7978 4 місяці тому

    Just one minute in the video, you know it's extremely well done. Thanks for the video !

    • @jbhuang0604
      @jbhuang0604  4 місяці тому

      Glad you liked it! Thanks so much for the comment!

  • @pinkpig7505
    @pinkpig7505 8 місяців тому +1

    What a timing 🙌 needed this explanation so bad... thanks ✌️

    • @jbhuang0604
      @jbhuang0604  8 місяців тому

      Glad it helps! Thanks a lot!

  • @youtube_showcase
    @youtube_showcase 4 місяці тому +1

    Amazing work! Thank you for sharing 😀

  • @420_gunna
    @420_gunna 8 місяців тому +2

    Awesome video, hope I'm smarter when I try to rewatch it in 3 months ;)

    • @jbhuang0604
      @jbhuang0604  8 місяців тому

      Glad you liked it! Let me know if you have questions.

  • @emreakbas9289
    @emreakbas9289 8 місяців тому +1

    Great explanation, Jia-Bin! Thanks!

  • @nutshell1811
    @nutshell1811 5 місяців тому +1

    Best video on diffusion!!

    • @jbhuang0604
      @jbhuang0604  5 місяців тому

      Great! Glad that it’s helpful!

  • @pedroenriquelopezdeteruela6545
    @pedroenriquelopezdeteruela6545 6 місяців тому +1

    Awesome post, Jiang, thank you so much for the great job!
    Anyway, a small comment/question on your video (without too much importance, I assume). At minute 5:56 you comment that (direct derivation of formula (7) in the paper "Denoising Diffusion Probabilistic Models"), mu^hat_t(x_t,x_0) is on the line joining x_0 and x_t. And, while this is approximately true for "normal" beta_t scheduling, I think that the estimated mean as a function of x_0 and x_t need not be exactly on such a line since, in general, the respective multipliers of x_0 and x_t in such an equation need not (in general) add up to one.
    In fact, in "normal" scheduling, as t increases, it seems that this sum keeps progressively moving away from 1, so that although obviously mu_t will continue to be a simple linear combination of both x_t and x_0, the fact is that it will progressively move away (although by a small amount) from this line.
    Would you agree with this observation?
    Greetings, and again, congratulations for the video and thank you very much for clarifying us the inners of diffusion models!

    • @jbhuang0604
      @jbhuang0604  6 місяців тому

      Thank you so much for your comment! You are right! It won’t be on the line when the multipliers are not adding up to one.

  • @HuangMichel
    @HuangMichel 2 місяці тому +1

    Great content!

    • @jbhuang0604
      @jbhuang0604  2 місяці тому

      Thanks a lot! Glad you like it!

  • @SurajBorate-bx6hv
    @SurajBorate-bx6hv 3 місяці тому

    Thankyou for great step by step explanation. Can you share any good resources and insights for implementing diffusion for own custom images?

    • @jbhuang0604
      @jbhuang0604  3 місяці тому

      Hi! No problem. I think huggingface's diffuser probably has the best resources. Check it out: huggingface.co/docs/diffusers/en/index

  • @yuktikaura
    @yuktikaura 8 місяців тому +1

    @Jia-Bin Huang we want to maximize likelihood and also minimize KL divergence so that we can "maximize" similarity between two distributions..it is stated other-way round at timestamp 1:19 to 1:121

    • @jbhuang0604
      @jbhuang0604  8 місяців тому

      Yes! You are right! Maximize likelihood -> Minimize KL divergence -> Maximize similarity between the two distributions.
      I got confused with too many negations. :-P

  • @orisenbazuru
    @orisenbazuru 4 місяці тому

    Great video! At 1:21 should be maximizing similarity between two distributions. Or minimizing the distance between two distributions.

    • @jbhuang0604
      @jbhuang0604  4 місяці тому

      Thanks for pointing this out! Yes, you are right! It should be *maximizing* the similarity between the two distributions.

  • @jasoncampbell1464
    @jasoncampbell1464 8 місяців тому +7

    Saw the cow, heard the moo. 5 stars.

  • @sokak01
    @sokak01 3 місяці тому

    I think there should be a
    abla log q(x_t) instead of p(x_t) at the score matching part.

  • @Raymond-zv5gr
    @Raymond-zv5gr 5 місяців тому +1

    BRO YOU ARE EPIC

  • @johnini
    @johnini 2 місяці тому

    I still need to get my head around the math! but like everyone else said, amazing video!!
    One question!
    How to you imagine a distribution of high resolution images?!
    Would it be like a point in high dimensional space? where the coordinates are the intensities of its pixels?! and from a high dimensional noise vector we move to the vector on the dataset distribution?
    Thanks looking forward future videos

    • @jbhuang0604
      @jbhuang0604  2 місяці тому +1

      Thanks for the question. I agree that it's kind of difficult to imagine the distribution of images as it's high-dimensional. For a grayscale 100x100 image, we are talking about a 10,000-dim space! And you are right, the "coordinate" of each dimension indicates the intensity of a particular pixel. Diffusion models learns to predict the vectors in this space so that iteratively we push some random noise to regions in this high-dimensional space so that they look like real images in the dataset.

  • @truonggiangnguyen8844
    @truonggiangnguyen8844 5 місяців тому

    I have a question: Are all distribution mentioned is distribution of a continuous variable, since we're using integral here?

    • @jbhuang0604
      @jbhuang0604  5 місяців тому

      Good question! I think there are some development of discrete variational autoencoder and diffusion models. Those methods can deal with discrete variables.

  • @yasserothman4023
    @yasserothman4023 2 місяці тому

    thanks for the work, if i want to get x from y=Hx+n if i have noisy x (which is y) by using diffusion work what should be done ? what literature you know that had tackled similar problems ?

    • @jbhuang0604
      @jbhuang0604  2 місяці тому

      Thanks for the question. Diffusion models have been applied to various image restoration tasks.
      The earliest work is probably this one: arxiv.org/pdf/2011.13456 (see section 5), where they can perform conditional (on noisy/masked image) restoration using an unconditioned model.
      You can also directly train a model for image restoration if you have paired examples. See a recent work here arxiv.org/abs/2303.11435

  • @mcarletti
    @mcarletti 4 місяці тому +1

    My like comes with the 5th Symphony (9:39) 😸🎶

    • @jbhuang0604
      @jbhuang0604  4 місяці тому +1

      Oh My! Finally one person noticed that! (Spent a lot of time making that lol)

  • @herrbonk3635
    @herrbonk3635 8 місяців тому

    Wish I could hear what you say:
    0:36 "this stickholder"?
    0:43 "hyber we do not know"
    1:13 "just the cadirabigdes"
    and so on

    • @jbhuang0604
      @jbhuang0604  8 місяців тому +2

      You can see the full script by turning on the subtitles/CC. Hope this helps.

    • @herrbonk3635
      @herrbonk3635 8 місяців тому +1

      @@jbhuang0604 I will try, thanks!