Ultimate Guide to Diffusion Models | ML Coding Series | Denoising Diffusion Probabilistic Models

Поділитися
Вставка
  • Опубліковано 16 гру 2024

КОМЕНТАРІ • 74

  • @TheAIEpiphany
    @TheAIEpiphany  2 роки тому +39

    Time to cover diffusion models in greater depth! Do let me know how do you like this combination of papers + coding!

    • @prabhavkaula9697
      @prabhavkaula9697 2 роки тому +3

      Thank you so much for uploading the tutorial. Good resources on diffusion models is such a rarity.

    • @prabhavkaula9697
      @prabhavkaula9697 2 роки тому

      13:49 I too am okay with the mathematics and the proofs, but I wanted to know why it works?

    • @prabhavkaula9697
      @prabhavkaula9697 2 роки тому

      It would be great if you could share the code!

    • @nickfratto2439
      @nickfratto2439 2 роки тому

      Might be better to separate the code & papers into their own videos

    • @prabhavkaula9697
      @prabhavkaula9697 2 роки тому

      Thank you for the video, I have had some doubts:
      I wanted to know if one runs the training script, how does the model save the checkpoints?
      I also wanted to know while sampling where does the model save the samples?

  • @JorgeGarcia-eg5ps
    @JorgeGarcia-eg5ps 2 роки тому +13

    I have been learning about diffusion models for a week, so the timing on this video was perfect. Thank you!

  • @tinysquareradius8186
    @tinysquareradius8186 Рік тому +3

    Hi Aleska, the zero_module here is meaned to initial the zero weights for the last layers, avoiding the situation that last layers learn everything. But as the learning going on, the last layer will learn some thing. You can check the paper, . ovo

  • @akrielz
    @akrielz 2 роки тому +4

    Hi Aleksa
    The formulas side by side comparisons are really useful.
    Thank you a lot for your dedication!
    P.S.: I might be wrong, but I believe the bug that you mentioned at the end of the video with images that have 4k steps being over-saturated is caused by the next factor:
    The whole reason why Diffusion Models work, is that we assume the last step of the noising process will be a noise with mean=0 and variance=1.
    While it is true that if we take an image, and we gradually apply gaussian noise n steps where n tends to infinity, we will reach that state with mean=0 and var=1, it is important to notice that we can define an n_epsilon where if we take the limit, the image is already peaking the desired mean and variance. This n_epsilon in this case is about 2k. Every image generated in x steps where x > n_epsilon will roughly be same image as the one generated at n_epsilon.
    Thus, a diffusion model when starts to sample, the noise that is initially generated will be equivalent with the one at the n_epsilon. This means that the first n_epsilon steps from x will actually be able to generate the a good image, while all the steps that are past n_epsilon just destroy the image.
    This limit with n_epsilon being 2k might have to do with the precision of the operations too tho'.

  • @blakerains8465
    @blakerains8465 2 роки тому

    The side by side really does help give me a understanding on the formulas

  • @improvement_developer8995
    @improvement_developer8995 2 роки тому +1

    Thanks for showing the code and paper side by side. Really helpful!

  • @arshakrezvani3562
    @arshakrezvani3562 Рік тому +1

    Your walkthroughs are perfect, please keep up the good work ❤

  • @arunram6687
    @arunram6687 Рік тому

    Loved the code and paper side by side explanation ! Kudos to you ! Follow the code and paper explanations if you can in all videos !

  • @sg.stefan
    @sg.stefan 2 роки тому +1

    Thanks for this very useful video full of clear explanations about diffusion models and the bridge between paper formulas and code!

  • @Skinishh
    @Skinishh 2 роки тому +23

    Food for thought: I think it'd be cooler and more informative to build the simplest diffusion model from scratch, using Pytorch/Tensorflow/JAX and other packages of course

    • @TheAIEpiphany
      @TheAIEpiphany  2 роки тому +5

      100%!

    • @pjborowiecki2577
      @pjborowiecki2577 Рік тому +2

      Or even a series, where we start from a simplest possible diffusion-based model, and improve it over time in consecutive videos, implementing latest discoveries from most recent papers. This would be incredible

    • @rajkiran1982
      @rajkiran1982 Рік тому

      +1

  • @AZTECMAN
    @AZTECMAN 2 роки тому +1

    Finally got around to watching this. I quite enjoyed the video.

  • @sg.stefan
    @sg.stefan 2 роки тому +1

    Thank you very much for this video! Really, really great explanation (although no easy going) of the improved diffusion models and a perfect preparation for your stable diffusion video!

  • @xiangyuguo9856
    @xiangyuguo9856 2 роки тому +1

    I'm fairly familiar with the ddpm code but I still learned a lot, thanks for the nice video!

  • @ЕгорКолодин-й2з
    @ЕгорКолодин-й2з 2 роки тому +1

    Amazing! Keep up the good work. It is very interesting!

  • @GuanlinLi-l8j
    @GuanlinLi-l8j 2 роки тому +2

    Great video. Hope to see a video explaining the code of the "Diffusion model beat GANs" paper.

  • @kargarisaac
    @kargarisaac 2 роки тому +1

    amazing Aleksa :) we cannot wait for glide and dalle-2 :)

    • @TheAIEpiphany
      @TheAIEpiphany  2 роки тому

      Glide is already uploaded! 😀 Check it out!

  • @anarnurizada9586
    @anarnurizada9586 Рік тому

    Your videos are amazing. I especially like this simultaneous covering of both the paper and the code. Keep it up! However, maybe you can still make some short (lighter) videos for beginners.

  • @wenbogao2630
    @wenbogao2630 9 місяців тому

    amazing video, really helpful!

  • @davita6379
    @davita6379 2 роки тому +2

    i love this series

  • @almogdavid
    @almogdavid 2 роки тому

    Excellent video, thank you very much!

  • @Vikram-wx4hg
    @Vikram-wx4hg 2 роки тому +2

    Love your tutorials, Aleksa!
    Also wanted to know if you have covered DDIMs in any tutorial?

  • @omarabubakr6408
    @omarabubakr6408 Рік тому

    Hey I have a question about the research paper, Why are they using the integration in the beginning of the background section? Thanks in advance. 3:39

  • @lanjiang9870
    @lanjiang9870 Рік тому

    Excellent video, it is very helpful ❤

  • @anatolicvs
    @anatolicvs 2 роки тому

    It was quite nice video ! Well done sir !

  • @angelacast135
    @angelacast135 2 роки тому +1

    Thanks for this video, it's really helpful. Could you please cover the DDIM paper too? It's super helpful to have the code and equations side-by-side.

  • @DED_Search
    @DED_Search Рік тому +2

    Hi, could you kindly share the repo please? I cant find it on your github. Thanks.

  • @sh4ny1
    @sh4ny1 7 місяців тому

    Hi, i am always confused about the forward process equation defined in (2). we say the our images x come from an unknown distribution q(.). but in equation (2) we are saying that this distribution is normal ? we are sampling from a normal distribution to get next forward step. sorry I am not that good when it comes to probability theory.

  • @hesselbosma1998
    @hesselbosma1998 2 роки тому +1

    Hey nice vid! Do you have any idea why they zero the weights of some of the convolutional layers?

  • @alexijohansen
    @alexijohansen 2 роки тому

    Super valuable video! Many thanks. Can you post a link to your GitHub repo for windows?

  • @susmithasai204
    @susmithasai204 Рік тому

    Hi. Great Explanation. Also, can you do a video explaining score based generative models i.e score based sde paper and code?

  • @leonardoberti917
    @leonardoberti917 Рік тому

    The explanation was great. If you went back to making these type of videos would be super.

  • @VarunTulsian
    @VarunTulsian 2 роки тому

    great video Aleksa. i am new to torch, i read pytorch rand_like should sample frim a uniform distribution instead of a gaussian. How does that work since we need samples from standard gaussian?

  • @imranq9241
    @imranq9241 2 роки тому +1

    Thanks for the video, is there a good toy project that uses diffusion models that you would recommend?

    • @TheAIEpiphany
      @TheAIEpiphany  2 роки тому

      Hm, toy project - not that I am aware of. I mean if you treat the model as a black box everything is a toy project.
      GLIDE, DALL-E mini, etc. Although I think you can't run DALL-E mini on a single machine, I might be wrong. Stay tuned! ;)

  • @baharazari976
    @baharazari976 2 роки тому

    Perfect explanation, I really appreciate it if you can share the code that runs on a single gpu. I am having trouble running the code in distributed mode.

  • @jianxiongfeng
    @jianxiongfeng Рік тому

    yuor video is very wonderful

  • @ArjunKumar123111
    @ArjunKumar123111 2 роки тому +1

    Hey Aleksa, I have a question. When you come across a topic such as Text to Image generation or just Diffusion models, how do you find fundamental papers/articles/reading materials to gain in-depth knowledge on them? And how do you plan and follow through on your learning process.
    I'm big on self learning but often lack the planning to follow through. I'm inspired by your journey and seek to acquire some guidance. Thanks in advance!

    • @TheAIEpiphany
      @TheAIEpiphany  2 роки тому +3

      Hey Arjun! Check out my Medium blogs. I literally have my process captured there. :)) Maybe start with how I landed a job at DeepMind blog

  • @snsa_kscc
    @snsa_kscc 2 роки тому +1

    Gigachad!

  • @alessandrozuech61
    @alessandrozuech61 2 роки тому

    Very nice video! Just a question: how can I apply denoising to a noisy image? It seems to me that this paper can only generate a new image from the learned data distribution, right? Maybe I lost some steps....

    • @anonymousperson9757
      @anonymousperson9757 Рік тому

      Hey! I am working on the same problem. It would be great if @Aleksa could make a video on that.
      I think this paper "Image super-resolution via iterative refinement", a follow-up to the original DDPM has the solution although it focusses on super resolution. To my understanding, in the original DDPM, you are trying to minimize the MSE loss between the noise added in the forward process at time t and the noise predicted by the network. So, the noise predicted by the network is only a function of the noisy input at step t and t itself. In denoising/super resolution, I would assume that there should also be some way of feeding the noisy image to the network as input during training. So in this case, the network would take in the noisy(to be denoised) input, the noisy input from the forward diffusion process and the time step. But I am not entirely sure. Would you like to connect through Discord to discuss this in case you are still working on this?

  • @daniel-mika
    @daniel-mika Рік тому +1

    I am curious, is the problem seen at 1:15:05 addressed... Its quite a big error tbh, I am curious if they actually used this code with the error to train because then that means the theory behind how it works is shaky

    • @orip333
      @orip333 Рік тому

      There is no error in the code
      The parentheses are just before the 1 over \alpha-bar_t
      it's all good.

  • @alexijohansen
    @alexijohansen 2 роки тому

    Do you know how outpainting/inpainting works?

  • @nirmalbaishnab4910
    @nirmalbaishnab4910 2 роки тому

    Fantastic tutorial! It will be very helpful if you share the code. Thanks.

  • @emanalafandi9474
    @emanalafandi9474 2 роки тому

    Thank you for tNice tutorials video. I just downloaded soft soft and I was so, so lost. I couldn't even figure out how to make a soft. Your video

  • @rezabagherian3331
    @rezabagherian3331 2 роки тому

    thank you

  • @rajroy2426
    @rajroy2426 Рік тому

    the variational lower bound part is not very clear to be honest

  • @convolutionalnn2582
    @convolutionalnn2582 2 роки тому

    What are the maths require to be research scientist in computer vision? What are best resource? And Best book for Computer Vision?

    • @sergeychirkunov7165
      @sergeychirkunov7165 2 роки тому

      Multiview Geometry in Computer Vision. It’s fundamental and quite helpful for research in CV

    • @convolutionalnn2582
      @convolutionalnn2582 2 роки тому

      @@sergeychirkunov7165 Can you look for me something in youtube....I search as geometry for computer vision and which playlist should i watch....multiple view geometry in computer vision playlist by Sean Mullery or Cvprtum or 3D Computer Vision by CVRP Lab or any recommendations

    • @convolutionalnn2582
      @convolutionalnn2582 2 роки тому

      @@sergeychirkunov7165 People mostly said Linear Algebra Calculas Probability and statistic optimization and even talk about tensor algebra...Are this maths require too?

    • @saurabhshrivastava224
      @saurabhshrivastava224 2 роки тому

      @@convolutionalnn2582 Yes, that's true. Basics of LA, Probab and Optimization are sort of mandatory.

    • @convolutionalnn2582
      @convolutionalnn2582 2 роки тому

      @@saurabhshrivastava224 Best resource of geometry for Computer Vision?

  • @lorenzo.padoan
    @lorenzo.padoan 2 роки тому +1

    I think they initialize some of the layers with a zero weights in order to speed up the training process

    • @TheAIEpiphany
      @TheAIEpiphany  2 роки тому +1

      Any pointers/papers?

    • @lorenzo.padoan
      @lorenzo.padoan 2 роки тому +1

      @@TheAIEpiphany Unfortunately I can't give any paper reference, during the AI course my prof explained some rules of thumbs for weights initialization, and one is this technique that was implemented in this code.

  • @bibhabasumohapatra
    @bibhabasumohapatra Рік тому +1

    kuch samjh nhin aaraha 😭