Thank you for the video, I have had some doubts: I wanted to know if one runs the training script, how does the model save the checkpoints? I also wanted to know while sampling where does the model save the samples?
Hi Aleska, the zero_module here is meaned to initial the zero weights for the last layers, avoiding the situation that last layers learn everything. But as the learning going on, the last layer will learn some thing. You can check the paper, . ovo
Hi Aleksa The formulas side by side comparisons are really useful. Thank you a lot for your dedication! P.S.: I might be wrong, but I believe the bug that you mentioned at the end of the video with images that have 4k steps being over-saturated is caused by the next factor: The whole reason why Diffusion Models work, is that we assume the last step of the noising process will be a noise with mean=0 and variance=1. While it is true that if we take an image, and we gradually apply gaussian noise n steps where n tends to infinity, we will reach that state with mean=0 and var=1, it is important to notice that we can define an n_epsilon where if we take the limit, the image is already peaking the desired mean and variance. This n_epsilon in this case is about 2k. Every image generated in x steps where x > n_epsilon will roughly be same image as the one generated at n_epsilon. Thus, a diffusion model when starts to sample, the noise that is initially generated will be equivalent with the one at the n_epsilon. This means that the first n_epsilon steps from x will actually be able to generate the a good image, while all the steps that are past n_epsilon just destroy the image. This limit with n_epsilon being 2k might have to do with the precision of the operations too tho'.
Food for thought: I think it'd be cooler and more informative to build the simplest diffusion model from scratch, using Pytorch/Tensorflow/JAX and other packages of course
Or even a series, where we start from a simplest possible diffusion-based model, and improve it over time in consecutive videos, implementing latest discoveries from most recent papers. This would be incredible
Thank you very much for this video! Really, really great explanation (although no easy going) of the improved diffusion models and a perfect preparation for your stable diffusion video!
Your videos are amazing. I especially like this simultaneous covering of both the paper and the code. Keep it up! However, maybe you can still make some short (lighter) videos for beginners.
Hi, i am always confused about the forward process equation defined in (2). we say the our images x come from an unknown distribution q(.). but in equation (2) we are saying that this distribution is normal ? we are sampling from a normal distribution to get next forward step. sorry I am not that good when it comes to probability theory.
great video Aleksa. i am new to torch, i read pytorch rand_like should sample frim a uniform distribution instead of a gaussian. How does that work since we need samples from standard gaussian?
Hm, toy project - not that I am aware of. I mean if you treat the model as a black box everything is a toy project. GLIDE, DALL-E mini, etc. Although I think you can't run DALL-E mini on a single machine, I might be wrong. Stay tuned! ;)
Perfect explanation, I really appreciate it if you can share the code that runs on a single gpu. I am having trouble running the code in distributed mode.
Hey Aleksa, I have a question. When you come across a topic such as Text to Image generation or just Diffusion models, how do you find fundamental papers/articles/reading materials to gain in-depth knowledge on them? And how do you plan and follow through on your learning process. I'm big on self learning but often lack the planning to follow through. I'm inspired by your journey and seek to acquire some guidance. Thanks in advance!
Very nice video! Just a question: how can I apply denoising to a noisy image? It seems to me that this paper can only generate a new image from the learned data distribution, right? Maybe I lost some steps....
Hey! I am working on the same problem. It would be great if @Aleksa could make a video on that. I think this paper "Image super-resolution via iterative refinement", a follow-up to the original DDPM has the solution although it focusses on super resolution. To my understanding, in the original DDPM, you are trying to minimize the MSE loss between the noise added in the forward process at time t and the noise predicted by the network. So, the noise predicted by the network is only a function of the noisy input at step t and t itself. In denoising/super resolution, I would assume that there should also be some way of feeding the noisy image to the network as input during training. So in this case, the network would take in the noisy(to be denoised) input, the noisy input from the forward diffusion process and the time step. But I am not entirely sure. Would you like to connect through Discord to discuss this in case you are still working on this?
I am curious, is the problem seen at 1:15:05 addressed... Its quite a big error tbh, I am curious if they actually used this code with the error to train because then that means the theory behind how it works is shaky
@@sergeychirkunov7165 Can you look for me something in youtube....I search as geometry for computer vision and which playlist should i watch....multiple view geometry in computer vision playlist by Sean Mullery or Cvprtum or 3D Computer Vision by CVRP Lab or any recommendations
@@sergeychirkunov7165 People mostly said Linear Algebra Calculas Probability and statistic optimization and even talk about tensor algebra...Are this maths require too?
@@TheAIEpiphany Unfortunately I can't give any paper reference, during the AI course my prof explained some rules of thumbs for weights initialization, and one is this technique that was implemented in this code.
Time to cover diffusion models in greater depth! Do let me know how do you like this combination of papers + coding!
Thank you so much for uploading the tutorial. Good resources on diffusion models is such a rarity.
13:49 I too am okay with the mathematics and the proofs, but I wanted to know why it works?
It would be great if you could share the code!
Might be better to separate the code & papers into their own videos
Thank you for the video, I have had some doubts:
I wanted to know if one runs the training script, how does the model save the checkpoints?
I also wanted to know while sampling where does the model save the samples?
I have been learning about diffusion models for a week, so the timing on this video was perfect. Thank you!
Nice!
Hi Aleska, the zero_module here is meaned to initial the zero weights for the last layers, avoiding the situation that last layers learn everything. But as the learning going on, the last layer will learn some thing. You can check the paper, . ovo
Hi Aleksa
The formulas side by side comparisons are really useful.
Thank you a lot for your dedication!
P.S.: I might be wrong, but I believe the bug that you mentioned at the end of the video with images that have 4k steps being over-saturated is caused by the next factor:
The whole reason why Diffusion Models work, is that we assume the last step of the noising process will be a noise with mean=0 and variance=1.
While it is true that if we take an image, and we gradually apply gaussian noise n steps where n tends to infinity, we will reach that state with mean=0 and var=1, it is important to notice that we can define an n_epsilon where if we take the limit, the image is already peaking the desired mean and variance. This n_epsilon in this case is about 2k. Every image generated in x steps where x > n_epsilon will roughly be same image as the one generated at n_epsilon.
Thus, a diffusion model when starts to sample, the noise that is initially generated will be equivalent with the one at the n_epsilon. This means that the first n_epsilon steps from x will actually be able to generate the a good image, while all the steps that are past n_epsilon just destroy the image.
This limit with n_epsilon being 2k might have to do with the precision of the operations too tho'.
The side by side really does help give me a understanding on the formulas
Thanks for showing the code and paper side by side. Really helpful!
Your walkthroughs are perfect, please keep up the good work ❤
Loved the code and paper side by side explanation ! Kudos to you ! Follow the code and paper explanations if you can in all videos !
Thanks for this very useful video full of clear explanations about diffusion models and the bridge between paper formulas and code!
Food for thought: I think it'd be cooler and more informative to build the simplest diffusion model from scratch, using Pytorch/Tensorflow/JAX and other packages of course
100%!
Or even a series, where we start from a simplest possible diffusion-based model, and improve it over time in consecutive videos, implementing latest discoveries from most recent papers. This would be incredible
+1
Finally got around to watching this. I quite enjoyed the video.
Glad to hear that!!
Thank you very much for this video! Really, really great explanation (although no easy going) of the improved diffusion models and a perfect preparation for your stable diffusion video!
I'm fairly familiar with the ddpm code but I still learned a lot, thanks for the nice video!
Amazing! Keep up the good work. It is very interesting!
Great video. Hope to see a video explaining the code of the "Diffusion model beat GANs" paper.
amazing Aleksa :) we cannot wait for glide and dalle-2 :)
Glide is already uploaded! 😀 Check it out!
Your videos are amazing. I especially like this simultaneous covering of both the paper and the code. Keep it up! However, maybe you can still make some short (lighter) videos for beginners.
amazing video, really helpful!
i love this series
🥳🥳🥳
Excellent video, thank you very much!
Love your tutorials, Aleksa!
Also wanted to know if you have covered DDIMs in any tutorial?
Hey I have a question about the research paper, Why are they using the integration in the beginning of the background section? Thanks in advance. 3:39
Excellent video, it is very helpful ❤
It was quite nice video ! Well done sir !
Thanks for this video, it's really helpful. Could you please cover the DDIM paper too? It's super helpful to have the code and equations side-by-side.
Hi, could you kindly share the repo please? I cant find it on your github. Thanks.
Hi, i am always confused about the forward process equation defined in (2). we say the our images x come from an unknown distribution q(.). but in equation (2) we are saying that this distribution is normal ? we are sampling from a normal distribution to get next forward step. sorry I am not that good when it comes to probability theory.
Hey nice vid! Do you have any idea why they zero the weights of some of the convolutional layers?
Wondering the same thing
Super valuable video! Many thanks. Can you post a link to your GitHub repo for windows?
Hi. Great Explanation. Also, can you do a video explaining score based generative models i.e score based sde paper and code?
The explanation was great. If you went back to making these type of videos would be super.
great video Aleksa. i am new to torch, i read pytorch rand_like should sample frim a uniform distribution instead of a gaussian. How does that work since we need samples from standard gaussian?
Thanks for the video, is there a good toy project that uses diffusion models that you would recommend?
Hm, toy project - not that I am aware of. I mean if you treat the model as a black box everything is a toy project.
GLIDE, DALL-E mini, etc. Although I think you can't run DALL-E mini on a single machine, I might be wrong. Stay tuned! ;)
Perfect explanation, I really appreciate it if you can share the code that runs on a single gpu. I am having trouble running the code in distributed mode.
yuor video is very wonderful
Hey Aleksa, I have a question. When you come across a topic such as Text to Image generation or just Diffusion models, how do you find fundamental papers/articles/reading materials to gain in-depth knowledge on them? And how do you plan and follow through on your learning process.
I'm big on self learning but often lack the planning to follow through. I'm inspired by your journey and seek to acquire some guidance. Thanks in advance!
Hey Arjun! Check out my Medium blogs. I literally have my process captured there. :)) Maybe start with how I landed a job at DeepMind blog
Gigachad!
Lol! Such a wordchad thing to say!
Very nice video! Just a question: how can I apply denoising to a noisy image? It seems to me that this paper can only generate a new image from the learned data distribution, right? Maybe I lost some steps....
Hey! I am working on the same problem. It would be great if @Aleksa could make a video on that.
I think this paper "Image super-resolution via iterative refinement", a follow-up to the original DDPM has the solution although it focusses on super resolution. To my understanding, in the original DDPM, you are trying to minimize the MSE loss between the noise added in the forward process at time t and the noise predicted by the network. So, the noise predicted by the network is only a function of the noisy input at step t and t itself. In denoising/super resolution, I would assume that there should also be some way of feeding the noisy image to the network as input during training. So in this case, the network would take in the noisy(to be denoised) input, the noisy input from the forward diffusion process and the time step. But I am not entirely sure. Would you like to connect through Discord to discuss this in case you are still working on this?
I am curious, is the problem seen at 1:15:05 addressed... Its quite a big error tbh, I am curious if they actually used this code with the error to train because then that means the theory behind how it works is shaky
There is no error in the code
The parentheses are just before the 1 over \alpha-bar_t
it's all good.
Do you know how outpainting/inpainting works?
Fantastic tutorial! It will be very helpful if you share the code. Thanks.
Thank you for tNice tutorials video. I just downloaded soft soft and I was so, so lost. I couldn't even figure out how to make a soft. Your video
thank you
the variational lower bound part is not very clear to be honest
What are the maths require to be research scientist in computer vision? What are best resource? And Best book for Computer Vision?
Multiview Geometry in Computer Vision. It’s fundamental and quite helpful for research in CV
@@sergeychirkunov7165 Can you look for me something in youtube....I search as geometry for computer vision and which playlist should i watch....multiple view geometry in computer vision playlist by Sean Mullery or Cvprtum or 3D Computer Vision by CVRP Lab or any recommendations
@@sergeychirkunov7165 People mostly said Linear Algebra Calculas Probability and statistic optimization and even talk about tensor algebra...Are this maths require too?
@@convolutionalnn2582 Yes, that's true. Basics of LA, Probab and Optimization are sort of mandatory.
@@saurabhshrivastava224 Best resource of geometry for Computer Vision?
I think they initialize some of the layers with a zero weights in order to speed up the training process
Any pointers/papers?
@@TheAIEpiphany Unfortunately I can't give any paper reference, during the AI course my prof explained some rules of thumbs for weights initialization, and one is this technique that was implemented in this code.
kuch samjh nhin aaraha 😭