Denoising Diffusion Probabilistic Models Code | DDPM Pytorch Implementation

ExplainingAI

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 16 гру 2024

КОМЕНТАРІ • 42

@Explaining-AI Рік тому
*Github Code* - github.com/explainingai-code/DDPM-Pytorch
*DDPM Math Explanation Video* - ua-cam.com/video/H45lF4sUgiE/v-deo.html
@Mahan_Veisi 5 днів тому
Thank you! It was amazing. While there are limited content available for diffusion models, you did pretty nice.❤
@Explaining-AI 5 днів тому
Thank you for your kind words :)
@zhuangzhuanghe530 9 місяців тому ⁺¹
I am very thankful for your nice video; it's the best explanation of the diffusion model I have seen!
@Explaining-AI 9 місяців тому
Thank you so much for your encouraging words!
@prathameshdinkar2966 6 місяців тому
Nicely explained! Keep the good work going! 😁
@_RMSG_ 3 місяці тому
This is incredible
@purnavindhya27 8 місяців тому
Hi, amazing explanation! Thanks for all the efforts you put into making the video.
Can you please share the details of the UNet model that you've used (maybe a link to a paper/blog)? Thank you!
@Explaining-AI 8 місяців тому ⁺¹
Thank you for the appreciation! For the UNet model, I just mimicked the architecture from the huggingface Unet2DModel class in diffusers library (huggingface.co/docs/diffusers/en/api/models/unet2d) with minor changes(at what point concatenation and upsampling happens in upblock). The diffusers Unet2DModel class (which itself is based on unet paper arxiv.org/abs/1505.04597 ) and this comment thread (ua-cam.com/video/vu6eKteJWew/v-deo.html&lc=UgzBFfe4anyDf4txEZx4AaABAg) should give you all the necessary information regarding the Unet Model. Do let me know if that ends up not being the case.
@pigeon908 16 днів тому
Great video, thank you.
When passing transposed feature-map in ConvBlocks, do you intentionally skip adding positional encodings to them (like ViT does for example), or is it intentional somehow?
@Explaining-AI 15 днів тому
I didn’t intentionally skip position embeddings. The reason they are not included is that this code mimics the official implementation provided by the authors of Latent Diffusion, where positional embeddings are skipped.
I’m not entirely sure why the authors made that choice, but I’ve discussed this in a bit more detail in the issue linked below.
If you're interested, then do check it out:
github.com/explainingai-code/DDPM-Pytorch/issues/4
@PoojaSharma-ms5jf 8 місяців тому
Amazing.
@alivecoding4995 День тому
How does self attention work in convnets (instead of transformers)? 😊
@muhammadawais2173 Рік тому
very well explained. what changes would we need to make if we used our own dataset? specifically greyscale
@Explaining-AI Рік тому
Thank you. Have replied on github regarding this.
@muhammadawais2173 Рік тому
yeah it was me@@Explaining-AI
@nabinbk1065 2 місяці тому
thank you so much sir
@Explaining-AI 2 місяці тому
Glad you found it helpful
@efstathiasoufleri6881 8 місяців тому
Thank you so much!
@binyaminramati3010 Рік тому ⁺¹
Hi there, thanks for the video, may I ask a question: to my understanding, the multi-headed attention first applies 3 ff networks for key, query, and value, and in this model, you applied multiheaded attention on the image where channels play as sequence length and flattened image plays as the token_length that should mean that the query network for example should be a Linear(token_length/4,token_length/4) which means its parameter count should be (token_length*token_length/16) = ((h*w)**2)/16 which is huge, or am I wrong?
@Explaining-AI Рік тому
Thank you! @binyaminramati3010
So the channel dimension here is the embedding dimension and the H*W is the sequence length.
If you notice before attention, we do a transpose this is to make the channel dimension as the embedding dimension.
Assuming the feature map is 128x7x7 (CxHxW) and lets assume we only have one head.
So that means we have a sequence of 49 tokens(feature map cells) each of 128 dimensions.
Q/K/V will be 128*128
(QKT) attention weights will be 49x49
Weighted Values will 49x128
So no huge computation as such required right? Or am I not understanding your question correctly ?
@binyaminramati3010 Рік тому
@@Explaining-AIThank you, I missed the transpose. and again, applause for the impressive content👏
@colder4163 5 місяців тому
Amazing explaination. But i have a question that i want to train on my custom rgb data with the shapr 128x128 or 256x256, buy i always gave the results of outofmemory, but the training params is inly about 10m params. Can you help with that?
@colder4163 5 місяців тому
Moreover, i set the config params that the batchsize is 1, and i trained on the gpu t4
@Explaining-AI 5 місяців тому
@@colder4163 Its most likely because of image size. Can you try with 64x64 . Have responded on what changes need to be made for this here - github.com/explainingai-code/DDPM-Pytorch/issues/1#issuecomment-2236651773
@colder4163 5 місяців тому
@@Explaining-AI oh i see, thank you so much
@colder4163 5 місяців тому
@@Explaining-AI if i want to restore the blured image like motion blur of exposure blur, what should i do, could you give me some advises?
@xdhanav5449 10 місяців тому
Thanks for the very informative video! I am having trouble with using my own dataset in this. I'm doing this on a macbook in google colab. Currently, I have mounted my drive to the colab and pulled in my dataset from my drive, through the default.yaml. However, I am getting an error, saying that num_samples should be positive, and not 0. I am not sure what you mean by "Put the image files in a folder created within the repo root (example: data/images/*.png ).". What is this repo root and where can I find it? Is it local on my computer? Could you help with this? Thank you in advance!
@Explaining-AI 10 місяців тому
You are welcome! So the path in config can either be the relative path from the "DDPM-Pytorch" directory or the absolute path. So currently the config assumes inside DDPM-Pytorch directory there would be a data/images folder which will have all image files.
@ShuchengLiu-l7z 3 місяці тому
Unbelievable！
@takihasan8310 9 місяців тому
Thank you so much for the video. It was amazing and your video explained many things that I couldn't understand anywhere. Though I have a question regarding the up channels. You have given down channels as [32, 64, 128, 256]. As per your code the channels for the first upsample will be (256, 64) but after concatenating from the last down layer the number of channels for the first convolution of the resnet layer should be 128 + 256 = 384 but as per your code it is 256. The same thing will happen for each upblock. In second case 128 + 64 should be the in channels but as per your code 128, and the third upsample layer should have in channels 64 + 32 = 96 but as per your code it is 64. I think there is little miscalculation.
@Explaining-AI 9 місяців тому ⁺¹
Hello, according to the code the first down layer to be concatenated is not the last down layer but the second last down layer. Its a bit easier to explain with a diagram so can you take a look at the below text representing whats happening and let me know if you have any issues still.
Downblocks Upblocks
32 ---------------------------64->16
|down |upsample(&concat)
64 ------------------128->32
|down |upsample(&concat)
128------------256->64
|down |upsample(&concat)
256----256---128
@takihasan8310 9 місяців тому
@Explaining-AI Sorry, my mistake. I got it. You are saving the feature tensors before passing it through the down block hence the math works out if we consider that. But isn't normally we concatenate the feature tensor obtained after passing through the downblock? in my brief experience with unets I have seen that normally. That's why I thought there is mistake.
@Explaining-AI 9 місяців тому
@@takihasan8310 yes you are right. That way is indeed closer to the "official unet" implementation. After spending limited amount of time on this, I found this way enabled me to write simpler code. So went with this only. And as long as the network has layers of downsampling followed by layers of upsampling together with concatenation of downblock feature maps, I would say it still qualifies as a unet per say. But yes, definitely not the official paper's unet implementation.
@paramthakkar4658 9 місяців тому
I am getting a Cuda out of memory error when used on my own dataset. The dataset consists of .npy files
@Explaining-AI 9 місяців тому
Hello, If you have already tried reducing the batch size and are still getting this error, could you take a look at this github.com/explainingai-code/DDPM-Pytorch/issues/1 specifically this comment - github.com/explainingai-code/DDPM-Pytorch/issues/1#issuecomment-1862244458 and see if that helps getting rid of the out of memory error.
@takihasan8310 9 місяців тому
@Explaining-AI
Sorry to bother you but I don't know why but whenever I am training on any dataset, I tried mnist, cifar10 etc but mse loss is always nan. Is this expected, I checked my transformation. It is correct, first transforms.ToTensor(), and transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]). All the losses are nan values, will the model learn anything meaningful?
@Explaining-AI 9 місяців тому
Were you able to get rid if this issue? Is it possible for you send me a link to your repo in case you have changed any part of code or parameters of training.
@muhammadawais2173 11 місяців тому
hi Sir, i would like to request you kindly make changing in the stable diffusion model repository regarding size of the images because this repository is not supporting high image size and required very high GPU memory like for 256 size images its required almost 200Gb which is high cost effective. also if possible include few evaluation metrics for quantitative analysis between the original and the generated images. waiting for the next video!
@Explaining-AI 11 місяців тому ⁺¹
Hi @muhammadawais2173, I will next start working on the Stable diffusion video but unfortunately it would take me a month to get it up with code and video. Sorry but its going to take that long given my other works. In case you are really blocked because of this might I suggest using the hugging face diffusers library . They will anyway have much more efficient implementation than me :)
@muhammadawais2173 11 місяців тому
@@Explaining-AI Thank you so much. I will go through it. Infect, i already went through many diffusion model implementation but you explained very well and an easiest way also your model give satisfactory results as compared to others.
@ahmetberkegokmen2342 5 місяців тому
Amazing.

Наступне

Автоматичне відтворення

Stable Diffusion from Scratch in PyTorch | Unconditional Latent Diffusion Models