VQ-GAN | PyTorch Implementation

Diffusion Models | Paper Explanation | Math Explained

Variational Autoencoders

So cute 🥺❤️ @andreyreactions

Манчестер Сіті VS Манчестер Юнайтед - Огляд матчу

[柴犬ASMR]曼玉Manyu&小白Bai 毛发护理Spa asmr

VQ-GAN | Paper Explanation

Outlier

Переглядів 17 265

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 30 тра 2024
Vector Quantized Generative Adversarial Networks (VQGAN) is a generative model for image modeling. It was introduced in Taming Transformers for High-Resolution Image Synthesis. The concept is build upon two stages. The first stage learns in an autoencoder-like fashion by encoding images into a low-dimensional latent space, then applying vector quantization by making use of a codebook. Afterwards, the quantized latent vectors are projected back to the original image space by using a decoder. Encoder and Decoder are fully convolutional. The second stage is learning a transformer for the latent space. Over the course of training it learns which codebook vectors go along together and which not. This can then be used in an autoregressive fashion to generate before unseen images from the data distribution.
#deeplearning #gan #generative # vqgan
0:00 Introduction
0:42 Idea & Theory
9:20 Implementation Details
13:37 Outro
Further Reading:
• VAE: towardsdatascience.com/unders...
• VQVAE: arxiv.org/pdf/1711.00937.pdf
• Why CNNS are invariant to sizes: www.quora.com/How-are-variabl...
• NonLocal NN: arxiv.org/pdf/1711.07971.pdf
• PatchGAN: arxiv.org/pdf/1611.07004.pdf
PyTorch Code: github.com/dome272/VQGAN
Follow me on instagram lol: / dome271
Наука та технологія

КОМЕНТАРІ • 40

@AICoffeeBreak 2 роки тому ⁺²⁶
Really cool video! 😎Can't wait for the next one.
@NoahElRhandour 2 роки тому ⁺⁶
omg u here??? i know u from your videos. thats so cool!
@AICoffeeBreak 2 роки тому ⁺⁸
@@NoahElRhandour Haha, I can only reply with: omg, u recognize me??? That is so cool!
Yes, I am here. I have to keep a close eye on the competition! 😆
@NoahElRhandour 2 роки тому ⁺²
@@AICoffeeBreak i see :D
@felixvgs.9840 2 роки тому ⁺⁷
What an amazing video. Please keep up the great work! :)
@reasoning9273 5 місяців тому ⁺¹
By far the best video on VQVAE. Great job, outlier!
@code4AI Рік тому
Excellent visualization for this smooth transition from VQVAE -> VQGAN (focus on main idea first and details second). 10/10
@aratasaki Рік тому ⁺¹
Incredible video! Can't tell you how much clearer everything is now. Looking forward to the future of your channel!
@outliier Рік тому ⁺²
Thats so nice to hear and motivational. The next video is in the making already about CrossAttention
@devashishprasad1509 Рік тому
This is such a great channel!!!! Why didn't I find it earlier? Thanks a lot for the great work...
@smbonilla 10 місяців тому
Your videos are great! Super clearly explained :) Thanks!!
@joanrodriguez6212 2 роки тому ⁺¹
that made some clicks in understanding! thanks a lot
@mchahhou 2 роки тому
awesome!! More of this please.
@rezarawassizadeh4601 Рік тому ⁺⁵
after three days of struggling with the paper, I find this amazing explanation for VQ-GAN.
@igorvaz6055 Рік тому
Nice explanation and visualizations!
@NoahElRhandour 2 роки тому ⁺²
didaktisch, visuell und inhaltlich absolut insane, dicke probs
@filipequincas1485 Рік тому
Brilliantly explained
@alexandterfst6532 Рік тому ⁺²
Incredible videos
@baothaiba7099 Рік тому ⁺¹
Great work !!!!
@tiln8455 2 роки тому ⁺²
Thank you for this video, now I can be better
@melisakilic726 2 роки тому
So excited for the next one!
@Paul-wk7rp 2 роки тому ⁺³
Very cool video
@saulcanoortiz7902 3 місяці тому
Hey! Really great video:) I have one question. Imagine you want to use a diffusion model to learn image-to-image translation, more specifically, from segmentation masks to synthetic images. Then, you can have a tool to create images from hand-painted segmentation masks, and then, you can augment a dataset and see if state-of-the-art segmentation networks trained with the augmented dataset improve its performance. Do you know a diffusion model for this image-to-image translation task with some explanations and available repos?
@prabhavkaula9697 Рік тому
Thank you so much for the explanation
Hopefully one can now go ahead with clip and create free version of DALL-E like text-to-image models
@JeavanCooper 17 днів тому
The strange patten in the reconstructed image and the generated image is likely to be caused by the perceptual loss, I have no idea why but the disappears when I take the perceptual loss away.
@DollyNipples Рік тому
Those pictures that were generated with VQGAN are surprisingly coherent. How do you do that?
@AIwithAniket Рік тому ⁺¹
great video
@rikki146 11 місяців тому
Why make 2 loss functions with sg instead of optimizing ||E(x) -z_q||_2^2 directly?
@sourabhpatil9406 2 роки тому ⁺²
Crisp Explanation! I would request you to talk little bit slower, it would be really helpful. Keep up the good work.
@maralzarvani8154 Рік тому ⁺¹
cool!
@TheAero 8 місяців тому
I can't find the VQGan Paper!
@yendar2806 2 роки тому ⁺³
Ich liebe dich Mathemann❤️
@raeeskhan9058 Рік тому
you are truly an outlier!
@MrArtod Рік тому
How do we decide on what goes to the codebook? Is it filled with random vectors?
@rikki146 11 місяців тому ⁺¹
It seems to be the case and they are to be converged over the course of training
@yassinesafraoui Рік тому
Hmm isn't trying to train the whole network ( decoder and encoder) using the discriminator just too complicated and would result in a loss function that's so complex that using the gradient descent to minimize it would be inefficient? I mean wouldn't it take a longer time to train?
Hence the following idea, why not use separate discriminators to train the decoder and the encoder separately. Yes it would be quite a lot more complicated than this to design but I guess it's worth giving a shot 😀
If someone knows if something like this is already done( cuz I have a feeling it probably is), may he enlighten me, thanks
@readbyname 29 днів тому
Hey great video. Can you tell me why random sampling of codebook vectors doesn't generate a meaningful images. In Vae we random sample from std gaussian, why the same doesn't work for vq auto encoders.
@outliier 29 днів тому ⁺¹
Because in a VAE you only predict mean and standard deviation. Sampling this is easier. Sampling the codebook vectors happens independently and this is why the output doesn‘t give a meaningful output.
@user-mh8pl5wd1s Рік тому
개쩐다
@idealintelligence7009 Рік тому ⁺²
Thanks boy :)
Please speak louder in the video your voice is low.:)

Наступне

Автоматичне відтворення

VQ-GAN | PyTorch Implementation

VQ-GAN | PyTorch Implementation

Diffusion Models | Paper Explanation | Math Explained

Diffusion Models | Paper Explanation | Math Explained

Variational Autoencoders

Variational Autoencoders

So cute 🥺❤️ @andreyreactions

So cute 🥺❤️ @andreyreactions

Манчестер Сіті VS Манчестер Юнайтед - Огляд матчу

Манчестер Сіті VS Манчестер Юнайтед - Огляд матчу

[柴犬ASMR]曼玉Manyu&小白Bai 毛发护理Spa asmr

[柴犬ASMR]曼玉Manyu&小白Bai 毛发护理Spa asmr

Giving 1000 Phones Away

Giving 1000 Phones Away

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

VQ-VAEs: Neural Discrete Representation Learning | Paper + PyTorch Code Explained

VQ-VAEs: Neural Discrete Representation Learning | Paper + PyTorch Code Explained

If LLMs are text models, how do they generate images? (Transformers + VQVAE explained)

If LLMs are text models, how do they generate images? (Transformers + VQVAE explained)

VQ-GAN: Taming Transformers for High-Resolution Image Synthesis | Paper Explained

VQ-GAN: Taming Transformers for High-Resolution Image Synthesis | Paper Explained

What are Normalizing Flows?

What are Normalizing Flows?

Cross Attention | Method Explanation | Math Explained

Cross Attention | Method Explanation | Math Explained

Variational Autoencoder - VISUALLY EXPLAINED!

Variational Autoencoder - VISUALLY EXPLAINED!

Vector Quantized VAEs

Vector Quantized VAEs

How I Understand Diffusion Models

How I Understand Diffusion Models

Самая НЕобычная Механическая клавиатура

Самая НЕобычная Механическая клавиатура

ЭТО Главный провал Apple перевод @mkbhd Смотри до КОНЦА

ЭТО Главный провал Apple перевод @mkbhd Смотри до КОНЦА

Пользователям Samsung, можно не делиться Наушниками #Shorts

Пользователям Samsung, можно не делиться Наушниками #Shorts

Мой странный компьютер 2024

Мой странный компьютер 2024

КУПИЛ САМЫЙ ДЕШЁВЫЙ ПК ARDOR GAMING в DNS для CS2

КУПИЛ САМЫЙ ДЕШЁВЫЙ ПК ARDOR GAMING в DNS для CS2

Проблемы с безопасностью ИИ OpenAI | В Китае ожил размороженный мозг | Большой Брат ИИ от Microsoft

Проблемы с безопасностью ИИ OpenAI | В Китае ожил размороженный мозг | Большой Брат ИИ от Microsoft

Don't worry, see if my color optimization is correct! S23 Ultra Vs iPhone 14 Pro Max #shorts

Don't worry, see if my color optimization is correct! S23 Ultra Vs iPhone 14 Pro Max #shorts

Обзор игрового компьютера Макса 2в1

Обзор игрового компьютера Макса 2в1