VQ-GAN | Paper Explanation

Поділитися
Вставка
  • Опубліковано 30 тра 2024
  • Vector Quantized Generative Adversarial Networks (VQGAN) is a generative model for image modeling. It was introduced in Taming Transformers for High-Resolution Image Synthesis. The concept is build upon two stages. The first stage learns in an autoencoder-like fashion by encoding images into a low-dimensional latent space, then applying vector quantization by making use of a codebook. Afterwards, the quantized latent vectors are projected back to the original image space by using a decoder. Encoder and Decoder are fully convolutional. The second stage is learning a transformer for the latent space. Over the course of training it learns which codebook vectors go along together and which not. This can then be used in an autoregressive fashion to generate before unseen images from the data distribution.
    #deeplearning #gan #generative # vqgan
    0:00 Introduction
    0:42 Idea & Theory
    9:20 Implementation Details
    13:37 Outro
    Further Reading:
    • VAE: towardsdatascience.com/unders...
    • VQVAE: arxiv.org/pdf/1711.00937.pdf
    • Why CNNS are invariant to sizes: www.quora.com/How-are-variabl...
    • NonLocal NN: arxiv.org/pdf/1711.07971.pdf
    • PatchGAN: arxiv.org/pdf/1611.07004.pdf
    PyTorch Code: github.com/dome272/VQGAN
    Follow me on instagram lol: / dome271
  • Наука та технологія

КОМЕНТАРІ • 40

  • @AICoffeeBreak
    @AICoffeeBreak 2 роки тому +26

    Really cool video! 😎Can't wait for the next one.

    • @NoahElRhandour
      @NoahElRhandour 2 роки тому +6

      omg u here??? i know u from your videos. thats so cool!

    • @AICoffeeBreak
      @AICoffeeBreak 2 роки тому +8

      @@NoahElRhandour Haha, I can only reply with: omg, u recognize me??? That is so cool!
      Yes, I am here. I have to keep a close eye on the competition! 😆

    • @NoahElRhandour
      @NoahElRhandour 2 роки тому +2

      @@AICoffeeBreak i see :D

  • @felixvgs.9840
    @felixvgs.9840 2 роки тому +7

    What an amazing video. Please keep up the great work! :)

  • @reasoning9273
    @reasoning9273 5 місяців тому +1

    By far the best video on VQVAE. Great job, outlier!

  • @code4AI
    @code4AI Рік тому

    Excellent visualization for this smooth transition from VQVAE -> VQGAN (focus on main idea first and details second). 10/10

  • @aratasaki
    @aratasaki Рік тому +1

    Incredible video! Can't tell you how much clearer everything is now. Looking forward to the future of your channel!

    • @outliier
      @outliier  Рік тому +2

      Thats so nice to hear and motivational. The next video is in the making already about CrossAttention

  • @devashishprasad1509
    @devashishprasad1509 Рік тому

    This is such a great channel!!!! Why didn't I find it earlier? Thanks a lot for the great work...

  • @smbonilla
    @smbonilla 10 місяців тому

    Your videos are great! Super clearly explained :) Thanks!!

  • @joanrodriguez6212
    @joanrodriguez6212 2 роки тому +1

    that made some clicks in understanding! thanks a lot

  • @mchahhou
    @mchahhou 2 роки тому

    awesome!! More of this please.

  • @rezarawassizadeh4601
    @rezarawassizadeh4601 Рік тому +5

    after three days of struggling with the paper, I find this amazing explanation for VQ-GAN.

  • @igorvaz6055
    @igorvaz6055 Рік тому

    Nice explanation and visualizations!

  • @NoahElRhandour
    @NoahElRhandour 2 роки тому +2

    didaktisch, visuell und inhaltlich absolut insane, dicke probs

  • @filipequincas1485
    @filipequincas1485 Рік тому

    Brilliantly explained

  • @alexandterfst6532
    @alexandterfst6532 Рік тому +2

    Incredible videos

  • @baothaiba7099
    @baothaiba7099 Рік тому +1

    Great work !!!!

  • @tiln8455
    @tiln8455 2 роки тому +2

    Thank you for this video, now I can be better

  • @melisakilic726
    @melisakilic726 2 роки тому

    So excited for the next one!

  • @Paul-wk7rp
    @Paul-wk7rp 2 роки тому +3

    Very cool video

  • @saulcanoortiz7902
    @saulcanoortiz7902 3 місяці тому

    Hey! Really great video:) I have one question. Imagine you want to use a diffusion model to learn image-to-image translation, more specifically, from segmentation masks to synthetic images. Then, you can have a tool to create images from hand-painted segmentation masks, and then, you can augment a dataset and see if state-of-the-art segmentation networks trained with the augmented dataset improve its performance. Do you know a diffusion model for this image-to-image translation task with some explanations and available repos?

  • @prabhavkaula9697
    @prabhavkaula9697 Рік тому

    Thank you so much for the explanation
    Hopefully one can now go ahead with clip and create free version of DALL-E like text-to-image models

  • @JeavanCooper
    @JeavanCooper 17 днів тому

    The strange patten in the reconstructed image and the generated image is likely to be caused by the perceptual loss, I have no idea why but the disappears when I take the perceptual loss away.

  • @DollyNipples
    @DollyNipples Рік тому

    Those pictures that were generated with VQGAN are surprisingly coherent. How do you do that?

  • @AIwithAniket
    @AIwithAniket Рік тому +1

    great video

  • @rikki146
    @rikki146 11 місяців тому

    Why make 2 loss functions with sg instead of optimizing ||E(x) -z_q||_2^2 directly?

  • @sourabhpatil9406
    @sourabhpatil9406 2 роки тому +2

    Crisp Explanation! I would request you to talk little bit slower, it would be really helpful. Keep up the good work.

  • @maralzarvani8154
    @maralzarvani8154 Рік тому +1

    cool!

  • @TheAero
    @TheAero 8 місяців тому

    I can't find the VQGan Paper!

  • @yendar2806
    @yendar2806 2 роки тому +3

    Ich liebe dich Mathemann❤️

  • @raeeskhan9058
    @raeeskhan9058 Рік тому

    you are truly an outlier!

  • @MrArtod
    @MrArtod Рік тому

    How do we decide on what goes to the codebook? Is it filled with random vectors?

    • @rikki146
      @rikki146 11 місяців тому +1

      It seems to be the case and they are to be converged over the course of training

  • @yassinesafraoui
    @yassinesafraoui Рік тому

    Hmm isn't trying to train the whole network ( decoder and encoder) using the discriminator just too complicated and would result in a loss function that's so complex that using the gradient descent to minimize it would be inefficient? I mean wouldn't it take a longer time to train?
    Hence the following idea, why not use separate discriminators to train the decoder and the encoder separately. Yes it would be quite a lot more complicated than this to design but I guess it's worth giving a shot 😀
    If someone knows if something like this is already done( cuz I have a feeling it probably is), may he enlighten me, thanks

  • @readbyname
    @readbyname 29 днів тому

    Hey great video. Can you tell me why random sampling of codebook vectors doesn't generate a meaningful images. In Vae we random sample from std gaussian, why the same doesn't work for vq auto encoders.

    • @outliier
      @outliier  29 днів тому +1

      Because in a VAE you only predict mean and standard deviation. Sampling this is easier. Sampling the codebook vectors happens independently and this is why the output doesn‘t give a meaningful output.

  • @user-mh8pl5wd1s
    @user-mh8pl5wd1s Рік тому

    개쩐다

  • @idealintelligence7009
    @idealintelligence7009 Рік тому +2

    Thanks boy :)
    Please speak louder in the video your voice is low.:)