Lesson 9: Deep Learning Foundations to Stable Diffusion

Поділитися
Вставка
  • Опубліковано 8 чер 2024
  • (All lesson resources are available at course.fast.ai.) This is the first lesson of part 2 of Practical Deep Learning for Coders. It starts with a tutorial on how to use pipelines in the Diffusers library to generate images. Diffusers is (in our opinion!) the best library available at the moment for image generation. It has many features and is very flexible. We explain how to use its many features, and discuss options for accessing the GPU resources needed to use the library.
    We talk about some of the nifty tweaks available when using Stable Diffusion in Diffusers, and show how to use them: guidance scale (for varying the amount the prompt is used), negative prompts (for removing concepts from an image), image initialisation (for starting with an existing image), textual inversion (for adding your own concepts to generated images), Dreambooth (an alternative approach to textual inversion).
    The second half of the lesson covers the key concepts involved in Stable Diffusion:
    - CLIP embeddings
    - The VAE (variational autoencoder)
    - Predicting noise with the unet
    - Removing noise with schedulers.
    You can discuss this lesson, and access links to all notebooks and resources from it, at this forum topic: forums.fast.ai/t/lesson-9-par...
    0:00 - Introduction
    6:38 - This course vs DALL-E 2
    10:38 - How to take full advantage of this course
    12:14 - Cloud computing options
    14:58 - Getting started (Github, notebooks to play with, resources)
    20:48 - Diffusion notebook from Hugging Face
    26:59 - How stable diffusion works
    30:06 - Diffusion notebook (guidance scale, negative prompts, init image, textual inversion, Dreambooth)
    45:00 - Stable diffusion explained
    53:04 - Math notation correction
    1:14:37 - Creating a neural network to predict noise in an image
    1:27:46 - Working with images and compressing the data with autoencoders
    1:40:12 - Explaining latents that will be input into the unet
    1:43:54 - Adding text as one hot encoded input to the noise and drawing (aka guidance)
    1:47:06 - How to represent numbers vs text embeddings in our model with CLIP encoders
    1:53:13 - CLIP encoder loss function
    2:00:55 - Caveat regarding "time steps"
    2:07:04 Why don’t we do this all in one step?
    Thanks to fmussari for the transcript, and to Raymond-Wu (on forums.fast.ai) for the timestamps.

КОМЕНТАРІ • 63

  • @TTTrouble
    @TTTrouble Рік тому +43

    Wow this is such a treasure to have freely available and I am so thankful that you put this out for the community. Many many thanks good sir, your work towards educating the masses about AI and Machine Learning is so very much appreciated. 🎉❤

  • @rajahx
    @rajahx Рік тому +9

    This is beautifully explained Jeremy! From real basics to some of the most complicated state of the art models we have today. Bravo.

  • @gilbertobatres-estrada5119
    @gilbertobatres-estrada5119 Рік тому +3

    I am so glad you took your time to correct the math mistake! Great work! And thank you for your mission of teaching us new findings in AI and deep learning 🙏

  • @numannebuni
    @numannebuni Рік тому +9

    I absolutely love the style in which this is explained. Thank you very much!

  • @MuhammadJaalouk
    @MuhammadJaalouk Рік тому

    Thank you so much for this insightful video. The lecture breaks down complex ideas into segments that are very easy to comprehend.

  • @ItzGanked
    @ItzGanked Рік тому +1

    I thought I was going to have to wait until next year, thank you for making this content accessible

  • @AIBites
    @AIBites 4 місяці тому +2

    This is a nicely thought-through course. Amazing Jeremy! :)

  • @chyldstudios
    @chyldstudios Рік тому +21

    Wonderful, I was waiting for these series of videos. Bravo!

  • @akheel_khan
    @akheel_khan 8 місяців тому

    Undoubtedly an accessible and insightful guide

  • @kartikpodugu
    @kartikpodugu 8 місяців тому +1

    🙏🙏🙏
    Amazing information.
    I knew bits and pieces, now I know the entire picture.

  • @johngrabner
    @johngrabner Рік тому

    Very informative video. Thank you for taking the time to produce.

  • @yufengchen4944
    @yufengchen4944 Рік тому +3

    Great! I can only see 2019 version of Part 2, look foward to see the new Part 2 course available!

  • @markm4642
    @markm4642 Рік тому

    Liberating the world with this quality of education

  • @asheeshmathur
    @asheeshmathur 9 місяців тому +2

    Outstanding, the best description so far. God Bless Jeremy. Excellent service to curious souls.

  • @sushilkhadka8069
    @sushilkhadka8069 8 місяців тому

    Excellent intiution. You're doing the huge service to humanity

  • @ricardocalleja
    @ricardocalleja Рік тому +5

    Awesome material! Thank you very much for sharing

  • @SadAnecdote
    @SadAnecdote Рік тому

    Thanks for the early release

  • @atNguyen-gt6nd
    @atNguyen-gt6nd Рік тому

    Thank you so much for your lectures.

  • @cybermollusk
    @cybermollusk Рік тому +6

    You might want to put this series into a playlist. I see you have playlists for all your other courses.

  • @kirak
    @kirak Рік тому

    Wow this helped me a lot. Thank you!

  • @tinkeringengr
    @tinkeringengr Рік тому

    Thanks -- great lecture!

  • @rashulsel
    @rashulsel Рік тому +2

    Amazing video and really easy to follow up with the topics. Its neat how different research is coming together to build something more efficient and promising. So future of AI is how models fit together?

  • @SubhadityaMukherjee
    @SubhadityaMukherjee Рік тому

    YAY its hereeee. My excitement!!

  • @user-ny9zc5nw7s
    @user-ny9zc5nw7s Рік тому

    Thank you for this lecture

  • @rubensmau
    @rubensmau Рік тому

    Thanks, very clear.

  • @ramansarabha871
    @ramansarabha871 Рік тому

    Thanks a ton! have been waiting.

  • @ayashiyumi
    @ayashiyumi 10 місяців тому

    Ottimo video. Continua a pubblicare altre cose del genere.

  • @super-eth8478
    @super-eth8478 Рік тому +1

    THANKS 🙏🏻🙏🏻

  • @edmondj.
    @edmondj. Рік тому

    I love you, its so clear as usual, i owed you embeddings, now i owe you diffusion too.

    • @edmondj.
      @edmondj. Рік тому

      Please open a tipee

  • @Beyondarmonia
    @Beyondarmonia Рік тому

    Thank you 🙏

  • @mariuswuyo8742
    @mariuswuyo8742 Рік тому

    Very an excellent course, I would like to ask a question that the noise N(0,0.1) is added equally to each pixel or to the whole image at 1:21:51? These two are equivalent?

    • @user-wf3bp5zu3u
      @user-wf3bp5zu3u Рік тому

      Different per pixel! You’re drawing a vector of random noise samples then reshaping it into an image, so you get many values but all from a dist with low variance. The python random numbers let you sample in the shape of an image directly, so you don’t need to manually reshape. But that’s just for convenience

  • @ghpkishore
    @ghpkishore Рік тому +2

    That math correction was very essential to me. Coming from a mechanical background, I knew something was off, but then thought I didn't know enough about DL to figure out what it is, and that I was on the wrong. With the math correction, it clicked, and was something I knew all along.Thanks.

  • @pranavkulkarni6489
    @pranavkulkarni6489 Рік тому +2

    Thank you for great explanation .. I just wanted to know ans to 'what is U-net?' could not understand where is it used in whole process ? I mean what I could not get is what is the difference between VAE (Autoencoder) and an Unet

    • @tildebyte
      @tildebyte 8 місяців тому

      During *training*, you pass an actual image into the VAE ENcoder (to reduce the amount of data you have to deal with), which then passes the latent it produces on to the UNet, which does the learning involving noising/denoising the latent. During *inference* ("generating"), the UNet (after a lot of other stuff happens :D) passes out a denoised latent to the VAE DEcoder, which then produces an actual image

  • @yufengchen4944
    @yufengchen4944 Рік тому +1

    Looks like the part 2 2022 webpage is still not public right? or I didn't find the way?

  • @peregudovoleg
    @peregudovoleg 2 місяці тому

    At 1:13:20 aren't we supposed to add derivatives to pixel values since we are maximizing P? Unless, since P is binary and it looks like a classification problem, we are going to get negative logits, then deducting seems ok (not touching the sign). Great course!

  • @homataha5626
    @homataha5626 Рік тому

    Can I ask a video of these model that is used for colonization?

  • @edwardhiscoke471
    @edwardhiscoke471 Рік тому

    Already out, wow. Then I'd better push on with part 1!

  • @mohdil123
    @mohdil123 Рік тому

    Awesome

  • @sushilkhadka8069
    @sushilkhadka8069 8 місяців тому

    at 1:56:50 I'm having hard time understanding the cost function. I think we need to maximise ( Green Summation - Red Summation ) , for that reason we can't call it a cost function because cost functions are usually minimised. Please correct me If I'm wrong

  • @sotasearcher
    @sotasearcher 4 місяці тому

    28:36 - I'm here in February '24, where they are good enough to do it in 1 go with SDXL-Turbo / ADD (Nov '23) :)

  • @andrewimanuel2838
    @andrewimanuel2838 Рік тому

    where can I find the latest recommended cloud computing resource?

  • @tildebyte
    @tildebyte 8 місяців тому

    I've been working on/with diffusion models (and before that VQGANs!) for years now, so I'm pretty familiar (from the end-user/theoretical POV, not so much the math/code side heh) with samplers/schedulers - this is the first time I've conceived of them as optimizers, and that seems like a *really* fertile area to research. Have you (or anyone else, for that matter) made any progress in this direction? It's (not too surprisingly) VERY hard to prompt today's search engines to find anything to do with "denoise|diffusion|schedule|sample|optimize" and NOT come up with dozens of either HuggingFace docs pages, or pages w.r.t. Stable DIffusion ROFL

  • @pankajsinghrawat1056
    @pankajsinghrawat1056 Місяць тому

    since we want to increase the probability of our image being a digit, we should "add" and not "substract" the grad of probability wrt to img. Is this right? or am I missing something ?

  • @susdoge3767
    @susdoge3767 3 місяці тому

    gold

  • @mikhaeldito
    @mikhaeldito Рік тому +1

    Released already??

  • @jonatan01i
    @jonatan01i 9 місяців тому

    Good thing is that with git we can go back to the state of the code as of (11/20).10.2022

  • @gustavojuantorena
    @gustavojuantorena Рік тому

    👏👏👏

  • @kawalier1
    @kawalier1 6 місяців тому

    Jerremy, Adam has eps, SGD momentum

  • @TiagoVello
    @TiagoVello Рік тому

    UHUUUL ITS OUT

  • @AndrewRafas
    @AndrewRafas Рік тому +2

    There is a small (not that important) correction: when you talk about 16384 bytes of latents, it is 16384 numbers, which are 65536 bytes in fact.

    • @sambitmukherjee1713
      @sambitmukherjee1713 2 місяці тому

      Each number is 4 bytes because it's a float32 precision?

  • @sotasearcher
    @sotasearcher 4 місяці тому

    52:12 - The upside down triangle is "nabla", "del" is the squiggly d that goes before each partial derivative. Also, I'm jealous of people who started calculus in high school lol

    • @sotasearcher
      @sotasearcher 4 місяці тому

      Nevermind! Getting to the next section edited in lol

    • @sotasearcher
      @sotasearcher 4 місяці тому

      1:04:12 wait you still mixed them up 😅 At this rate with your following, you're going to speak it into existence though lol. Math notation ultimately is whatever everyone agrees upon, so I could see it being possible.

    • @sotasearcher
      @sotasearcher 4 місяці тому

      1:05:22 - While I'm being nit-picky - Right-side-up triangle is called "delta", and just means change, not necessarily small

  • @myfavor2827
    @myfavor2827 Рік тому

    I was wondering if I could give some suggestions, you spent 20 mins to explain the course materials and different people. Why not start from main fun play first, then later introduce more about the course materials. People will lose interest to listen to 20 mins course materials.

  • @michaelnurse9089
    @michaelnurse9089 Рік тому +2

    I thought I was going to have to wait until next year, thank you for making this content accessible