Stable Diffusion - How to build amazing images with AI

Serrano.Academy

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 20 тра 2024
This video is about Stable Diffusion, the AI method to build amazing images from a prompt.
If you like this material, check out LLM University from Cohere!
llm.university
Get the Grokking Machine Learning book!
manning.com/books/grokking-ma...
Discount code (40%): serranoyt
(Use the discount code on checkout)
0:00 Introduction
1:27 How does Stable Diffusion work?
2:55 Embeddings
12:55 Diffusion Model
15:00 Numerical Example
17:39 Embedding Example
19:37 Image Generator Example
28:37 The Sigmoid Function
34:39 Diffusion Model Example
41:03 Summary
Наука та технологія

КОМЕНТАРІ • 42

@amirkidwai6451 4 місяці тому ⁺²
Arguably the greatest teacher alive
@SerranoAcademy 4 місяці тому
Thank you :)
@thebigFIDDLES 5 місяців тому ⁺⁵
These videos are always incredibly helpful, informative, and understandable. Very grateful
@krajanna 3 місяці тому ⁺¹
I am a fan of your work. I read your "Grokking Machine Learning". It's awesome. I am totally impressed. I stopped watching other AI videos and following you for most of the stuff. Simple and practical explanation. Thanks a lot and grateful for spreading the knowledge.
@NigusBasicEnglish 8 днів тому
You are the best expainer ever. You are amazing.
@enginnakus9550 5 місяців тому ⁺²
I respect your concise explaination
@jasekraft430 3 місяці тому
Always impressed with how understandable, but detailed your videos are. Thank you!
@wanggogo1979 5 місяців тому ⁺²
Amazing, I hope to truly understand the mechanism of stable diffusion through this video!
@anthonymalagutti3517 5 місяців тому ⁺²
excellent explanation - thank you so much
@kyn-ss4kc 5 місяців тому ⁺¹
Amazing!! Thanks for this high level overview. It was really helpful and fun 👍
@MikeTon 2 місяці тому
Really incredible job of stepping through the HELLO WORLD of image generation, especially how the video compresses the key output a 4x4 pixel grid and clearly hand computes each step of the way!
@avijitsen8096 4 місяці тому
Superb, so elegant explanation. Big thanks Sir!
@reyhanehhashempour7157 5 місяців тому
Amazing as always!
@AravindUkrd 4 місяці тому
Thank you for such wonderful visualization that conveys an overview of complex mathematical concepts.
Can you please do a video detailing the underlying architecture of the neural network that forms the diffusion model?
Also, are Generative Adversarial Networks (GANs) not used anymore for image generation?
@skytoin 4 місяці тому
Great video, it gives good intuition to deep network architecture. Thanks
@abhaymishra-uj6jp 2 місяці тому
Really amazing work easy to understand and grasp doing a great deal for the community thanks alot..
@priyankavarma1054 5 місяців тому
Thank you so much!!!
@olesik 5 місяців тому ⁺²
So can we just use the diffusion model to denoise low quality or night time shots?
@SerranoAcademy 5 місяців тому ⁺¹
Yes absolutely, they can be used to denoise already existing images.
@qwertyntarantino1937 4 місяці тому
thank you
@samirelzein1095 5 місяців тому
Amazing deep dismantling job of complex structures. that s real ML/AI democratization.
@abhishek-zm7tx 2 місяці тому
Hi @Louis. Your videos are very informative and I love them. Thank you so much for sharing your knowledge with us.
I wanted to know if "Fourier Transforms in AI" is in your pipeline. I request you to please give some intuitions around that in a video. Thanks in advance.
@SerranoAcademy 2 місяці тому
Thanks for the suggestion! It's definitely a great idea. In the meantime, 3blue1brown has great videos on Fourier transformations, take a look!
@ASdASd-kr1ft 5 місяців тому
Could be that the diffusion model is trained to learn what amount of noise have to be removed from the input image instead the image with less noise? That is what i understended from others sources, cause they say that that is more easy for the model. Thank you, and good video, very enlightening
@BigAsciiHappyStar 15 днів тому
Muy BALL-issimo 😄 Loved the puns!!!!!😋😋😋
@olesik 5 місяців тому ⁺²
Thanks for teaching Mr Luis! I still remember fondly you teaching me machine learning basics over drinks in SF
@SerranoAcademy 5 місяців тому
Thanks Jon!!! Great to hear from you! How’s it going?
@melihozcan8676 3 місяці тому
Serrano Academy: The art of Understanding
Luis Serrano: The GOD of Understanding
@SerranoAcademy 3 місяці тому ⁺¹
Thank you so much, what an honour! :)
@melihozcan8676 3 місяці тому
@@SerranoAcademy Thank you, the honour is ours! :)
@maxxu8818 2 місяці тому
Hello Serrano, is there paper like attention is all you need for Stable diffusion?
@SerranoAcademy 2 місяці тому ⁺¹
Good question, I'm not fully aware. There's this but I'm not 100% sure if it's the original: stability.ai/news/stable-diffusion-public-release
I always use this explanation as reference, there may be some good leads there jalammar.github.io/illustrated-stable-diffusion/
@maxxu8818 2 місяці тому
thanks @@SerranoAcademy 🙂
@NVHdoc 5 місяців тому
(at 17:25), the image on the right, baseball and bat should have 3 gray squares right? Very nice channel, I just subscribed.
@SerranoAcademy 5 місяців тому
Thank you! Yes, ball and bat should be three gray or black squares. Since these images are not so exact, there could also be dark gray, or some variations.
@parmarsuraj99 5 місяців тому ⁺¹
🙏
@AI_Financier 5 місяців тому
Finally the diffusion penny dropped for me, many thanks
@850mph 23 дні тому
This is wonderful…
Perhaps the best low-level description of the diffusion process I’ve seen….
But discrete images of bats and balls represented as single pixels- are a long way away from a PHOTO REALISTIC pirate standing on a ship at sunrise.
What I can’t get my head around is how these discrete images (which actually exist in the multi-dimensional data set space) are combined, really, grafted together (parts pulled from each existing image) into a single image with correct composition, scaling, coloring, shadows, etc.
If I lay even a specifically chosen (by the NN) bat and ball pictures over each other to produce a “fuzzy” combined image (composition) and then use another NN to sharpen the fuzzy image into a crisp composition with all the attributes defined in the prompt and pointed to by the embeddings….
There’s still too much magic inside the DIFFUSION black box which I just don’t understand…. Even understanding the denoising and self-attention processes.
@850mph 23 дні тому
I guess what I have not been able to determine after watching maybe 30-35 hours of Diffusion videos.. is specifically how the black box COMPOSES a complicated scene BEFORE the process begins which “tightens” the image up by removing noise between the given and target in successive passes of the decoder.
I get the fact (one) that the prompts correspond to embeddings, and the embeddings point to some point in multi-dimensional space which contains all sorts of related info and perhaps a close image representation of the prompted request….. or perhaps not.
I get the fact (two) that the diffusion process is able to generate virtually any complicated scene starting from random noise when gently persuaded to a target by the prompt….
What I don’t understand is how the black box builds a complicated FUZZY image once the various “parts” of the composition are identified.
Does the composing process start with a single image if available in the dataset and scale individual attributes to correspond with the prompt…?
-or-
Does the composing process start with segmented attributes, scale all appropriately, and combine into a single image…?
A closer look at how the scene COMPOSITION works would be a great addition to your very helpful library of vids, thnx.
@850mph 23 дні тому
Ok… for those with the same “problem…”
The missing part, at least for me, is the “classifier” portion of the model which I have NOT seen explained in the high-level Diffusion explanation vids.
This tripped me up…
Here is good vid and corresponding paper which helps understand the “feature” set extraction within the image convolution process which penultimately creates an “area/segment aware” data-set (image) which can be directed to include the visual requirements described in a text prompt.
ua-cam.com/video/N15mjfAEPqw/v-deo.htmlsi=6sZxibtFvjrVNHeE
In a nutshell… the features extracted from each image are MUCH more descriptive than I had pictured allowing for much better interpolation, composition and reconstruction of multiple complex forms in each image.
Of course the queues to build these complex images all happen as the model interpolates its learned data, converging on the visual representation of the text prompt, somewhere in the multi-dimensional space which we can not comprehend… so in a sense it’s still all a black box.
I don’t pretend to understand it all… but it does give the gist of how certain abstract features within the models convolutional layers blow themselves up into full blown objects.
@850mph 21 день тому
Another good short vid which shows how diffusion accomplishes image COMPOSITION:
ua-cam.com/video/xtlxCz349WU/v-deo.htmlsi=PJl_vWueiQdZxLn1
@850mph 15 днів тому
Another good vid which gets into composition:
ua-cam.com/video/3b7kMvrPZX8/v-deo.htmlsi=AwNQJAjABKn-iV4F

Наступне

Автоматичне відтворення

How Stable Diffusion Works (AI Image Generation)