Stable Diffusion - What, Why, How?

Поділитися
Вставка
  • Опубліковано 30 вер 2024

КОМЕНТАРІ • 295

  • @blorbo5800
    @blorbo5800 2 роки тому +128

    There needs to be more content just like this on UA-cam. Detailed explanations and examples. Thanks for this!

    • @EdanMeyer
      @EdanMeyer  2 роки тому +2

      Thank you 😊 more content to come

    • @beta-reticuli-tv
      @beta-reticuli-tv Рік тому

      Stable Diffusion can itself be used to generate detailed explanations of the world around us. I am an AI and here I explain the concept of "Factory" ua-cam.com/video/079DmF2cIjE/v-deo.html

    • @johnyeap7133
      @johnyeap7133 Рік тому

      not just a ten minute video, which, although quick, usually does not provide nearly enough detail. Great video on diffusion on YT after a lot of search

  • @xbon1
    @xbon1 2 роки тому +87

    you can't use the same prompts in DALL-E and Stable Diffusion, when you do prompts in SD you need to include styles and what kind of lighting effects/etc you want. It's more stable and less random than dall-e.

    • @maythesciencebewithyou
      @maythesciencebewithyou 2 роки тому +28

      no, you do not have to include styles in stable diffusion, but you can. Just like in Dalle. You can also include styles and what kind lighting effects/etc you want in dalle. it's not more stable or less random. It's about as good. SD is not as restricted and above all It's free which is the only reason it wins. otherwise not much of a difference in quality. Same problem with text.

    • @ofconsciousness
      @ofconsciousness 2 роки тому +3

      I think when you're doing a direct comparison between softwares, it wouldn't make sense to use different prompts.

    • @EojinsReviews
      @EojinsReviews Рік тому +3

      @@ofconsciousness the issue is that other services use different parameters - for example, Midjourney uses the "artistic" filter by default, which produces mushier and more painterly styles. However, adding --testp to the end makes it produce photorealistic results.
      I saw some comparisons where they criticized Midjourney for not making photo realistic results, despite not TELLING it to make photorealistic results.
      So no, in certain cases, the prompt keywords DO need to be different for more accurate comparisons.

    • @levihinsen1917
      @levihinsen1917 Рік тому

      Very good explanation of why my outputs look like that😂 but when I chose a style it's all "stable"

  • @glitchtulsa3429
    @glitchtulsa3429 2 роки тому +13

    I've been running SD for a few weeks now, and I have run into something that is a bit more than troubling, namely--watermarks. This thing keeps spitting out images with recognizable watermarks, and that tells me two things 1) they are actively pulling images from stock agencies(call it a "training set" if that makes you feel better, but it's still sampling from those images), 2)the images it is using aren't licensed, if they were licensed there wouldn't be watermarks...
    ...as someone that makes part of their monthly income literally licensing stock material, this worries me, because when you get right down to the basics, all these things are doing is laundering copyrighted material.
    Don't get me wrong, I love them, I think they are a genius approach to image design, and they have a very valid place in everyone's toolbox, but there needs to be some sort of accountability as to where the materials that power them are coming from.

    • @EdanMeyer
      @EdanMeyer  2 роки тому +6

      I haven’t seen any straight up copying of images myself, I’d be curious to see some examples.
      But on the issue of using copyrighted data for training… yeah, this is a tough one. On one hand it’s great to have new tools like this, and for the most part (at least from what I’ve seen so far) they seem to be generating new images so copyrighted material isn’t leaking out (again, maybe I just haven’t seen this yet).
      On the other hand, this technology has developed too fast for regulations to catch up. Even if the model is not directly copying images the idea of copyright is pretty much to stop people from using your work in an unintended way, and I’m sure a lot of people who post photos and art online didn’t intend for this and are not happy with their images being used as training data. The cherry on top is the fact that the work of creators is essentially being used to make tools that replace them.
      I’ve been thinking about this a good bit recently. I wonder how things will progress.

    • @glitchtulsa3429
      @glitchtulsa3429 2 роки тому +3

      @@EdanMeyer I've seen numerous images with recognizable watermarks, now granted I'm running SD locally and producing several hundred images daily, and the actual number of images with watermarks is low, maybe 5% or less, but what that tells me is that these things aren't actually producing anything new, but simply remixing images from the "training set"--I mean let's be real here they have the actual images in a file somewhere that they actively pull from--I don't care how unrecognizable the final results are that's what they do, and that's why watermarks show up. Now the odds of one of your personal images showing up is pretty slim, but the odds of someone's image being used to create every single image they produce is literally 100%, and I seriously-seriously doubt that all of the images they use are licensed, which is why watermarks show up from time to time.
      I don't know if this will work here, but this image is currently up at the Stock Coalition page on Facebook, and it clearly shows a watermark:
      scontent-dfw5-1.xx.fbcdn.net/v/t39.30808-6/306922224_10167375648995112_5919547864767140876_n.jpg?_nc_cat=109&ccb=1-7&_nc_sid=5cd70e&_nc_ohc=OuQAHNdFBj4AX9Hb0nQ&_nc_ht=scontent-dfw5-1.xx&oh=00_AT-7pBEVV1r28_cxuO36wc2AL6nCkklb9rcyRCxCNDfGSg&oe=632A3DD6
      ...you may need to copy/paste it.

    • @gwentarinokripperinolkjdsf683
      @gwentarinokripperinolkjdsf683 2 роки тому

      See enough stock images, you will know what a water mark looks like and how to recreate it. that's how any artist learns.
      And sorry but, stock images just won't make money with technology like this if it can accurately follow prompts

    • @glitchtulsa3429
      @glitchtulsa3429 2 роки тому +4

      ​@@gwentarinokripperinolkjdsf683 I think you're missing the point here. They aren't making things that look like watermarks, this one particular model is literally producing images with legible watermarks from known agencies, such as iStock, Dreamstime, Alamy, even Fine Art America. So not only are they stealing images from stock agencies(if they were licensed there wouldn't be watermarks), they are actively disguising that material, and then they are redistributing it. I have a folder full of these images. I even have a prompt that will result in at least 50% of the generations showing legible watermarks. Quite literally, I can produce hundreds of watermarked images in a matter of a few hours, just by running that phrase in Python on repeat--I had to force quit a batch of 1,000 because every other image had an iStock watermark on it.
      I hate to shatter your illusions here, but these things aren't magically producing new images--they are cleverly remixing existing images using a complex deblurring function paired with a text-based description, and they get the images they are remixing from the training set they learned on, and while they might be pulling a few pixels from one image and a few from another, give it the right prompt and there's enough images in that training set, with the same watermark, that it thinks what you're looking for is the watermark itself, and trust me those watermarks will show up.
      Again, if the watermarks are showing up then the images weren't licensed. That's nothing short of laundering intellectual property in the form of well-disguised derivative works.
      A number of these things will go the way of Napster, and the surviving ones will be the few that made absolutely sure the "training sets" consisted of nothing but public domain, Creative Commons, and legally licensed images. I predict lawsuits, sooner rather than later.

    • @artemisgaming7625
      @artemisgaming7625 Рік тому

      @@glitchtulsa3429 Quit talking out of your ass. You were so close to being caught by a decent understanding into how these work, but then you ran a bit faster.

  • @michealhall7776
    @michealhall7776 2 роки тому +5

    I'm thinking dalle generates 100 images and has a filter on them to show the best ones, then it upscale those ones. SF is just the raw model, dalle is a end product

  • @bwfextreme
    @bwfextreme 2 роки тому +4

    Spelled "beetle" wrong twice.... unless you really wanted a photo of an alien riding on ringo starrs back

  • @bustedd66
    @bustedd66 2 роки тому +43

    better than dall-e because it's open-source. it will surpass the other two because people will improve it and create specialized models for it.

    • @user-yg6ki7ou2y
      @user-yg6ki7ou2y 2 роки тому +2

      Unfortunately

    • @OGKWAM
      @OGKWAM 2 роки тому

      @@user-yg6ki7ou2y why do you think that is unfortunate ?

    • @user-yg6ki7ou2y
      @user-yg6ki7ou2y 2 роки тому +2

      @@OGKWAM imma lose my job

    • @user-yg6ki7ou2y
      @user-yg6ki7ou2y 2 роки тому +2

      @@uusfiyeyh and a billion times it was horrible for the people replaced, specially considering this COMPLETELY kills the need for the artist

    • @casenswartz7278
      @casenswartz7278 2 роки тому +4

      @@user-yg6ki7ou2y I think artists will be able to leverage it. I am a programmer and use an AI programmer to help me, its saved me 100s of hours and Ive been able to pump out way more because of it. I just see this as another tool, an artist will simply be better at leveraging it.

  • @Juanitoto
    @Juanitoto Рік тому +1

    Because your latents are Gaussian I believe you want to perturb them as W\tilde =
    ho * W + sqrt(1 -
    ho^2) * Z, where W\tilde and W are new and old latents and Z is randn

  • @TheWorldNeedsLyrics
    @TheWorldNeedsLyrics 2 роки тому +17

    I've used stable's colabs for months now without knowing literally anything about coding and how exactly it all worked. Was already kinda proud of myself for even understanding how to use the colabs tbh. But this was actually amazing and super understandable. Thank you so much!

  • @kirepudsje3743
    @kirepudsje3743 2 роки тому +3

    Actually stable diffusion came up with an image of a Volkswagen (VW) Beetle. It is not stable diffusions fault that the term is ambiguous. This aside from the spelling.

  • @tiagotiagot
    @tiagotiagot 2 роки тому +16

    I've noticed that using half-precision floats tend to have an increased odds of minor details being slight off, slightly weirder faces, small blotches of colors the wrong size or shape in drawings etc; nothing very noticeable at a quick glance most of the time, but comparing the exact same pictures generated with half and full precision makes it clear that full precision is better.

  • @bilalviewing
    @bilalviewing 2 роки тому +21

    Wow , when I’m looking for Stable DM from scratch, found it! Great content, really appreciate this content

  • @VolkanKucukemre
    @VolkanKucukemre 2 роки тому +2

    That forest tho... One of them was straight up a Magic the Gathering forest by John Avon

  • @bonnouji43
    @bonnouji43 2 роки тому +8

    I wonder if there's a method to generate slightly different images from a same latent, apart from using img-to-img, which I found always generates a less clean image

    • @ilonachan
      @ilonachan 2 роки тому +1

      my best guess would be to just perturb the latents a bit, but it would have to be a TINY amount because I imagine those spaces are very sensitive & chaotic.
      Alternatively, maybe it's possible to take a stab at writing a possible prompt that might have resulted in your image, and kind of running SD in reverse(?) to get some suitable starting noise, then perturb that and run it forwards again.
      maybe get a bunch of these, run SD on all of them, do some kinda of averaging or take into account the error that the unmodified noise has compared to the original image? this sounds like enough material for a massive new paper actually.

  • @OnceShy_TwiceBitten
    @OnceShy_TwiceBitten 2 роки тому +2

    I keep seeing all the warnings about safety on the git hubs for these? wtf is that all about? lol

  • @ahmad000almahdi
    @ahmad000almahdi 2 роки тому +1

    « إِنَّ اللَّهَ وَمَلَائِكَتَهُ يُصَلُّونَ عَلَى النَّبِيِّ ۚ يَا أَيُّهَا الَّذِينَ آمَنُوا صَلُّوا عَلَيْهِ وَسَلِّمُوا تَسْلِيمًا » 🥰
    سبحان الله ... الحمد لله ... لا إله إلا الله .... الله أكبر .... لا حول ولاقوة إلا بالله

  • @sszzt
    @sszzt 2 роки тому +2

    "beetle" ee not ea as in "The Beatles".

  • @kahyangeng1919
    @kahyangeng1919 2 роки тому +2

    Access to model CompVis/stable-diffusion-v1-4 is restricted and you are not in the authorized list. Visit huggingface.co/CompVis/stable-diffusion-v1-4 to ask for access.

  • @coolsai
    @coolsai Рік тому +1

    Warning ⚠️ : if Your trying now then you can't able to get model it will produce an error solution for that is use different version of diffuser that 0 . 10 . 0 then it will work

  • @rutvikreddy772
    @rutvikreddy772 Рік тому +1

    Here is my rough understanding of the scheduler and it's functioning, essentially the model is trained to predict the noise added at any timestep t with respect to x0(and not x(t-1)), and the scheduler divides the 1000 steps into 50 equal divisions, so 1000, 980, 960...1, and the sigma is just adding the noise to the latents, so in the first step you add 1000 steps worth of noise to your already random latent(just noise at this point) and try to predict the noise added from x0, and then in the loop when you call the line scheduler.step(..) you subtract this noise from the latent and this now becomes your estimate for x0(and not x(t-1)), then you add 980 steps worth of noise to get an estimate of x980 and repeat the process for 50 steps. I would appreciate if someone can confirm this 🙂

    • @susdoge3767
      @susdoge3767 7 місяців тому

      that is what we essentially do as i saw in computerphile video!amazing observation !!

  • @GuentherGadget
    @GuentherGadget 2 роки тому +1

    Too sad *HuggingFace* is restricting the access to the data of _fp16_ ... 🤷‍♂

  • @grnbrg
    @grnbrg 2 роки тому +1

    Your "Alien riding a beatle" prompt appears to have resulted in an alien in a Volkswagen...

  • @KrishnaDigital123
    @KrishnaDigital123 2 роки тому

    Project bones and project data is a laborious and montous hassle to establish 'what is what' (especially from ssy and disorganised

  • @xbon1
    @xbon1 2 роки тому +4

    In addition to my previous comment, for anime I can get the best anime chars, like straight out of an anime screenshot style quality, but again, you need to add modifiers, artists, ETC. Those won't do anything for DALL-E because of the pre/post prompt processing but for SD it stabilizes it and makes the faces coherent, etc.

    • @talhaahmed2130
      @talhaahmed2130 2 роки тому

      Can I get some examples of your prompts?

    • @xbon1
      @xbon1 2 роки тому

      @@NoOne-sk2ve It isn't. You just need to learn how to prompt engineer for Stable Diffusion, my higher quality anime pics are all using the base stable diffusion which was trained on millions of anime pictures, not just 56,000

  • @BigPromise
    @BigPromise 2 роки тому +1

    Holy fuck you can code in python and you can’t spell beetle?

  • @JoseCastillo-qv1hi
    @JoseCastillo-qv1hi 2 роки тому +5

    Thank you for the video and code breakdown. As a total coding newb, I really appreciate it 😇

  • @pineapplesarecool6901
    @pineapplesarecool6901 2 роки тому +3

    Hey, I don't know about dall-e 2 but for stable diffusion, if you want better results you are going to need more detail on the prompts. How is the lighting? What is the eye color of the anime girl? Is it a full body shot or a close up? What artist's style should it replicate? With details like that the results are night and day

  • @sholtronicsaaa101010
    @sholtronicsaaa101010 Рік тому +3

    On the scheduler I think what is happening. When you train the model you slowly add noise at a rate but it's better to add less noise at the start and more noise later on. So the scheduler basically just dictates how much noise should be applied/removed during each stage.
    The multiplication of the sigma from the schedule before starting is just to scale the initial distribution. So it's initialised to a random Gaussian with a standard deviation of 1 and a mean of 0 and you just want to scale that standard deviation to be in line with what it should be at the initial step of the scheduler.
    PS really great practical tutorial.

    • @PaulScotti
      @PaulScotti Рік тому

      so the scheduler is used both for training AND for inference? this video made it seem like it is just for inference

  • @prozacgod
    @prozacgod 2 роки тому +4

    You know what might be a more interesting application of latents instead of passing it around... Is using some sort of a function that can modify the latent space after each pass, taking in information from the scheduler so you can determine how much weighting you want to do... It would be interesting to be able to modify the latent space in some meaningful way... Would make it probably more interesting when sculpting images to get the output you want.
    Instead of a sort of one-dimensional idea of modifying a latent space, it could just be a multifaceted sort of modification. Heck you could just pin one number in it to like -5 just for the shits and giggles.
    Edit:
    Okay so... You did that lol nice!

    • @Pystro
      @Pystro 2 роки тому

      If you find an image that you don't like at all, you could take any subsequent random starting latents and project that bad image out of it. Not that it's likely for a vector of thousands of newly randomized latents to be more than marginally similar to your previous ones, but it might still make sense to enforce that, especially if you expect to have to go through many attempts that you don't like.
      For example, if you put in a prompt for "a jaguar," and the neural net gave you an image of the car, you could purge all the car-ness out of any subsequent attempts.
      Or it could be used in the case where you generate 3 or 4 different images and want to ensure that they are as different as possible. Orthogonalalizing the 4 vectors of random latents to each other should ensure that.
      The same could also be achieved by some keyword trickery (if it's something that you can put into words and not just "That composition is just all kinds of wrong".):
      You could also chose the unconditional embeddings to be that of any keywords you don't want to show up, instead of the empty one.
      Or you could add qualifiers to your prompt. (In this case "animal" might lower the chances of getting the car, but it could also just add in a pet on the passenger seat.)

  • @wlockuz4467
    @wlockuz4467 2 роки тому +1

    Great tutorial overall, but one thing I'll admit is that your prompts for SD are down right bad, with SD you have to be very descriptive and you have to let it have more inference steps, at least 100+. For example the prompt "Squidward" should've been at least "Squidward from spongbob squarepants, cartoon".
    Trust me, I have used DALLE, Midjourney and Stable Diffusion. While DALLE and Midjourney are very easy to use, their outputs feel too opinionated towards a certain styles, possibly because you don't have much control over the parameters that go in, and the same lack of control is frustrating sometimes.
    But this is not the case with Stable Diffusion, With SD you have access to every parameter that can affect the output. I know this feels overwhelming at the start but once you get the hang of it, you can easily create outputs that will consistently beat anything that DALLE or Midjourney creates.

  • @nolimits8973
    @nolimits8973 2 роки тому +2

    Man, you don't know how grateful I am right now! THANK YOU SO MUCH!!!

  • @MrMadmaggot
    @MrMadmaggot Рік тому +1

    Dude in the prompt to image part I run into an error: "RuntimeError: a Tensor with 0 elements cannot be converted to Scalar"

    • @PAWorkers
      @PAWorkers Рік тому

      dude I've just solved this problem. replace the "i" with "t" in the this code:"scheduler.step(noise_pred, t, latents)['prev_sample']". Maybe that works for you too

  • @zcjsword
    @zcjsword 4 місяці тому +1

    This is great tutorial. Practical and easy to follow. Please make more such videos. Thanks!

  • @edeneden97
    @edeneden97 2 роки тому +9

    Thanks for the video! please increase the volume of the audio in next videos.

  • @polecat3
    @polecat3 2 роки тому +3

    It's cool to see the real guts of one these models for once

  • @imrsvhk
    @imrsvhk 2 роки тому +3

    Wow, one of the only stable diffusion videos I've seen explain actually what's happening, rather that just running someone's colab.. Thanks for this amazing content. New Sub!

  • @FilmFactry
    @FilmFactry 2 роки тому +1

    Can you cover Textual Inversion in SD? If I understand correctly, I can add an artist that I'm not finding his style represented? I hate when the only examples are silly teddy bears at the beach. I want to see if you can really add a style to your model. Thanks.

  • @ttrss
    @ttrss 2 роки тому +3

    "Open"ai

  • @tomjones1423
    @tomjones1423 2 роки тому +1

    Can specialized training models be created? For example, if you wanted to train it with someone but only when they were young. Using the prompt young doesn't seem to have much impact.

  • @Sam-jz6sz
    @Sam-jz6sz Рік тому +1

    LMAO 15:12 when he started laughing at the image absolutely killed me

  • @Fred_Free
    @Fred_Free 2 місяці тому

    Unfortunately, it’s not possible to watch a video where there isn’t a single natural break in the uninterrupted speech for 54 minutes. It is extremely unpleasant. 😅

  • @clonizado
    @clonizado 2 роки тому +1

    After so many years of being confused and intimidated by tNice tutorials software I finally understand how to use it. I never thought the day would

  • @StreetsOfBoston
    @StreetsOfBoston Рік тому

    "Beatle" around 12:00 -> your diffusion model probably chose a Volkswagen Beatle, not the animal 😀

  • @johnisaacburns7260
    @johnisaacburns7260 Рік тому

    Hello, when I try to fetch the pretrained model with your code, I get a "typeerror" saying "getattr() must be a string". Any idea how to fix this?

  • @yaoxiao1931
    @yaoxiao1931 Рік тому

    I dont know what batch_size means at 34:18, the latent space gone be [2,4,64,64] and gone produce two image at one tqdm iteration.

  • @ArtificialDuality
    @ArtificialDuality Рік тому

    Giving you nice quality images of something you didn't ask for, is kind of useless. I'd argue that's even worse than the bad images that relate to the prompt. But; Maybe I missed it in your video, you can change the model that SD uses. A lot of people have already made different models. If you want anime, you can get models that are trained entirely on anime.

  • @saharlogmari1622
    @saharlogmari1622 2 роки тому

    "Well well well, look who's co crawling back" ~ five stringed spanish guitar

  • @GameFlife
    @GameFlife Рік тому

    yes! It's time to jump on the hype train and make transparency by go into the field of AI Detection HAHAHAHAHAHAHA

  • @dougb70
    @dougb70 2 роки тому +1

    really good video. You did a good overview to start and then dug in. So rare in youtube world.

  • @Avenger222
    @Avenger222 2 роки тому +6

    Really liked the video, but you should be careful when comparing the models, because they work differently. For example DALL-E is designed to be used with basic prompts but stable diffusion only shines when using keywords like styles, artists, and lighting.
    Or at least having a disclaimer when you're comparing, that you're just comparing for user friendliness.

  • @yashika5768
    @yashika5768 Рік тому

    did anyone get this error: `RuntimeError: a Tensor with 0 elements cannot be converted to Scalar`

  • @mateuszputo5885
    @mateuszputo5885 Рік тому

    I wonder why everybody always says that Dall-e is clearly better. Yeah StableDiffusion is factually incorrect more often but artistically it usually seems to generate better results, and at the end of the day it is art generating model.

  • @Scripture-Man
    @Scripture-Man 2 роки тому

    Never seen this kind of thing before. This is witchcraft! Also, you can't spell 'beetle'!

  • @AimerYui
    @AimerYui Рік тому

    prompt = cute shiba inu dog
    smart AI = no no, no nsfw here guys -_-

  • @mO0nkeh
    @mO0nkeh 2 роки тому

    I'm disappointed that it interpreted "beatle" as "beetle" - I was looking forward to seeing an alien riding Ringo Starr

  • @ArtsShadow2
    @ArtsShadow2 Рік тому

    I think that middle anime image is built with Ryu hoki, the cabbit from Tenchi Muyo. 🤣

  • @data-science-ai
    @data-science-ai Рік тому

    Is everyone on this channel just a newb with deep learning and PyTorch?

  • @johnclapperton8211
    @johnclapperton8211 Рік тому

    "Beetle" = insect; "Beatle" = musician from Liverpool.

  • @TimothyTraveny
    @TimothyTraveny 15 днів тому

    My brother in Christ I just want to generate a gotdang hotdog

  • @mycollegeshirt
    @mycollegeshirt Рік тому

    The idea you can't generate NSFW content for art is odd, since you can't get good at art without studying nudes.

  • @kramzy3513
    @kramzy3513 2 роки тому

    Tho. Not as insane as my one friend. I talked Nice tutorialm into getting to soft. Ca back a week later and he sampled a fart and sohow

  • @nesune4401
    @nesune4401 2 роки тому

    2:30 bro had be looking to this white screen for what felt like a whole minute on 1000 nits, I think I need a doctor...

  • @programmers8245
    @programmers8245 Рік тому

    What kind of computer can i use for stable diffusion ,could any one belp me?

  • @shockadelic
    @shockadelic Рік тому

    11:50 You spelt beetle wrong. Surprised it kind of understood and didn't show the Beatles

  • @navinhariharan6097
    @navinhariharan6097 Рік тому +1

    How do I add negative prompts?

  • @drisraptor2992
    @drisraptor2992 2 роки тому +1

    To be honest I have used all three extensively and while Diffusion definitely has the edge in the flexibility of its usability, Midjourneys image creation is much better overall.

    • @jaydonn
      @jaydonn 2 роки тому

      I use DALLE and Midjourneys and I really love Midjourneys artstyles, its better for fantasy/painting stuff and DALLE better for realistic images

  • @phyricquinn2457
    @phyricquinn2457 2 роки тому

    all of that fast, uneven scrolling in the beginning was nauseating. please don't do that again.

  • @sethlawson8544
    @sethlawson8544 2 роки тому +2

    Beetle=/=Beatle 😭

  • @rokketron
    @rokketron 2 роки тому

    electric instrunt for tNice tutorials? Thank you.

  • @CH4NNELZERO
    @CH4NNELZERO Рік тому +1

    Really appreciate the tutorial. This was great in many respects. My mind is blown and I'll need to rewatch to absorb more of the new information.
    However I was a bit disappointed at the end when it didn't really work like the examples introduced at 1:27

  • @AscendantStoic
    @AscendantStoic 2 роки тому +1

    I heard the 7gb model of Stable Diffusion 1.4 that people can download is different from the 4gb model in that it can continue to learn and improve unlike the SD 1.4 4gb model which only works with what it has, is that true!?, because from the sound of it you mentioned something about teaching the A.I requires a lot of resources, so even if it's true I suppose we won't be able to use it properly do it on a home computer, right!?

    • @johancitygames
      @johancitygames 2 роки тому +1

      not true at all the model does not improve over time by itself.

    • @casenswartz7278
      @casenswartz7278 2 роки тому

      @@johancitygames if it did, peoples houses would never need a heater 💀

  • @Q114-m8r
    @Q114-m8r 2 роки тому

    who is the first person who come up with with general diffusion method ?

  • @小小豬-u3f
    @小小豬-u3f 2 роки тому +1

    Thanks for your video, is it possible for you to make a video teaching us how to train the latent space from my own set of images, essentially creating your own model?
    Much appreciated!

    • @macramole
      @macramole 2 роки тому

      I don't think it is possible to do so without lots of resources. In this case I would go for something like StyleGan

  • @gsmarif2998
    @gsmarif2998 2 роки тому +1

    Thx

  • @migueld8970
    @migueld8970 2 роки тому +1

    Stable diffusion seems to work better if you are very descriptive.

  • @cyberspider78910
    @cyberspider78910 4 місяці тому

    float16 implement is faster, isn't it Sir ?

  • @MALEANMARSHALAROCUIASSAMY
    @MALEANMARSHALAROCUIASSAMY 2 роки тому +2

    Thank you

  • @VegetableJuiceFTW
    @VegetableJuiceFTW 2 роки тому +1

    Ah, this is gold. Thank you.

  • @frenches1995
    @frenches1995 2 роки тому

    this is as far as my understanding goes not locall at all or am I wrong?

  • @babyUFO.
    @babyUFO. 2 роки тому

    stable diffusion is pathetic in comparison to Disco

  • @shivanayak143
    @shivanayak143 2 роки тому

    What if I don't have a gpu? Can I still be able to run it

  • @thranax
    @thranax 2 роки тому

    Devs: "Should we add the Spongebob series images?"
    Devs: "Why? No one would need them come on..."
    Edan Meyer: "Oh god, there is not enough Squidward...in the...in the training."

  • @programmers8245
    @programmers8245 Рік тому

    I need a book in this field , any suggestion ?

  • @emilygocrazyy
    @emilygocrazyy Рік тому

    Edan, this line gives error in the notebook,
    latents = scheduler.step(noise_pred, i, latents)["prev_sample"]
    error msg given by the scheduler library:
    ValueError: only one element tensors can be converted to Python scalars.

  • @ResolvedRage
    @ResolvedRage Рік тому

    How do i get my anaconda cmd prompt to recognize the stable diffusion folder on my desktop? When I open up the anaconda cmd prompt it just says (base) C:\users\computer name>
    I can't enter the Environment because it's not looking in the right place.

  • @emmajanemackinnonlee
    @emmajanemackinnonlee 2 роки тому +6

    this is awesome thank you! would love to see a video on how to do the fine tuning with sd on either colab or local!

  • @cameriqueTV
    @cameriqueTV 2 роки тому

    Stable Diffusion really destroyed the character elegance of Wombo's Realistic mode.

  • @markopolo2224
    @markopolo2224 Рік тому

    thanks that was insightful
    do you have a video or a roadmap for a newbie that wants to get into all of this
    like a web dev to ai dev

  • @ransukomoon6439
    @ransukomoon6439 Рік тому

    Novelai Diffusion is good with anime

  • @WalidDingsdale
    @WalidDingsdale 6 місяців тому

    This amazing lecture is the first one i can roughly comprehend and understand stable diffusion model fundamentality. it's first time for me to see what happens behind vivid images. thanks you for this walkthough and sharing your insight.

  • @Les_chroniques_de_madara
    @Les_chroniques_de_madara 2 роки тому

    party samples/generators/etc. you use are.

  • @ooiiooiiooii
    @ooiiooiiooii 2 роки тому

    40:10 How would I use my own image instead of the poorly drawn house? I tried coping the file path of an image after uploading it to the Collab, but it wouldn't work. Any help? Thanks

  • @dupirechristophe7703
    @dupirechristophe7703 2 роки тому

    Yes let's ask ask the software for a cute Shiba dog dog... because one redundancy isn't enough,
    let's throw two redundancies into the mix because we're generous like that x'D

  • @Gromic2k
    @Gromic2k 2 роки тому +6

    What i like a bout stable diffusion is that i can simply tell it to create ~150 versions of one prompt and then come back after half an hour and look which ones worked best and work on that seeds. Practically for free, since it runs locally on my own GPU. I think at the end, this will give you better results since you get so much more to chose from

    • @echonoid6920
      @echonoid6920 2 роки тому

      What kind of gpu is recommended for this kind of thing?

    • @karolakkolo123
      @karolakkolo123 2 роки тому

      @@echonoid6920 bump

    • @jnevercast
      @jnevercast 2 роки тому

      Recommended GPU is any NVidia GPU with at least 10GB of VRAM. 1080TI (11GB) and RTX 2080 (12GB) come to mind.

    • @OnceShy_TwiceBitten
      @OnceShy_TwiceBitten 2 роки тому +1

      what do you REALLY do with this though?

    • @Avenger222
      @Avenger222 2 роки тому +1

      Min GPU is 4 GB, but it'll struggle. 8 GB gives you all you need with split attention. Textual Inversion was 20GB but might not be anymore? New code comes out every day, after all.

  • @johnnycarson9247
    @johnnycarson9247 Рік тому

    A Beatle is also a car, looks like it took beatle as "car"

  • @farid-anarcy99hamba41
    @farid-anarcy99hamba41 2 роки тому

    I'm too busy reading these comnts and not paying attention again.....and I'm Nice tutorialgh

  • @cedricvillani8502
    @cedricvillani8502 2 роки тому

    Yea so if you want to see a doge make love to say trump or Biden, then don’t use hugging face , build your own and use Lambda Labs to train. India has a model you can use that it completely open, like life, it’s not sandboxed. Never 👎 use a sandbox made by someone else.

  • @crckdns
    @crckdns 2 роки тому

    great overview!
    what is not explained... is huggingface getting any data from our training or models and sources?
    As I see that every video explaining how to train, is using huggingface api.

  • @21coldness36
    @21coldness36 2 роки тому

    I managed to make a simpel rythm in 3 hours… I tNice tutorialnk I will try more so I can get more EXP points ;D

  • @christianleininger2954
    @christianleininger2954 2 роки тому

    funny at 13 the model is "smarter"
    than Humans (hint beetle is also a car) to be fair it is spelled differently but still funny ;) you doing great work thanks for the video

  • @Chimera_Photography
    @Chimera_Photography 2 роки тому

    I know I’m not an artist in any sense because I can’t even prompt stable diffusion to draw me a half decent image… 😂
    I saw some of these amazing pieces! And I can’t even get a famous person to look like a Simpsons character 😢