This was an amazing breakdown. I would love to see something of this nature repeated using numbers of descriptors. Prompt artists often use dozens of descriptions and details, and I'm curious how those effect the desired results, and when/if there is a point where it becomes less effective.
Your content is invaluable, these are true masterclasses! I have a question about something I find a bit confusing in the way you break down prompts into their elements. In some examples you seem to define a prompt as: Subject(s) + Prompt Ending. But in other parts of the video you talk about the prompt as composed by: Subject(s) + Neutral Elements, Descriptors, Quantity, Style and Artist. Do you reckon the latter elements are all part of the "prompt ending" or some of these elements are part of the Subject(s) description?
I consider anything that applies to the whole image to be the prompt ending (neutral descriptors like high resolution, artist, style). The subject is objects, numbers of objects, and specific descriptions (i.e. green dog). Though based on the results for colors I'm not sure SD makes that distinction like we do.
Amazing breakdown. As newbie on SD I always thought using deepbooru tags no matter what checkpoint I used were better than typing directly. especially the long ones. Thanks for the great guide.
Very useful! Even when training a model with text descriptions that would allow several subjects it still gets it wrong and duplicates one subject across them.
Wow! what an elaborate video! Thank you for the work! It will probably stay like this: Before I prompt myself to death, I prefer to use Inpaint, Outpaint, Image to Image and a bit Photoshop ... nevertheless: that was helpful and inspiring!
Thanks! I was considering doing a followup video to this one but then ControlNet nation attacked. I was also told to try Latent Couple but I haven't been able to get it to work
i am experimenting a lot with styles and are trying to use descriptor to describe the style i want to use. maybe a video how to craft a prompt to ask chatgpt for a style description of an artist? reason im moving away from artists as you pointed out in your video is styles via artist prompts are not universal (missing cellphone example in your video) i also noticed adding "in the style of artist xyz" can influence who is portrait in the image if its a person. also some artists are better represented in different token configurations. for example. some work with the prase "in the art style" some "style by" some "art by" and so on. bottom line SD1.5 is currently a mess and i wish the community would move on to develop tools and extensions for SD2.1
Thank god someone did it already, I almost do this stuff myself and other things. xd I think you need to post your stuff on reddit I think it will help more people.
I'm glad you enjoy the videos. I usually post on Reddit when I upload new videos but usually they're not super popular there. My Reddit username is the same as my channel name.
great vid..but after a few months of playing with Stable Diffusion, i think it should have been named "unstable illusion"...it has no idea what biological creatures are...like asking a Martian to describe a human...the key word is "artificial"...the misnomer or falsity is "intelligence"...artificial means fake..intelligence means the ability to think progressively...a 5yo can describe anatomy better than Stable diffusion..we humans are considered by A.I, to be nothing but static..i still enjoy messing around with it because it truly is in its infantcy and holds promise..but i am sorry..this does not compute..lol
I've discovered your channel and I'm mind blown by your hardwork !!
keep going bro..
This was an amazing breakdown. I would love to see something of this nature repeated using numbers of descriptors. Prompt artists often use dozens of descriptions and details, and I'm curious how those effect the desired results, and when/if there is a point where it becomes less effective.
I might do a followup focusing more on descriptors later on, but for now I'm more focused on ControlNet
Thanks for making these videos bro, very informative .. appreciate it!
Your content is invaluable, these are true masterclasses! I have a question about something I find a bit confusing in the way you break down prompts into their elements. In some examples you seem to define a prompt as: Subject(s) + Prompt Ending. But in other parts of the video you talk about the prompt as composed by: Subject(s) + Neutral Elements, Descriptors, Quantity, Style and Artist. Do you reckon the latter elements are all part of the "prompt ending" or some of these elements are part of the Subject(s) description?
I consider anything that applies to the whole image to be the prompt ending (neutral descriptors like high resolution, artist, style). The subject is objects, numbers of objects, and specific descriptions (i.e. green dog). Though based on the results for colors I'm not sure SD makes that distinction like we do.
Amazing breakdown. As newbie on SD I always thought using deepbooru tags no matter what checkpoint I used were better than typing directly. especially the long ones. Thanks for the great guide.
When you use the cutoff extension, that decrease concept bleeding a lot but I dont know how it actually works
I guess a part 2 of this video including controlnet capabilities would be extremely interesting to watch
Many thanks for taking the time to do all that (quite the electricity bill :P)
Very useful! Even when training a model with text descriptions that would allow several subjects it still gets it wrong and duplicates one subject across them.
7:00 is how i imagine "Sil" typing in prompts😂❤
Wow! what an elaborate video! Thank you for the work! It will probably stay like this: Before I prompt myself to death, I prefer to use Inpaint, Outpaint, Image to Image and a bit Photoshop ... nevertheless: that was helpful and inspiring!
CLIP ViT-L/14 has 8 attention heads. I'm surprised the concept-subject limit is three. (Although this more or less matches my testing.)
great video
Thanks! I was considering doing a followup video to this one but then ControlNet nation attacked.
I was also told to try Latent Couple but I haven't been able to get it to work
i am experimenting a lot with styles and are trying to use descriptor to describe the style i want to use. maybe a video how to craft a prompt to ask chatgpt for a style description of an artist?
reason im moving away from artists as you pointed out in your video is styles via artist prompts are not universal (missing cellphone example in your video)
i also noticed adding "in the style of artist xyz" can influence who is portrait in the image if its a person.
also some artists are better represented in different token configurations. for example. some work with the prase "in the art style" some "style by" some "art by" and so on.
bottom line SD1.5 is currently a mess and i wish the community would move on to develop tools and extensions for SD2.1
This is awesome
I always knew patrick was a secret genius.
Thank god someone did it already, I almost do this stuff myself and other things. xd
I think you need to post your stuff on reddit I think it will help more people.
I'm glad you enjoy the videos. I usually post on Reddit when I upload new videos but usually they're not super popular there. My Reddit username is the same as my channel name.
use latent couple if you want to put different things in a image
So when describing a anime character it'd be the person, their clothes, and their hair?
Thanks my man. I hope the nerds "fix" this pain the ass, liability of the code.
great vid..but after a few months of playing with Stable Diffusion, i think it should have been named "unstable illusion"...it has no idea what biological creatures are...like asking a Martian to describe a human...the key word is "artificial"...the misnomer or falsity is "intelligence"...artificial means fake..intelligence means the ability to think progressively...a 5yo can describe anatomy better than Stable diffusion..we humans are considered by A.I, to be nothing but static..i still enjoy messing around with it because it truly is in its infantcy and holds promise..but i am sorry..this does not compute..lol