Oh thank you !, that was really helpfull, i was doing the excact opposite by trying to ba as short as possible by just adding keywords. Now i understand why i was struggling to get what i wanted. I was sure that there was something wrong in my prompts and now i no what to do!
Can't believe I've never used flux text encode. Thanks. I'm curious what results it gets with controlnets and lora though. Do you put lora keywords in top style clip box? Also using a 4070ti man I feel ripped off in the vram for the big price tag in AUS. I'll have to see how long it takes mine to get as I feel like it's 60-120 seconds. Do you use low vram mode?
Hello, yes, you will put any keywords in the top box. However, the workflow to use Flux model changes quite rapidly. If you are using the fp8 simplified version, then there will be only one clip text encode (normal default one, same as sd1.5, sdxl one). For vram, it defaults to normal vram mode.
I enjoy your channel but here I think you perhaps aren't being quite analytical enough. The long LLM prompt for example. how much was it responsible for the image? If so, how many of those words were having significant effect? You can test this by removing classes of words systematically, so first the pronouns then adjectives, then verbs etc. When assessing you must be careful not to mix up different with better. The test should be whether the image has improved in quality, not whether it has changed. I know this is subjective but a rough assessment is not too hard. With Flux there is a great variation from seed to seed so it is easy to get a random very high quality image which might not reflect the average output. So testing across a range of seeds is important.
Thanks for bringing this up! My primary goal was to demonstrate that LLMs can be integrated into the image generation process, but I completely agree that for a thorough analysis of how LLMs affect the output, more detailed testing is necessary. This kind of testing, as you suggested, would be valuable not just for LLM-generated prompts but even when manually crafting them. The advantage of using LLMs is that they generate prompts for you, so there's often no need to tweak the output unless you're aiming for a specific result or planning to use the result of the tests to fine-tune a model specifically for prompt crafting.
I agree. Using LLM prompting is convenient and fast, but those long prompts can get out of hand and make it difficult to tweak the result. If you become too dependent on the LLM prompts, you can't get exactly what you want sometimes. I have run into problems with LLM refusing to make a prompt because it doesn't like what you are asking for. Even just a model posing at the beach will sometimes trigger it to go into censorship mode.
Thank you! The clip_l prompt primarily controls the style of the image (although low influence), while the T5xxl prompt has a stronger influence over the content and final outcome of the image. I went over this in detail in another video, so I missed mentioning it here. Thanks for bringing it up!
Very helpful, thank you!
And congratulation for the new house!
Thank you too!
Oh thank you !, that was really helpfull, i was doing the excact opposite by trying to ba as short as possible by just adding keywords. Now i understand why i was struggling to get what i wanted. I was sure that there was something wrong in my prompts and now i no what to do!
Glad it was helpful!
Can't believe I've never used flux text encode. Thanks. I'm curious what results it gets with controlnets and lora though. Do you put lora keywords in top style clip box?
Also using a 4070ti man I feel ripped off in the vram for the big price tag in AUS. I'll have to see how long it takes mine to get as I feel like it's 60-120 seconds. Do you use low vram mode?
Hello, yes, you will put any keywords in the top box. However, the workflow to use Flux model changes quite rapidly. If you are using the fp8 simplified version, then there will be only one clip text encode (normal default one, same as sd1.5, sdxl one).
For vram, it defaults to normal vram mode.
Great videos! Love the depth.
Where did you get the resource monitor under the Queue Prompt button?
Thank you! The resouce monitor is a custom node named "Crystools". You can install it from the Manager.
I enjoy your channel but here I think you perhaps aren't being quite analytical enough. The long LLM prompt for example. how much was it responsible for the image? If so, how many of those words were having significant effect? You can test this by removing classes of words systematically, so first the pronouns then adjectives, then verbs etc. When assessing you must be careful not to mix up different with better. The test should be whether the image has improved in quality, not whether it has changed. I know this is subjective but a rough assessment is not too hard. With Flux there is a great variation from seed to seed so it is easy to get a random very high quality image which might not reflect the average output. So testing across a range of seeds is important.
Thanks for bringing this up! My primary goal was to demonstrate that LLMs can be integrated into the image generation process, but I completely agree that for a thorough analysis of how LLMs affect the output, more detailed testing is necessary. This kind of testing, as you suggested, would be valuable not just for LLM-generated prompts but even when manually crafting them. The advantage of using LLMs is that they generate prompts for you, so there's often no need to tweak the output unless you're aiming for a specific result or planning to use the result of the tests to fine-tune a model specifically for prompt crafting.
I agree. Using LLM prompting is convenient and fast, but those long prompts can get out of hand and make it difficult to tweak the result. If you become too dependent on the LLM prompts, you can't get exactly what you want sometimes. I have run into problems with LLM refusing to make a prompt because it doesn't like what you are asking for. Even just a model posing at the beach will sometimes trigger it to go into censorship mode.
@@helveticafreezes5010 Well you still pretty much have only 70 tokens, more than that and it begins randomly ignoring words.
awesome
Great explanation! Question-Can I setup ComfyUI in M2/16 GB MacBook? Thanks
Yes you can , but you will have to use CPU to generate the images which will be slow. Here is the official installation guide: bit.ly/3T8Pbgu
Great video but what is missing is why there is a need for two different prompt boxes for clip and T5.. what is the difference.
Thank you! The clip_l prompt primarily controls the style of the image (although low influence), while the T5xxl prompt has a stronger influence over the content and final outcome of the image. I went over this in detail in another video, so I missed mentioning it here. Thanks for bringing it up!