You're the person who presents what we're looking for while we're still trying to understand and think. Thank you, Matteo. We are eagerly waiting for the IP adapter :)
Everyone's praising you already but I gotta do it too. Your work with your vids is of really high scientific quality. I subbed to your channel and watched some vids a while ago, but lately I "rediscovered" them and I've been watching many for the 2nd and 3rd time. As I'm training, learning and gaining experience, the same video can provide different information, depending on my level.
I would like to express my gratitude for the exceptional work that has been completed.I wish you good health and strength on this not easy but very fun path.
🎯 Key points for quick navigation: 00:00:00 *⚡ Testing Flux Model with Various Parameters* - The speaker discusses challenges in achieving consistent results with the Flux model. - Tested combinations of samplers, schedulers, guidance, and steps on multiple setups. - Initial findings indicated a complex model behavior with variable results. 00:05:05 *🖼️ Optimal Number of Steps for Image Convergence* - Portraits need up to 40 steps for clear and detailed convergence. - Different prompt lengths didn't significantly affect convergence. - Simple prompts might stabilize quicker, but detailed prompts need more steps. 00:09:08 *🔄 Evaluating Samplers and Schedulers* - Multiple sampler and scheduler combinations were tested across different step thresholds. - Certain samplers and schedulers better handle noise or detail at fewer steps. - No specific sampler outperforms others universally; it's subject-dependent. 00:11:16 *🎨 Handling Illustrations and Style Adjustments* - Illustrations vary significantly in style across samplers and schedulers with step adjustments. - Higher steps shift illustration style from clip art to detailed digital art. - Illustration outcomes are subjective and vary with personal taste and required detail. 00:13:42 *🧭 Guiding Principles for Adjusting Guidance* - Default guidance is reliable but can be adjusted for better results depending on the artistic need. - Lower guidance can lead to washed-out styles which might be desirable in certain scenarios. - Overuse of guidance might lead to hallucinations, especially in text representation. 00:16:21 *🌀 Role of Shift Parameters in Image Quality* - Base and Max shift impact noise and clarity, with distinct variations across image sizes. - High shift might introduce noise but also increased detail; ideal around default values. - Shift can subtly enhance or negatively affect details and artifacts. 00:19:18 *🔍 Experimenting with Attention Patching* - New Flux attention patching technique allows more control over image blocks. - Adjustments to query, key, and value can lead to varied artistic results. - The exact function of each block remains complex and exploratory. 00:23:12 *💬 Conclusion and Future Potential of Flux* - Flux is intricate, often offering rigid results but providing potential within certain domains. - Current limitations exist, awaiting enhancements like better adapters. - The dataset holds promise, yet practical utility requires further developments.
Thanks for doing such thorough research and reporting on it in such detail. And thank you for mentioning that existing IP Adapter. Seeing how quickly it was released and that you weren't involved made me not even bother trying it, guess I was right to assume it wasn't very good. Waiting for the real deal as long as it takes ♥
You’re incredible! So much work, well analyzed and perfectly presented! This has brought me so much new knowledge I have to think about. I just love your content!
Wow sir. Wow. You are a special kind of angel. As someone who uses your *_truly essential products_* in their workflow I am in *awe* of you!!! What AN ANGEL onto this community you are!!! I am trying to get pasted the flood of basic videos flooding YT... You are the THE SIREN SONG captain wearing your blindfold leading the space forward... Absolutely *_UNFATHOMABLE_* CREDIT GOES TO YOU!!! YOU ARE AN *ANGEL!*
Hi Matteo, thanks for the second Flux video, your in-depth analysis and almost surgical tests are amazing. Thanks for sharing your knowledge with the community! 👏
thanks, I was cracking my head with the noise problem... at the end I just removed noise with latent sdxl upscale... now I know its max_shift - base_shift ratio 😁
Always look forward to your deep dives that teach us how to use the latest tech. While I don't understand most of it, you help me to understand enough to experiment. I'll add my thanks to everyone else's
Certainly been interested in flux since it got released, I wish I could do more but unfortunately each image takes me about 5-10 minutes under normal circumstances, so it is hard for me to run mass testing like I did for SD1.5 and SDXL, so this video presented me with many of the kinds of testing I would normally do, so, Thank you for that! :)
incredible work, thank you for all this. I myself have been trying to figure out this model for almost a month, my conclusions are about the same as yours, the model is very unstable, very specific and you can’t just work with it. the same applies to training, training they seem successful at first glance But after we start checking this already in the work process, everything starts to fluctuate from side to side, it depends on the seed and the planner and the number of steps and the image resolution and many other factors. thank you again, I appreciate your work.
shift kind of moves the target where the image converges. more shift will require more steps to denoise but if you add a little you may end up with more details (because the image has more noise)
Hey Mateo, thanks for sharing as always! Offtopic, How are you doing? I got wondering if something changed in your voice and if it's just the mic. Anyways, just checking up if you're good. I want to let you know I admire you and your hard work provides tons of value. Thanks for all your do and let us know, the audience, whatever you need or want from us 🙂 you really rocks
Argh! So many variables!!! I wish we could prompt an LLM with machine vision/clip interrogator so that we can add a prompt to what we want and then it would run these tests with all the variables we allow to be open, so that it would self calibrate. Like for example if you had in your must-haves that the girl must wear a green jumper with leather straps and rose-gold buckles. Then you put your prompt, and the LLM layer between you and Comfy would run tests and check the outputs, calibrating all these variables until it got minimal deviation. Awesome video by the way, very good empirical approach. Thank you so much for your work and sharing these results!
I haven't seen you videos in awhile, great content easy to understand as usual, i noticed your Comfyui UI, so minimal i like it, everything at the button, is that a theme ?
Great video and research, thanks! There are many unsolved mysteries around Flux. For instance, it struggles with some styles, although it can generate them perfectly under certain circumstances. Also some quantization perform better for some tasks... Dev model also seems to understand prompts differently, and so on.
styles work relatively well with very short prompts, as soon as you add more tokens Flux goes ballistic and steers towards the same few styles it likes
@@latentvision True, at some point the model gets completely stubborn (or 'rigid'), forcing you a couple very similar styles. You can somewhat circumvent this in img2img (so an IP adapter could help here I guess), it is clear the model can do a variance. Using very short prompts is not always an option. Also I am using various models and quantizations for tests, and some even feel like completely different models and require a different approach too. I will dig into this more.
@@sandnerdaniel I might be wrong but I feel the position of the keyword matters, it feels its strongest at the beginning of the prompt, also tried repeating or adding synonims to try get out of the stubborness, not sure if its doing much but seems to help
Great video, as always. I did the research about using Clip Attention Multiply originally that Sebastian presented in his video as a way to improve the image quality and prompt following. And on average it had a measurable positive effect on the number of generated images that improved with this trick. After watching this video I also did a new matrix and compared the original image not only with images that were generated by only "boosting" Clip-L but also with images that were generated with "boosted" T5 conditioning through your attention seeker node. Once again I saw a slight increase in number of improved images when only changing the attention for Clip-L (14%), but a higher number of images that got worse when changing the attention for T5 (30%). So my conclusion is to use Clip Attention Multiply to only change the Clip-L attention, but leaving T5 untouched (with only Clip-L "boost": 50% good 50% bad / with T5 "boost": 40% good 60% bad). In both cases (changing only Clip-L attention vs also changing T5 attention) there were also a number of images where it made the resulting images worse to do that, but in the latter case the chance to make it worse was twice as high as in the former case (14% when only changing Clip-L vs 30% when also changing T5).
T5 has a higher impact on the image, it's hard to quantify "better" or "worse". Depends on your subject and your desired result. The attention seeker lets you target single blocks, the multiplier goes ballistic on all. It's more complicated to use but more precise. Up to you, use whatever works for you.
Always a pleasure, I have to ask when generating the 6 images, are you using the 6000 or the 4090? I ask because I have a 3090 which in comparision crawls along!
In the halls of eternity it is murmured that once the legend delivers IPAdapter for Flux, Humanity shall reach Enlightenment. :D Thank you for your work.
Feeding long sentences to Clip-L destroys Flux's performance. Run a test with the Flux dual prompt node, leave Clip-L empty, and compare the results. The quality boost is staggering. And then, forget that and go grab the awesome finetune of Clip-L from HF, and you'll get slightly better performance than with leaving Clip blank. Thanks for the data! ❤
Your point is not very accurate, if you have used an LLM-like model in comfyui to expand the prompt words, you will find that the more prompt words, the closer the picture details and composition you need. Therefore, long sentences will not destroy Flux's performance, and high-quality long paragraphs will make Flux play better.
Thanks! I trained a lora for 8mm film which has a super high texture look that is very hard to steer flux towards. I found the only way to get accurate film grain for an 8mm look was with a max shift of 0 and base shift of 8. It seems that high base shifts are good when you want a more analog / degraded look.
Amazing job! How do you feel this affects your choice of use between models. Seems there is still a solid place for SDXL at the moment for flexibility and control of style though at a loss for prompt adherence and ai artifacts. How would you describe the use case for top models currently? (though I realize still so much to still understand about flux)
that depends on the use-case. it's impossible to answer in a general way. For example if you do animations with animatediff you are still going to use SD1.5. Probably the main problem of flux at the moment is the hw requirements in a local environment. The entry point is pretty steep. SDXL at the moment has a wider ecosystem of course, so we will be using it for a while still
Fair point. Most of my use with Flux has been limited to RunPod on an A40. A lot of my focus has been on exploring environmental, architectural, and interior design prototyping and concept inspiration. I’ve been trying to keep up with the Flux hype cycle, but poor ControlNets and currently available tools have slowed the discovery process. However, experimenting with the Flux model has been enjoyable, especially with its prompt adherence and seemingly fewer AI artifacts. Your IP Adapter videos have been immensely helpful in becoming comfortable with SDXL, which I find myself returning to for speed, comfort, and control. Thanks for all you do!
When you use the "clip text encoder (prompt)" does it use the t5xxl or clip_l? The original workflow I found uses a ClipTextEncodeFlux node that has a text box for Clip and a text box for t5xxl, but it doesn't seem to make any change when I use one or the other.
Hey Mateo, great video. I can only imagine the work that goes into this. I have a question: I am experimenting with DEIS (Sampling method) and KL Optimal (scheduler) in Forge and it does give me stunning results compared to others. But then i slightly change the prompt and it suddenly looks terrible. I feel FLUX is very sensitive about certain keywords and changes output style really randomly compared to SDXL for example. I wonder what that is that makes FLUX so unpredictable. People say FLUX likes long prompts, maybe...but at a cost that i find to high.
thanks for the awesome job, that's amazing! Could you address Flux upscaling (latent and model) and possibly some other methods (tiled diffusion etc...) in your next video? It seems like Flux behaves differently compared to other models.
Hi, do you know is there any method to plug multiple lora loaders into the Flux Sampler Parameter node? I want to test different versions of a LORA. I can see there is a lora params connection but don't know what to do with it.
Thanks Matteo! :) Is there a way to download the rendered images so I can take a look? Also, any idea how to do this comparison but with Loras? I want to try few prompts against some loras and compare the results...
The Euler sampler actually adds back in noise after each step, and thus denoises more than 100%, If you have it set to a simple schedule of less and less noise added back in , that would explain the "tiering" convergence you were seeing in the first part of your video. Did you try with a more deterministic sampler?
Thanks Matteo, great job. I learned a lot by following your videos, it was a bit complicated at first but now I understand many things. Do you think it is possible to improve the style matching using the method explained with individual UNet blocks when an IPAdapter becomes available.
Hello, thank you for your research, it was interesting. It would be nice to add at different sizes. According to my observations, there are differences in composition at 1, 1.5 , 2 megapixels
Hi Matteo! Thanks for the video, it was interesting! I have an unrelated question though. I think many Stable Diffusion users want drawn characters consistency, but it is really hard. Are you aware if any model or library creators tried to address this problem directly? Maybe you can explain why it is so hard to create an IP-adapter variant that would make images based on one example of drawn character just like instant-id do for faces? Do you think it's even possible within Stable Diffusion?
Im new to confyui and wanted to use "plot sampler parameters" in a SDXL workflow. But after googling and searching for 1h I can not find anything that outputs a "params" noodle... i need help.
saying thanks want be enough for your hard work. You are a true legend. I Started learning this couple of weeks back. Still learning the Basics..
You're the person who presents what we're looking for while we're still trying to understand and think. Thank you, Matteo.
We are eagerly waiting for the IP adapter :)
Everyone's praising you already but I gotta do it too. Your work with your vids is of really high scientific quality. I subbed to your channel and watched some vids a while ago, but lately I "rediscovered" them and I've been watching many for the 2nd and 3rd time. As I'm training, learning and gaining experience, the same video can provide different information, depending on my level.
I am having the same experience. What a difference time and experience make!
I would like to express my gratitude for the exceptional work that has been completed.I wish you good health and strength on this not easy but very fun path.
One of the best vids out there on the use of ComfyUI and Flux! Thank you.
The man just presents small tricks casually, I had to pausing the video every few seconds and try to learn
🎯 Key points for quick navigation:
00:00:00 *⚡ Testing Flux Model with Various Parameters*
- The speaker discusses challenges in achieving consistent results with the Flux model.
- Tested combinations of samplers, schedulers, guidance, and steps on multiple setups.
- Initial findings indicated a complex model behavior with variable results.
00:05:05 *🖼️ Optimal Number of Steps for Image Convergence*
- Portraits need up to 40 steps for clear and detailed convergence.
- Different prompt lengths didn't significantly affect convergence.
- Simple prompts might stabilize quicker, but detailed prompts need more steps.
00:09:08 *🔄 Evaluating Samplers and Schedulers*
- Multiple sampler and scheduler combinations were tested across different step thresholds.
- Certain samplers and schedulers better handle noise or detail at fewer steps.
- No specific sampler outperforms others universally; it's subject-dependent.
00:11:16 *🎨 Handling Illustrations and Style Adjustments*
- Illustrations vary significantly in style across samplers and schedulers with step adjustments.
- Higher steps shift illustration style from clip art to detailed digital art.
- Illustration outcomes are subjective and vary with personal taste and required detail.
00:13:42 *🧭 Guiding Principles for Adjusting Guidance*
- Default guidance is reliable but can be adjusted for better results depending on the artistic need.
- Lower guidance can lead to washed-out styles which might be desirable in certain scenarios.
- Overuse of guidance might lead to hallucinations, especially in text representation.
00:16:21 *🌀 Role of Shift Parameters in Image Quality*
- Base and Max shift impact noise and clarity, with distinct variations across image sizes.
- High shift might introduce noise but also increased detail; ideal around default values.
- Shift can subtly enhance or negatively affect details and artifacts.
00:19:18 *🔍 Experimenting with Attention Patching*
- New Flux attention patching technique allows more control over image blocks.
- Adjustments to query, key, and value can lead to varied artistic results.
- The exact function of each block remains complex and exploratory.
00:23:12 *💬 Conclusion and Future Potential of Flux*
- Flux is intricate, often offering rigid results but providing potential within certain domains.
- Current limitations exist, awaiting enhancements like better adapters.
- The dataset holds promise, yet practical utility requires further developments.
Thanks for doing such thorough research and reporting on it in such detail. And thank you for mentioning that existing IP Adapter. Seeing how quickly it was released and that you weren't involved made me not even bother trying it, guess I was right to assume it wasn't very good. Waiting for the real deal as long as it takes ♥
Very informative, thank you! Gave me a lot more understanding under the hood and what all the parameters actually do haha
The most useful video I watched until now about flux and comfy. Thanks.
Never miss a video by Matteo! thank you for sharing your research!
You’re incredible! So much work, well analyzed and perfectly presented! This has brought me so much new knowledge I have to think about. I just love your content!
Great video! 14:40 I've been using 1.0 for all my training samples and I couldn't figure out where I was going wrong. Thanks!
Fantastic as always. Thank you!
Wow sir. Wow.
You are a special kind of angel.
As someone who uses your *_truly essential products_* in their workflow I am in *awe* of you!!!
What AN ANGEL onto this community you are!!!
I am trying to get pasted the flood of basic videos flooding YT...
You are the THE SIREN SONG captain wearing your blindfold leading the space forward...
Absolutely *_UNFATHOMABLE_* CREDIT GOES TO YOU!!!
YOU ARE AN *ANGEL!*
Thanks for this video so much. I've been wanting to compare values for image generation somehow but hadn't found a good way yet. This is the perfect!
Thanks once again. I just love the way you humbly and honestly share your knowledge and experiences with the community. This is truly appreciated.
I'm throwing a celebratory party when you release IPAdapter for Flux, there will be cake.
save me a slice!
@@latentvision the cake is a lie
@@Wodz30 noooooooooooooooooooooooooooooooooooooo!
I'll bring the digital cake... its endless!
Incredible work. Invaluable information as always! I was struggling with the exact same thing and then you post this absolute gem of a video.
the amount of work put into this video is mindblowing, thank you sir
Danke!
Hi Matteo, thanks for the second Flux video, your in-depth analysis and almost surgical tests are amazing. Thanks for sharing your knowledge with the community! 👏
OMG! You have done a gigantic job! This is really helpful, thank you!
thanks, I was cracking my head with the noise problem... at the end I just removed noise with latent sdxl upscale... now I know its max_shift - base_shift ratio 😁
Always look forward to your deep dives that teach us how to use the latest tech. While I don't understand most of it, you help me to understand enough to experiment. I'll add my thanks to everyone else's
you are simply the best. there is no one even near. GOAT
yes, your video is always useful! thank you so much for explaining each parameter
Thank you Sir!!! Looking forward to the next video!!
I sort of did my own testing early on w/Flux and settled on 25 steps to be my go-to standard.
that seems a good standard
Thank you for your research on this!
thanks for doing these comparisons. good job
thanks for takin on that bull!
You are legend. Thank you for your hard work!
Thank you Matteo. That's one of the most useful video I've seen about Flux.
Certainly been interested in flux since it got released, I wish I could do more but unfortunately each image takes me about 5-10 minutes under normal circumstances, so it is hard for me to run mass testing like I did for SD1.5 and SDXL, so this video presented me with many of the kinds of testing I would normally do, so, Thank you for that! :)
incredible work, thank you for all this. I myself have been trying to figure out this model for almost a month, my conclusions are about the same as yours, the model is very unstable, very specific and you can’t just work with it. the same applies to training, training they seem successful at first glance But after we start checking this already in the work process, everything starts to fluctuate from side to side, it depends on the seed and the planner and the number of steps and the image resolution and many other factors. thank you again, I appreciate your work.
Thanks Matteo, such a lot of useful information, it will take me a while to process!
Thank you very much for the hard detailed work!
I appreciate all of the work that you put into this and everything else you do for this community. Thanks man!
Thank you very much for your "avant-garde" work!!!
Thank you so much for your content. Still a bit confused about shifts. And I'm a big fan of your essentials pack
shift kind of moves the target where the image converges. more shift will require more steps to denoise but if you add a little you may end up with more details (because the image has more noise)
@@latentvision Is this relation to max shift or base shift? (or both?)
Hey Mateo, thanks for sharing as always!
Offtopic, How are you doing? I got wondering if something changed in your voice and if it's just the mic. Anyways, just checking up if you're good. I want to let you know I admire you and your hard work provides tons of value.
Thanks for all your do and let us know, the audience, whatever you need or want from us 🙂 you really rocks
lol, thanks yeah a little of sore throat, thanks, nothing too bad but it took a little more effort to talk
" I tested the hell out of Flux " xD that made me laugh so hard, thanks for the guide as always
Another Matteo video, I'm here for it
Fantastic work, Matteo.
Great, as usual ! Thanks a lot.
Thank you very much for the very useful video! Always learning a lot. Keep the amazing work. Looking forward for your ipadapter 👍
Argh! So many variables!!! I wish we could prompt an LLM with machine vision/clip interrogator so that we can add a prompt to what we want and then it would run these tests with all the variables we allow to be open, so that it would self calibrate.
Like for example if you had in your must-haves that the girl must wear a green jumper with leather straps and rose-gold buckles.
Then you put your prompt, and the LLM layer between you and Comfy would run tests and check the outputs, calibrating all these variables until it got minimal deviation.
Awesome video by the way, very good empirical approach. Thank you so much for your work and sharing these results!
Thanks Matteo! Made my day
Thanks so much for this video!
I haven't seen you videos in awhile, great content easy to understand as usual, i noticed your Comfyui UI, so minimal i like it, everything at the button, is that a theme ?
hey thanks! it's the new beta menu, you can activate it from the preferences
@@latentvision ohh, how did i missed that, thank you
Thank you sir!
Thank you for your great research!
Thanks so much, you are a hero.
Thx a lot for this extensive review 😎
this laboratory was... amazing! Ty a lot!
Great video and research, thanks! There are many unsolved mysteries around Flux. For instance, it struggles with some styles, although it can generate them perfectly under certain circumstances. Also some quantization perform better for some tasks... Dev model also seems to understand prompts differently, and so on.
styles work relatively well with very short prompts, as soon as you add more tokens Flux goes ballistic and steers towards the same few styles it likes
@@latentvision True, at some point the model gets completely stubborn (or 'rigid'), forcing you a couple very similar styles. You can somewhat circumvent this in img2img (so an IP adapter could help here I guess), it is clear the model can do a variance. Using very short prompts is not always an option. Also I am using various models and quantizations for tests, and some even feel like completely different models and require a different approach too. I will dig into this more.
@@sandnerdaniel I might be wrong but I feel the position of the keyword matters, it feels its strongest at the beginning of the prompt, also tried repeating or adding synonims to try get out of the stubborness, not sure if its doing much but seems to help
Thanks for the research and video. Very interesting 😊
very good experiment! I will go back to this when i try flux next time
This guy knows how to do a Design of Experiments
Thanx 4 sharing. We was short before to walk this road, too.
You know FLUX inside-out fantastic work.👍💯
thanks for this advanced information, waiting for iPadapter to celebrate
can't wait for faceid2 flux - you are the best! ;)
best 6 dollars I've ever spent thank you.
I am always grateful for the wonderful videos. There is one thing.
🥰
Great job as always!
insanely good video
Great video, as always.
I did the research about using Clip Attention Multiply originally that Sebastian presented in his video as a way to improve the image quality and prompt following. And on average it had a measurable positive effect on the number of generated images that improved with this trick.
After watching this video I also did a new matrix and compared the original image not only with images that were generated by only "boosting" Clip-L but also with images that were generated with "boosted" T5 conditioning through your attention seeker node.
Once again I saw a slight increase in number of improved images when only changing the attention for Clip-L (14%), but a higher number of images that got worse when changing the attention for T5 (30%). So my conclusion is to use Clip Attention Multiply to only change the Clip-L attention, but leaving T5 untouched (with only Clip-L "boost": 50% good 50% bad / with T5 "boost": 40% good 60% bad).
In both cases (changing only Clip-L attention vs also changing T5 attention) there were also a number of images where it made the resulting images worse to do that, but in the latter case the chance to make it worse was twice as high as in the former case (14% when only changing Clip-L vs 30% when also changing T5).
T5 has a higher impact on the image, it's hard to quantify "better" or "worse". Depends on your subject and your desired result. The attention seeker lets you target single blocks, the multiplier goes ballistic on all. It's more complicated to use but more precise. Up to you, use whatever works for you.
Very nice. Thanks for sharing your research.. yet again.
mateo you are the greatest
no, you are!
amazing video thank you a lot. It came just as I needed to test my latest flux lora
thank you for your hard work ♥
Always a pleasure, I have to ask when generating the 6 images, are you using the 6000 or the 4090? I ask because I have a 3090 which in comparision crawls along!
the 6000 was used only with the huge generations (over 60 images at a time)
In the halls of eternity it is murmured that once the legend delivers IPAdapter for Flux, Humanity shall reach Enlightenment. :D Thank you for your work.
Feeding long sentences to Clip-L destroys Flux's performance. Run a test with the Flux dual prompt node, leave Clip-L empty, and compare the results. The quality boost is staggering.
And then, forget that and go grab the awesome finetune of Clip-L from HF, and you'll get slightly better performance than with leaving Clip blank.
Thanks for the data! ❤
Your point is not very accurate, if you have used an LLM-like model in comfyui to expand the prompt words, you will find that the more prompt words, the closer the picture details and composition you need. Therefore, long sentences will not destroy Flux's performance, and high-quality long paragraphs will make Flux play better.
Thanks! I trained a lora for 8mm film which has a super high texture look that is very hard to steer flux towards. I found the only way to get accurate film grain for an 8mm look was with a max shift of 0 and base shift of 8. It seems that high base shifts are good when you want a more analog / degraded look.
Thanx!
Amazing job! How do you feel this affects your choice of use between models. Seems there is still a solid place for SDXL at the moment for flexibility and control of style though at a loss for prompt adherence and ai artifacts. How would you describe the use case for top models currently? (though I realize still so much to still understand about flux)
that depends on the use-case. it's impossible to answer in a general way. For example if you do animations with animatediff you are still going to use SD1.5. Probably the main problem of flux at the moment is the hw requirements in a local environment. The entry point is pretty steep. SDXL at the moment has a wider ecosystem of course, so we will be using it for a while still
Fair point. Most of my use with Flux has been limited to RunPod on an A40. A lot of my focus has been on exploring environmental, architectural, and interior design prototyping and concept inspiration. I’ve been trying to keep up with the Flux hype cycle, but poor ControlNets and currently available tools have slowed the discovery process. However, experimenting with the Flux model has been enjoyable, especially with its prompt adherence and seemingly fewer AI artifacts.
Your IP Adapter videos have been immensely helpful in becoming comfortable with SDXL, which I find myself returning to for speed, comfort, and control.
Thanks for all you do!
very practical
best compliment!
quality content as always from Matteo, not everyone understand how good is this video.
Matteo, you are a effing Mad man! 💚
I know... and what you see on screen is like 5% of all the images I generated for this research
@@latentvision Thats Insane! Thanks for all the work and info, man! It helps a lot.
Hey @Mateo can you upload this workflow?
nice video!
Thanks for that! Did you try the guidance limiter nodes? I don't know if it's a placebo, but I sometimes get really good results.
Great information. Is there an sdxl node for the sampler parameters?
yeah I'm thinking of doing that
When you use the "clip text encoder (prompt)" does it use the t5xxl or clip_l? The original workflow I found uses a ClipTextEncodeFlux node that has a text box for Clip and a text box for t5xxl, but it doesn't seem to make any change when I use one or the other.
it uses both
I was missing you 🙏.
Hey Mateo, great video. I can only imagine the work that goes into this.
I have a question: I am experimenting with DEIS (Sampling method) and KL Optimal (scheduler) in Forge and it does give me stunning results compared to others. But then i slightly change the prompt and it suddenly looks terrible. I feel FLUX is very sensitive about certain keywords and changes output style really randomly compared to SDXL for example. I wonder what that is that makes FLUX so unpredictable. People say FLUX likes long prompts, maybe...but at a cost that i find to high.
Keep making more videos please🎉🎉
thanks for the awesome job, that's amazing! Could you address Flux upscaling (latent and model) and possibly some other methods (tiled diffusion etc...) in your next video? It seems like Flux behaves differently compared to other models.
yeah I guess we need to talk about upscaling sooner or later :P
Hi, do you know is there any method to plug multiple lora loaders into the Flux Sampler Parameter node? I want to test different versions of a LORA. I can see there is a lora params connection but don't know what to do with it.
Thanks Matteo! :) Is there a way to download the rendered images so I can take a look? Also, any idea how to do this comparison but with Loras? I want to try few prompts against some loras and compare the results...
they are in my discord
@@latentvision all right, I joined. But where to look, there are many channels :)
@@andreizdetovetchi ping me
The Euler sampler actually adds back in noise after each step, and thus denoises more than 100%, If you have it set to a simple schedule of less and less noise added back in , that would explain the "tiering" convergence you were seeing in the first part of your video. Did you try with a more deterministic sampler?
Thanks Matteo, great job. I learned a lot by following your videos, it was a bit complicated at first but now I understand many things.
Do you think it is possible to improve the style matching using the method explained with individual UNet blocks when an IPAdapter becomes available.
it's not a unet of course, but yes I believe it should be possible
It would be great if you could talk about flux style training. It's a subject we don't see much of, and never treated as well as on your channel.
Hello, thank you for your research, it was interesting. It would be nice to add at different sizes. According to my observations, there are differences in composition at 1, 1.5 , 2 megapixels
true, but you know... it takes time to repeat everything at higher resolutions. I would hope what is valid at 1M is also valid at 2M
Hi Matteo! Thanks for the video, it was interesting! I have an unrelated question though. I think many Stable Diffusion users want drawn characters consistency, but it is really hard. Are you aware if any model or library creators tried to address this problem directly? Maybe you can explain why it is so hard to create an IP-adapter variant that would make images based on one example of drawn character just like instant-id do for faces? Do you think it's even possible within Stable Diffusion?
Can you give me workflow you use?
Im new to confyui and wanted to use "plot sampler parameters" in a SDXL workflow. But after googling and searching for 1h I can not find anything that outputs a "params" noodle... i need help.