Update (Sep4, 2024): Accidental cut in the video 2s after 01:04:34. You have to add The Model Sampling Flux Node. Use the Get nodes to connect the Model, Width and Height. As per screenshot: drive.google.com/file/d/13jd_c9SwZw7Tza9T3N5XAW2ZB3XGlpBf/view?usp=sharing Update (Sep 5, 2024): Flux Inpainting, The Preview Bridge node in the latest update has a new option called "Block". Ensure that it is set to "never" and not "if_empty_mask". This will allow the preview bridge node to pass on the mask. If set to "if_empty_mask", the node will block inpaint model conditioning input, as the switch default is set to SAM + Custom. I had asked the dev to update the node so that the default behavior is always set to never, he has fixed and done the same. Update the node again to the latest version. Update (Sept 2, 2024): The LLM used are very small models. They have a higher variant as well. The "llava:34b" model for vision performs the best, but require 24 Gb VRAM. There is a 13B version for lower VRAM. The llama3.1 also has a 8b instyrcut model with fp16 "llama3.1:8b-instruct-fp16". This requires 17Gb VRAM as well. I have tested it, there are no issues with the workflow, as we unload the LLM as soon as its done. Llama 3.1 Also has a 70B and 400B parameter, cannot be run on consumer grade hardware. If you require such models, you can run them via API, Chat GPT 4o or Gemini Pro (non humans) perform best for API usage. To use API, just replace the node, ensure the connections are same. The workflow will maintain the logic. The video is quite long, and had initially planned to include GGUF, but couldn't. ControlNet support for Flux was added in comfy, and was not stable at the time of recording the video. And more tests are anyways needed. Will probably make another video with ControlNet and GGUF if they warrant a tutorial on the channel.
I just wanted to express my gratitude for your tutorials! Instead of simply providing workflows to download, you're taking the time to explain how each node functions. Thanks to your guidance, I'm starting to grasp how ComfyUI works, and it's making a big difference. Keep up the amazing work! thank you!
I'm new playing around with ComfyUI and image generation, and this is the first demo that really showed me the potential of the entire infrastructure. It helped my learning immensely to build this workflow out step by step, following along as you explained each piece. Bravo, well done, and I hope you continue to produce more such great content!
I managed to get the workflow fully assembled this weekend, everything works great. This is the most interesting video I've ever seen on this topic. And thank you for helping me find the lower case letter bug in my workflow. Looking forward to more videos.
Wow !, this gives me a headhache but a good one, i've learned so much in just this video. So many things seems so much simpler, with just some basic knowledge that many other youtubers seem to ignore in their tutorials. All that logic part is great to simplify the workflow. That was great to be able to understand most of it even if i'm just learning ComfyUI since less than a week. The video was really fast so i slowed it down to 50% so i had more time to get what you were doing. Thanks a lot for all this knowledge that surely took a lot of time to prepare and to put in a video.
Finished! Thanks so much for this awesome tutorial. Its taught me a lot about ComfyUI that I can now utilise with other workflows. I havent yet ran the 5.04 upscale but I have to say that the 2.52x upscale definitely softens the image over the 1.261x. I tried with a couple of different sampling methods and images
Hi, That 1.261x upscale is designed to soften the upscale on purpose. Because when you run the 5x upscale it will balance it out. If you don't want to run the 5x upscale, use the same upscale model as 1x details for 1.26x upscale. The Upscale will be sharper.
Seems like every time I get a little more comfortable in Comfy, a new development comes along to make us restart again T_T Thank you so much for this, learning about all brand new things in AI image generation is not so easy!
My man I am using comfy UI for months trying to figure out what goes wrong there and what goes wrong here... It was an excellent example of how we could use the visual scripting of ComfyUI, like a master class! Thanks a lot for your contribution to the community!
I was having trouble with pose style transfer with all control nets but this is working like a magic most of the time and the upscaler! I stucked on 4x and my computer was on fire! now i can get 5x faster! Thank you so much. You rock!
Btw what do you think about supir? I think it is giving me better results in terms of vector style images. what do you think about using ultimate sd and then supir?
SUPIR is no good with flux if using text. Otherwise SUPiR is great. However, you cannot go 5x in supir. I wanted to change the approach as I figured a lot of text would be done by many people. To be honest the current upscale I showed has limitations. In process of making another video which would be part 2. We managed to apply a noise injection and blend method with our own node which would give you very realistic skin or textures after upscale, no plastic/velvet looking lines. It works for other types of images as well like landscapes. Basically any part of the image which has over soft lines. And in this method there is no ultimate sd upscale. We skip that and still get 5x upscale.
With the 1.261 upscale, an alternative is to downscale by 0.3125 instead of 0.315. The math for this is the desired scaling (1.25) divided by the upscale done by the model (4). So you get 1.25/4 which is 0.3125. If you had a height of 1024 and you scale up by 4 (4096) and then down by 0.3125 you get 1280, the same as 1024 * 1.25. The problem of how to get a float constant with that amount is to put in a math node with an expression of 1.25 / 4.0 and use the float output.
lol I know, thanks for the input will see what I can do for the video 2. Will create a node to accept that constant and expression if necessary. Did not go that route since it was 1.261x and there was no harm in that, was working with flux and no scan lines.
Finally got to the implementing all the bells and whistles. I added a small section with the main Flux configurations like Guidance, Steps, Sampler Scheduler and seed, extracted from the complex flow. Also, I have a 12GB 3080ti GPU, so GGUF was a must for me... I'm able to use Q8 with both your recommended LoRAs with about 3 to 4.5 sec/it on EULER/BETA, so really happy with it. As you said, I don't think the GGUF switch merits another video, it's just replacing the Unet Model Loader, and the Dual CLIP loader with tGGUF versions, and that's it. At least I hope so. I'm really happy with the results. Thanks a bunch!
was just about to sit and watch this, and saw that Black Forest Labs have released their own tools, hope you do a tutorial for them :) Fill, Depth, Canny, Redux
@@controlaltai Yes! I am excited to see what they can do, while grateful for the efforts other groups who have been making canny and depth, inpaint, and ipadapters, I have not been impressed. I have much higher expectations with these. I took a brief look at the Comfyui site and it looks like they have native support and a few examples to play with.
I have done a union pro tutorial. Next video is about flux region. We made custom nodes for that very cool concept. After that video will make a full workflow tutorial using these tools as well. We can do some amazing restoring and conversion using these tools union pro.
@@controlaltai As I saw from your tutorial, you are using Select input to select elements. In practice you have 3 switches: Image LLM Text LLM Modify Image LLM Using logical operations you get (Image LLM && Text LLM) || Image LLM (Image LLM && Text LLM) || Text LLM Modify image LLM is only valid if you go that Route. All actions are switchable by 3 boolean buttons.
Can you please email me this integrated in the workflow. Interesting to learn something new. Because I am still confused. The switch is two only, not three. Its Image/text/Disable LLM (one choice) and modify (second choice). If Image then Modify otherwise ignore always. So the user has only two option. Modify is choice and not always. (mail @ controlaltai . com withoutspaces)
This is amazing, not only for the final result, but as a learning tool. I have learnt more from this than any other process in ComfyUI. Thanks! I think I'm missing something: in 1:00:57 , how do did you pause the workflow after the 1x Image generation and the inPaint portion? There seems to be a couple of icons on your groups that I don't have in mine (edit: found them on rgthree config), but how do you do it in a way that, after the 1x image generation, you no longer go through that portion of the process and only focus on the inPaint part? Maybe fixing the original seed is enough?. Maybe this is an extra module or feature or checkbox I have no idea where to find? Thanks for any insight.
Thanks. That's a setting in rg three it's new. Go to rg three setting and tick mark show fast toggle in groups. After this just left click anywhere in a group at the title bar right end side you should see a group mute and group bypass button, it's very faint. Pausing is I mute the ksampler node. Since I mute that the workflow won't progress beyond that point . Control + m after clicking the node is the shortcut key.
@@controlaltai Thanks! Maybe I explained myself poorly. How do you make it so that you first get the 1x image, and then work quickly on the mask, without the main Flux Sampler running over and over again? Do you set the seed to fixed? Or is there a way to just "continue" from where that left off without fixing the seed? Thanks.
Okay so that depends on your workflow build. Comfy ui progress in a linear way. Say point a to point d. Now if you stop at point c, and don't make any changes to point a or b, you can keep working on c while d is disabled. Once you finish work on c you enable d and get the output. If for some reason you are at c, and without making any changes, it's starting from a then yeah some changes are happening at a. Seed should be fixed. Does this explain it?
at 28:02 you said you would include in the description but I think you've forgotten to do that. I've included it here: Objective: Modify the given text prompt based on specific change instructions while preserving all other details. Read the Change Instruction: Indentify the specific change(s) to be made as provided in the next line. "Reimagine the room decor as a kids room who loves space and astronomy" Implement the Change: Apply the specified modification(s) to the text. Preserve Original Context: Ensure that all other aspects of the text, including descriptions, mood, style, and composition remain unchanged. Maintain Consistency: Keep the language style and tone consistent with the original text. Review Changes: Verify that the modifications are limited to the specified change and that the overall meaning and intent of the text are preserved. Provide Response: Just output the modified text. Ensure your response is free of additional commentary.
There is a cut at 1:04:36 where you left something out. You added a ModelSamplingFlux node and some other nodes. But you don't go back and explainn it. Can you tell what happend? Which model do you drop in there?
Hi, Thanks for pointing this out, i will add this information to the pin post, it is an accidental cut, You have to add model sampling flux node, use get to connect width height and model as shown in this screenshot: drive.google.com/file/d/13jd_c9SwZw7Tza9T3N5XAW2ZB3XGlpBf/view?usp=sharing
48:45 image right at the beginning of the workflow add an impact integer node and set its value to one connect it with the. shouldn't the interger be set to 2?
Default is 1. When it's 1 img2img is off. When it's 2 img2img is on. Later on when showcasing how to use img2img, I mention the same. By default you want it off, unless you specifically need to use img2img.
Once again thanks so much for this. I am learning so much. I have still some few questions, if you would extend your generosity, I would like to pick your brain: if the 2.52x step renders too soft an image, what would be a reasonable replacement to the realvizxlv40 upscaler? The 1.25x renders consistently perfect results, but I find the 2.5 (and consequently the 5x) a bit too soft. Thanks once more.
Hi, welcome. The 5x is what the 2.5x is. So if the 2.5x is sharp 5x will be sharp. Do this, skip 2.52x. In the 5.04x group change the input to 1.26x in the get node. Disconnect the upscale by image node where we downscale the image by 0.5. what this will do is take the 1.26x image which you are finding sharp and upscale it 4x without downscaling and will skip the sdxl input entirely. Alternatively if you want to maintain the sample flow, try this. Change the upscale model in sdxl. Use the same model used in 1x details, and not in 1.26x. This may however over sharpen the image but depends on the image. This should have a more prominent effect than changing the checkpoint. However try juggernaut or epic realism kiss. Let me know which solution gave you desired results, curious as well.
@@controlaltai Thanks for this well though out reply. I need to do a bit more testing, but directly going into 5.04x without the 2.52 and without the 0.5x really yielded unusable results. It's too much of a direct step... realistic photos almost look like a painting. As you foretold, changing the model to to Upscale Model made the biggest difference... I see your point about overly sharpened, but I think that I like it depending on the scene. Changing the checkpoints has much less of an impact, but still clearly noticeable on further-away faces. I believe epicrealismXL_v8Kiss yielded the better results for me (hopefully that was the one you recommended? Not sure I was able to match your suggestions 1:1, specially with the juggernaut, there are just too many workflows named the same). Will do some more testing for sure, but I believe that now that I understand better the impact of each stage, it would be easier to start making finer adjustments. Thanks once more!
We faced the exact same problem when going from 1.26x to 5x directly. In fact I was not satisfied with in between 2.5x as well. The faces come out realistic, like skin, however it becomes plastic during the upscale process. A clear cut solution to this would be latent upscale, which is avoided for plenty of reasons here, text and flux cannot do 1.5x upscale without causing scan lines when using any Lora. So we came up with our on solution and to simplify it for the user we end up creating a new node. So basically in part 2 of the video will cover this. For you it would simply be enabling a setting called preserve details, everything will happen automatically in the workflow, the faces will be ultra realistic like the 1x detail just 5x upscale no plastic lines or texture. The node will come in a section right inbetween 1x detail and 1.26x upscale. To simply explain it in the node injects Gaussian noise on the pixel 1x detail output and blends it using soft light technique before passing it to the 1.26x k sampler. Since there is noise injection, when enable we designed the updated workflow to disable the 2.52x pixel on pixel upscale. The results are quite satisfactory. Stay tuned this will take some time for release.
I'm new to Flux and keen to try your Flux Resolution Calculator node shown in your video. However, I notice below 1.0 megapixel it currently only supports 0.1 and 0.5, but 0.6 is the ideal mp I need. Do you have similar node that allow manual mp input (so I can input 0.6) or is there any workaround I can explore? Thanks
No, I should update the node to give that option. If you want a workaround then you have to use a bunch of math nodes for multiplication and divisibility and calculate the entire formulation.
I made a ControlNet union node (compatible with instantx). Part 2 of the video will be coming shortly. Probably by next week. I don't like x labs will check mistoline.
I am working on another video to integrate guff. I am not sure about the quality. See the basic is the quality should be better than sdxl. If you are using quantization models which give you poorer output than sdxl, then it's not worth switching. We need extensive testing. This video was released nearly after 20 days of very extensive testing. If the test are fine then probably in 2 to 3 weeks I plan to make Guff, Hyper Dev and ControlNet integration to this base workflow. My advice is to wait for the video, basically it would just have the same workflow logic only for lower vram.
ollama.com/library/llava-llama3 - here is the model, pull as shown. its a 5.5 gb model. After pulling you have to close and restart comfy ui, ensure Ollama app is opened in the bg.
39:32 you "select input 3" to disable LLM. but your TEXT LLM is enabled. Why? i set my workflow like yours (i hope), but my disable LLM do disable all LLM. (thats what it should do, right?)
Yes 3 disables all Llm, and pushes custom conditioning. Text llm It's not enabled. Pause at 39:35 and recheck. All llm are disabled. Only custom conditioning is enabled.
@@controlaltai i am glad to hear that. i was irritated because before starting the "queue" the "show text" beyond the LLM was empty. after pressing the queue there is a prompt in it. thx for response. i am fine. btw great vid.
i been able to build it but getting some issues, is it possible to make screenshots of the Node Groups to be able to compare it and see what i did wrong? (like the update of the model sampling on the comment)
there is so much to unpack here! for both experienced and newcomers! Im mid way building it, but I hit a brick wall with Layer Styles - I noticed when I was adding the purge Vram - it was not coming up then in the manager it says "Import Failed" I tried everything to get it running. I uninstalled it and re installed it through manager, I tried to "FIX it, through manager", I uninstalled then did a GIT CLONE into the custom node folder, I installed the Pip dependencies inside the node folder.. Im just not sure what else to try... Any ideas?
Thanks!! Send me an email, will try and see why the import is failing for you. You are on windows with comfy portable locally right? Git Clone a custom node doesn't help. Install via manager, boot comfy and send me the entire cmd command text. The reason for the import fail will be there. mail @ controlaltai . com (without spaces).
I just explained it in the above comment. The tutorial is advanced. You need to be quite familiar with comfy and basics of comfy and diffusion process.
@@goodsniper7766 He goes over that part pretty thoroughly. You change the text within the quotes within the 'Modify Image LLM' node and change the switch to Modify Image LLM
Just looking at this video for the second time i' ve heard you say that if you are short on VRAM some Lora would not work, do you mean that they are not loaded but don't cause a CUDA out of memory crash.
Comfy by default uses smart memory management. What this does is takes up your entire vram and then will load unload models when it thinks are not important. I think with flux it's buggy. So, say flux takes 90% vram and two loras require 15%. Now comfy unloads 1 lora and runs the whole thing without crashing. I know people observing the same behavior Google Style Align. On 12 gb vram, when they do 8 images it's breaks after 4, later on we found out comfy was CV Lear cache in attention, which is required by style align. Only solution was to reduce batch size as per vram at that time. A cuda out of memory crash is an actual crash. When i disabled smart memory vram usage with comfy went down to 80%.
@@controlaltai So it would be better to disable smart memory? I have a 12 GB 3080 ti and it works fine even with the fp16, slows down sometimes and rarely crashes. I just clear the vram and then it's fine for a while. But i have noticed that sometimes some Lora would not work. Some loras ask to use skip layer but it seems to crash everytime i try it with flux. it crashed during the prompt node.
@kukipett yeah try with disabling smart memory and see if that solves generation issues, it did for me on my 4090. Don't use skip clip. Monitor the vram usage when it loads the lora.
Awesome video and awesome workflow! Thank you :) Does anyone knows what ComfyUI interface extension is is he using? What is the bar at the top, please?
Glad you found the video helpful. This is official comfy, no extension. Go to settings, comfy beta by default is disabled. Enable the bar top or bottom.
No nf4 is deprecated for GGUF models as per the comfy dev. Obviously quality won't be the same. I have to test GGUF properly before I can answer your question.
With GGUF model you can, its not the size of the project but rather the size of the flux model. IN part 2 of the video, I will show how to run this with lower than 12 gb VRAM. If you disable system smart memory, it should run in 16gb vram as it is, provided your GPU is fully free of any other task.
@@controlaltai Very interesting, because I got to the step where the first image was generated with DualClipLoader, but in my process it dies due to excess memory. My configuration is like this at the moment without any modifications: Total VRAM 12288 MB, total RAM 5934 MB pytorch version: 2.4.1+cu121 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 3060 : cudaMallocAsync
@@fatosmindset Ohh you have a 12GB VRAM, I though you said 16gb. 12GB is too low for fp8 flux. You have to use a GGUF model of flux which works for 12GB or lower VRAM. You can search about which GGUF model is compatible, as the next flux video is still 1-2 weeks away.
I tried the workflow and it works well except that the image description is extremely censored. Well it's ok if you intend to show a cat or a woman in everyday clothes but the moment you have to describe something nomal but with less textile the Ollama totally ignores it. I had a picture of a barabrian woman riding a horse dressed with a long leather skirt and a leather top but the AI insisted to tell me that she was wearing a full armor and when i tried to modify the prompt to say that she is wearing a leather bikini top, the AI just said yes but insisted to add a leather jacket!! That was funny and ridiculous, well i guess that this Ai has been designed by a group of Amish and Mormons under the guidace of a very concerned preacher!
Yeah I get what you are saying. We should test on another LLM model that works. There are other vision models, I have not tested them. You can also use Gemini, but that is so accurate however its more censored. It does not read people, no matter what clothes. Plus you have to pay so much for the API. Chat GPT 4o, works best, but API is too costly, if you upload it website and put an instruction asking everything in the vision node, it gives the best results and is not censored at all. Again not an Ideal Solution. Also you can replace the Ollama nodes with any other Custom LLM node, if there are some models not in Ollama. Also, this was using the 7b model. They have a 13b and a 34 billion model. Maybe they are way better, technically they should be. Not sure if it works with flux. Another thing is to overcome this, enable the image2 image setting, and set the max shift on option 2 or 3. You should get something similar to your original.
@@controlaltai Well after many tests with that LLM, i can just say that it is a scam, that AI is absolute crap. Oh it's great at talking and talking using a great advanced language about.. nothing. Just banalities and supposed intellectual thoughts made to impress the average Joe, the kind of conversation you can hear at some parties by people who don't understand what they are saying but are proud to lecture you with their knowledge they have found on internet. And worse this AI is a blatant liar, so many time it just added things that don't exist in the image or ignores the most obvious. I'm starting to think that it just gets the average feeling of the picture and then just tells you anything, hoping it would maybe match the reality. I was using Florence before and it gives at least a very good description, very accurate, shorter but containing 10 times more usefull informations. Well if you want to impress some idiots, just use that LLM to write them some message, they will probably not get it and get bored and will cautiously avoid you, so in fact it's a great idea! 😆
@@controlaltai If it helps, there is one model known for being uncensored: Mistral. Apparently it is considered underrated but I haven't tested so no direct knowledge.
It's not stable I am getting depth control somewhat without control net. Depending on the end goal, once union is stable, would implement in this workflow but if you are getting your desired output with only flux, no need for control net. As the control net requires extra vram as well, there is enough for stage one, as I ensured that upscaling is based without conditioning. So we can unload cc after flux core output.
Before I even begin with this video, can someone tell me whether the workflow would work with a 12GB VRAM card? I fast-forwarded, saw LLMs, SAM2 with Flux, etc. and stopped. 😂
LLM and SAM load and offload, and its optional, as by default its disabled, unless you want to inpaint, even after that it loads and unloads. Workflow is designed to run with logical processing. Its not a matter if the workflow runs or not, its whether, raw Flux w/t5xxl runs on your System. Flux 1, Dev FP8 is a 17GB model with t5xxl, so 12gb won't cut it. You need to use GUFF, that can be integrated in the same workflow, however, it's for another video, this video is very long as it is.
@@controlaltai Yea, so my assumption was right that it may not work. People are using Swarm UI and Flux even on 12GB machines, so when a ComfyUI workflow comes out like that, you begin to wonder. Glad I asked before diving in, shall stay tuned for your other video.
@@controlaltai Made the entire workflow and ran the test. GGUF models are working fine with it, though they are slower. Also, dunno why people say one won't be able to generate using the FP8 model/ Q8_O model on a 12GB card, they worked fine on mine. *PS: Dunno how comfy does it, but I just used the full 23GB size Flux Dev model with it for the Sam2 x Florence 2 inpaint and was able to get the results in like 96secs. I was able to run it on my card using Swarm UI anyway, so some kinda resource management thingy does exist.
Update (Sep4, 2024): Accidental cut in the video 2s after 01:04:34. You have to add The Model Sampling Flux Node. Use the Get nodes to connect the Model, Width and Height. As per screenshot: drive.google.com/file/d/13jd_c9SwZw7Tza9T3N5XAW2ZB3XGlpBf/view?usp=sharing
Update (Sep 5, 2024): Flux Inpainting, The Preview Bridge node in the latest update has a new option called "Block". Ensure that it is set to "never" and not "if_empty_mask". This will allow the preview bridge node to pass on the mask. If set to "if_empty_mask", the node will block inpaint model conditioning input, as the switch default is set to SAM + Custom. I had asked the dev to update the node so that the default behavior is always set to never, he has fixed and done the same. Update the node again to the latest version.
Update (Sept 2, 2024): The LLM used are very small models. They have a higher variant as well. The "llava:34b" model for vision performs the best, but require 24 Gb VRAM. There is a 13B version for lower VRAM. The llama3.1 also has a 8b instyrcut model with fp16 "llama3.1:8b-instruct-fp16". This requires 17Gb VRAM as well. I have tested it, there are no issues with the workflow, as we unload the LLM as soon as its done. Llama 3.1 Also has a 70B and 400B parameter, cannot be run on consumer grade hardware. If you require such models, you can run them via API, Chat GPT 4o or Gemini Pro (non humans) perform best for API usage. To use API, just replace the node, ensure the connections are same. The workflow will maintain the logic.
The video is quite long, and had initially planned to include GGUF, but couldn't. ControlNet support for Flux was added in comfy, and was not stable at the time of recording the video. And more tests are anyways needed. Will probably make another video with ControlNet and GGUF if they warrant a tutorial on the channel.
ControlNet would be super cool in this flow. Nice work...
@SpikeColgate Yup next video. Check the members post. Just reveal what's upcoming.
I just wanted to express my gratitude for your tutorials! Instead of simply providing workflows to download, you're taking the time to explain how each node functions. Thanks to your guidance, I'm starting to grasp how ComfyUI works, and it's making a big difference. Keep up the amazing work! thank you!
I'm new playing around with ComfyUI and image generation, and this is the first demo that really showed me the potential of the entire infrastructure. It helped my learning immensely to build this workflow out step by step, following along as you explained each piece. Bravo, well done, and I hope you continue to produce more such great content!
I managed to get the workflow fully assembled this weekend, everything works great. This is the most interesting video I've ever seen on this topic. And thank you for helping me find the lower case letter bug in my workflow. Looking forward to more videos.
Wow !, this gives me a headhache but a good one, i've learned so much in just this video. So many things seems so much simpler, with just some basic knowledge that many other youtubers seem to ignore in their tutorials. All that logic part is great to simplify the workflow.
That was great to be able to understand most of it even if i'm just learning ComfyUI since less than a week.
The video was really fast so i slowed it down to 50% so i had more time to get what you were doing.
Thanks a lot for all this knowledge that surely took a lot of time to prepare and to put in a video.
Finished! Thanks so much for this awesome tutorial. Its taught me a lot about ComfyUI that I can now utilise with other workflows. I havent yet ran the 5.04 upscale but I have to say that the 2.52x upscale definitely softens the image over the 1.261x. I tried with a couple of different sampling methods and images
Hi, That 1.261x upscale is designed to soften the upscale on purpose. Because when you run the 5x upscale it will balance it out. If you don't want to run the 5x upscale, use the same upscale model as 1x details for 1.26x upscale. The Upscale will be sharper.
Exactly what I was looking for. Good work!
great, I was looking for a good workflow for Flux, appreciate that you go through the whole thing
Thanks!
Seems like every time I get a little more comfortable in Comfy, a new development comes along to make us restart again T_T
Thank you so much for this, learning about all brand new things in AI image generation is not so easy!
that's right I stick with sd1.5 till now, start using pony and got in flux......now i'm brain dead
@@Kvision25thsame
My man I am using comfy UI for months trying to figure out what goes wrong there and what goes wrong here... It was an excellent example of how we could use the visual scripting of ComfyUI, like a master class! Thanks a lot for your contribution to the community!
I was having trouble with pose style transfer with all control nets but this is working like a magic most of the time and the upscaler! I stucked on 4x and my computer was on fire! now i can get 5x faster! Thank you so much. You rock!
Btw what do you think about supir? I think it is giving me better results in terms of vector style images. what do you think about using ultimate sd and then supir?
SUPIR is no good with flux if using text. Otherwise SUPiR is great. However, you cannot go 5x in supir. I wanted to change the approach as I figured a lot of text would be done by many people. To be honest the current upscale I showed has limitations. In process of making another video which would be part 2. We managed to apply a noise injection and blend method with our own node which would give you very realistic skin or textures after upscale, no plastic/velvet looking lines. It works for other types of images as well like landscapes. Basically any part of the image which has over soft lines. And in this method there is no ultimate sd upscale. We skip that and still get 5x upscale.
insane bro... that is a largest logic(wokflow) i ever seen
Absolutely fantastic tutorial, learned a lot here. Finally got some image/vision stuff set up thanks to it, much appreciated.
With the 1.261 upscale, an alternative is to downscale by 0.3125 instead of 0.315. The math for this is the desired scaling (1.25) divided by the upscale done by the model (4). So you get 1.25/4 which is 0.3125. If you had a height of 1024 and you scale up by 4 (4096) and then down by 0.3125 you get 1280, the same as 1024 * 1.25. The problem of how to get a float constant with that amount is to put in a math node with an expression of 1.25 / 4.0 and use the float output.
lol I know, thanks for the input will see what I can do for the video 2. Will create a node to accept that constant and expression if necessary. Did not go that route since it was 1.261x and there was no harm in that, was working with flux and no scan lines.
Amazing tutorial mate! thanks a lot for all the effort 🏆
Amazing detailed video, thank you!
I've learned a lot! Good and complete video! Congrats!
Finally got to the implementing all the bells and whistles. I added a small section with the main Flux configurations like Guidance, Steps, Sampler Scheduler and seed, extracted from the complex flow. Also, I have a 12GB 3080ti GPU, so GGUF was a must for me... I'm able to use Q8 with both your recommended LoRAs with about 3 to 4.5 sec/it on EULER/BETA, so really happy with it. As you said, I don't think the GGUF switch merits another video, it's just replacing the Unet Model Loader, and the Dual CLIP loader with tGGUF versions, and that's it. At least I hope so. I'm really happy with the results. Thanks a bunch!
was just about to sit and watch this, and saw that Black Forest Labs have released their own tools, hope you do a tutorial for them :) Fill, Depth, Canny, Redux
Yeah those are different. Will explain them in a tutorial. However highly recommend you watch this one for the logical flow understanding.
@@controlaltai Yes! I am excited to see what they can do, while grateful for the efforts other groups who have been making canny and depth, inpaint, and ipadapters, I have not been impressed. I have much higher expectations with these. I took a brief look at the Comfyui site and it looks like they have native support and a few examples to play with.
I have done a union pro tutorial. Next video is about flux region. We made custom nodes for that very cool concept. After that video will make a full workflow tutorial using these tools as well. We can do some amazing restoring and conversion using these tools union pro.
I'll be sure to check them both out. Thank you! :)
A quick note:
There is Impact Logical Operations node that you can use for most of the Boolean logic. It reduces the complexity of switches.
How? Elaborate, Some switches are user manual controlled.
@@controlaltai As I saw from your tutorial, you are using Select input to select elements. In practice you have 3 switches:
Image LLM
Text LLM
Modify Image LLM
Using logical operations you get
(Image LLM && Text LLM) || Image LLM
(Image LLM && Text LLM) || Text LLM
Modify image LLM is only valid if you go that Route.
All actions are switchable by 3 boolean buttons.
Can you please email me this integrated in the workflow. Interesting to learn something new. Because I am still confused. The switch is two only, not three. Its Image/text/Disable LLM (one choice) and modify (second choice). If Image then Modify otherwise ignore always. So the user has only two option. Modify is choice and not always. (mail @ controlaltai . com withoutspaces)
i learn something new today. Thanks!
You are the best ai tutorial in the world
COULD YOU HELP ME? i cant find the .json file for that giant workflow you showed here 2:00
I build that giant workflow in the video. The whole video is about how to build that workflow.
That flow is ridiculous! 😂😂😂
hmm I cant find Florence2Run node at 55:15. I have Florence-2 but that specific node doesnt show up
That node comes from Sam2 I think. Ensure you have all mentioned custom nodes installed as per the required section.
@@controlaltai thanks, it was the ComfyUI-Florence2 node from kijai that I had missed
This is amazing, not only for the final result, but as a learning tool. I have learnt more from this than any other process in ComfyUI. Thanks!
I think I'm missing something: in 1:00:57 , how do did you pause the workflow after the 1x Image generation and the inPaint portion? There seems to be a couple of icons on your groups that I don't have in mine (edit: found them on rgthree config), but how do you do it in a way that, after the 1x image generation, you no longer go through that portion of the process and only focus on the inPaint part? Maybe fixing the original seed is enough?. Maybe this is an extra module or feature or checkbox I have no idea where to find? Thanks for any insight.
Thanks. That's a setting in rg three it's new. Go to rg three setting and tick mark show fast toggle in groups. After this just left click anywhere in a group at the title bar right end side you should see a group mute and group bypass button, it's very faint.
Pausing is I mute the ksampler node. Since I mute that the workflow won't progress beyond that point . Control + m after clicking the node is the shortcut key.
@@controlaltai Thanks! Maybe I explained myself poorly. How do you make it so that you first get the 1x image, and then work quickly on the mask, without the main Flux Sampler running over and over again? Do you set the seed to fixed? Or is there a way to just "continue" from where that left off without fixing the seed? Thanks.
Okay so that depends on your workflow build. Comfy ui progress in a linear way. Say point a to point d. Now if you stop at point c, and don't make any changes to point a or b, you can keep working on c while d is disabled. Once you finish work on c you enable d and get the output. If for some reason you are at c, and without making any changes, it's starting from a then yeah some changes are happening at a. Seed should be fixed. Does this explain it?
@@controlaltai Completely, thanks. Makes perfect sense.
Nice downloading now
at 28:02 you said you would include in the description but I think you've forgotten to do that. I've included it here:
Objective: Modify the given text prompt based on specific change instructions while preserving all other details.
Read the Change Instruction: Indentify the specific change(s) to be made as provided in the next line.
"Reimagine the room decor as a kids room who loves space and astronomy"
Implement the Change: Apply the specified modification(s) to the text.
Preserve Original Context: Ensure that all other aspects of the text, including descriptions, mood, style, and composition remain unchanged.
Maintain Consistency: Keep the language style and tone consistent with the original text.
Review Changes: Verify that the modifications are limited to the specified change and that the overall meaning and intent of the text are preserved.
Provide Response: Just output the modified text. Ensure your response is free of additional commentary.
I rechecked, its already included in the description.
@@controlaltai maybe I'm just blind 😅😅
@@gohan2091 its easy to overlook, there is so much going on here, I had forgotten a couple of things as well.
There is a cut at 1:04:36 where you left something out. You added a ModelSamplingFlux node and some other nodes. But you don't go back and explainn it. Can you tell what happend? Which model do you drop in there?
Hi, Thanks for pointing this out, i will add this information to the pin post, it is an accidental cut, You have to add model sampling flux node, use get to connect width height and model as shown in this screenshot:
drive.google.com/file/d/13jd_c9SwZw7Tza9T3N5XAW2ZB3XGlpBf/view?usp=sharing
Amazing and not much spaghetti :D
48:45
image right at the beginning of the workflow add an impact integer node and set its value to one connect it with the. shouldn't the interger be set to 2?
Default is 1. When it's 1 img2img is off. When it's 2 img2img is on. Later on when showcasing how to use img2img, I mention the same. By default you want it off, unless you specifically need to use img2img.
@@controlaltai very nice tutorial i have learned a lot thanks for your time and effort
Once again thanks so much for this. I am learning so much. I have still some few questions, if you would extend your generosity, I would like to pick your brain: if the 2.52x step renders too soft an image, what would be a reasonable replacement to the realvizxlv40 upscaler? The 1.25x renders consistently perfect results, but I find the 2.5 (and consequently the 5x) a bit too soft. Thanks once more.
Hi, welcome. The 5x is what the 2.5x is. So if the 2.5x is sharp 5x will be sharp. Do this, skip 2.52x. In the 5.04x group change the input to 1.26x in the get node. Disconnect the upscale by image node where we downscale the image by 0.5. what this will do is take the 1.26x image which you are finding sharp and upscale it 4x without downscaling and will skip the sdxl input entirely. Alternatively if you want to maintain the sample flow, try this. Change the upscale model in sdxl. Use the same model used in 1x details, and not in 1.26x. This may however over sharpen the image but depends on the image. This should have a more prominent effect than changing the checkpoint. However try juggernaut or epic realism kiss. Let me know which solution gave you desired results, curious as well.
@@controlaltai Thanks for this well though out reply. I need to do a bit more testing, but directly going into 5.04x without the 2.52 and without the 0.5x really yielded unusable results. It's too much of a direct step... realistic photos almost look like a painting.
As you foretold, changing the model to to Upscale Model made the biggest difference... I see your point about overly sharpened, but I think that I like it depending on the scene.
Changing the checkpoints has much less of an impact, but still clearly noticeable on further-away faces. I believe epicrealismXL_v8Kiss yielded the better results for me (hopefully that was the one you recommended? Not sure I was able to match your suggestions 1:1, specially with the juggernaut, there are just too many workflows named the same).
Will do some more testing for sure, but I believe that now that I understand better the impact of each stage, it would be easier to start making finer adjustments.
Thanks once more!
We faced the exact same problem when going from 1.26x to 5x directly. In fact I was not satisfied with in between 2.5x as well. The faces come out realistic, like skin, however it becomes plastic during the upscale process. A clear cut solution to this would be latent upscale, which is avoided for plenty of reasons here, text and flux cannot do 1.5x upscale without causing scan lines when using any Lora. So we came up with our on solution and to simplify it for the user we end up creating a new node. So basically in part 2 of the video will cover this. For you it would simply be enabling a setting called preserve details, everything will happen automatically in the workflow, the faces will be ultra realistic like the 1x detail just 5x upscale no plastic lines or texture. The node will come in a section right inbetween 1x detail and 1.26x upscale. To simply explain it in the node injects Gaussian noise on the pixel 1x detail output and blends it using soft light technique before passing it to the 1.26x k sampler. Since there is noise injection, when enable we designed the updated workflow to disable the 2.52x pixel on pixel upscale. The results are quite satisfactory. Stay tuned this will take some time for release.
I'm new to Flux and keen to try your Flux Resolution Calculator node shown in your video. However, I notice below 1.0 megapixel it currently only supports 0.1 and 0.5, but 0.6 is the ideal mp I need. Do you have similar node that allow manual mp input (so I can input 0.6) or is there any workaround I can explore? Thanks
No, I should update the node to give that option. If you want a workaround then you have to use a bunch of math nodes for multiplication and divisibility and calculate the entire formulation.
please include controlnet like union or best performing ones and also the role of mistoline in flux, thanks
I made a ControlNet union node (compatible with instantx). Part 2 of the video will be coming shortly. Probably by next week. I don't like x labs will check mistoline.
I respect the lecture that contains so much. My com is 3080, so it took me a day to implement 3/2 of your lecture. Is hyper dev possible?
I am working on another video to integrate guff. I am not sure about the quality. See the basic is the quality should be better than sdxl. If you are using quantization models which give you poorer output than sdxl, then it's not worth switching. We need extensive testing. This video was released nearly after 20 days of very extensive testing. If the test are fine then probably in 2 to 3 weeks I plan to make Guff, Hyper Dev and ControlNet integration to this base workflow. My advice is to wait for the video, basically it would just have the same workflow logic only for lower vram.
Thank you for your reply!! I will wait!!!!❤
While I've pulled the llava-llama3:latest model with ollama, it doesn't show up in the ollama vision node. Only llava:latest. I'm on Ubuntu 24.04
ollama.com/library/llava-llama3 - here is the model, pull as shown. its a 5.5 gb model. After pulling you have to close and restart comfy ui, ensure Ollama app is opened in the bg.
39:32 you "select input 3" to disable LLM. but your TEXT LLM is enabled. Why?
i set my workflow like yours (i hope), but my disable LLM do disable all LLM. (thats what it should do, right?)
Yes 3 disables all Llm, and pushes custom conditioning. Text llm It's not enabled. Pause at 39:35 and recheck. All llm are disabled. Only custom conditioning is enabled.
@@controlaltai i am glad to hear that. i was irritated because before starting the "queue" the "show text" beyond the LLM was empty. after pressing the queue there is a prompt in it. thx for response. i am fine. btw great vid.
i been able to build it but getting some issues, is it possible to make screenshots of the Node Groups to be able to compare it and see what i did wrong? (like the update of the model sampling on the comment)
Off ourse sure. Send me whatever you have via email. mail @ controlaltai. com (without spaces) will revert directly there.
At 22:16 he tell me to ad a GetNode Text Boolean but there is not such thing in my dropdown menu. What am I doing wrong? :(
Hi, look at 17:53, I tell where to define text and image Boolean .
there is so much to unpack here! for both experienced and newcomers! Im mid way building it, but I hit a brick wall with Layer Styles - I noticed when I was adding the purge Vram - it was not coming up then in the manager it says "Import Failed" I tried everything to get it running. I uninstalled it and re installed it through manager, I tried to "FIX it, through manager", I uninstalled then did a GIT CLONE into the custom node folder, I installed the Pip dependencies inside the node folder.. Im just not sure what else to try... Any ideas?
Thanks!! Send me an email, will try and see why the import is failing for you. You are on windows with comfy portable locally right? Git Clone a custom node doesn't help. Install via manager, boot comfy and send me the entire cmd command text. The reason for the import fail will be there. mail @ controlaltai . com (without spaces).
thank you!
What if I don't get a window popping up with modifying prompts?
You won't get any window popup. Modifying prompt you have to change the switch to modify prompt and enter the modification.
@@controlaltai I am a layman but I have grasped the whole workflow from the tutorial and this is the only thing I don't understand, how to do it? :P
I just explained it in the above comment. The tutorial is advanced. You need to be quite familiar with comfy and basics of comfy and diffusion process.
@@goodsniper7766 He goes over that part pretty thoroughly. You change the text within the quotes within the 'Modify Image LLM' node and change the switch to Modify Image LLM
Just looking at this video for the second time i' ve heard you say that if you are short on VRAM some Lora would not work, do you mean that they are not loaded but don't cause a CUDA out of memory crash.
Comfy by default uses smart memory management. What this does is takes up your entire vram and then will load unload models when it thinks are not important. I think with flux it's buggy. So, say flux takes 90% vram and two loras require 15%. Now comfy unloads 1 lora and runs the whole thing without crashing.
I know people observing the same behavior Google Style Align. On 12 gb vram, when they do 8 images it's breaks after 4, later on we found out comfy was CV Lear cache in attention, which is required by style align. Only solution was to reduce batch size as per vram at that time.
A cuda out of memory crash is an actual crash. When i disabled smart memory vram usage with comfy went down to 80%.
@@controlaltai So it would be better to disable smart memory?
I have a 12 GB 3080 ti and it works fine even with the fp16, slows down sometimes and rarely crashes. I just clear the vram and then it's fine for a while. But i have noticed that sometimes some Lora would not work.
Some loras ask to use skip layer but it seems to crash everytime i try it with flux. it crashed during the prompt node.
@kukipett yeah try with disabling smart memory and see if that solves generation issues, it did for me on my 4090. Don't use skip clip. Monitor the vram usage when it loads the lora.
Awesome video and awesome workflow! Thank you :)
Does anyone knows what ComfyUI interface extension is is he using? What is the bar at the top, please?
Glad you found the video helpful. This is official comfy, no extension. Go to settings, comfy beta by default is disabled. Enable the bar top or bottom.
I find it in settings :)
Hey controlaltai could you make an updated animation video on the best way to make an ai animation, theres better tons of softwares and workflows now!
I don’t think I have made any animation video like animate diff, except stable video diffusion on the channel. Only made two video about svd.
Will it look great also with Nf4 +Lora?
No nf4 is deprecated for GGUF models as per the comfy dev. Obviously quality won't be the same. I have to test GGUF properly before I can answer your question.
Is it possible to run this entire project with 16GB of RAM?
With GGUF model you can, its not the size of the project but rather the size of the flux model. IN part 2 of the video, I will show how to run this with lower than 12 gb VRAM. If you disable system smart memory, it should run in 16gb vram as it is, provided your GPU is fully free of any other task.
@@controlaltai Very interesting, because I got to the step where the first image was generated with DualClipLoader, but in my process it dies due to excess memory.
My configuration is like this at the moment without any modifications:
Total VRAM 12288 MB, total RAM 5934 MB
pytorch version: 2.4.1+cu121
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3060 : cudaMallocAsync
@@fatosmindset Ohh you have a 12GB VRAM, I though you said 16gb. 12GB is too low for fp8 flux. You have to use a GGUF model of flux which works for 12GB or lower VRAM. You can search about which GGUF model is compatible, as the next flux video is still 1-2 weeks away.
that workflow lol.... my brain is not ready for that...
how to add ControlAltAI Nodes to comfyui?
Like any other custom nodes with Comfy Manager
I tried the workflow and it works well except that the image description is extremely censored. Well it's ok if you intend to show a cat or a woman in everyday clothes but the moment you have to describe something nomal but with less textile the Ollama totally ignores it. I had a picture of a barabrian woman riding a horse dressed with a long leather skirt and a leather top but the AI insisted to tell me that she was wearing a full armor and when i tried to modify the prompt to say that she is wearing a leather bikini top, the AI just said yes but insisted to add a leather jacket!! That was funny and ridiculous, well i guess that this Ai has been designed by a group of Amish and Mormons under the guidace of a very concerned preacher!
Yeah I get what you are saying. We should test on another LLM model that works. There are other vision models, I have not tested them. You can also use Gemini, but that is so accurate however its more censored. It does not read people, no matter what clothes. Plus you have to pay so much for the API. Chat GPT 4o, works best, but API is too costly, if you upload it website and put an instruction asking everything in the vision node, it gives the best results and is not censored at all. Again not an Ideal Solution. Also you can replace the Ollama nodes with any other Custom LLM node, if there are some models not in Ollama.
Also, this was using the 7b model. They have a 13b and a 34 billion model. Maybe they are way better, technically they should be. Not sure if it works with flux. Another thing is to overcome this, enable the image2 image setting, and set the max shift on option 2 or 3. You should get something similar to your original.
@@controlaltai Well after many tests with that LLM, i can just say that it is a scam, that AI is absolute crap. Oh it's great at talking and talking using a great advanced language about.. nothing. Just banalities and supposed intellectual thoughts made to impress the average Joe, the kind of conversation you can hear at some parties by people who don't understand what they are saying but are proud to lecture you with their knowledge they have found on internet.
And worse this AI is a blatant liar, so many time it just added things that don't exist in the image or ignores the most obvious. I'm starting to think that it just gets the average feeling of the picture and then just tells you anything, hoping it would maybe match the reality.
I was using Florence before and it gives at least a very good description, very accurate, shorter but containing 10 times more usefull informations.
Well if you want to impress some idiots, just use that LLM to write them some message, they will probably not get it and get bored and will cautiously avoid you, so in fact it's a great idea! 😆
@@controlaltai If it helps, there is one model known for being uncensored: Mistral. Apparently it is considered underrated but I haven't tested so no direct knowledge.
Can you leave an email for consultancy?
mail @ controlaltai . com (without spaces)
@@controlaltai thanks, btw if you add canny with depth will not it distract the process? I am thinking about union cn...
It's not stable I am getting depth control somewhat without control net. Depending on the end goal, once union is stable, would implement in this workflow but if you are getting your desired output with only flux, no need for control net. As the control net requires extra vram as well, there is enough for stage one, as I ensured that upscaling is based without conditioning. So we can unload cc after flux core output.
Before I even begin with this video, can someone tell me whether the workflow would work with a 12GB VRAM card? I fast-forwarded, saw LLMs, SAM2 with Flux, etc. and stopped. 😂
LLM and SAM load and offload, and its optional, as by default its disabled, unless you want to inpaint, even after that it loads and unloads. Workflow is designed to run with logical processing. Its not a matter if the workflow runs or not, its whether, raw Flux w/t5xxl runs on your System. Flux 1, Dev FP8 is a 17GB model with t5xxl, so 12gb won't cut it. You need to use GUFF, that can be integrated in the same workflow, however, it's for another video, this video is very long as it is.
@@controlaltai how this performs with gguf Q8 or Nf4?
@@controlaltai Yea, so my assumption was right that it may not work. People are using Swarm UI and Flux even on 12GB machines, so when a ComfyUI workflow comes out like that, you begin to wonder. Glad I asked before diving in, shall stay tuned for your other video.
Not tested with GUFF, will do some testing this week.
@@controlaltai Made the entire workflow and ran the test. GGUF models are working fine with it, though they are slower. Also, dunno why people say one won't be able to generate using the FP8 model/ Q8_O model on a 12GB card, they worked fine on mine. *PS: Dunno how comfy does it, but I just used the full 23GB size Flux Dev model with it for the Sam2 x Florence 2 inpaint and was able to get the results in like 96secs. I was able to run it on my card using Swarm UI anyway, so some kinda resource management thingy does exist.
looks like a motherboard