ComfyUI: Flux with LLM, 5x Upscale Part 1 (Workflow Tutorial)

Поділитися
Вставка
  • Опубліковано 29 січ 2025

КОМЕНТАРІ • 128

  • @controlaltai
    @controlaltai  5 місяців тому +26

    Update (Sep4, 2024): Accidental cut in the video 2s after 01:04:34. You have to add The Model Sampling Flux Node. Use the Get nodes to connect the Model, Width and Height. As per screenshot: drive.google.com/file/d/13jd_c9SwZw7Tza9T3N5XAW2ZB3XGlpBf/view?usp=sharing
    Update (Sep 5, 2024): Flux Inpainting, The Preview Bridge node in the latest update has a new option called "Block". Ensure that it is set to "never" and not "if_empty_mask". This will allow the preview bridge node to pass on the mask. If set to "if_empty_mask", the node will block inpaint model conditioning input, as the switch default is set to SAM + Custom. I had asked the dev to update the node so that the default behavior is always set to never, he has fixed and done the same. Update the node again to the latest version.
    Update (Sept 2, 2024): The LLM used are very small models. They have a higher variant as well. The "llava:34b" model for vision performs the best, but require 24 Gb VRAM. There is a 13B version for lower VRAM. The llama3.1 also has a 8b instyrcut model with fp16 "llama3.1:8b-instruct-fp16". This requires 17Gb VRAM as well. I have tested it, there are no issues with the workflow, as we unload the LLM as soon as its done. Llama 3.1 Also has a 70B and 400B parameter, cannot be run on consumer grade hardware. If you require such models, you can run them via API, Chat GPT 4o or Gemini Pro (non humans) perform best for API usage. To use API, just replace the node, ensure the connections are same. The workflow will maintain the logic.
    The video is quite long, and had initially planned to include GGUF, but couldn't. ControlNet support for Flux was added in comfy, and was not stable at the time of recording the video. And more tests are anyways needed. Will probably make another video with ControlNet and GGUF if they warrant a tutorial on the channel.

    • @SpikeColgate
      @SpikeColgate 4 місяці тому

      ControlNet would be super cool in this flow. Nice work...

    • @controlaltai
      @controlaltai  4 місяці тому

      @SpikeColgate Yup next video. Check the members post. Just reveal what's upcoming.

  • @dankazama09
    @dankazama09 4 місяці тому +6

    I just wanted to express my gratitude for your tutorials! Instead of simply providing workflows to download, you're taking the time to explain how each node functions. Thanks to your guidance, I'm starting to grasp how ComfyUI works, and it's making a big difference. Keep up the amazing work! thank you!

  • @mikedoug7
    @mikedoug7 4 місяці тому +3

    I'm new playing around with ComfyUI and image generation, and this is the first demo that really showed me the potential of the entire infrastructure. It helped my learning immensely to build this workflow out step by step, following along as you explained each piece. Bravo, well done, and I hope you continue to produce more such great content!

  • @iArthurA
    @iArthurA 4 місяці тому +1

    I managed to get the workflow fully assembled this weekend, everything works great. This is the most interesting video I've ever seen on this topic. And thank you for helping me find the lower case letter bug in my workflow. Looking forward to more videos.

  • @kukipett
    @kukipett 5 місяців тому +2

    Wow !, this gives me a headhache but a good one, i've learned so much in just this video. So many things seems so much simpler, with just some basic knowledge that many other youtubers seem to ignore in their tutorials. All that logic part is great to simplify the workflow.
    That was great to be able to understand most of it even if i'm just learning ComfyUI since less than a week.
    The video was really fast so i slowed it down to 50% so i had more time to get what you were doing.
    Thanks a lot for all this knowledge that surely took a lot of time to prepare and to put in a video.

  • @livinagoodlife
    @livinagoodlife 3 місяці тому

    Finished! Thanks so much for this awesome tutorial. Its taught me a lot about ComfyUI that I can now utilise with other workflows. I havent yet ran the 5.04 upscale but I have to say that the 2.52x upscale definitely softens the image over the 1.261x. I tried with a couple of different sampling methods and images

    • @controlaltai
      @controlaltai  3 місяці тому

      Hi, That 1.261x upscale is designed to soften the upscale on purpose. Because when you run the 5x upscale it will balance it out. If you don't want to run the 5x upscale, use the same upscale model as 1x details for 1.26x upscale. The Upscale will be sharper.

  • @Novalis2009
    @Novalis2009 4 місяці тому +1

    Exactly what I was looking for. Good work!

  • @christofferbersau6929
    @christofferbersau6929 5 місяців тому +1

    great, I was looking for a good workflow for Flux, appreciate that you go through the whole thing

  • @xandervera01
    @xandervera01 5 місяців тому +3

    Seems like every time I get a little more comfortable in Comfy, a new development comes along to make us restart again T_T
    Thank you so much for this, learning about all brand new things in AI image generation is not so easy!

    • @Kvision25th
      @Kvision25th 5 місяців тому

      that's right I stick with sd1.5 till now, start using pony and got in flux......now i'm brain dead

    • @---Nikita--
      @---Nikita-- 4 місяці тому

      ​@@Kvision25thsame

  • @phaedon_Imperius
    @phaedon_Imperius 4 місяці тому

    My man I am using comfy UI for months trying to figure out what goes wrong there and what goes wrong here... It was an excellent example of how we could use the visual scripting of ComfyUI, like a master class! Thanks a lot for your contribution to the community!

  • @zecetry4970
    @zecetry4970 4 місяці тому

    I was having trouble with pose style transfer with all control nets but this is working like a magic most of the time and the upscaler! I stucked on 4x and my computer was on fire! now i can get 5x faster! Thank you so much. You rock!

    • @zecetry4970
      @zecetry4970 4 місяці тому

      Btw what do you think about supir? I think it is giving me better results in terms of vector style images. what do you think about using ultimate sd and then supir?

    • @controlaltai
      @controlaltai  4 місяці тому +1

      SUPIR is no good with flux if using text. Otherwise SUPiR is great. However, you cannot go 5x in supir. I wanted to change the approach as I figured a lot of text would be done by many people. To be honest the current upscale I showed has limitations. In process of making another video which would be part 2. We managed to apply a noise injection and blend method with our own node which would give you very realistic skin or textures after upscale, no plastic/velvet looking lines. It works for other types of images as well like landscapes. Basically any part of the image which has over soft lines. And in this method there is no ultimate sd upscale. We skip that and still get 5x upscale.

  • @gvstello
    @gvstello 2 місяці тому

    insane bro... that is a largest logic(wokflow) i ever seen

  • @systematicpsychologic7321
    @systematicpsychologic7321 5 місяців тому

    Absolutely fantastic tutorial, learned a lot here. Finally got some image/vision stuff set up thanks to it, much appreciated.

  • @jamesharrison8156
    @jamesharrison8156 3 місяці тому +1

    With the 1.261 upscale, an alternative is to downscale by 0.3125 instead of 0.315. The math for this is the desired scaling (1.25) divided by the upscale done by the model (4). So you get 1.25/4 which is 0.3125. If you had a height of 1024 and you scale up by 4 (4096) and then down by 0.3125 you get 1280, the same as 1024 * 1.25. The problem of how to get a float constant with that amount is to put in a math node with an expression of 1.25 / 4.0 and use the float output.

    • @controlaltai
      @controlaltai  3 місяці тому

      lol I know, thanks for the input will see what I can do for the video 2. Will create a node to accept that constant and expression if necessary. Did not go that route since it was 1.261x and there was no harm in that, was working with flux and no scan lines.

  • @juani1
    @juani1 4 місяці тому

    Amazing tutorial mate! thanks a lot for all the effort 🏆

  • @angelovestieri9013
    @angelovestieri9013 3 місяці тому

    Amazing detailed video, thank you!

  • @pedroj.fernandez
    @pedroj.fernandez 5 місяців тому

    I've learned a lot! Good and complete video! Congrats!

  • @RafaPolit
    @RafaPolit 4 місяці тому

    Finally got to the implementing all the bells and whistles. I added a small section with the main Flux configurations like Guidance, Steps, Sampler Scheduler and seed, extracted from the complex flow. Also, I have a 12GB 3080ti GPU, so GGUF was a must for me... I'm able to use Q8 with both your recommended LoRAs with about 3 to 4.5 sec/it on EULER/BETA, so really happy with it. As you said, I don't think the GGUF switch merits another video, it's just replacing the Unet Model Loader, and the Dual CLIP loader with tGGUF versions, and that's it. At least I hope so. I'm really happy with the results. Thanks a bunch!

  • @Daniel-zy4by
    @Daniel-zy4by 2 місяці тому

    was just about to sit and watch this, and saw that Black Forest Labs have released their own tools, hope you do a tutorial for them :) Fill, Depth, Canny, Redux

    • @controlaltai
      @controlaltai  2 місяці тому

      Yeah those are different. Will explain them in a tutorial. However highly recommend you watch this one for the logical flow understanding.

    • @Daniel-zy4by
      @Daniel-zy4by 2 місяці тому

      @@controlaltai Yes! I am excited to see what they can do, while grateful for the efforts other groups who have been making canny and depth, inpaint, and ipadapters, I have not been impressed. I have much higher expectations with these. I took a brief look at the Comfyui site and it looks like they have native support and a few examples to play with.

    • @controlaltai
      @controlaltai  2 місяці тому +1

      I have done a union pro tutorial. Next video is about flux region. We made custom nodes for that very cool concept. After that video will make a full workflow tutorial using these tools as well. We can do some amazing restoring and conversion using these tools union pro.

    • @Daniel-zy4by
      @Daniel-zy4by 2 місяці тому

      I'll be sure to check them both out. Thank you! :)

  • @_satanasov_
    @_satanasov_ 4 місяці тому

    A quick note:
    There is Impact Logical Operations node that you can use for most of the Boolean logic. It reduces the complexity of switches.

    • @controlaltai
      @controlaltai  4 місяці тому

      How? Elaborate, Some switches are user manual controlled.

    • @_satanasov_
      @_satanasov_ 4 місяці тому

      @@controlaltai As I saw from your tutorial, you are using Select input to select elements. In practice you have 3 switches:
      Image LLM
      Text LLM
      Modify Image LLM
      Using logical operations you get
      (Image LLM && Text LLM) || Image LLM
      (Image LLM && Text LLM) || Text LLM
      Modify image LLM is only valid if you go that Route.
      All actions are switchable by 3 boolean buttons.

    • @controlaltai
      @controlaltai  4 місяці тому

      Can you please email me this integrated in the workflow. Interesting to learn something new. Because I am still confused. The switch is two only, not three. Its Image/text/Disable LLM (one choice) and modify (second choice). If Image then Modify otherwise ignore always. So the user has only two option. Modify is choice and not always. (mail @ controlaltai . com withoutspaces)

  • @dankazama09
    @dankazama09 4 місяці тому

    i learn something new today. Thanks!

  • @mupmuptv
    @mupmuptv 5 місяців тому

    You are the best ai tutorial in the world

  • @RodrigoSantos-gw9mw
    @RodrigoSantos-gw9mw 4 місяці тому +1

    COULD YOU HELP ME? i cant find the .json file for that giant workflow you showed here 2:00

    • @controlaltai
      @controlaltai  4 місяці тому

      I build that giant workflow in the video. The whole video is about how to build that workflow.

  • @Pauluz_The_Web_Gnome
    @Pauluz_The_Web_Gnome 5 місяців тому

    That flow is ridiculous! 😂😂😂

  • @livinagoodlife
    @livinagoodlife 3 місяці тому

    hmm I cant find Florence2Run node at 55:15. I have Florence-2 but that specific node doesnt show up

    • @controlaltai
      @controlaltai  3 місяці тому +1

      That node comes from Sam2 I think. Ensure you have all mentioned custom nodes installed as per the required section.

    • @livinagoodlife
      @livinagoodlife 3 місяці тому

      @@controlaltai thanks, it was the ComfyUI-Florence2 node from kijai that I had missed

  • @RafaPolit
    @RafaPolit 4 місяці тому

    This is amazing, not only for the final result, but as a learning tool. I have learnt more from this than any other process in ComfyUI. Thanks!
    I think I'm missing something: in 1:00:57 , how do did you pause the workflow after the 1x Image generation and the inPaint portion? There seems to be a couple of icons on your groups that I don't have in mine (edit: found them on rgthree config), but how do you do it in a way that, after the 1x image generation, you no longer go through that portion of the process and only focus on the inPaint part? Maybe fixing the original seed is enough?. Maybe this is an extra module or feature or checkbox I have no idea where to find? Thanks for any insight.

    • @controlaltai
      @controlaltai  4 місяці тому

      Thanks. That's a setting in rg three it's new. Go to rg three setting and tick mark show fast toggle in groups. After this just left click anywhere in a group at the title bar right end side you should see a group mute and group bypass button, it's very faint.
      Pausing is I mute the ksampler node. Since I mute that the workflow won't progress beyond that point . Control + m after clicking the node is the shortcut key.

    • @RafaPolit
      @RafaPolit 4 місяці тому

      @@controlaltai Thanks! Maybe I explained myself poorly. How do you make it so that you first get the 1x image, and then work quickly on the mask, without the main Flux Sampler running over and over again? Do you set the seed to fixed? Or is there a way to just "continue" from where that left off without fixing the seed? Thanks.

    • @controlaltai
      @controlaltai  4 місяці тому +1

      Okay so that depends on your workflow build. Comfy ui progress in a linear way. Say point a to point d. Now if you stop at point c, and don't make any changes to point a or b, you can keep working on c while d is disabled. Once you finish work on c you enable d and get the output. If for some reason you are at c, and without making any changes, it's starting from a then yeah some changes are happening at a. Seed should be fixed. Does this explain it?

    • @RafaPolit
      @RafaPolit 4 місяці тому

      @@controlaltai Completely, thanks. Makes perfect sense.

  • @stopminingmydata
    @stopminingmydata 5 місяців тому +1

    Nice downloading now

  • @gohan2091
    @gohan2091 4 місяці тому

    at 28:02 you said you would include in the description but I think you've forgotten to do that. I've included it here:
    Objective: Modify the given text prompt based on specific change instructions while preserving all other details.
    Read the Change Instruction: Indentify the specific change(s) to be made as provided in the next line.
    "Reimagine the room decor as a kids room who loves space and astronomy"
    Implement the Change: Apply the specified modification(s) to the text.
    Preserve Original Context: Ensure that all other aspects of the text, including descriptions, mood, style, and composition remain unchanged.
    Maintain Consistency: Keep the language style and tone consistent with the original text.
    Review Changes: Verify that the modifications are limited to the specified change and that the overall meaning and intent of the text are preserved.
    Provide Response: Just output the modified text. Ensure your response is free of additional commentary.

    • @controlaltai
      @controlaltai  4 місяці тому

      I rechecked, its already included in the description.

    • @gohan2091
      @gohan2091 4 місяці тому

      ​@@controlaltai maybe I'm just blind 😅😅

    • @controlaltai
      @controlaltai  4 місяці тому +1

      @@gohan2091 its easy to overlook, there is so much going on here, I had forgotten a couple of things as well.

  • @Novalis2009
    @Novalis2009 4 місяці тому

    There is a cut at 1:04:36 where you left something out. You added a ModelSamplingFlux node and some other nodes. But you don't go back and explainn it. Can you tell what happend? Which model do you drop in there?

    • @controlaltai
      @controlaltai  4 місяці тому +1

      Hi, Thanks for pointing this out, i will add this information to the pin post, it is an accidental cut, You have to add model sampling flux node, use get to connect width height and model as shown in this screenshot:
      drive.google.com/file/d/13jd_c9SwZw7Tza9T3N5XAW2ZB3XGlpBf/view?usp=sharing

  • @rolarocka
    @rolarocka 4 місяці тому

    Amazing and not much spaghetti :D

  • @timbersavage90
    @timbersavage90 4 місяці тому

    48:45
    image right at the beginning of the workflow add an impact integer node and set its value to one connect it with the. shouldn't the interger be set to 2?

    • @controlaltai
      @controlaltai  4 місяці тому +1

      Default is 1. When it's 1 img2img is off. When it's 2 img2img is on. Later on when showcasing how to use img2img, I mention the same. By default you want it off, unless you specifically need to use img2img.

    • @timbersavage90
      @timbersavage90 4 місяці тому

      @@controlaltai very nice tutorial i have learned a lot thanks for your time and effort

  • @RafaPolit
    @RafaPolit 4 місяці тому

    Once again thanks so much for this. I am learning so much. I have still some few questions, if you would extend your generosity, I would like to pick your brain: if the 2.52x step renders too soft an image, what would be a reasonable replacement to the realvizxlv40 upscaler? The 1.25x renders consistently perfect results, but I find the 2.5 (and consequently the 5x) a bit too soft. Thanks once more.

    • @controlaltai
      @controlaltai  4 місяці тому

      Hi, welcome. The 5x is what the 2.5x is. So if the 2.5x is sharp 5x will be sharp. Do this, skip 2.52x. In the 5.04x group change the input to 1.26x in the get node. Disconnect the upscale by image node where we downscale the image by 0.5. what this will do is take the 1.26x image which you are finding sharp and upscale it 4x without downscaling and will skip the sdxl input entirely. Alternatively if you want to maintain the sample flow, try this. Change the upscale model in sdxl. Use the same model used in 1x details, and not in 1.26x. This may however over sharpen the image but depends on the image. This should have a more prominent effect than changing the checkpoint. However try juggernaut or epic realism kiss. Let me know which solution gave you desired results, curious as well.

    • @RafaPolit
      @RafaPolit 4 місяці тому

      @@controlaltai Thanks for this well though out reply. I need to do a bit more testing, but directly going into 5.04x without the 2.52 and without the 0.5x really yielded unusable results. It's too much of a direct step... realistic photos almost look like a painting.
      As you foretold, changing the model to to Upscale Model made the biggest difference... I see your point about overly sharpened, but I think that I like it depending on the scene.
      Changing the checkpoints has much less of an impact, but still clearly noticeable on further-away faces. I believe epicrealismXL_v8Kiss yielded the better results for me (hopefully that was the one you recommended? Not sure I was able to match your suggestions 1:1, specially with the juggernaut, there are just too many workflows named the same).
      Will do some more testing for sure, but I believe that now that I understand better the impact of each stage, it would be easier to start making finer adjustments.
      Thanks once more!

    • @controlaltai
      @controlaltai  4 місяці тому

      We faced the exact same problem when going from 1.26x to 5x directly. In fact I was not satisfied with in between 2.5x as well. The faces come out realistic, like skin, however it becomes plastic during the upscale process. A clear cut solution to this would be latent upscale, which is avoided for plenty of reasons here, text and flux cannot do 1.5x upscale without causing scan lines when using any Lora. So we came up with our on solution and to simplify it for the user we end up creating a new node. So basically in part 2 of the video will cover this. For you it would simply be enabling a setting called preserve details, everything will happen automatically in the workflow, the faces will be ultra realistic like the 1x detail just 5x upscale no plastic lines or texture. The node will come in a section right inbetween 1x detail and 1.26x upscale. To simply explain it in the node injects Gaussian noise on the pixel 1x detail output and blends it using soft light technique before passing it to the 1.26x k sampler. Since there is noise injection, when enable we designed the updated workflow to disable the 2.52x pixel on pixel upscale. The results are quite satisfactory. Stay tuned this will take some time for release.

  • @Nick-q4o
    @Nick-q4o 6 днів тому

    I'm new to Flux and keen to try your Flux Resolution Calculator node shown in your video. However, I notice below 1.0 megapixel it currently only supports 0.1 and 0.5, but 0.6 is the ideal mp I need. Do you have similar node that allow manual mp input (so I can input 0.6) or is there any workaround I can explore? Thanks

    • @controlaltai
      @controlaltai  6 днів тому

      No, I should update the node to give that option. If you want a workaround then you have to use a bunch of math nodes for multiplication and divisibility and calculate the entire formulation.

  • @eveekiviblog7361
    @eveekiviblog7361 4 місяці тому

    please include controlnet like union or best performing ones and also the role of mistoline in flux, thanks

    • @controlaltai
      @controlaltai  4 місяці тому +1

      I made a ControlNet union node (compatible with instantx). Part 2 of the video will be coming shortly. Probably by next week. I don't like x labs will check mistoline.

  • @moviecartoonworld4459
    @moviecartoonworld4459 4 місяці тому

    I respect the lecture that contains so much. My com is 3080, so it took me a day to implement 3/2 of your lecture. Is hyper dev possible?

    • @controlaltai
      @controlaltai  4 місяці тому +1

      I am working on another video to integrate guff. I am not sure about the quality. See the basic is the quality should be better than sdxl. If you are using quantization models which give you poorer output than sdxl, then it's not worth switching. We need extensive testing. This video was released nearly after 20 days of very extensive testing. If the test are fine then probably in 2 to 3 weeks I plan to make Guff, Hyper Dev and ControlNet integration to this base workflow. My advice is to wait for the video, basically it would just have the same workflow logic only for lower vram.

    • @moviecartoonworld4459
      @moviecartoonworld4459 4 місяці тому

      Thank you for your reply!! I will wait!!!!❤

  • @koudkunstje
    @koudkunstje 3 місяці тому

    While I've pulled the llava-llama3:latest model with ollama, it doesn't show up in the ollama vision node. Only llava:latest. I'm on Ubuntu 24.04

    • @controlaltai
      @controlaltai  3 місяці тому

      ollama.com/library/llava-llama3 - here is the model, pull as shown. its a 5.5 gb model. After pulling you have to close and restart comfy ui, ensure Ollama app is opened in the bg.

  • @hoylogefkal
    @hoylogefkal 4 місяці тому

    39:32 you "select input 3" to disable LLM. but your TEXT LLM is enabled. Why?
    i set my workflow like yours (i hope), but my disable LLM do disable all LLM. (thats what it should do, right?)

    • @controlaltai
      @controlaltai  4 місяці тому +1

      Yes 3 disables all Llm, and pushes custom conditioning. Text llm It's not enabled. Pause at 39:35 and recheck. All llm are disabled. Only custom conditioning is enabled.

    • @hoylogefkal
      @hoylogefkal 4 місяці тому

      @@controlaltai i am glad to hear that. i was irritated because before starting the "queue" the "show text" beyond the LLM was empty. after pressing the queue there is a prompt in it. thx for response. i am fine. btw great vid.

  • @Willer202
    @Willer202 4 місяці тому

    i been able to build it but getting some issues, is it possible to make screenshots of the Node Groups to be able to compare it and see what i did wrong? (like the update of the model sampling on the comment)

    • @controlaltai
      @controlaltai  4 місяці тому

      Off ourse sure. Send me whatever you have via email. mail @ controlaltai. com (without spaces) will revert directly there.

  • @ernatogalvao
    @ernatogalvao 4 місяці тому

    At 22:16 he tell me to ad a GetNode Text Boolean but there is not such thing in my dropdown menu. What am I doing wrong? :(

    • @controlaltai
      @controlaltai  4 місяці тому

      Hi, look at 17:53, I tell where to define text and image Boolean .

  • @dkamhaji
    @dkamhaji 4 місяці тому

    there is so much to unpack here! for both experienced and newcomers! Im mid way building it, but I hit a brick wall with Layer Styles - I noticed when I was adding the purge Vram - it was not coming up then in the manager it says "Import Failed" I tried everything to get it running. I uninstalled it and re installed it through manager, I tried to "FIX it, through manager", I uninstalled then did a GIT CLONE into the custom node folder, I installed the Pip dependencies inside the node folder.. Im just not sure what else to try... Any ideas?

    • @controlaltai
      @controlaltai  4 місяці тому

      Thanks!! Send me an email, will try and see why the import is failing for you. You are on windows with comfy portable locally right? Git Clone a custom node doesn't help. Install via manager, boot comfy and send me the entire cmd command text. The reason for the import fail will be there. mail @ controlaltai . com (without spaces).

  • @agusdor1044
    @agusdor1044 5 місяців тому

    thank you!

  • @goodsniper7766
    @goodsniper7766 3 місяці тому

    What if I don't get a window popping up with modifying prompts?

    • @controlaltai
      @controlaltai  3 місяці тому +2

      You won't get any window popup. Modifying prompt you have to change the switch to modify prompt and enter the modification.

    • @goodsniper7766
      @goodsniper7766 3 місяці тому

      @@controlaltai I am a layman but I have grasped the whole workflow from the tutorial and this is the only thing I don't understand, how to do it? :P

    • @controlaltai
      @controlaltai  3 місяці тому +1

      I just explained it in the above comment. The tutorial is advanced. You need to be quite familiar with comfy and basics of comfy and diffusion process.

    • @livinagoodlife
      @livinagoodlife 3 місяці тому

      @@goodsniper7766 He goes over that part pretty thoroughly. You change the text within the quotes within the 'Modify Image LLM' node and change the switch to Modify Image LLM

  • @kukipett
    @kukipett 5 місяців тому

    Just looking at this video for the second time i' ve heard you say that if you are short on VRAM some Lora would not work, do you mean that they are not loaded but don't cause a CUDA out of memory crash.

    • @controlaltai
      @controlaltai  5 місяців тому

      Comfy by default uses smart memory management. What this does is takes up your entire vram and then will load unload models when it thinks are not important. I think with flux it's buggy. So, say flux takes 90% vram and two loras require 15%. Now comfy unloads 1 lora and runs the whole thing without crashing.
      I know people observing the same behavior Google Style Align. On 12 gb vram, when they do 8 images it's breaks after 4, later on we found out comfy was CV Lear cache in attention, which is required by style align. Only solution was to reduce batch size as per vram at that time.
      A cuda out of memory crash is an actual crash. When i disabled smart memory vram usage with comfy went down to 80%.

    • @kukipett
      @kukipett 5 місяців тому

      @@controlaltai So it would be better to disable smart memory?
      I have a 12 GB 3080 ti and it works fine even with the fp16, slows down sometimes and rarely crashes. I just clear the vram and then it's fine for a while. But i have noticed that sometimes some Lora would not work.
      Some loras ask to use skip layer but it seems to crash everytime i try it with flux. it crashed during the prompt node.

    • @controlaltai
      @controlaltai  5 місяців тому

      @kukipett yeah try with disabling smart memory and see if that solves generation issues, it did for me on my 4090. Don't use skip clip. Monitor the vram usage when it loads the lora.

  • @matejpergl7040
    @matejpergl7040 5 місяців тому

    Awesome video and awesome workflow! Thank you :)
    Does anyone knows what ComfyUI interface extension is is he using? What is the bar at the top, please?

    • @controlaltai
      @controlaltai  5 місяців тому

      Glad you found the video helpful. This is official comfy, no extension. Go to settings, comfy beta by default is disabled. Enable the bar top or bottom.

    • @matejpergl7040
      @matejpergl7040 5 місяців тому

      I find it in settings :)

  • @iseahosbourne9064
    @iseahosbourne9064 4 місяці тому

    Hey controlaltai could you make an updated animation video on the best way to make an ai animation, theres better tons of softwares and workflows now!

    • @controlaltai
      @controlaltai  4 місяці тому

      I don’t think I have made any animation video like animate diff, except stable video diffusion on the channel. Only made two video about svd.

  • @eveekiviblog7361
    @eveekiviblog7361 5 місяців тому

    Will it look great also with Nf4 +Lora?

    • @controlaltai
      @controlaltai  5 місяців тому +1

      No nf4 is deprecated for GGUF models as per the comfy dev. Obviously quality won't be the same. I have to test GGUF properly before I can answer your question.

  • @fatosmindset
    @fatosmindset 4 місяці тому

    Is it possible to run this entire project with 16GB of RAM?

    • @controlaltai
      @controlaltai  4 місяці тому +1

      With GGUF model you can, its not the size of the project but rather the size of the flux model. IN part 2 of the video, I will show how to run this with lower than 12 gb VRAM. If you disable system smart memory, it should run in 16gb vram as it is, provided your GPU is fully free of any other task.

    • @fatosmindset
      @fatosmindset 4 місяці тому

      @@controlaltai Very interesting, because I got to the step where the first image was generated with DualClipLoader, but in my process it dies due to excess memory.
      My configuration is like this at the moment without any modifications:
      Total VRAM 12288 MB, total RAM 5934 MB
      pytorch version: 2.4.1+cu121
      Set vram state to: NORMAL_VRAM
      Device: cuda:0 NVIDIA GeForce RTX 3060 : cudaMallocAsync

    • @gseth8950
      @gseth8950 4 місяці тому +1

      @@fatosmindset Ohh you have a 12GB VRAM, I though you said 16gb. 12GB is too low for fp8 flux. You have to use a GGUF model of flux which works for 12GB or lower VRAM. You can search about which GGUF model is compatible, as the next flux video is still 1-2 weeks away.

  • @Kvision25th
    @Kvision25th 5 місяців тому

    that workflow lol.... my brain is not ready for that...

  • @danielnico3950
    @danielnico3950 5 місяців тому

    how to add ControlAltAI Nodes to comfyui?

    • @controlaltai
      @controlaltai  5 місяців тому

      Like any other custom nodes with Comfy Manager

  • @kukipett
    @kukipett 5 місяців тому +1

    I tried the workflow and it works well except that the image description is extremely censored. Well it's ok if you intend to show a cat or a woman in everyday clothes but the moment you have to describe something nomal but with less textile the Ollama totally ignores it. I had a picture of a barabrian woman riding a horse dressed with a long leather skirt and a leather top but the AI insisted to tell me that she was wearing a full armor and when i tried to modify the prompt to say that she is wearing a leather bikini top, the AI just said yes but insisted to add a leather jacket!! That was funny and ridiculous, well i guess that this Ai has been designed by a group of Amish and Mormons under the guidace of a very concerned preacher!

    • @controlaltai
      @controlaltai  5 місяців тому +2

      Yeah I get what you are saying. We should test on another LLM model that works. There are other vision models, I have not tested them. You can also use Gemini, but that is so accurate however its more censored. It does not read people, no matter what clothes. Plus you have to pay so much for the API. Chat GPT 4o, works best, but API is too costly, if you upload it website and put an instruction asking everything in the vision node, it gives the best results and is not censored at all. Again not an Ideal Solution. Also you can replace the Ollama nodes with any other Custom LLM node, if there are some models not in Ollama.
      Also, this was using the 7b model. They have a 13b and a 34 billion model. Maybe they are way better, technically they should be. Not sure if it works with flux. Another thing is to overcome this, enable the image2 image setting, and set the max shift on option 2 or 3. You should get something similar to your original.

    • @kukipett
      @kukipett 4 місяці тому

      @@controlaltai Well after many tests with that LLM, i can just say that it is a scam, that AI is absolute crap. Oh it's great at talking and talking using a great advanced language about.. nothing. Just banalities and supposed intellectual thoughts made to impress the average Joe, the kind of conversation you can hear at some parties by people who don't understand what they are saying but are proud to lecture you with their knowledge they have found on internet.
      And worse this AI is a blatant liar, so many time it just added things that don't exist in the image or ignores the most obvious. I'm starting to think that it just gets the average feeling of the picture and then just tells you anything, hoping it would maybe match the reality.
      I was using Florence before and it gives at least a very good description, very accurate, shorter but containing 10 times more usefull informations.
      Well if you want to impress some idiots, just use that LLM to write them some message, they will probably not get it and get bored and will cautiously avoid you, so in fact it's a great idea! 😆

    • @RetroFutureArts
      @RetroFutureArts 4 місяці тому

      @@controlaltai If it helps, there is one model known for being uncensored: Mistral. Apparently it is considered underrated but I haven't tested so no direct knowledge.

  • @eveekiviblog7361
    @eveekiviblog7361 5 місяців тому

    Can you leave an email for consultancy?

    • @controlaltai
      @controlaltai  5 місяців тому +1

      mail @ controlaltai . com (without spaces)

    • @eveekiviblog7361
      @eveekiviblog7361 5 місяців тому

      @@controlaltai thanks, btw if you add canny with depth will not it distract the process? I am thinking about union cn...

    • @controlaltai
      @controlaltai  5 місяців тому +1

      It's not stable I am getting depth control somewhat without control net. Depending on the end goal, once union is stable, would implement in this workflow but if you are getting your desired output with only flux, no need for control net. As the control net requires extra vram as well, there is enough for stage one, as I ensured that upscaling is based without conditioning. So we can unload cc after flux core output.

  • @divye.ruhela
    @divye.ruhela 5 місяців тому

    Before I even begin with this video, can someone tell me whether the workflow would work with a 12GB VRAM card? I fast-forwarded, saw LLMs, SAM2 with Flux, etc. and stopped. 😂

    • @controlaltai
      @controlaltai  5 місяців тому +1

      LLM and SAM load and offload, and its optional, as by default its disabled, unless you want to inpaint, even after that it loads and unloads. Workflow is designed to run with logical processing. Its not a matter if the workflow runs or not, its whether, raw Flux w/t5xxl runs on your System. Flux 1, Dev FP8 is a 17GB model with t5xxl, so 12gb won't cut it. You need to use GUFF, that can be integrated in the same workflow, however, it's for another video, this video is very long as it is.

    • @eveekiviblog7361
      @eveekiviblog7361 5 місяців тому

      @@controlaltai how this performs with gguf Q8 or Nf4?

    • @divye.ruhela
      @divye.ruhela 5 місяців тому

      @@controlaltai Yea, so my assumption was right that it may not work. People are using Swarm UI and Flux even on 12GB machines, so when a ComfyUI workflow comes out like that, you begin to wonder. Glad I asked before diving in, shall stay tuned for your other video.

    • @controlaltai
      @controlaltai  5 місяців тому +1

      Not tested with GUFF, will do some testing this week.

    • @divye.ruhela
      @divye.ruhela 4 місяці тому

      ​@@controlaltai Made the entire workflow and ran the test. GGUF models are working fine with it, though they are slower. Also, dunno why people say one won't be able to generate using the FP8 model/ Q8_O model on a 12GB card, they worked fine on mine. *PS: Dunno how comfy does it, but I just used the full 23GB size Flux Dev model with it for the Sam2 x Florence 2 inpaint and was able to get the results in like 96secs. I was able to run it on my card using Swarm UI anyway, so some kinda resource management thingy does exist.

  • @synthoelectro
    @synthoelectro 5 місяців тому

    looks like a motherboard