instaling xfomrers was the big one for me. I have a 3070ti and went from 22 mins for a single 1920 x 1080 image to 50 seconds. it was pretty fast as it was at 512 x 512 but for some reason 1080 images slowed it to a crawl. Thank you!
what if I have a 2060sup and a 4060, only 4060 is being used. i would like to be able to set up my 2060sup to run my auto1111, then if i feel like it I can also run a comfyui separately
Hi there, the documentation does not mention any compatibiltiy issues. In fact, using the "--opt-split-attention" option may enhance the performance of the "--xformers" command line argument. Could you provide me with a source as I'm very interested in optimizing performance?
@@Archive-pg2zn --xformers --opt-split-attention-v1 --opt-sub-quad-attention I think those 3 can't work together as the code is set up with conditions making only one of them actually be applied.
Works way better now. Yet my RTX 3080 still does not run on 100% it feels like. Not that that means anything but the GPU constantly stays at 60 degrees with barely any cooling on yet it moves between 80% and 100% in the 3D usage in the task manager. I adapted everything in the global settings of the Nvidea Control Panel yet it seems like there is still a lot to go. Comparing it to other benchmarks confirms this. Is there anything more you can advise on?
Hello, I'm using windows surface 4 with AMD Ryzen 5 GPU (?), but my CPU is running at 100%, my memory (RAM?) is running at 90%, and images are taking 20 minutes to load (an hour to lad one image was the worst time). I need to know what tutorial would best suit my computer's needs? (I'm a complete n00b to this, BTW)
Can someone please help its using 100% of my cpu and giving me super slow performance and not using my gpu at all btw my specs are 7600 for cpu and 6800xt for gpu as well as 32gb of ddr5 ram
Hello there! This is most likely a power delivery issue. Remove these flags if you have set them as command-line arguments. "--opt-split-attention-v1 --opt-sub-quad-attention". See if your computer crashes. Be sure you have updated your NVIDIA Drivers for your specific GPU. Try installing Automatic1111 on WSL2. It is very easy and improves performance in some cases.Tutorial: ua-cam.com/video/sfQvP5VGxKI/v-deo.html Use a program like CPU-Z to see the power draw and usage of your GPU and CPU. See if your computer crashes while it is at 99% or 100% utilization. It may be due to your CPU not being cooled properly, causing it to thermal shut down. Although this is unlikely to be the case. There is a GitHub issue that specifically addresses the problem you mentioned: github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/11598
Switching between checkpoints also takes a bit longer for me cause I'm using a hard drive. The first image generation/itteration is usually slower when you load your checkpoint cause it ned to be loaded into vram. Thats what I heard people saying
@@SpookySpiralz You don't need to have the models/checkpoints on SSD, nor A1111's installation. I have everything on a 7200 HDD. Switching between them takes 2-3min depending on their GB size.
KEK! the bloat and spyware of windows is no match for your performance get a custom iso if you need windows but don't go messing with all spy and bloatware
Hi, tried out new args and removed --medvram, on an rtx3060ti and get memory error when trying to 2x upscale in img2img from 512x768 to 1024x1536. what's the deal? Args are --xformers --opt-channelslast --upcast-sampling --opt-split-attention --no-half-vae --medvram --vae-path "models\VAE\vae-ft-mse-840000-ema-pruned.safetensors"
Hi there, it might be worth trying a different approach. Instead of specifying the VAE via the commandline arguments, you could try using the UI. If the VAE is in the VAE folder (it appears that this is already the case on your computer) go to "Settings" > "Stable Diffusion" > "SD VAE". Choose the VAE from the drop down menu and click on "Apply Settings". Additionally, try using the following command line arguments: "--xformers --upcast-sampling --opt-sub-quad-attention --opt-channelslast --medvram --no-half-vae". Hopefully, this will help you avoid the memory error. Note that upscaling the generated image requires a lot of memory. I suggest to generate all your images without the upscaling feature. After the generation restart Automatic1111 with the "--lowvram" argument and batch upscale all your images. You can let this run while you're away. Let me know if it works!
use xformers only..do not use any other args..especially do not use medvram if possible-my 3070ti is using 85 percent load,,,was at 100 percent,6.9gb/8gb..fast and efficient...32 768x768 images..hires fix,,,takes less than 3 minutes,,,16 images at 1.8 seconds
such a shame ... you are making a tutorial on windows bro ! the title stipulate a general tuto compatible for server side, windows is such a trash in term of performance ! :P
Xformers helps it go faster for sure, but I keep running into an error when I use image2image with 768 by 512 saying, "modules.devices.NansException: A tensor with all NaNs was produced in VAE. This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check." Whenever I use text2img its works fine. Im using a rtx 2070 super btw. Thanks for the informative video
Nevermind I just added --no-half-vae with the xformers might've fixed it for now. I also noticed that using different command line arguments slightly changes the image even if its using the same seed. not necessarily a bad thing, just something i noticed
@@Statvar Great job on finding the solution to the error by adding the "--no-half-vae" command line argument. You're absolutly right that changing the command line arguments can slightly change the image. With "--xformers" you trade a tad bit of percision for generation speed. Although most of the times the difference is negligible. Keep up the good work!
Hi, firs of all thanks for this video ! I HAVE A PROBLEM and need your help pls if i use the user.bat with lowvram everything works properly but very slow, if i use it witout lowvram i get this error: "stable diffusion error PYTORCH_CUDA_ALLOC_CONF" my system info is: cpu: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel system: Windows release: Windows-10-10.0.22621-SP0 python: 3.10.6 device: NVIDIA GeForce GT 1030 (1) (compute_37) (6, 1) cuda: 11.8 cudnn: 8700device:NVIDIA GeForce GT 1030 (1) (compute_37) (6, 1) cuda:11.8 cudnn:8700 2GB ram: free:8.42 used:7.39 total:15.82 gpu: free:0.99 used:1.01 total:2.0 gpu-active: current:0.01 peak:1.07 gpu-allocated: current:0.01 peak:1.07 gpu-reserved: current:0.02 peak:1.69 gpu-inactive: current:0.02 peak:0.18 events: retries:0 oom:0 utilization: 0 can i do something to work SD faster?
Hi! I'm glad you found my video helpful. One solution you could try is to use these flags: "--medvram flag --xformers --opt-split-attention". Try lowering the resolution at which you're rendering pictures. Don't use any upscalers, because they allocate extra VRAM. Run the benchmark to see which flags work the best. If you're still experiencing slow performance, it might be worth considering upgrading your graphics card to one with at least 4 GB of VRAM. The GT 1030 only has 2 GB of VRAM, which is lower than the recommended VRAM of 4 GB for Stable Diffusion. You may use Google Colab. Their free tier has high performance graphics cards. The GTX 970 and 960 4 GB Edition should offer solid performance too. I hope this helps!
@@pinielka I have made a typo. This "--medvram flag --xformers --opt-split-attention" should actually be "--medvram --xformers --opt-split-attention", without the "flag". If these falgs result in a "cuda alloc" error, your best bet is to use your original flags "--lowvram --xformers --opt-split-attention". If you really want to speed up the generation process consider upgrading your graphics card to one with at least 4 GB of VRAM.
I love that Arnold Schwarzanegger is giving AI tips.
Thanks man! This dramatically increased my performance!
That's great to hear!
instaling xfomrers was the big one for me. I have a 3070ti and went from 22 mins for a single 1920 x 1080 image to 50 seconds. it was pretty fast as it was at 512 x 512 but for some reason 1080 images slowed it to a crawl. Thank you!
If I remove --medvram, it will slow down drastically on generating images (it goes from 1 min to 20-28 min). What can be the issue?
thanks for telling me --medvram reduces speed.
Isn't it sufficient to right click on the bat file and choose modify to open it in notepad?
for editing .bat files, can't you just shift right click and then open with?
or click show more options and then open with
I have forge ui with a rtx 3060 12 GB and under settings in stability matrix i have cuda malloc and cuda stream, should i activate both?
wow, thank you, from waiting 60 seconds for 512x512 on Pony models now it takes 10second. RTX 3070ti
Das problem es bracht extrem lange obwohl ich nur 2 propmts eingegeben hab. Wenn ich mehrere prompts eingebe funktioniert es garnicht erst
what if I have a 2060sup and a 4060, only 4060 is being used. i would like to be able to set up my 2060sup to run my auto1111, then if i feel like it I can also run a comfyui separately
Don't the arguments '--opt-split-attention' and '--xformers' conflict with each other? You can only choose either.
Hi there, the documentation does not mention any compatibiltiy issues. In fact, using the "--opt-split-attention" option may enhance the performance of the "--xformers" command line argument. Could you provide me with a source as I'm very interested in optimizing performance?
You confused "--opt-sdp-attention" with "--opt-split-attention'"
@@Archive-pg2zn
--xformers
--opt-split-attention-v1
--opt-sub-quad-attention
I think those 3 can't work together as the code is set up with conditions making only one of them actually be applied.
@@Albedowo check out the codes, you'll find them incompatible with each other
My laptop freezes everytime i hit generate 😢
Please help
Works way better now. Yet my RTX 3080 still does not run on 100% it feels like. Not that that means anything but the GPU constantly stays at 60 degrees with barely any cooling on yet it moves between 80% and 100% in the 3D usage in the task manager. I adapted everything in the global settings of the Nvidea Control Panel yet it seems like there is still a lot to go. Comparing it to other benchmarks confirms this. Is there anything more you can advise on?
is there a system info extention for comfyui??
I'm really struggling to find a good setting for RX 6600, do you have an idea where I could find the best commandline?
just drop it, radeon cards arent made for ai stuff
Use stable difusion ZLUDA for better performance on AMD
Thank you! My images were being made in 3 minutes, it went to 20 seconds.
hi, im using rtx2060super, my problem is SD using lot of my ram not gpu..how to make it using the GPU btw Thanks
Hello, I'm using windows surface 4 with AMD Ryzen 5 GPU (?), but my CPU is running at 100%, my memory (RAM?) is running at 90%, and images are taking 20 minutes to load (an hour to lad one image was the worst time). I need to know what tutorial would best suit my computer's needs? (I'm a complete n00b to this, BTW)
are you still having issues?
How do I restart automatic1111 on windows? What's the command?
Thank you for the video!
On console CTRL+C
Can someone please help its using 100% of my cpu and giving me super slow performance and not using my gpu at all btw my specs are 7600 for cpu and 6800xt for gpu as well as 32gb of ddr5 ram
My PC keeps shutting down after running Stable Diffusion for a period. I'm using Windows with a NVIDIA card. Any suggestions?
Hello there! This is most likely a power delivery issue. Remove these flags if you have set them as command-line arguments. "--opt-split-attention-v1 --opt-sub-quad-attention". See if your computer crashes. Be sure you have updated your NVIDIA Drivers for your specific GPU. Try installing Automatic1111 on WSL2. It is very easy and improves performance in some cases.Tutorial: ua-cam.com/video/sfQvP5VGxKI/v-deo.html Use a program like CPU-Z to see the power draw and usage of your GPU and CPU. See if your computer crashes while it is at 99% or 100% utilization. It may be due to your CPU not being cooled properly, causing it to thermal shut down. Although this is unlikely to be the case. There is a GitHub issue that specifically addresses the problem you mentioned: github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/11598
DO you recommend xformers and other settings on a RTX 3090?
xformers IS recommended to use on a RTX 3090.
thx it goes really faster now
That's great! Glad it helped!
My performence is good when generating pictures but when I switch between checkpoints it takes 5-10 minutes. Do you know how to make it faster?
use ssd
Switching between checkpoints also takes a bit longer for me cause I'm using a hard drive. The first image generation/itteration is usually slower when you load your checkpoint cause it ned to be loaded into vram. Thats what I heard people saying
@@SpookySpiralz You don't need to have the models/checkpoints on SSD, nor A1111's installation. I have everything on a 7200 HDD. Switching between them takes 2-3min depending on their GB size.
@@SupremacyGamesYT Yes, but on an SSD my switches are about 15 seconds
Excellent video! Thanks for making this!
I appreciate it, you too.
you showed it all very simple and clear and i still dont get it, im so stupid man...
No worries, what doesn't work?
You are a Wizard. Thanks for making the videos so easy to follow so dum dums like me can make higher res pictures :)
Thank you too!
thank yo bro for all you do. I have gtx 1660 super but extremely low performance can you help me if I share my PC on discort or anydesk ?
Hi, thank you! Try installing Stable Diffusion this way: ua-cam.com/video/sfQvP5VGxKI/v-deo.html Be sure to install the latest Nvidia Drivers.
@@Archive-pg2zn already i have newst nvdia bro :(
BRO YOU NEED LEARN US ABOUT ERRORS !! MEMORY ERRORS ETC
Thank you!!!
KEK! the bloat and spyware of windows is no match for your performance get a custom iso if you need windows but don't go messing with all spy and bloatware
Hi, tried out new args and removed --medvram, on an rtx3060ti and get memory error when trying to 2x upscale in img2img from 512x768 to 1024x1536. what's the deal?
Args are --xformers --opt-channelslast --upcast-sampling --opt-split-attention --no-half-vae --medvram --vae-path "models\VAE\vae-ft-mse-840000-ema-pruned.safetensors"
Hi there, it might be worth trying a different approach. Instead of specifying the VAE via the commandline arguments, you could try using the UI. If the VAE is in the VAE folder (it appears that this is already the case on your computer) go to "Settings" > "Stable Diffusion" > "SD VAE". Choose the VAE from the drop down menu and click on "Apply Settings". Additionally, try using the following command line arguments: "--xformers --upcast-sampling --opt-sub-quad-attention --opt-channelslast --medvram --no-half-vae". Hopefully, this will help you avoid the memory error. Note that upscaling the generated image requires a lot of memory. I suggest to generate all your images without the upscaling feature. After the generation restart Automatic1111 with the "--lowvram" argument and batch upscale all your images. You can let this run while you're away. Let me know if it works!
use xformers only..do not use any other args..especially do not use medvram if possible-my 3070ti is using 85 percent load,,,was at 100 percent,6.9gb/8gb..fast and efficient...32 768x768 images..hires fix,,,takes less than 3 minutes,,,16 images at 1.8 seconds
TY
TYVM
Thanks Man :)
Thank you!
why on the Earth you need to rename bat file to txt? What a lamer...
🔥
Did as you showed, no any improvements at all )))
such a shame ... you are making a tutorial on windows bro ! the title stipulate a general tuto compatible for server side, windows is such a trash in term of performance ! :P
Xformers helps it go faster for sure, but I keep running into an error when I use image2image with 768 by 512 saying, "modules.devices.NansException: A tensor with all NaNs was produced in VAE. This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check."
Whenever I use text2img its works fine. Im using a rtx 2070 super btw. Thanks for the informative video
Nevermind I just added --no-half-vae with the xformers might've fixed it for now. I also noticed that using different command line arguments slightly changes the image even if its using the same seed. not necessarily a bad thing, just something i noticed
@@Statvar Great job on finding the solution to the error by adding the "--no-half-vae" command line argument. You're absolutly right that changing the command line arguments can slightly change the image. With "--xformers" you trade a tad bit of percision for generation speed. Although most of the times the difference is negligible. Keep up the good work!
Hi, firs of all thanks for this video ! I HAVE A PROBLEM and need your help pls
if i use the user.bat with lowvram everything works properly but very slow, if i use it witout lowvram i get this error: "stable diffusion error PYTORCH_CUDA_ALLOC_CONF"
my system info is: cpu: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
system: Windows
release: Windows-10-10.0.22621-SP0
python: 3.10.6
device: NVIDIA GeForce GT 1030 (1) (compute_37) (6, 1)
cuda: 11.8
cudnn: 8700device:NVIDIA GeForce GT 1030 (1) (compute_37) (6, 1) cuda:11.8 cudnn:8700 2GB
ram: free:8.42 used:7.39 total:15.82
gpu: free:0.99 used:1.01 total:2.0
gpu-active: current:0.01 peak:1.07
gpu-allocated: current:0.01 peak:1.07
gpu-reserved: current:0.02 peak:1.69
gpu-inactive: current:0.02 peak:0.18
events: retries:0 oom:0
utilization: 0
can i do something to work SD faster?
Hi! I'm glad you found my video helpful. One solution you could try is to use these flags: "--medvram flag --xformers --opt-split-attention". Try lowering the resolution at which you're rendering pictures. Don't use any upscalers, because they allocate extra VRAM. Run the benchmark to see which flags work the best. If you're still experiencing slow performance, it might be worth considering upgrading your graphics card to one with at least 4 GB of VRAM. The GT 1030 only has 2 GB of VRAM, which is lower than the recommended VRAM of 4 GB for Stable Diffusion. You may use Google Colab. Their free tier has high performance graphics cards. The GTX 970 and 960 4 GB Edition should offer solid performance too. I hope this helps!
@@Archive-pg2zn thank u so much for all information! 👍👍I will check it out and reply
@@Archive-pg2zn for this flags: "--medvram flag --xformers --opt-split-attention" it gives me en error: flag
@@pinielka I have made a typo. This "--medvram flag --xformers --opt-split-attention" should actually be "--medvram --xformers --opt-split-attention", without the "flag". If these falgs result in a "cuda alloc" error, your best bet is to use your original flags "--lowvram --xformers --opt-split-attention". If you really want to speed up the generation process consider upgrading your graphics card to one with at least 4 GB of VRAM.
@@Archive-pg2zn thanks for your reply! I will check it out and let you know
Did everything in the video and image generation still takes a long time it maybe even slower because I took out medvram