i have a RTX 4090 and damn, SDXL is fast 5 sec generation for 1 image, instead of like 3 minute with my old one aha Speed was my main problem with SDXL and thats why i never used it before But, now, that speed isn't the problem anymore, there is an other one that makes me use SD 1.5 That's true that the realism is better in SDXL, even if, with complexe prompt, and good checkpoint, you can do super realistic things with SD too But, the lack of checkpoints, the lack of lora's, is the worst part Also, it seem like that SDXL lora's are harder to combine and give more artefact than SD, i don't know if that's only me. Anyway, for character i use faceswaplab combined with Lora, it work incredibly good, because using lora only doesn't give the realism i want to. But using both of them, is incredible, the body fit, so the face fit too.
SDXL training usually gives slighly more realisic results than SD, but SD results are smoother and contain less particles or grains, as you mentioned, biggest issue is speed, in SD one must combine different techniques as you mentioned to achieve good results, there is no single solution for anything thus far thus having a work flow is a must to achieve quality results.
Mac settings are different, very very few people were able to even launch Koyha so far. Would You by any chance know what parameters on koyha running on a mac would work ?
unfortunately no, but the principles in this video remain true, regardless of the changes in environment, or some parameters (how captioning works, how the SD trains, and the approach of image selection, and so on)
have not done any recent trainining, and usually the algorithm is fixed so learning rate changes are unlikely to been used differently, not sure, anyway, if you see things overfit quickly then using smaller number of steps could be better, besides, learning rate smaller than 0.0001 doesnt make much sense i think, so we usually consider increasing it not decreasing it to learn faster for instance...not sure if any recent changes in Kohya have made things different.
how can I contact you? I want to train a lora with images of a myself/a person to create images with the epicrealism model. basically profile pictures. been running and testing on a runpod installation but not getting good results :(
Awesome content! It would also be cool if you leave the specs of your PC in the description so we know what you're working with and can kinda guess what to expect with our own PCs.
Thanks, will do so, will paste it here and in the description too: Adapter Type NVIDIA GeForce RTX 3070 Laptop GPU, NVIDIA compatible, 8GB VRAM Physical Memory (RAM) 16.0 GB Processor AMD Ryzen 7 5800H 3201 Mhz, 8 Core(s), 16 Logical Processor(s) SDXL training was performed online on Runpod because it didnt work on my laptop, using 24GB RTX 3090GPU
Great video, thanks a lot!! How long took the training with SD 1.5? I don´t recall hearing you say it. And the SDXL model in Runpod? How much time? And it was easy to config? I haven´t used it.
You are welcome, on Run Pod, it took around 50 minutes to achieve stable results that used RTX 3090. Runpod comes with preinstalled Stable diffusion usually, one must choose a suitable pod for that. last year i made this video ua-cam.com/video/arx6xZLGNCA/v-deo.html for Runpod, it is possible that changes happened since then too, easier that is, with models pre installed. on my Laptop RTX 3070 with similar image sizes I think that took more than 3-4 hours if I am not mistaken for 1024 images sizes. this depends largely on how many images are used, and the image sizes.
i don't really understand for the "undesired tags". if my face got a mole and i always want the mole to be there, do i put the word 'mole' there? what's about a scar on the face?
definitely, dont include mole in the captions, if that is a trait of the face...any train in the character that you want absorbed by the LoRA should not be included in the captions ... if you include them, you will also have to include them in the prompting, and this makes prompting more complex and less effective ... remove 1girl,mole,scar,lips,nose,eye color,hair color,realistic, and all things that related to your character and repeat across all the images, and just caption everything else.
if you images are always looking at the viewer, then dont include that in the caption... if your character is always wearing the same dress and you want that dress to be part of the LoRA then dont describe the dress either....anything that repeats in all the images and you want to be part of Your LoRA dont caption it.
@@AI-HowToThanks for the reply, the problem starts with the fact that most of the photos the character does look at the viewer, but only looks away in 3-4 photos. Should I include "looking away" in those 3-4 photos and nothing on the subject (looking at viewer) in the rest?
yes its possible, dont include looking at viewer since it is the default, and only in those write "looking away", it is not 100% exact science, but I think this is the most logical approach.
nope Reg images are not required, becuase SD already knows the concept of a person....I prefer to train with them and without them and see which model is better, because it is purely experimental process.... my best results in private LoRAs were produced with Regularized LoRAs, they became more rich and able to adapt to prompts more and produce better results
I dont think so, it is just that LoRAs are more practical and achieve good level of accuracy, faster to train tooo and simpler, and we can generate as many as we can and reuse them efficienty for objects/people.
CalledProcessError: Command '['C:\\Users\\grome\\stable-diffusion-webui\\cd kohya_ss\\venv\\Scripts\\python.exe', the script activates well from the folder. I can`t locate the command line within Kohya trying to change the command path. What am I doing wrong?
not sure....but if you are trying to run the script directly using the command line using something like (accelerate ...params...) then first you must activate from inside scripts folder... then go up two steps using cd.. , cd.. and run the script from there so the scrip is run from Kohya folder not from script folder .
It happens when I run Kohya trying to make a Lora. I get this after it has classified all the images, together with low memory error, even though I have 3070. Would running the script before starting to make Lora fix it? Or is there anywhere withing Kohya to point toward exact script location to avoid error?
not sure, trying setting precision to fp16 and save precision to fp16, use xformers, gradient checking if necessary and try again *3070 should work with bf16* but not sure, possibly there is something missing in your installation for instance. usually we only run a script which was made by someone else or to avoid using the GUI, but I always used the GUI, I also have 3070, it works without memory problems or errors for SD 1.5, try setting the parameters yourself in the GUI and avoid using scripts when possible .... also check your images in case some of them have large resolutions which may cause errors too.... also try with 1 image till you figure out the source of the error ...
How do you go about training a model that works with your own pictures? Do I just start taking selfies daily for couple days and then adjust the w/h to 1024? I keep getting bad details around the eye area. Any way to improve this?
same as here, better if someone took the photos, since selfies are not so good, the angel of view is not good ... just let someone take the pictures from different angels, with good room natural lighting condition, and some full body shots and train normally. I also use Photoshop to smooth the images slightly, for example if there are wrinkles in the face or some features that I dont want to appear in my LoRA, Photoshop can hide them, I use Photoshop neaural filters for this purpose which is parts of photonshop 2021 and above, this helps improve quality of the LoRA greatly. use a good regularization set on a good checkpoint if you want pretty images such as Photon or Majicmix checkpoints and follow the same steps here.
Errors usually happen when your GPU dont have enought VRAM as mentioned in the Video, best we can do is turn on Gradient checking, xformers, and if VRAM is 12GB only, then use Ada Adapter optimizer ... I used Run pod for training this SDXL LoRA because my VRAM was only 8GB which is not sufficient for Kohya SDXL training.
Thank you for this useful comparison and tutorial. From your experiments, which LoRA do you prefer (1.5 or XL) and in your view, what are the key advantages from your preference ?
thus far SD 1.5 looks smoother, faster, and good enough for characters and produces wonderful results at resolution 1024x1024 with checkpoints such as photon/majicmix, while SDXL is more realistic such as the skin details/moles/wrinkles, but a lot slower unfortunately, for now, I continue using SD 1.5 because of my GPU, in the future, once i get a better GPU, and there exists better community checkpoints, i will use SDXL ...SDXL on the other hand is better with anatomy, that is: body, hands, fitting character on a bike/horse, and understands objects better, so for complex scene prompting, SDXL is superior.
nope, currently working in another field , wont finish before June 2024, so i doubt i will be posting videos about training, and possible about this subject in general, not sure yet... The training process is a procedure, so following the principles mentioned in this video is all that a person really needs, in general the remaining is just experimental, and the same parameters that works for some dataset may not work 100% optimally for another data set, but the principle remains the same.... I expect soon there will be better, faster, more accurate training methods, even better than Dreambooth that will give people better results.
sure :) , I did this model for educational purposes only, I have no use for it, but if I further developed it later, i will add glasses, and improve full body images to get higher resolution ones.
Is it important to remove the incorrect anatomy images from the regularisations images? In other words, how to prevent the disfigured hands fingers, eyes not aligning, etc.?
yes, it is better to have good class images, it will improve the quality of the images based on what i have tested recently, it can also improve the color of your character... for instance, if your class images have doll like skin/faces, it makes your character smoother and look better.... avoid having any kind of deformation in the class set, while its effect will be limited, if it has 1% effect you should remove it...Kohya ss is definately learning from class set also based on what I have seen from my recent tests.
Can you use img2img with a image of a clothing to make your Lora Person wear it or you would need to make lora of that peace of clothing too? Tnx for the video great stuff
this is very difficult to do directly even with controlnet (reference+openpose or other control net combinations and masking), because stable diffusion doesn't understand geometry, accurate dressing is done using blender 3D software, in stable diffusion you get something approximate, unless you trained your clothes inside a LoRA for instance as in this video ua-cam.com/video/wJX4bBtDr9Y/v-deo.html , even then, SD might miss some accurate details .... img2img in general is used to change image style and do inpainting that is: changing part of the image.
I read your answer from the previous video and I'm doing it that way, I always put the trigger word in the prompt... but I'm giving up on these regularization images, I'm training a character right now, it's already on the 4th epoch and the samples show an image that has 0 % of what I'm training... I have 20 images being processed with different sizes but of good quality and with 100 steps, I'm using 900 random 512x512 images of girls (even my character) generated in the same checkpoint.. what's wrong? I've seen videos saying that it needs to be 100 regularization images per trained image, I've seen videos saying that it uses 10... I'm confused
@@fi5h81 I'm totally lost, I think the solution will be to give up these images and focus on training without them.. the training I mentioned is still being processed, it was estimated for 5 hours... it's been 5 epochs and not even 0.1% of the character has appeared yet... when I train without these regularization images it takes less than half the time and I can see results
you should do what works for you ...training can work really well even without regularization and faster... there is no requirement for 100 reg images per 1 training images... all these are made up by the trainers ... there are no rules like 100 reg images///3000 steps.... all are made up ...training is entirely experimental... I suggest you reduce the 100 repeats down to 20 and increase number of epochs ... I usually never use more than 20 repeats per image and often 10 when i have more images and it works really well with reg and without...... as mentioned in the video Kohya will use (repeats*image count reg images only) so if you use for instance 100 repeats, and have 20 images then you need 2000 reg images otherwise your class images will be repeated since Kohya uses 2 factor regularization, so if it didnt find the extra reg images it will repeat them and increase the effect of the reg set over your character and takes longer to train....I also strongly recommend using real world regularization ... but once again, regularization is not necessary for character LoRA, it only has positive effect in flexibility and can make the LoRA more rich for inpainting.
@@AI-HowTo thank you very much you are saving me and thank you for your patience with a beginner... i am not comfortable just asking you questions, i am also looking and reading articles on the internet about it.. i will do the tests you recommended tomorrow... i am on that moment testing 2 epochs of a lora i just created... i'll like it this time just a suggestion... what better way to test a lora? The one I'm testing at certain checkpoints is a complete mess, at others it works fine, at some I need to change a lot so it doesn't get blurred... usually at weight 1 it never works, below 7 the character is already mischaracterized, I'm testing just between 7, 8 and 9...
totally fine, we all are beginners in this area, and learning, you should ask when you have a question, even if no one replied, you lose nothing, that is how we all learn...Character LoRAs often are delicate and dont work on other check points properly, they work best on the same check point it was trained on.... Character LoRA that is not overfitted should work perfectly at weight 1 ... anything less, the character will lose it's features gradually ... but when you want to create full body shots, we may use 0.8 or lower then automatically repaint the face with after detailer extension ... if you are not getting good results at weight 1 then most likely you are training on large number of repeats such as 50+, this is why using smaller number of repeats is necessary such as 20 or less... use smaller repeats and you can increase number of epochs how many you like.
Subbed! Very good tutorial! I know this is an old video, but I had a few queries. Is it harder to create/ train a 'realistic character LoRA' if the original dataset contains AI generated images created on realistic checkpoints, instead of a real person's photos? I guess, what I mean to ask is, can a LoRA created using AI generated datasets achieve such realism? PS. Also, what would be the best checkpoint to create such an AI generated dataset? TIA!
yes as long as the training data set are of good quality and do not contain deformations in the eyes or fingers, even slightest deformation could be augmented after training if they repeated. as for the best check point, not sure unfortunately at the time being, previosuly for 1.5 i got the best results with majicmix v4, even for wester characters despite that the checkpoint was asian and for SDXL the Juggernaut XL , not sure now....I think in general the principles of training do not change overtime, so the video is still good to rely on for training.
I normaly use "easy lora training scripts" for lora training on 1.5, mainly because i have already tweaked my settings for the best results. can i still use that to train SDXL ?
I think you can do that partially! but not practical, load the script, then go back and check SDXL and fill up SDXL related parameters, images sizes, noise offset...etc....there are some SDXL scripts already in newer version of Kohya which populates lots of settings for you such as the noise offset 0.0357 amongst other optimization parameters such as the adafactor optimizer ... scripts are just alternate way of Json file storeage but included as part of the Kohya, so you can export it as Json too or save it for future use that fits your data.
Did you actually apply “ : 1girl,blonde hair,blue eyes,solo,realistic,looking at viewer” like in your video? Also, in order to make the normalized image as similar as possible to the data photo to be trained, shouldn't we prepare something similar to the image to be trained? I'm asking because I see things like AI images in your video. Please understand because my English is bad.
yes, : 1girl,blonde hair,blue eyes,solo,realistic,looking at viewer, any any feature that is specific to our character, it helps to absorb these features into the LoRA.... so we describe everything else that we want to change in the tags. the normalized set, must not be similar to our subject, must be different, but from the same class, so that we get small features from the reg set and make our LoRA richer... Reg set is also not a must, but i found it to give more flexible results.
@@AI-HowTo drive.google.com/drive/folders/1N139LyAgXfRN1hDrCjfkQixtroP5RQzR?usp=sharing Thank you so much for your reply. I followed everything exactly as you did, starting with the female model, but the results are so different.
@@AI-HowTo drive.google.com/drive/folders/1Rd2scCO-tBk_0_XZxGwZubOH09LIQscE?usp=sharing Take a look when you get a chance. I'm pretty sure I did the exact same thing as you, but it's so different.
I just checked them, Your only problem is the resizing of the images, You are resizing them in the wrong way, it is causing the images to stretch which causes distortion in the training process, the training seems to progress correctly based on how stretched these images are .... you need to maintain the aspect ratio when resizing or use a better resizing/cutting softwrae that is all..... also i suggest to remove "long hair" and smile from the captions, but that is another thing.
so based on your incorrect image resize, the output is logical..... also make sure all your images are of good quality, some of them are really not good quality..... aslo you need to have more reg images such as(20(images)x20(repeats)=400 images , that would be better.
nope, didn't try, I think having such pictures will increase likilhood of producing similar pictures in generations, they will work too of course since they are of person or woman class, however, I dont think the model will be richer or more flexible when it comes to fashion or clothing styles.
@@AI-HowTo how many photos do you need in different poses with different facial expressions so that the model is as flexible as possible and does not resist the prompt? and the same for regularization images, if there are 10,000 maximally diverse, will it be better? (I teach XL)
For regularization images: we only need (number of repeats X image count) regularization images, so if we have 30 image, and we are doing 20 repeats, KOHYA will only use 600 reg images and ignores the rest ..... as for how many images we need to train: most importantly is having high quality images, 1 image per pose /per expression/per angel of the face , should be enough, but I have seen that having more images is producing in general better results... so around 20 total images is suitable, 40 images maybe ideal to have more poses/expressions, and up to 100 for character and object is good if it helps capture the face/body/from all angels with different clothing and poses... .... for Style: 100-400 images is good...so there is no set rule for this.
yes, still, it will take a while to test and get filtered by the community to know which is better than which ... biggest problem thus far for SDXL wide adoption seems to be speed not quality, it is slower than SD 1.5 and requires more GPU to train in comparison.
if it is not working with you, then possibly Kohya version you are using needs you to explicitly define the caption extension to .txt in the parameters section under Caption Extension field ... it used to take .txt by default if i am not mistaken, now the default seems .caption, so if your captions are text files you should type .txt ... that might be the case.
sorry, never experienced this issue, because this should not happen, 3080ti is powerful almost equal to 3090 even if it has smaller VRAM, SDXL should run even with 12GB, so your training speed must match this video or be a little bit slower..... you can try to use a different learning algorithm such as Adaadapter which is better for 12GB instead of Adam8 see in this video ua-cam.com/video/RT2jj-5t8x8/v-deo.html ... make sure Gradient Check pointing option is on since your VRAM is less than 24 also if it stilld doesnt work use this option in ua-cam.com/video/RT2jj-5t8x8/v-deo.html - --network_train_unet_only and hopefully things work better, otherwise, there must be a driver issue in your system.
mine is RTX 3070, ok for training 1024x1024 SD 1.5 but not ok for SDXL unfortunately, we need 12GB at least for SDXL training...that's why i used runpod for making this video only... then generated the images locally on my 8GB graphics card for both SDXL and SD 1.5
You are welcome, turning gradient checkpointing on will allow you to run SDXL training on 12GB and with adafactor optimizer because it requires less GPU than adam8
if you are training and prompting on a realistic checkpoint, you should not get a cartonic person, i never got this kind of results with any LoRA before...double check on which check point you are doing the training, or producing the results.
SDXL uses different Unet structure, which requires more time to generated images/train ... also even in SD 1.5 if i Train on 1024 images we will 4 times slower than 512 images, however, on SD 1.5 training 1024 resolution images usually produces studio quality images so i dont have to do many repeats... for this model, results were great from first attempt because the data was 1024x1024
How does training work if say the male character you are using is always wearing some cultural head gear(native American chieftan, Mongolian skullcap, Viking helmet, Thai crown dress, Arab male shawl etc) Like say I got 50 images of an individual in which he's wearing the same thing on his head in all 50. What kind of classification images should I use. regular photos of men or do i curate a new set of classification photos of men all wearing the head garb. Some head accessories are easier for the ai models to understand without specific training(viking helmet) given how commonly they are used but others not so much. I had many issues with various ethnic head gear, most notably that of Vietnam, Thailand, Mongolia, Arabs.
we use normal regularization images of men, we can also, use me with various types of helmets too in the class images that are different than yours.... True, it is easier for SD to improve on something it has seen before than something new... most importantly, don't mention anything related to helmet in the captions of head gear or related to the dress if the dress repeats and you want it to be part of the final results, you dont even need to use man in the captions, just use triggerword absorbing all your character features... the class (in folder name) on the other hand is man.
thanks, I watched it a year ago) the answer to your request...usually when we use a large number of pictures such as hundreds, it becomes more of a style training ... there is how ever no max number, some say around 1000 is the max, but one can also train with around 10 pictures and get good results using a mix of techniques and lots of try/error .... this video ua-cam.com/video/vA2v2IugK6w/v-deo.html is a better example of character training with lots of useful notes .... for me, I got better results with 30+ pictures till around 80 pictures for people, it captures more changes and details , but even with around 10 that can be done too.... and new SD versions can train on smaller numbers..........I remember when I was coaching a year ago... on the 1.5 model, I did this by throwing folders of photos there were a lot in each folder))) with the names 100_body....80_face.....80_fullbody set Network Rank128 Network Alpha1 and lo and behold, the model was super obedient, flexible, super response and quality....it's been exactly a year, I've been waiting for people to start training sdxl, but there's still not much information, so I want to try to train now, I don't know where to start ...))) thank you for the answer!
sorry, I have none that I can share at the time being, because images must be checked for copy right issues before sharing publicly... you can just generate some using SD and collect some from Freepik or real pictures from google which is best approach then crop them manually... when used in your own computer, it doesnt matter if it has copy right or not, since the pictures are not used directly nor are memorized, so generated models will have no copy right issues.
Thank you, I will make it available, need to check if it has any copy righted content just incase because it has some real images as well... I will make it available on the same video or send you a link as soon as possible.
sorry for the LoRA training of Olivia I wont make the LoRA available for copyright issues and due to some complaints of the video i got ... for this video of Kathren civitai.com/models/126328/katheryn-winnick-xyzkwv1sdxl you can see here three links but all are regularized .... I did several models recently, I got better results with regularization, but in some cases without class images seems to work as well...personally, I always do regularization for models that I want to be better and more flexible, takes more time to train, but often when regularization set is used, results are often better.
Yeah, try to get a render from the side. Or from behind. "LoRA" is a patch, not a solution. And it only works if your subject is always taking a selfie.
if training data has side views, the results will be really good ...this one has limited dataset... for long shots, use of after detailer is the only solution for LoRAs, which like you said is more like a patch for a certain object/character ... LoRAs and even SDs have many shortcommings but there are many ways around them
This man watched Vikings for years, waited for nextgen AI, and got himself a YT channel then said my time has come.
:)
i have a RTX 4090 and damn, SDXL is fast
5 sec generation for 1 image, instead of like 3 minute with my old one aha
Speed was my main problem with SDXL and thats why i never used it before
But, now, that speed isn't the problem anymore, there is an other one that makes me use SD 1.5
That's true that the realism is better in SDXL, even if, with complexe prompt, and good checkpoint, you can do super realistic things with SD too
But, the lack of checkpoints, the lack of lora's, is the worst part
Also, it seem like that SDXL lora's are harder to combine and give more artefact than SD, i don't know if that's only me.
Anyway, for character i use faceswaplab combined with Lora, it work incredibly good, because using lora only doesn't give the realism i want to.
But using both of them, is incredible, the body fit, so the face fit too.
SDXL training usually gives slighly more realisic results than SD, but SD results are smoother and contain less particles or grains, as you mentioned, biggest issue is speed, in SD one must combine different techniques as you mentioned to achieve good results, there is no single solution for anything thus far thus having a work flow is a must to achieve quality results.
Mac settings are different, very very few people were able to even launch Koyha so far. Would You by any chance know what parameters on koyha running on a mac would work ?
unfortunately no, but the principles in this video remain true, regardless of the changes in environment, or some parameters (how captioning works, how the SD trains, and the approach of image selection, and so on)
Hi, the settings seem to have changed for adamw8bit at 0.0001. the model seems to overfit. have you noticed a change?
have not done any recent trainining, and usually the algorithm is fixed so learning rate changes are unlikely to been used differently, not sure, anyway, if you see things overfit quickly then using smaller number of steps could be better, besides, learning rate smaller than 0.0001 doesnt make much sense i think, so we usually consider increasing it not decreasing it to learn faster for instance...not sure if any recent changes in Kohya have made things different.
how can I contact you? I want to train a lora with images of a myself/a person to create images with the epicrealism model. basically profile pictures. been running and testing on a runpod installation but not getting good results :(
of course I would pay
sorry, not considering any work at the time being, thank you.
@@AI-HowTo damn! Ok :)
Awesome content! It would also be cool if you leave the specs of your PC in the description so we know what you're working with and can kinda guess what to expect with our own PCs.
Thanks, will do so, will paste it here and in the description too:
Adapter Type NVIDIA GeForce RTX 3070 Laptop GPU, NVIDIA compatible, 8GB VRAM
Physical Memory (RAM) 16.0 GB
Processor AMD Ryzen 7 5800H 3201 Mhz, 8 Core(s), 16 Logical Processor(s)
SDXL training was performed online on Runpod because it didnt work on my laptop, using 24GB RTX 3090GPU
Great video, thanks a lot!! How long took the training with SD 1.5? I don´t recall hearing you say it. And the SDXL model in Runpod? How much time? And it was easy to config? I haven´t used it.
You are welcome,
on Run Pod, it took around 50 minutes to achieve stable results
that used RTX 3090. Runpod comes with preinstalled Stable diffusion usually, one must choose a suitable pod for that.
last year i made this video ua-cam.com/video/arx6xZLGNCA/v-deo.html for Runpod, it is possible that changes happened since then too, easier that is, with models pre installed.
on my Laptop RTX 3070 with similar image sizes I think that took more than 3-4 hours if I am not mistaken for 1024 images sizes.
this depends largely on how many images are used, and the image sizes.
@@AI-HowTo Thank you SO much for your answer, I will see that other video. Take care!
i don't really understand for the "undesired tags". if my face got a mole and i always want the mole to be there, do i put the word 'mole' there? what's about a scar on the face?
definitely, dont include mole in the captions, if that is a trait of the face...any train in the character that you want absorbed by the LoRA should not be included in the captions ... if you include them, you will also have to include them in the prompting, and this makes prompting more complex and less effective ... remove 1girl,mole,scar,lips,nose,eye color,hair color,realistic, and all things that related to your character and repeat across all the images, and just caption everything else.
@@AI-HowTo what about tags like "standing" "looking away" "looking at viewer" and different items of clothing?
if you images are always looking at the viewer, then dont include that in the caption... if your character is always wearing the same dress and you want that dress to be part of the LoRA then dont describe the dress either....anything that repeats in all the images and you want to be part of Your LoRA dont caption it.
@@AI-HowToThanks for the reply, the problem starts with the fact that most of the photos the character does look at the viewer, but only looks away in 3-4 photos.
Should I include "looking away" in those 3-4 photos and nothing on the subject (looking at viewer) in the rest?
yes its possible, dont include looking at viewer since it is the default, and only in those write "looking away", it is not 100% exact science, but I think this is the most logical approach.
Regularisation images are required? Because someone says it is not. Is there any diff if we use it or not.
nope Reg images are not required, becuase SD already knows the concept of a person....I prefer to train with them and without them and see which model is better, because it is purely experimental process.... my best results in private LoRAs were produced with Regularized LoRAs, they became more rich and able to adapt to prompts more and produce better results
@@AI-HowTo in my limited practice it is opposite. it would be interesting to see comparisson.
Great tutorial. Thank you. For generating photos of a particular person (face + body), do you think Dreambooth is outdated now that we use can LoRAs?
I dont think so, it is just that LoRAs are more practical and achieve good level of accuracy, faster to train tooo and simpler, and we can generate as many as we can and reuse them efficienty for objects/people.
CalledProcessError: Command '['C:\\Users\\grome\\stable-diffusion-webui\\cd kohya_ss\\venv\\Scripts\\python.exe',
the script activates well from the folder. I can`t locate the command line within Kohya trying to change the command path. What am I doing wrong?
not sure....but if you are trying to run the script directly using the command line using something like (accelerate ...params...) then first you must activate from inside scripts folder... then go up two steps using cd.. , cd.. and run the script from there so the scrip is run from Kohya folder not from script folder .
It happens when I run Kohya trying to make a Lora. I get this after it has classified all the images, together with low memory error, even though I have 3070.
Would running the script before starting to make Lora fix it? Or is there anywhere withing Kohya to point toward exact script location to avoid error?
not sure, trying setting precision to fp16 and save precision to fp16, use xformers, gradient checking if necessary and try again *3070 should work with bf16* but not sure, possibly there is something missing in your installation for instance.
usually we only run a script which was made by someone else or to avoid using the GUI, but I always used the GUI, I also have 3070, it works without memory problems or errors for SD 1.5, try setting the parameters yourself in the GUI and avoid using scripts when possible .... also check your images in case some of them have large resolutions which may cause errors too.... also try with 1 image till you figure out the source of the error ...
How do you go about training a model that works with your own pictures? Do I just start taking selfies daily for couple days and then adjust the w/h to 1024? I keep getting bad details around the eye area. Any way to improve this?
same as here, better if someone took the photos, since selfies are not so good, the angel of view is not good ... just let someone take the pictures from different angels, with good room natural lighting condition, and some full body shots and train normally.
I also use Photoshop to smooth the images slightly, for example if there are wrinkles in the face or some features that I dont want to appear in my LoRA, Photoshop can hide them, I use Photoshop neaural filters for this purpose which is parts of photonshop 2021 and above, this helps improve quality of the LoRA greatly.
use a good regularization set on a good checkpoint if you want pretty images such as Photon or Majicmix checkpoints and follow the same steps here.
Errors keep appearing when using the SDXL1.5 model. Is there any solution?
Errors usually happen when your GPU dont have enought VRAM as mentioned in the Video, best we can do is turn on Gradient checking, xformers, and if VRAM is 12GB only, then use Ada Adapter optimizer ... I used Run pod for training this SDXL LoRA because my VRAM was only 8GB which is not sufficient for Kohya SDXL training.
Thank you for this useful comparison and tutorial. From your experiments, which LoRA do you prefer (1.5 or XL) and in your view, what are the key advantages from your preference ?
thus far SD 1.5 looks smoother, faster, and good enough for characters and produces wonderful results at resolution 1024x1024 with checkpoints such as photon/majicmix, while SDXL is more realistic such as the skin details/moles/wrinkles, but a lot slower unfortunately, for now, I continue using SD 1.5 because of my GPU, in the future, once i get a better GPU, and there exists better community checkpoints, i will use SDXL ...SDXL on the other hand is better with anatomy, that is: body, hands, fitting character on a bike/horse, and understands objects better, so for complex scene prompting, SDXL is superior.
@@AI-HowTo thank you!
You are most welcome.
@@AI-HowTo hello, have you made more discoveries since 1 month ago in the art of fine tuning SDXL models, be LoRas or Dreambooth?
nope, currently working in another field , wont finish before June 2024, so i doubt i will be posting videos about training, and possible about this subject in general, not sure yet... The training process is a procedure, so following the principles mentioned in this video is all that a person really needs, in general the remaining is just experimental, and the same parameters that works for some dataset may not work 100% optimally for another data set, but the principle remains the same.... I expect soon there will be better, faster, more accurate training methods, even better than Dreambooth that will give people better results.
Can I allow, to some sunglasses in dataset?
sure :) , I did this model for educational purposes only, I have no use for it, but if I further developed it later, i will add glasses, and improve full body images to get higher resolution ones.
Is it important to remove the incorrect anatomy images from the regularisations images? In other words, how to prevent the disfigured hands fingers, eyes not aligning, etc.?
yes, it is better to have good class images, it will improve the quality of the images based on what i have tested recently, it can also improve the color of your character... for instance, if your class images have doll like skin/faces, it makes your character smoother and look better.... avoid having any kind of deformation in the class set, while its effect will be limited, if it has 1% effect you should remove it...Kohya ss is definately learning from class set also based on what I have seen from my recent tests.
@@AI-HowTo thanks a lot for the answer!
you are welcome
Can you use img2img with a image of a clothing to make your Lora Person wear it or you would need to make lora of that peace of clothing too?
Tnx for the video great stuff
this is very difficult to do directly even with controlnet (reference+openpose or other control net combinations and masking), because stable diffusion doesn't understand geometry, accurate dressing is done using blender 3D software, in stable diffusion you get something approximate, unless you trained your clothes inside a LoRA for instance as in this video ua-cam.com/video/wJX4bBtDr9Y/v-deo.html , even then, SD might miss some accurate details .... img2img in general is used to change image style and do inpainting that is: changing part of the image.
I read your answer from the previous video and I'm doing it that way, I always put the trigger word in the prompt... but I'm giving up on these regularization images, I'm training a character right now, it's already on the 4th epoch and the samples show an image that has 0 % of what I'm training... I have 20 images being processed with different sizes but of good quality and with 100 steps, I'm using 900 random 512x512 images of girls (even my character) generated in the same checkpoint.. what's wrong? I've seen videos saying that it needs to be 100 regularization images per trained image, I've seen videos saying that it uses 10... I'm confused
I got that same - if I use regularization images, likeness of object is very low
@@fi5h81 I'm totally lost, I think the solution will be to give up these images and focus on training without them.. the training I mentioned is still being processed, it was estimated for 5 hours... it's been 5 epochs and not even 0.1% of the character has appeared yet... when I train without these regularization images it takes less than half the time and I can see results
you should do what works for you ...training can work really well even without regularization and faster... there is no requirement for 100 reg images per 1 training images... all these are made up by the trainers ... there are no rules like 100 reg images///3000 steps.... all are made up ...training is entirely experimental... I suggest you reduce the 100 repeats down to 20 and increase number of epochs ... I usually never use more than 20 repeats per image and often 10 when i have more images and it works really well with reg and without...... as mentioned in the video Kohya will use (repeats*image count reg images only) so if you use for instance 100 repeats, and have 20 images then you need 2000 reg images otherwise your class images will be repeated since Kohya uses 2 factor regularization, so if it didnt find the extra reg images it will repeat them and increase the effect of the reg set over your character and takes longer to train....I also strongly recommend using real world regularization ... but once again, regularization is not necessary for character LoRA, it only has positive effect in flexibility and can make the LoRA more rich for inpainting.
@@AI-HowTo thank you very much you are saving me and thank you for your patience with a beginner... i am not comfortable just asking you questions, i am also looking and reading articles on the internet about it.. i will do the tests you recommended tomorrow... i am on that moment testing 2 epochs of a lora i just created... i'll like it this time just a suggestion... what better way to test a lora? The one I'm testing at certain checkpoints is a complete mess, at others it works fine, at some I need to change a lot so it doesn't get blurred... usually at weight 1 it never works, below 7 the character is already mischaracterized, I'm testing just between 7, 8 and 9...
totally fine, we all are beginners in this area, and learning, you should ask when you have a question, even if no one replied, you lose nothing, that is how we all learn...Character LoRAs often are delicate and dont work on other check points properly, they work best on the same check point it was trained on.... Character LoRA that is not overfitted should work perfectly at weight 1 ... anything less, the character will lose it's features gradually ... but when you want to create full body shots, we may use 0.8 or lower then automatically repaint the face with after detailer extension ... if you are not getting good results at weight 1 then most likely you are training on large number of repeats such as 50+, this is why using smaller number of repeats is necessary such as 20 or less... use smaller repeats and you can increase number of epochs how many you like.
Subbed! Very good tutorial! I know this is an old video, but I had a few queries.
Is it harder to create/ train a 'realistic character LoRA' if the original dataset contains AI generated images created on realistic checkpoints, instead of a real person's photos? I guess, what I mean to ask is, can a LoRA created using AI generated datasets achieve such realism?
PS. Also, what would be the best checkpoint to create such an AI generated dataset? TIA!
yes as long as the training data set are of good quality and do not contain deformations in the eyes or fingers, even slightest deformation could be augmented after training if they repeated. as for the best check point, not sure unfortunately at the time being, previosuly for 1.5 i got the best results with majicmix v4, even for wester characters despite that the checkpoint was asian and for SDXL the Juggernaut XL , not sure now....I think in general the principles of training do not change overtime, so the video is still good to rely on for training.
I normaly use "easy lora training scripts" for lora training on 1.5, mainly because i have already tweaked my settings for the best results.
can i still use that to train SDXL ?
I think you can do that partially! but not practical, load the script, then go back and check SDXL and fill up SDXL related parameters, images sizes, noise offset...etc....there are some SDXL scripts already in newer version of Kohya which populates lots of settings for you such as the noise offset 0.0357 amongst other optimization parameters such as the adafactor optimizer ... scripts are just alternate way of Json file storeage but included as part of the Kohya, so you can export it as Json too or save it for future use that fits your data.
Thank you! Perfectly explained! Btw. do you have captions for your regularization images?
you are welcome, nope just class name woman
Did you actually apply “ : 1girl,blonde hair,blue eyes,solo,realistic,looking at viewer” like in your video? Also, in order to make the normalized image as similar as possible to the data photo to be trained, shouldn't we prepare something similar to the image to be trained? I'm asking because I see things like AI images in your video. Please understand because my English is bad.
yes, : 1girl,blonde hair,blue eyes,solo,realistic,looking at viewer, any any feature that is specific to our character, it helps to absorb these features into the LoRA.... so we describe everything else that we want to change in the tags.
the normalized set, must not be similar to our subject, must be different, but from the same class, so that we get small features from the reg set and make our LoRA richer... Reg set is also not a must, but i found it to give more flexible results.
@@AI-HowTo drive.google.com/drive/folders/1N139LyAgXfRN1hDrCjfkQixtroP5RQzR?usp=sharing Thank you so much for your reply. I followed everything exactly as you did, starting with the female model, but the results are so different.
@@AI-HowTo drive.google.com/drive/folders/1Rd2scCO-tBk_0_XZxGwZubOH09LIQscE?usp=sharing Take a look when you get a chance. I'm pretty sure I did the exact same thing as you, but it's so different.
I just checked them, Your only problem is the resizing of the images, You are resizing them in the wrong way, it is causing the images to stretch which causes distortion in the training process, the training seems to progress correctly based on how stretched these images are .... you need to maintain the aspect ratio when resizing or use a better resizing/cutting softwrae that is all..... also i suggest to remove "long hair" and smile from the captions, but that is another thing.
so based on your incorrect image resize, the output is logical..... also make sure all your images are of good quality, some of them are really not good quality..... aslo you need to have more reg images such as(20(images)x20(repeats)=400 images , that would be better.
Tried using nude photos as regularization images? is it worth doing that?
just an archive with very high-quality nude photos is easier to find on a torrent than in clothes
nope, didn't try, I think having such pictures will increase likilhood of producing similar pictures in generations, they will work too of course since they are of person or woman class, however, I dont think the model will be richer or more flexible when it comes to fashion or clothing styles.
@@AI-HowTo how many photos do you need in different poses with different facial expressions so that the model is as flexible as possible and does not resist the prompt? and the same for regularization images, if there are 10,000 maximally diverse, will it be better? (I teach XL)
For regularization images: we only need (number of repeats X image count) regularization images, so if we have 30 image, and we are doing 20 repeats, KOHYA will only use 600 reg images and ignores the rest .....
as for how many images we need to train: most importantly is having high quality images, 1 image per pose /per expression/per angel of the face , should be enough, but I have seen that having more images is producing in general better results... so around 20 total images is suitable, 40 images maybe ideal to have more poses/expressions, and up to 100 for character and object is good if it helps capture the face/body/from all angels with different clothing and poses... .... for Style: 100-400 images is good...so there is no set rule for this.
hope a good realistic checkpoint will be release for sdxl :)
I hope so too! without community checkpoints or LoRAs realistic results in SDXL are limited.
@@AI-HowTo I think there are some already released on civitai
yes, still, it will take a while to test and get filtered by the community to know which is better than which ... biggest problem thus far for SDXL wide adoption seems to be speed not quality, it is slower than SD 1.5 and requires more GPU to train in comparison.
why captioning not working ?
if it is not working with you, then possibly Kohya version you are using needs you to explicitly define the caption extension to .txt in the parameters section under
Caption Extension field ... it used to take .txt by default if i am not mistaken, now the default seems .caption, so if your captions are text files you should type .txt ... that might be the case.
I have 3080Ti and my training process is so much slower! It's 120.77s for each step and I cant solve the problem :(
sorry, never experienced this issue, because this should not happen, 3080ti is powerful almost equal to 3090 even if it has smaller VRAM, SDXL should run even with 12GB, so your training speed must match this video or be a little bit slower..... you can try to use a different learning algorithm such as Adaadapter which is better for 12GB instead of Adam8 see in this video ua-cam.com/video/RT2jj-5t8x8/v-deo.html ... make sure Gradient Check pointing option is on since your VRAM is less than 24 also if it stilld doesnt work use this option in ua-cam.com/video/RT2jj-5t8x8/v-deo.html - --network_train_unet_only and hopefully things work better, otherwise, there must be a driver issue in your system.
Whats your graphic card for training XL
mine is RTX 3070, ok for training 1024x1024 SD 1.5 but not ok for SDXL unfortunately, we need 12GB at least for SDXL training...that's why i used runpod for making this video only... then generated the images locally on my 8GB graphics card for both SDXL and SD 1.5
You are welcome, turning gradient checkpointing on will allow you to run SDXL training on 12GB and with adafactor optimizer because it requires less GPU than adam8
@@AI-HowTo Ah thanks for reply! I also own 3070ti ... no luck with xl I guess :(
I trained it with sd1.5 by doing everything the same as you, and it came out like a cartoon. Why is that?
if you are training and prompting on a realistic checkpoint, you should not get a cartonic person, i never got this kind of results with any LoRA before...double check on which check point you are doing the training, or producing the results.
overlays at the end are hiding the conclusion image....
Thank you for mentioning it, updated, I removed them from conclusion section.
thanks for video! Why it takes a lot of time with XL? I hope it'll be as fast as 1.5 :/
SDXL uses different Unet structure, which requires more time to generated images/train ... also even in SD 1.5 if i Train on 1024 images we will 4 times slower than 512 images, however, on SD 1.5 training 1024 resolution images usually produces studio quality images so i dont have to do many repeats... for this model, results were great from first attempt because the data was 1024x1024
How does training work if say the male character you are using is always wearing some cultural head gear(native American chieftan, Mongolian skullcap, Viking helmet, Thai crown dress, Arab male shawl etc) Like say I got 50 images of an individual in which he's wearing the same thing on his head in all 50. What kind of classification images should I use. regular photos of men or do i curate a new set of classification photos of men all wearing the head garb. Some head accessories are easier for the ai models to understand without specific training(viking helmet) given how commonly they are used but others not so much. I had many issues with various ethnic head gear, most notably that of Vietnam, Thailand, Mongolia, Arabs.
we use normal regularization images of men, we can also, use me with various types of helmets too in the class images that are different than yours....
True, it is easier for SD to improve on something it has seen before than something new... most importantly, don't mention anything related to helmet in the captions of head gear or related to the dress if the dress repeats and you want it to be part of the final results, you dont even need to use man in the captions, just use triggerword absorbing all your character features... the class (in folder name) on the other hand is man.
@@AI-HowTo Thank you
посмотрел спасибо за труд!
You are most welcome
thanks, I watched it a year ago) the answer to your request...usually when we use a large number of pictures such as hundreds, it becomes more of a style training ... there is how ever no max number, some say around 1000 is the max, but one can also train with around 10 pictures and get good results using a mix of techniques and lots of try/error .... this video ua-cam.com/video/vA2v2IugK6w/v-deo.html is a better example of character training with lots of useful notes .... for me, I got better results with 30+ pictures till around 80 pictures for people, it captures more changes and details , but even with around 10 that can be done too.... and new SD versions can train on smaller numbers..........I remember when I was coaching a year ago... on the 1.5 model, I did this by throwing folders of photos there were a lot in each folder))) with the names 100_body....80_face.....80_fullbody set Network Rank128 Network Alpha1 and lo and behold, the model was super obedient, flexible, super response and quality....it's been exactly a year, I've been waiting for people to start training sdxl, but there's still not much information, so I want to try to train now, I don't know where to start ...))) thank you for the answer!
hey can you provide regularzation images for men? please
sorry, I have none that I can share at the time being, because images must be checked for copy right issues before sharing publicly... you can just generate some using SD and collect some from Freepik or real pictures from google which is best approach then crop them manually... when used in your own computer, it doesnt matter if it has copy right or not, since the pictures are not used directly nor are memorized, so generated models will have no copy right issues.
Do you have a Discord or a community you can get into?
sorry no, not at the moment
Great tutorial, would you mind to share your reg dataset?
Thank you, I will make it available, need to check if it has any copy righted content just incase because it has some real images as well... I will make it available on the same video or send you a link as soon as possible.
@@AI-HowTo can you send me a link? I just wanna check out the difference between using a reg data and without
sorry for the LoRA training of Olivia I wont make the LoRA available for copyright issues and due to some complaints of the video i got ... for this video of Kathren civitai.com/models/126328/katheryn-winnick-xyzkwv1sdxl you can see here three links but all are regularized .... I did several models recently, I got better results with regularization, but in some cases without class images seems to work as well...personally, I always do regularization for models that I want to be better and more flexible, takes more time to train, but often when regularization set is used, results are often better.
thanks for your advice, it helps a lot!!!@@AI-HowTo
you are welcome
Make a discord community please
sorry, not planning on any for the following few months at least.
You skip so much it's so hard to follow what you're doing, half of the clicks you're doing are skipped and unable to follow
was trying to shorted the video as much as possible, but point taken, thanks for the input.
Yeah, try to get a render from the side. Or from behind. "LoRA" is a patch, not a solution. And it only works if your subject is always taking a selfie.
if training data has side views, the results will be really good ...this one has limited dataset... for long shots, use of after detailer is the only solution for LoRAs, which like you said is more like a patch for a certain object/character ... LoRAs and even SDs have many shortcommings but there are many ways around them
it works from side too, i made many loras like that - U need side photos in UR dataset