@@AI-HowTo did you go to university to learn about AI etc ? Is there any good ressources you could recommend to learn what is happening behind the curtain ?
current AI tools do not require any university degree to use, they are aimed to give people more power as long as they have patience/time/tools and they want to learn...unfortunately, currently, it is all just a mess, mostly experimental, not well documented, I didnt find any one good place to use, nor any one right way to do something, there are many ways...so youtube can be a useful place to look up some info with help of Bard/ChatGPT for answering certain questions which can give some useful answers sometimes.
Again very useful information, thank you for being so thorough. It's clear now that I've been getting poor results by setting alpha same as dim (which is widely recommended elsewhere), with my best results coming from using adaptive learning rate. I will try with alpha 1/2 or 1/4 instead and train more epochs. Did your regularization not pick up the body well because of lack of body shots in the regularization set? (edit: I see you answered this in another comment, thanks!) I noticed when I generated my own set with "photo of a woman" at 512x512, all the images were portrait close-ups. I wondered if generated some part of the set at 512x768 might help, or prompting more specifically. Always more to experiment with!
The strange problem I facing is, that whenever I use regularisation images, they bleed into the model. If I use my created Lora to create an image, I will always get images that resemble the reg images and there's no trace of my face model I tried to train. Do you know what might be the problem?
true, reg images can affect the LoRA too much sometimes, especially if we dont include a trigger word inside each caption of the training data, to reduce spilling, reg images must not contain repeating faces or data... the only thing that should repeat are your subject .... second thing, try without regularization, you might get better results in some cases, since regularization is not a must, because SD already have learned the Human concept.
another reason could be using smaller number of training steps per image 5_xyz woman for instance which will then use smaller set from the classication images : Kohya ss will only use number of images *repeats number of reg images, all remaining images are ignored... for example if you have 40 repeats for each image, and you have 20 images only then Kohya will only use 40*20=800 images... all remaining images are ignored.... if we use smaller repeats such as 5 and have 20 images, then Kohya will only use 100 images from the reg set and that will increase the spilling too, especially if there are images that have similar features
@@AI-HowToI must say so far I’ve had the best results with your exact settings! I usually use between 30 - 40 training images with 30 repeats and 10 epocs. So far the Lycoris produced better quality than my previous Loras!
will check if my GPU can handle SDXL, most definitely SDXL will give better results, because it has learned the Person/objects concepts way much better
@@AI-HowTo Wonderful, I've had some good results training SDXL standard lora with 896x1152 images and 1024x1024 images. Adafactor optimizer, gradient checkpointing turned on. Max resolution 1024x1024. Training is taking 5-6 hours on a 300 image dataset, batch size and RTX 4090. I'm not able to go higher because the memory runs out.
great to hear, I checked SDXL today, it didnt work on my GPU unfortunately with 8GB RTX 3070, must find alternative options to do any training in SDXL such as online servers or others.... it is clear from testing IN A1111 that SDXL is superior for anatomy and understanding of concepts, the results will thus be better for training, but the GPU requirements makes it more difficult to use ... according to Stability, good results could be obtained also from less number of images for LoRAs, as long as they cover all aspects and angels of characters/objects.
Excellent follow up video, appreciate the work your doing, it's really helpful. Do you have a rough ratio of training images? (10 faces, 5 full body ect) and do you get better results trimming to 512/768 or would sticking to cropping to same ratio on bigger images improve quality? I'm currently combining blip and wd14 to caption my images, but do I remove every tag that mentions any physical feature or keep some core things so that it knows vaguely what's in the image?
the higher the resolution of the training data the better, 768x768 ... 512x768...768x1024...1024x1024... but my GPU cannot handle higher resolutions... with buckets we can work with multiple resolutiosn... we could also crop to resolutions such as 400x600...etc... but it is better to abide by the SD model sizes which are 1:1 , 2:3...etc ratios regarding number of images, no, we dont have to include more faces than bodies... the more body shots we have the better the model will learn body shots .... have more face images such as cropped faces, will only help in InPainting nothing else... there are some work arounds that i will try to upload later to improve full body shots with clearer face details.... ideally, we can remove every tag that repeats in all images and dont want to change in the final output such as eye description... and keep that which we want to change for instance: the cloth type/color...etc.
this is a good resource github.com/bmaltais/kohya_ss/wiki/LoRA-training-parameters i think it is taken from a translation from Japanes page athoshikat.hatenablog.com/entry/2023/05/26/223229 you can use translate to get it in english, i think it is a great resource, has many in depth articles about the parameters/SD/LoRA...
@@AI-HowTo thanks... it's crazy how there's isn't much in between the repo and the white papers. One is very light reading and the other impenetrable. Good job assimilating it for us
@@AI-HowTo still confused about one thing. Usually Dreambooth is without captions, whereas captions is for finetuning multiple concepts. And dreambooth is with a trigger keyword to call up the character. But here, you're saying, if the dataset is all one cohesive character, with captions or without, without trigger (which is the default use in dreambooth extension and Kohya script), you still get the learned character no matter what? You're using kohya dreambooth Lora right, but the trigger is not necessary
usually Trigger words are useful for multiple concepts... I also found it good for character when I am using Regularization...not useful without regularization in most of the cases... so it is better to test with it/without it for each LoRA separately....LoRA in general will affect Woman class (that includes girl, 1girl, female, young woman) of the SD even without the trigger word, but with regularization, I should have used the Trigger word... the video was a test case and more tests should have been included to give a clearer picture...... I developed other LyCORIS for characters, and found Trigger words very useful to get better resemblance with regularized data sets.
you can use this one i generated from SD for instance huggingface.co/datasets/AIHowto/Chilloutmix_woman_regset1/blob/main/1340chilloutmix_class_woman.zip or huggingface.co/datasets/AIHowto/Chilloutmix_woman_regset1/blob/main/1832_reg_images_general_with_full_body_faces_upperbody_images.zip or from you A1111 SD: choose your target checkpoint, put a woman and generate few hundreds of images ... if your data has full body shots, also generate few hundreds full body shots, you can use a suitable commands such as (realistic woman, master piece, best quality, young woman , full body )... and few hundreds upper body shots...so that it is mixed and resembles your training data in terms of variety... Kohya will eventually only use (repeats*image count) from your class folder and ignores the rest
Hey hey, such a great content. Can you craete a LoRa training for the clothing please? Like for example set of jackets or leather and cotton for example, and how to make for those specific LoRa, there is not much content for that.
Hi, I have replicated your settings for Lycoris exactly, but keep getting a CUDA error. I have a 16gig 4070 ti super - so, I cannot imagine why it should be overloading as I have read of people running Kohya for Lycoris with much, much less power. Do you have a patreon or some way of tipping? I would be willing to purchase 1 on 1 help.
I am sorry, not at the time being... you might want to check this video settings which might be of help ua-cam.com/video/vA2v2IugK6w/v-deo.html and have more recent update on character training ... it even shows how to train 1024x1024 image sizes without running into CUDA errors ... usuaully this error happens if the image size is too large ... i did training on my 8GB RTX 3070 Laptop card without having this issue...but this worked only for SD15, but for 1024x1024 images i get CUDA error for SDXL ... you just need to recheck you image dimenstions once again to make sure that all are proper.
Let me ask you a question about the prompt I see you put the name of your character as Oliviacastav03 , but when you type the prompt, you don't see it, you just use 1girl+ I also train like you, but when I don't put the name of the character I put in the folder in the prompt, it doesn't seem to output the same image as the one I trained. Is it mandatory to put the name in the folder into the prompt?
Oliviacastav03 was the instance name... the entire thing is experimental, you do what works for you, we dont really need to repeat it the instance name prompt in my case because i already had 1girl in the captions, so adding 1girl to the prompt has emphasized the LoRA so... each common keyword from the LoRA captions will increase LoRA effect if i didnt remove them from the captions(i shoudl have but i didnt) ... ideally, we should remove 1girl from the captions and put the instance name instead in the prompt... but to save time i kept the default captions ... so was enough to affect the results, this could also be related to the alpha value, when it is high (which is not recommended) it gives stronger effect in the prompt .
I think there is no direct answer to this, because training in general is a bit messy and based more on try/error not on clear scientific rules ... one can give name for reg image folder such as 1_woman where woman wears various jackets ... one can use reg folder name as 1_jean or n_jacket .... reg images will obscure some features and add new simpler more rich features into the LoRA... this video could have some tips on object training ua-cam.com/video/wJX4bBtDr9Y/v-deo.html
huggingface.co/datasets/AIHowto/Chilloutmix_woman_regset1/blob/main/1832_reg_images_general_with_full_body_faces_upperbody_images.zip i used a subset of this in previous video i think i used this one huggingface.co/datasets/AIHowto/Chilloutmix_woman_regset1/blob/main/1340chilloutmix_class_woman.zip
in class folder in this video i put more than 1300 class images, but Kohya eventually will not use all of them only as mentioned earlier (repeats*image count) if we check the console we can compute that through the number of images loaded+your training images, and the console says actually that no all reg images were loaded, but doesnt show the details
Could you teach how one can do a training using "LyCORIS". Lycoris is included in Kohya ss? In this video you teach how to do this?, I have not seen it, because I am not sure if I will find what I am looking to learn, thanks
Yes, you can check ua-cam.com/video/dUlki1IAB0w/v-deo.html and ua-cam.com/video/dUlki1IAB0w/v-deo.html , LyCORIS is also done in Kohya as seen starting around 6:58 and 7:46 for training settings and 9:42 for usage of LyCORIS and installation of the extension ... the prompting in this video is not accurate because I didnt use the Trigger word i think which made the results less accurate , you can watch a more accurate video in ua-cam.com/video/vA2v2IugK6w/v-deo.html which explains LoRA in SD 1.5, SDXL and talks abit about LyCORIS ... maybe in the future I will do video for LyCO only, but the steps are the same just slight change in settings.
@@AI-HowTo Oh nice, thank you, I will check that. The power of LORA is incredible, I have been able to try some models downloaded on the internet, and I have been impressed, based on your experience, could you tell me which one you prefer, Lora or LyCORIS, in which one do I get a better result?, and the requirements to use LyCORIS are similar to LORA, or do I need a very powerful gpu?. I used LORA once, but out of curiosity I found out about this alternative, but I would like to know your experience, if it is convenient for me to continue using LORA, or try this, thank you.
you are welcome, both are good and GPU requirements are not different, LyCORIS in general seems to give slightly better results when complex patterns are present, it also has smaller file size, for my personal use, i only develop LyCORIS, but if you want the model to be used by others, then LoRA is preferable. without them, SD is really limited in what it can do.
Awesome Video! I tested some of your lycoris settings, but i am struggeling a bit with my training steps, can't find a good formula. Any advice on how many steps?
there is no formula... all that other people are just suggestions, such as repeats = 1000/image count ... or some say 3000 steps is good... I tested some models with larger number of image that take 15000+ to produce great results mainly however regularized... it is purely experimental ... just check the Kohya output and once it starts overfitting or stops changing to something desirable, stop the training and test on A1111
Sorry if I missed it but what kind of GPU do you have? I have a 3080 with 10GB of VRAM and i can train 728,768 with different resolutions (not downsized!) no problem.
3070, yes, it is possible to train even up to 1024 with the right settings, it just gets slower, higher resolution images are better for obtaining better and clearer results.
hello after i finish training the model, if i look at the pictures done after each epoch the person looks much older like elderly, why? the only differences I have in the parameters are; the mixed precision (ft16)the network rank and alpha which are (16 and 8)
if you use regularization set with old people, the LoRA tends to produce older people ... you can also use Young woman with your lora or Young girl to help produce younger version... you might also have overfitted .... I can also suggest smoothing the images using photoshop or Topaz AI software to remove any wrnickles if the person trained have them ... you can also check this new video, hopefully it helps you in defining how your regulirzation set should be which is an update to this older video that was inconclusive ua-cam.com/video/vA2v2IugK6w/v-deo.html
@@AI-HowTo I will definitely watch the video, but my regularizations should be fine (2000 were made with realistic vision v3 and 1000 with v4) and the images produced are visually young
thanks for doing this. One comment is that WD14 captioning is I think causing issues and not making the comparison to non-captions fair. Remember to exclude anything like woman/girl etc from the captions. It should just learn those, we don't want to change those kinds of things at inference.
LoRA weigths will be applied with or without the trigger word for single subjects so all SD subjects will be overriden by my LoRA subject of a woman...trigger words are more important when having two or more subjects in the dataset. adding the trigger the word in the prompt instead of 1girl or young woman for instance may give same result sometimes or stronger effect... in some cases i dont add the trigger word and results are prettier, other times I only use the trigger word in the after detailer to make the automatic inpaint effect stronger... I usually use XYZ to test the effect of using 1girl, young woman, Trigger word...etc combination to see which gives best results from the checkpoint.
@@AI-HowTo thanks for your answer. Just a question, when you run Lora training samples, the sampling will run from the model only, or will it take into account the Lora network? My samples on Kohya dont have any resemblance to my dataset. But when I gerenate on auto1111 with the Lora I get good likeness
the sample is initially generated as noise from the SD, then gets affected by the LoRA weights .... if samples on Kohya dont resemble your subjects, it is most likely that you need to add the trigger word in that cat, especially if you are using regularization, and use the same seed that was used in the seed field on parameters section ... when i use same seed i get same result on Kohya and A1111 for the same prompt.
@@AI-HowTo i entered two sample captions, one with trigger word one without. Both samples did not resemble at all. I could see them following global tokens but not the resemblance to the character. Again, webui + lora gives very good resemblance. So not sure the trigger word works in the sampling. I started a thread for this on the BMaltais git discussions but haven't heard back yet
thanks, Koyha only uses (training images * repeats ) and ignores the rest ... so for example, if you have 100 images, and you are doing only 10 repeats, Koyha only needs 1000 class images, if class forlder has more images, it will ignore them, if it has less, it will use class images twice.... regularization doubles the steps ... sometimes, LoRA or LyCORIS produces better results without Regularization, this is why experimentation and comparision is necessary.
I am not confident that adding more body shots on the regularization set was the reason results improved, it could be a stochastic thing, but i suggest to add them anyway to resemble the training subject in terms of variety.... my results on regularization were not conclusive, especially that in some cases, it seems results without regularization were actually better.
A1111 and Kohya ss GUI changes slightly with every release, I suggest to do a fresh install to check most recent version which is v21.8.2 at the time being
@@AI-HowTo The problem is that they have completely changed the interface, and all the tutorials have become obsolete for me, Many options and tabs have been removed and there are new things that I have no idea what they are for.
based on my tests, i found both ok, but LYCORIS often produces slightly better results... in my case, i would always go for LYCORIS actually for character or style training.
Thank you so much! So informative and precise. This video and the previous one really helped me to train my first person LoRA and get good results. I want to try training a LyCORIS now! I have just two questions if you or someone else can help... 1) Is it normal that the LoRA don't work that well with the model I used as a base for training (but work quite well with other checkpoints)? 2) how to chose a good (realistic) checkpoint to use as a base for the training? I have used SD1.5 and RealisticVision2.0 - do you have any suggestion as to which other checkpoints could be good to use as base for training? Thank you and please keep up the great job - it really helps a lot of us!
Thank you, if it didnt work well on your checkpoint and well on others, that could be by accident, since all are based on 1.5 eventually, so this is possible ...realistic vision v4 is now available, but i didnt get very aethetic results from it ...i tests analog madness, i saw it giving good realistic results and pretty as well for non asian models ... photon also has some good results too, I think soon, all these will be ditched in favor of SDXL ... for perfect asian like models i saw Majicmix to be one of the best out there... it can be used to train non asians too, but wont look very realistic
@@AI-HowTo Thank you so much for your reply! I will try analog madness next! Please keep up the great job! Also... do you usually caption expression: like smile/serious/sad/etc?
yes you should, because some images are smiling, others are not ... only remove that which repeats in all captions... and that which you want to be part of your LoRA. it is better to have different facial expressions in your training data, and most importantly, different face poses ... in the Olivia examples, the dataset actually lacks different face angels, otherwise, the results would have be alot better than this ... I only used this model because i didnt want to spend much time looking up images, her data was just like out there and ready to use ... i tried other models, and they gave more realistic results than this model and better resemblance, because they had more facial angels/features.
@@AI-HowTo Great thank you so much - I will try to improve my tagging before re-training. I also wanted to tell you that like you, I got mixed results with Reg images, in all my training I am never sure if they improve things or not so for the time being I'll try without. Thank you!
I dont think so, I think because SD produces so stochastic results, and full body shots are not good enough as it didnt learn them well, due to having lower number of images in the data and not resorting to other methods such as a detailer or others... more experimentation is required.
your GPU might not be supporting it, anyway, xformers only speeds up the training, but it doesnt make the results better, actually, it degrades the quality of the output slightly
unfortunately, i cannot upload the Kohya myself, it is huge more than 19GB... just remove your old installation and start a new installation .... sometimes the update doesnt work ... so a fresh installation from the Kohya page is better, it will automatically install pytorch and xformers newest versions if your card supports it, just follow the instructions or check my previous video about LoRA ua-cam.com/video/clRYEpKQygc/v-deo.html
Hey man nice video I'll give it a try because I´m losing my mind trying to train a face but is a total failure haha By the way, would you mind sharing the training data set of the girl used as example?
check ua-cam.com/video/clRYEpKQygc/v-deo.html i explained how we got the data , I cant upload the data because the model's images dont belong to me, i only used them for this video, but cannot use it in real life without the owner's permission ...but i suggest you try different data, because the problem with this data, while it is pretty, it doesnt have enough face variations, which is not good, because final results would be less flexible, we should have the face from different angels too.
you mean scaling in the image to image with another LoRA without regularization? recently I created better LoRAs where Regularization have shown conclusively that it is better than without in different datasets, unlike those results in this video for this data as those here ua-cam.com/video/vA2v2IugK6w/v-deo.html and others that i didnt pubnlish, but I often just use after detailer to fix the generatated faces at higher resolutions, found it better than hires or other solutions and more practical.
I removed some real images which might been copy righted, these images are from freepik, SD generated, so some of them are not so great such as for full body with some bad hands in few pictures, it is better if class images no no deformation though, but the dataset may have at least 350 good images included
thanks for the tip, merging gave acceptable results in some cases, but the best results for realistic people seems coming from using After detailer with a LoRA (LyCORIS) trained on face/upper body shots alone .... I think if the GPU was good, training on 1024x1024 images would have given perfect results even with body shots and smaller number of images... with lower res images, using After detailer for automatic inpaint, seems like the best option I found for best details and resemblance.
I saw LyCORIS giving slightly better results in general... ideally one must develop different models using same data set and see which one is better for his data.
You are fast becoming one of the the best channels on youtube for SD content.
I think that takes months or years to actually reach that point, and I doubt I will be able to allocate the time for that in the future, we will see.
@@AI-HowTo did you go to university to learn about AI etc ? Is there any good ressources you could recommend to learn what is happening behind the curtain ?
current AI tools do not require any university degree to use, they are aimed to give people more power as long as they have patience/time/tools and they want to learn...unfortunately, currently, it is all just a mess, mostly experimental, not well documented, I didnt find any one good place to use, nor any one right way to do something, there are many ways...so youtube can be a useful place to look up some info with help of Bard/ChatGPT for answering certain questions which can give some useful answers sometimes.
Again very useful information, thank you for being so thorough. It's clear now that I've been getting poor results by setting alpha same as dim (which is widely recommended elsewhere), with my best results coming from using adaptive learning rate. I will try with alpha 1/2 or 1/4 instead and train more epochs.
Did your regularization not pick up the body well because of lack of body shots in the regularization set? (edit: I see you answered this in another comment, thanks!) I noticed when I generated my own set with "photo of a woman" at 512x512, all the images were portrait close-ups. I wondered if generated some part of the set at 512x768 might help, or prompting more specifically. Always more to experiment with!
How about one of these for styles with and without reg?
The strange problem I facing is, that whenever I use regularisation images, they bleed into the model. If I use my created Lora to create an image, I will always get images that resemble the reg images and there's no trace of my face model I tried to train. Do you know what might be the problem?
true, reg images can affect the LoRA too much sometimes, especially if we dont include a trigger word inside each caption of the training data, to reduce spilling, reg images must not contain repeating faces or data... the only thing that should repeat are your subject .... second thing, try without regularization, you might get better results in some cases, since regularization is not a must, because SD already have learned the Human concept.
another reason could be using smaller number of training steps per image 5_xyz woman for instance which will then use smaller set from the classication images : Kohya ss will only use number of images *repeats number of reg images, all remaining images are ignored... for example if you have 40 repeats for each image, and you have 20 images only then Kohya will only use 40*20=800 images... all remaining images are ignored.... if we use smaller repeats such as 5 and have 20 images, then Kohya will only use 100 images from the reg set and that will increase the spilling too, especially if there are images that have similar features
@@AI-HowToI must say so far I’ve had the best results with your exact settings! I usually use between 30 - 40 training images with 30 repeats and 10 epocs. So far the Lycoris produced better quality than my previous Loras!
yes, I saw Lycoris producing better results that standard LoRA models as well, more realistic and finer details.
Could you please do an updated guide for SDXL once you've figured out some optimal settings?
will check if my GPU can handle SDXL, most definitely SDXL will give better results, because it has learned the Person/objects concepts way much better
@@AI-HowTo Wonderful, I've had some good results training SDXL standard lora with 896x1152 images and 1024x1024 images. Adafactor optimizer, gradient checkpointing turned on. Max resolution 1024x1024.
Training is taking 5-6 hours on a 300 image dataset, batch size and RTX 4090. I'm not able to go higher because the memory runs out.
great to hear, I checked SDXL today, it didnt work on my GPU unfortunately with 8GB RTX 3070, must find alternative options to do any training in SDXL such as online servers or others.... it is clear from testing IN A1111 that SDXL is superior for anatomy and understanding of concepts, the results will thus be better for training, but the GPU requirements makes it more difficult to use ... according to Stability, good results could be obtained also from less number of images for LoRAs, as long as they cover all aspects and angels of characters/objects.
Excellent follow up video, appreciate the work your doing, it's really helpful.
Do you have a rough ratio of training images? (10 faces, 5 full body ect) and do you get better results trimming to 512/768 or would sticking to cropping to same ratio on bigger images improve quality?
I'm currently combining blip and wd14 to caption my images, but do I remove every tag that mentions any physical feature or keep some core things so that it knows vaguely what's in the image?
the higher the resolution of the training data the better, 768x768 ... 512x768...768x1024...1024x1024... but my GPU cannot handle higher resolutions... with buckets we can work with multiple resolutiosn... we could also crop to resolutions such as 400x600...etc... but it is better to abide by the SD model sizes which are 1:1 , 2:3...etc ratios
regarding number of images, no, we dont have to include more faces than bodies... the more body shots we have the better the model will learn body shots .... have more face images such as cropped faces, will only help in InPainting nothing else... there are some work arounds that i will try to upload later to improve full body shots with clearer face details....
ideally, we can remove every tag that repeats in all images and dont want to change in the final output such as eye description... and keep that which we want to change for instance: the cloth type/color...etc.
Also, where can I read up more on on network dimension and alpha and convolution with lykoris? Is there useful reading that's not too esoteric?
this is a good resource github.com/bmaltais/kohya_ss/wiki/LoRA-training-parameters
i think it is taken from a translation from Japanes page athoshikat.hatenablog.com/entry/2023/05/26/223229 you can use translate to get it in english, i think it is a great resource, has many in depth articles about the parameters/SD/LoRA...
@@AI-HowTo thanks... it's crazy how there's isn't much in between the repo and the white papers. One is very light reading and the other impenetrable. Good job assimilating it for us
@@AI-HowTo still confused about one thing. Usually Dreambooth is without captions, whereas captions is for finetuning multiple concepts. And dreambooth is with a trigger keyword to call up the character. But here, you're saying, if the dataset is all one cohesive character, with captions or without, without trigger (which is the default use in dreambooth extension and Kohya script), you still get the learned character no matter what? You're using kohya dreambooth Lora right, but the trigger is not necessary
usually Trigger words are useful for multiple concepts... I also found it good for character when I am using Regularization...not useful without regularization in most of the cases... so it is better to test with it/without it for each LoRA separately....LoRA in general will affect Woman class (that includes girl, 1girl, female, young woman) of the SD even without the trigger word, but with regularization, I should have used the Trigger word... the video was a test case and more tests should have been included to give a clearer picture...... I developed other LyCORIS for characters, and found Trigger words very useful to get better resemblance with regularized data sets.
Thank you very much for the video! I have a question, how to make a regularization folder?
you can use this one i generated from SD for instance huggingface.co/datasets/AIHowto/Chilloutmix_woman_regset1/blob/main/1340chilloutmix_class_woman.zip or huggingface.co/datasets/AIHowto/Chilloutmix_woman_regset1/blob/main/1832_reg_images_general_with_full_body_faces_upperbody_images.zip
or from you A1111 SD: choose your target checkpoint, put a woman and generate few hundreds of images ... if your data has full body shots, also generate few hundreds full body shots, you can use a suitable commands such as (realistic woman, master piece, best quality, young woman , full body )... and few hundreds upper body shots...so that it is mixed and resembles your training data in terms of variety... Kohya will eventually only use (repeats*image count) from your class folder and ignores the rest
Hey hey, such a great content. Can you craete a LoRa training for the clothing please? Like for example set of jackets or leather and cotton for example, and how to make for those specific LoRa, there is not much content for that.
will do
Hi, I have replicated your settings for Lycoris exactly, but keep getting a CUDA error. I have a 16gig 4070 ti super - so, I cannot imagine why it should be overloading as I have read of people running Kohya for Lycoris with much, much less power. Do you have a patreon or some way of tipping? I would be willing to purchase 1 on 1 help.
I am sorry, not at the time being... you might want to check this video settings which might be of help ua-cam.com/video/vA2v2IugK6w/v-deo.html and have more recent update on character training ... it even shows how to train 1024x1024 image sizes without running into CUDA errors ... usuaully this error happens if the image size is too large ... i did training on my 8GB RTX 3070 Laptop card without having this issue...but this worked only for SD15, but for 1024x1024 images i get CUDA error for SDXL ... you just need to recheck you image dimenstions once again to make sure that all are proper.
Let me ask you a question about the prompt
I see you put the name of your character as Oliviacastav03 , but when you type the prompt, you don't see it, you just use 1girl+
I also train like you, but when I don't put the name of the character I put in the folder in the prompt, it doesn't seem to output the same image as the one I trained.
Is it mandatory to put the name in the folder into the prompt?
Oliviacastav03 was the instance name... the entire thing is experimental, you do what works for you, we dont really need to repeat it the instance name prompt in my case because i already had 1girl in the captions, so adding 1girl to the prompt has emphasized the LoRA so... each common keyword from the LoRA captions will increase LoRA effect if i didnt remove them from the captions(i shoudl have but i didnt) ... ideally, we should remove 1girl from the captions and put the instance name instead in the prompt... but to save time i kept the default captions ... so was enough to affect the results, this could also be related to the alpha value, when it is high (which is not recommended) it gives stronger effect in the prompt .
Hello! How to give reg image name for clothing? Example: if I train a jean jacket , how do I give reg image name?
I think there is no direct answer to this, because training in general is a bit messy and based more on try/error not on clear scientific rules ... one can give name for reg image folder such as 1_woman where woman wears various jackets ... one can use reg folder name as 1_jean or n_jacket .... reg images will obscure some features and add new simpler more rich features into the LoRA... this video could have some tips on object training ua-cam.com/video/wJX4bBtDr9Y/v-deo.html
thanks a lot, the conclusion is very helpful
Glad it was helpful!
I seem had different results depending on the reg images used, do you know anything about this? are your reg images available for public download?
huggingface.co/datasets/AIHowto/Chilloutmix_woman_regset1/blob/main/1832_reg_images_general_with_full_body_faces_upperbody_images.zip
i used a subset of this
in previous video i think i used this one
huggingface.co/datasets/AIHowto/Chilloutmix_woman_regset1/blob/main/1340chilloutmix_class_woman.zip
note: Kohya ss will eventually only use (repeats*image count) from your class folder and ignores the rest
@@AI-HowTo awesome. I seem to remember you use about 500 of them?
in class folder in this video i put more than 1300 class images, but Kohya eventually will not use all of them only as mentioned earlier (repeats*image count) if we check the console we can compute that through the number of images loaded+your training images, and the console says actually that no all reg images were loaded, but doesnt show the details
@@AI-HowTo ok got you
Best content on YT!!
Please can you talk about training prompting 2 ckpt or loras on the same result. Latent couple or other solutions...
thank you, glad you find it useful
Could you teach how one can do a training using "LyCORIS". Lycoris is included in Kohya ss? In this video you teach how to do this?, I have not seen it, because I am not sure if I will find what I am looking to learn, thanks
Yes, you can check ua-cam.com/video/dUlki1IAB0w/v-deo.html and ua-cam.com/video/dUlki1IAB0w/v-deo.html , LyCORIS is also done in Kohya as seen starting around 6:58 and 7:46 for training settings and 9:42 for usage of LyCORIS and installation of the extension ... the prompting in this video is not accurate because I didnt use the Trigger word i think which made the results less accurate , you can watch a more accurate video in ua-cam.com/video/vA2v2IugK6w/v-deo.html which explains LoRA in SD 1.5, SDXL and talks abit about LyCORIS ... maybe in the future I will do video for LyCO only, but the steps are the same just slight change in settings.
@@AI-HowTo Oh nice, thank you, I will check that. The power of LORA is incredible, I have been able to try some models downloaded on the internet, and I have been impressed, based on your experience, could you tell me which one you prefer, Lora or LyCORIS, in which one do I get a better result?, and the requirements to use LyCORIS are similar to LORA, or do I need a very powerful gpu?. I used LORA once, but out of curiosity I found out about this alternative, but I would like to know your experience, if it is convenient for me to continue using LORA, or try this, thank you.
you are welcome, both are good and GPU requirements are not different, LyCORIS in general seems to give slightly better results when complex patterns are present, it also has smaller file size, for my personal use, i only develop LyCORIS, but if you want the model to be used by others, then LoRA is preferable. without them, SD is really limited in what it can do.
Awesome Video! I tested some of your lycoris settings, but i am struggeling a bit with my training steps, can't find a good formula. Any advice on how many steps?
there is no formula... all that other people are just suggestions, such as repeats = 1000/image count ... or some say 3000 steps is good... I tested some models with larger number of image that take 15000+ to produce great results mainly however regularized... it is purely experimental ... just check the Kohya output and once it starts overfitting or stops changing to something desirable, stop the training and test on A1111
Sorry if I missed it but what kind of GPU do you have? I have a 3080 with 10GB of VRAM and i can train 728,768 with different resolutions (not downsized!) no problem.
3070, yes, it is possible to train even up to 1024 with the right settings, it just gets slower, higher resolution images are better for obtaining better and clearer results.
hello after i finish training the model, if i look at the pictures done after each epoch the person looks much older like elderly, why? the only differences I have in the parameters are; the mixed precision (ft16)the network rank and alpha which are (16 and 8)
if you use regularization set with old people, the LoRA tends to produce older people ... you can also use Young woman with your lora or Young girl to help produce younger version... you might also have overfitted .... I can also suggest smoothing the images using photoshop or Topaz AI software to remove any wrnickles if the person trained have them ... you can also check this new video, hopefully it helps you in defining how your regulirzation set should be which is an update to this older video that was inconclusive ua-cam.com/video/vA2v2IugK6w/v-deo.html
@@AI-HowTo I will definitely watch the video, but my regularizations should be fine (2000 were made with realistic vision v3 and 1000 with v4) and the images produced are visually young
thanks for doing this. One comment is that WD14 captioning is I think causing issues and not making the comparison to non-captions fair. Remember to exclude anything like woman/girl etc from the captions. It should just learn those, we don't want to change those kinds of things at inference.
so it seems, thank you.
Hello. Really enjoyed. Just wondering if you are thinking about doing the same with Sdxl 1.0
will try to do so, my GPU in only 8GB, not enough for SDXL, will check alternatives.
@@AI-HowTo could do it with Runpod
yes will check it out
Hi! Thank you for this awesome tutorial! Do you maybe do a LORA training gig? Thank you!
will do one on learning objects which covers this topic, may also do gigs soon too
Also, one question: your samples prompts you use 1girl instead of your Lora trigger keyword. I'm confused. Can you explain this?
LoRA weigths will be applied with or without the trigger word for single subjects so all SD subjects will be overriden by my LoRA subject of a woman...trigger words are more important when having two or more subjects in the dataset.
adding the trigger the word in the prompt instead of 1girl or young woman for instance may give same result sometimes or stronger effect... in some cases i dont add the trigger word and results are prettier, other times I only use the trigger word in the after detailer to make the automatic inpaint effect stronger... I usually use XYZ to test the effect of using 1girl, young woman, Trigger word...etc combination to see which gives best results from the checkpoint.
@@AI-HowTo thanks for your answer. Just a question, when you run Lora training samples, the sampling will run from the model only, or will it take into account the Lora network? My samples on Kohya dont have any resemblance to my dataset. But when I gerenate on auto1111 with the Lora I get good likeness
the sample is initially generated as noise from the SD, then gets affected by the LoRA weights .... if samples on Kohya dont resemble your subjects, it is most likely that you need to add the trigger word in that cat, especially if you are using regularization, and use the same seed that was used in the seed field on parameters section ... when i use same seed i get same result on Kohya and A1111 for the same prompt.
@@AI-HowTo i entered two sample captions, one with trigger word one without. Both samples did not resemble at all. I could see them following global tokens but not the resemblance to the character. Again, webui + lora gives very good resemblance. So not sure the trigger word works in the sampling. I started a thread for this on the BMaltais git discussions but haven't heard back yet
Great tutorial!! How many reg imgs per training imgs?
thanks, Koyha only uses (training images * repeats ) and ignores the rest ... so for example, if you have 100 images, and you are doing only 10 repeats, Koyha only needs 1000 class images, if class forlder has more images, it will ignore them, if it has less, it will use class images twice.... regularization doubles the steps ... sometimes, LoRA or LyCORIS produces better results without Regularization, this is why experimentation and comparision is necessary.
Wait so did having more body shots in the regularization set help with body results?
I am not confident that adding more body shots on the regularization set was the reason results improved, it could be a stochastic thing, but i suggest to add them anyway to resemble the training subject in terms of variety.... my results on regularization were not conclusive, especially that in some cases, it seems results without regularization were actually better.
Did they change the entire environment, my webUI kohya_ss is completly different I can't follow any tutorial.
A1111 and Kohya ss GUI changes slightly with every release, I suggest to do a fresh install to check most recent version which is v21.8.2 at the time being
@@AI-HowTo The problem is that they completely changed the interface in the last version and all the tutorials are outdated for me.
@@AI-HowTo The problem is that they have completely changed the interface, and all the tutorials have become obsolete for me, Many options and tabs have been removed and there are new things that I have no idea what they are for.
excellent and very informative approach. Already hit subscribe and like !
Glad to hear, hopefully the content contains some useful information.
So is LORA better than LYCORIS? 😊
based on my tests, i found both ok, but LYCORIS often produces slightly better results... in my case, i would always go for LYCORIS actually for character or style training.
Thank you so much! So informative and precise. This video and the previous one really helped me to train my first person LoRA and get good results. I want to try training a LyCORIS now! I have just two questions if you or someone else can help... 1) Is it normal that the LoRA don't work that well with the model I used as a base for training (but work quite well with other checkpoints)? 2) how to chose a good (realistic) checkpoint to use as a base for the training? I have used SD1.5 and RealisticVision2.0 - do you have any suggestion as to which other checkpoints could be good to use as base for training? Thank you and please keep up the great job - it really helps a lot of us!
Thank you, if it didnt work well on your checkpoint and well on others, that could be by accident, since all are based on 1.5 eventually, so this is possible ...realistic vision v4 is now available, but i didnt get very aethetic results from it ...i tests analog madness, i saw it giving good realistic results and pretty as well for non asian models ... photon also has some good results too, I think soon, all these will be ditched in favor of SDXL ... for perfect asian like models i saw Majicmix to be one of the best out there... it can be used to train non asians too, but wont look very realistic
@@AI-HowTo Thank you so much for your reply! I will try analog madness next! Please keep up the great job! Also... do you usually caption expression: like smile/serious/sad/etc?
yes you should, because some images are smiling, others are not ... only remove that which repeats in all captions... and that which you want to be part of your LoRA. it is better to have different facial expressions in your training data, and most importantly, different face poses ... in the Olivia examples, the dataset actually lacks different face angels, otherwise, the results would have be alot better than this ... I only used this model because i didnt want to spend much time looking up images, her data was just like out there and ready to use ... i tried other models, and they gave more realistic results than this model and better resemblance, because they had more facial angels/features.
@@AI-HowTo Great thank you so much - I will try to improve my tagging before re-training. I also wanted to tell you that like you, I got mixed results with Reg images, in all my training I am never sure if they improve things or not so for the time being I'll try without. Thank you!
@16:58 this is likely result of a typo in the Prompt S/R script
I dont think so, I think because SD produces so stochastic results, and full body shots are not good enough as it didnt learn them well, due to having lower number of images in the data and not resorting to other methods such as a detailer or others... more experimentation is required.
What version of Kohya is this? Mine looks a lot different!
i think v21.7, there is a newer version 21.8 now but i didnt install it yet, i dont think much changes, but will check it later
@AI-HowTo can you please provide yours or a way to download it? I’m not able to use xformers on the one o have
your GPU might not be supporting it, anyway, xformers only speeds up the training, but it doesnt make the results better, actually, it degrades the quality of the output slightly
unfortunately, i cannot upload the Kohya myself, it is huge more than 19GB... just remove your old installation and start a new installation .... sometimes the update doesnt work ... so a fresh installation from the Kohya page is better, it will automatically install pytorch and xformers newest versions if your card supports it, just follow the instructions or check my previous video about LoRA ua-cam.com/video/clRYEpKQygc/v-deo.html
Hey man nice video I'll give it a try because I´m losing my mind trying to train a face but is a total failure haha
By the way, would you mind sharing the training data set of the girl used as example?
check ua-cam.com/video/clRYEpKQygc/v-deo.html i explained how we got the data , I cant upload the data because the model's images dont belong to me, i only used them for this video, but cannot use it in real life without the owner's permission ...but i suggest you try different data, because the problem with this data, while it is pretty, it doesnt have enough face variations, which is not good, because final results would be less flexible, we should have the face from different angels too.
your from France?
No sorry.
A trick:
use AWS Rekognition for facial comparison with your generated image against a real image
Thanks for the tip
Can non-coder do that please? I go to their site and it looks really complicated. Is there any website for facial comparison online?
Do you have an email I can reach you at?
sorry, not at the time being, not accepting business or personal inquiries outside this channel.
@@AI-HowTo No worries, still a way to keep in touch with you?
I used regulation, but the results are not similar
it is entirely experimental, different data/settings could produce different results, we should choose what works for our data and keep experimenting
Listen, we generate on LORa with regularization, scale with LORa without regularization.
you mean scaling in the image to image with another LoRA without regularization? recently I created better LoRAs where Regularization have shown conclusively that it is better than without in different datasets, unlike those results in this video for this data as those here ua-cam.com/video/vA2v2IugK6w/v-deo.html and others that i didnt pubnlish, but I often just use after detailer to fix the generatated faces at higher resolutions, found it better than hires or other solutions and more practical.
huggingface.co/datasets/AIHowto/460RegImages1024x1024/resolve/main/reg_images_1024x1024jpg.zip
I removed some real images which might been copy righted, these images are from freepik, SD generated, so some of them are not so great such as for full body with some bad hands in few pictures, it is better if class images no no deformation though, but the dataset may have at least 350 good images included
@@AI-HowTo Thank you
@@AI-HowTo are there guys? real photo
Olivia Casta is a female model, that use a filter that make her has that face.
true, didnt know that when i first created the LoRA, still, a valid set of images for testing.
Train one LoRA for body, train another LoRA for the face, merge the LoRA's.
thanks for the tip, merging gave acceptable results in some cases, but the best results for realistic people seems coming from using After detailer with a LoRA (LyCORIS) trained on face/upper body shots alone .... I think if the GPU was good, training on 1024x1024 images would have given perfect results even with body shots and smaller number of images... with lower res images, using After detailer for automatic inpaint, seems like the best option I found for best details and resemblance.
so, LoRA or LyCORIS is better?
I saw LyCORIS giving slightly better results in general... ideally one must develop different models using same data set and see which one is better for his data.