The speed of change is breathtaking. By the time you have written (or demonstartaed) something it is out of date. It makes my brain melt trying to understand let alone keep up. I am absolutely loving what's happening in the AI image development arena. There is almost certainly a moral maze in sampling work without permission, with passing AI created work as your own and endless others, but right now I am seeing a torrent of images that are simply amazing. Pushing a button as never been so addictive.
@@OlivioSarikas twitter will never accept that though, at least for now. when it'll finally dawn on them is when this crosses ober to 3d and ppl start quickly making scenes in unreal engine. By that time, screaming ai isnt art would be like farting in the wind.
twitter will fall in line sooner or later, and you guys better start learning cause in 10 years everyone will do AI, it's not about the artist or passion but the business of getting these for marketing companies and a small businesses. now imagine starting a business with almost nothing and get close to excellent arts for Ads. it's huge for the marketing industry, and as a marker we can tell you where the trend is going for arts. peace guys 💚
Here are some tips for the new versions: - you can now specify more than one value for the Learning rate, in such a way that you can slow down the learning at the specified steps, so the lower the value, the slower the learning, refining the final result. Enter 0.05:200, 0.005:1000, 0.0005:10000, 0.0001:10000, 0.00005:20000 separated by a hyphen - keywords are now stored in a separate TXT instead of filenames - it is also possible to request a sample based on individual text using the parameters and texts of the txt2img tab - the sample image will be displayed when the "Show image creation progress every N sampling steps. Set 0 to disable." is set from zero upwards (e.g. to 5) - I'm still experimenting with the "Unload VAE and CLIP from VRAM when training" option in Settings. When it's on, it seems to learn faster, but when it's off it somehow fits the environment better during generation. This needs to be set before training, but it's not clear yet when it's better. - You can now train Hypernetwork, which is specifically good for style blending, which is similar to Textual Inversion, but you have to train it with a lower learning rate, and it can be blended to any image Suddenly that's it, and thanks a lot for your video, because I'm still experimenting with vectors!
I somehow managed to do my first hypernetwork training perfect but after two hours had things to do and stopped it with the idea to carry on later. Later that day when I wanted to carry on the training I couldnt for the life of me remember every setting value and kept derailing the training my previous last checkpoint. Giving up after hours I decided to just start over now I can barely get any subject trained with hypernetworks, it barely makes it past 300 to 500 steps without becoming corrupted and at those steps before the corruption doesn't seem like it did a very good job of studying the subject. So it's very strange the training becomes corrupted before it ever even proper learns the subject in the dataset. Not sure for the life of me what I'm doing wrong, the CLIP, VAE stuff certainly plays a huge a part in that. Numerous testing with those settings on and off produces massive different results, neither the majority leaning to one way or the other if it's good or not. I've heard from some people playing with hypernetwork training it's better to just switch CLIP,VAE off. But I could remember on my first training that is the only time it went well, that setting I didnt know to switch off and so it was present during that first very good training run. Damn odd... But hypernetworks certainly seems to be the solution of training for lower VRam which is a big step in the right direction. This feature being very new we should see some drastic changes in the coming weeks to itself or it's integration into SD. Reading about what hypernetworks are, there seems to be a lot of potential here when refined to upgrade various parts of SD.
@@kernsanders3973 I had mine stop working all together but was playing with all sorts of things, Restarted PC and it works fine now. No clue what I had done. Also if your watching 4k videos while its running will cause issues. Possibly throw another card in to do other things while its going or sacrifice a chicken to the Ai lord before you start.
@@oldman5564 That was two months ago, hypernetwork training much more fluid and accurate now I feel. In fact some of the best quick training out there. So far been successful with training subjects and styles. As well the hypernetworks have been fine to use with other models. So sometimes I merge models and switch between different hypernetworks depending what Im trying to do
I think the textual inversion embeddings were moved to the "Train" tab in WebUI I don't have a "Textural Inversion" tab, but I have "Train" tab, where "Create Embedding" is a subtab
THANK YOU. Seriously I'd been digging through actually how to do this for days, but no one had provided a clear and concise tutorial on the information, most ran off broken scripts, and the web varient had no clean documentation of actually what the heck you'd see. This has saved me oodles of time and I greatly appreciate it.
I installed a new version yesterday, came around to this tonight, and realized that it has evolved. Hypernetworks must be next. But I can't keep up with this speed of change. Well, that's just the way it is. Thanx a lot for your tutorials and very instructive videos, which I have had good use for. Now on to trying to make a few bucks, so I can afford the bills. :)
It looks like the Automatic 1111 GUI has been updated, so that this is now in several Tabs, but the way to do this is pretty much the same as in this video :) Links from the Video: Install Automatic 1111: ua-cam.com/video/Pyze0seDHzA/v-deo.html Textual Inversion wiki: github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Textual-Inversion Tokenizer: beta.openai.com/tokenizer Stable Diffusion Conceptualizer: huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer
Just one thing, if you are training a subject and therefore using the "subject_filewords" template then the prompt should be "a photo of my japbun" instead of " by japbunt" (this is for style template)
Yeah, this seems like it's totally backward. He's clearly using a subject, keeps showing subject_filewords.txt, but then uses style_filewords, and uses it as a style. I guess if that's what he wanted, then cool. [Maybe I should have watched the whole video, at the end it's clearly being used as a style with different animal prompts. So now it all makes sense.] A little out of date even two weeks later since the tab is now called "Train" and has new sub-tabs including Hypernetwork. Regardless, I was able to start training knowing zero when I started. Thanks, Olivio!
Hi Olivio, can you correct my understanding if it is wrong - when we train the model using textual inversion, is it understanding fully what this object is and to then be able to use it in different contexts? E.G. if you said you wanted your character in a sitting pose instead, facing out at an epic landscape. Is that something that is possible to do given enough unique training data inputs? Or are you basically training in the exact character and pose position, and you have free reign to change small parameters about the composition of the image. Personally both using the textual inversion method and the trained .ctpk model generation method, I have not had significant luck reproducing my detailed character in other positions / environments. My character is my profile image - a humanoid light being with a neurological light pattern running across his entire body. So I now have one .ctpk for front poses, and one for back... But issue is any complexity in my prompts and my character starts becoming extremely distorted / downright removed from the picture altogther. Haven't found any good way to work the prompts to maintain character integrity while also having them do something unique / be in a unique environment
Awww hay love those japbun images they really made me laugh 😀 and the exquisite detail and quality is something to behold >> it would be good to see a final gallery of your favorites 👌💯👀🖼😎🌟
Hi there, for some reason in the most recent version of A1111, theres no option to do textural training under the training tab. Theres options to create an embedding, create a hypernetwor, and pre process the dataset, but no option to train. Am I doing something wrong? Any insight would be much appreciated!
Hi there, just one question, if I train it for style, can I use more different images in the same style? It would be great to rain it on my own art. Thank you!
Hi Olivio, I have downloaded the latest Automatic1111 and unzipped it and copied and overwrite it to my existing Automatic1111 Stable diffusion folder. but no luck, I don't see the Textual Inversion Tab? Please can you help me and explain why this is happening? I have seen below in the comments that there are some people who also doesn't have this extra tab...Thank you! My bad!! ...They have changed it already, the TABS are now as following: || txt2img | img2img | Extras | PNG Info | Image Browser | Checkpoint Merger | Train | Settings || As you can see there are 8 Menu-Items yet, and the Textual Inversion has changed into Train. They have split the "New" Tab "Train" into 4 sections. I was a little confused at first, because I wanted to follow everything step by step, but then I'd realised that there was a difference. Hope this will help. There is also a new Section under Train, the second Tab called "Create Hypernetwork" I think that there will be a video soon from Olivio with the explanation of this! :D
Hi! First of all thanks for making this video. Possible to update it on how to train TI for concepts like lighting, photography styles, hair styles etc? And I have a 3060 12GB VRAM card but most of the time it says it's out of memory to perform training
Thank you for the detailed video! If some of my images in the "Input" folder is causing unwanted results, how do I remove it? Is it as simply as I deleting it from both Input and Output folder?
That's a great video, thank you! But if I want to train SD not for a style but for an object or face, I guess it's exactly the same process, but then later the name of the model is the name of the object I write in the prompt, regardless of style, right?
In this video the approach is to put the same identifier on "initialization token" as the identifier you are naming your inversion as. I've not seen that approach before but I've heard differ thoughts about what you're supposed to do there. Does anyone have any solid knowledge on this?
Followed all step, after clicking Train Embedding it processes for about 5 seconds then says "Preprocessing finished" with no image and nothing in the inversion folders
Hey Olivio, thank you very much for the videos! I did exactly as you teach, but something doesn't work for me. I don´t have Textual Inversion. These are the options I have in the menu: txt2img img2img Extras PNG Info History Checkpoint Merger Train Settings I even tried to make it on "Train", but in the end, I have these options: Interrupt Train Hypernetwork Train Embedding The images are not generated. Could you tell me what to do?
I tried to download those bin files from the concept library. I renamed them and put them in embeddings. I tried them in prompts but can't get them to work. I'm trying to figure out what I'm missing. I used , by xyz also , by not working.
Setting up git and python, pulling the latest repository and adding git pull to the batch file will keep you always at the latest version too. No more downloading zip files.
I'm trying to train in invokeAI but am getting the error "trigger term must only contain alphanumeric characters". But there's no place to enter characters, so far it is just entering checkboxes in the command line interface.
@@perschistence2651 Have a look at the comment by Mike Menders. He wrote "you can now specify more than one value for the Learning rate, in such a way that you can slow down the learning at the specified steps, so the lower the value, the slower the learning, refining the final result. Enter 0.05:200, 0.005:1000, 0.0005:10000, 0.0001:10000, 0.00005:20000 separated by a hyphen"
Very interesting. But I got the latest version from master branch, but the textual inversion doesn't appear to me. Maybe I need to do some configuration, or they moved this update to another branch, or whatever, I don't know.
One question sir, if we use blip did we need to rename each data afterwards using the tag we got in the text file? Or it will automatically be read by the stable diffusion when the said data used in training?
just have one question when I train my images will they be locally installed or will they be available for the public. for example, if I train it with the images I shot of friends and family will it remain private or will it be in the public domain?
Great explanation! I wonder if automatic1111 could also run with openvino. I don't care the training time, I just want to test before I buy a massive GPU.
"Train" tab is the latest version, "Textual Inversion" is the outdated one. 2:47- 4:42 is in "Create Embedding" subtab 4:42 - 7:25 is in "Preprocess Images" subtab 7:25 upwards is in "Train" tab. Ignore "Create Hypernetwork" and "Hypernetwork" selects as its a complete different thing (textual inversion kinda adds new objects or styles. Hypernetwork modifies entire creation model) if you run your Stable Diffusion with --medvram or --lowvram paramaters, entire thing problably wont work. Also anything can change at the time I'm writing this, as that repository tends to add something everyday.
basically what it does is, that it trains Stable Diffusion to learn a certrain style or object or person or animal you want it to be able to do. and then you can use the prompt to create all kinds of images with that
Can you share embeddings? If you create one of a character and save it on your PC can that same embedding file be transferred to another? If so, where do I find them?
He mentions early in the video that if you don't have an "embeddings" folder in your SD folder, then you have an outdated version. That's the folder where the embeddings are saved in .pt format.
Basically, your checkpoint is your main model, you need one to generate anything. Embeddings (trained through textual inversion) use up tokens in your prompt to change the style of your output. Loras and Hypernetworks do not take any tokens, but (in different ways which I don't really understand or care about) affect the model itself during generation to change the style of the output. Embeddings don't increase the generation time of an image, nor use more memory (afaik), but Loras and Hypernetworks do both. Loras are awesome because you can use them to mix many different styles (with varying levels of strength) to get very unique outputs, all without taking up tokens in your prompt. Thats the main thing you gotta know unless you really want to get into the technical side of things but that isn't necessary for just working with the tools.
I don't understand what I'm doing wrong, I followed the tutorial exactly but when training I'm noticing the previews look nothing like the images I'm using to train. It seems like its just taking the blip caption and outputting a random picture with that prompt, none of the pictures look like the person in the photos. Could someone help please?
Did you check "Use cross attention optimizations while training" by any chance ? If so, disable it. It uses more VRAM (almost 10Gb) if disabled but if enabled it simply doesn't work.
why my model getting worse after 20000 training steps? i have trained a figure with 12 Fotos, the Model getting better and better, and than after 20000 it trains false?
I watched the start and damn it you have so many things to learn - middle-click or ctrl + click to open link in a new tab with one click - shift + right-click in a folder to see "open console" option ( if you dont have git bash ) - "git pull" in an openned console, in your stable diffusion webui folder, to update your stable diffusion webui current install ( if you installed it with git, like explained in installation documentation ) ( will not work anymore since you modified the files by updating manually ) - shift + right-clic on a folder or a file to see "copy as path"
so I'm trying to train an embedding rn. On a RTX 3060 12gb vram. Cuda always run out of memory. I've tried setting the max split size down, using the low/med vram option.. is there anything I can do?
Does doing this training generate .cptk files? How could I use my trained model on a different environment, say i wanted to run it on runpod after creating it locally?
Basically someone who wants a "personal" style could keep retraining it with their generated (and retouched) images to make something really unique and consistent.
This is great, thanks! One question: What model should I use? If you use different models for training with the same input, do I get different results?
I installed stable diffusion locally yesterday but it doesn't quite look like this so I'm a little confused. Is there an official SD that I managed to miss perhaps? I think I might have installed a slightly different ui that still works on SD but I think I would like to work with the original version that I keep seeing in all these UA-cam videos. Any input would be appreciated here.
The “textual inversion” tab has been updated to say “train”. They’ve added a new type of training called Hypernetworks, so there’s some extra stuff, and the layout changed 2 or 3 days ago, but if you can find the fields he’s talking about under “train” you should be good
Do you know where I can get information on CUDA out of memory? Does that mean I have a graphics card that doesn't have the horsepower yo do Textual Inversion?
I'm having this problem too - i'm using a 3070 and keep getting a CUDA memory error. But i see people using a 3060 etc - so i'm wondering what i'm doing wrong! Can anyone help?
If you don't see the new embedding that you created, please do: `The file may be malicious, so the program is not going to read it. You can skip this check with --disable-safe-unpickle commandline argument.`
Is anyone else having the problem that clicking Train Embbeding does not work ?, no error It starts initializing and then just stops and after half a second. ?. I have done all the steps just this last thing does nothing.
I had the same problem - turns out I hadn't preprocessed my images - which didn't work initially because I used a different technique to copy my folder locations that enclosed them in " marks, once I removed those preprocessing worked - once that worked my training started as expected
What's the advantage to using Midjourney as opposed to Automatic 1111? I know Midjourney is cloud based and Automatic 1111 is installed locally, thus using your own GPU. However, is Midjourney superior to Automatic 1111 other than that fact?
Midjourney is easy to use, and have a great model, no need for a good PC. Stable Diffusion is entirely customisable, and there’s no rules, no one is looking over your shoulders, everyone is collaborating and working together to create features, models, scripts whatever really, and sharing it with the community, it’s fantastic.
No. Auto1111, you get more control of what kind of image you get. Therefore it's more technical and a larger learning curve than MidJourney. Midjourney uses a learning algorithm that prioritizes certain styles and settings based off what other users like. With Auto1111, you need to give a lot more detailed prompts, and adjust settings. So for example, if you put just "Man" in prompt of Auto1111, it will shoot out a simplistic image of a man, with no styles. But if you do the same for Midjourney, the algo will choose a random style for you, preferably styles that are popular. Not one is superior to the other. I prefer Auto1111 because I don't like the handholding from Midjourney. Plus I think auto1111 is better because if you have the technical know-how, you can create something totally unique from the rest of Midjourney, like animated images with Deforum and entire music videos
Also you can mix unrelated pictures, Like when I was pushing buttons and pulling levers with no clue what I was doing. And make some crazy nightmare fuel. 🤣🤣
@@OlivioSarikas What I meant by it. Did the original artist uploaded their images to this site as training data for this thing? What I see here as a style uses images from Sakimi Chan
@@NotJustCreative There is no need for that. It's done automatically. Web Scrapping is legal by US law and done by many Internet Companies. How do you think Google search works?
@@OlivioSarikas Well, TI 4h DB 15m with better quality. They can be trimmed so everyone has jumped to them. Tech is moving so fast DB will be obsolete tomorrow, lol.
@@OlivioSarikas half precision 2gb as it creates a complete model, also if youre creating a model of a person dont bother using prior preservations. It means you dont need any "class images" ... you only need training (your pictures for training you etc) pictures. Saves even more time.
The speed of change is breathtaking. By the time you have written (or demonstartaed) something it is out of date. It makes my brain melt trying to understand let alone keep up. I am absolutely loving what's happening in the AI image development arena. There is almost certainly a moral maze in sampling work without permission, with passing AI created work as your own and endless others, but right now I am seeing a torrent of images that are simply amazing. Pushing a button as never been so addictive.
Absolutely. It's developing really fast and also getting very unique with the tools. So the times of "this isn't art" are already past us
@@OlivioSarikas twitter will never accept that though, at least for now. when it'll finally dawn on them is when this crosses ober to 3d and ppl start quickly making scenes in unreal engine. By that time, screaming ai isnt art would be like farting in the wind.
@@afrosymphony8207 twitter? do you mean artstation? i'm confused
@@OlivioSarikas wait artstation doesn't allow it too? WTF!!?
twitter will fall in line sooner or later, and you guys better start learning cause in 10 years everyone will do AI, it's not about the artist or passion but the business of getting these for marketing companies and a small businesses. now imagine starting a business with almost nothing and get close to excellent arts for Ads. it's huge for the marketing industry, and as a marker we can tell you where the trend is going for arts. peace guys 💚
Here are some tips for the new versions:
- you can now specify more than one value for the Learning rate, in such a way that you can slow down the learning at the specified steps, so the lower the value, the slower the learning, refining the final result. Enter
0.05:200, 0.005:1000, 0.0005:10000, 0.0001:10000, 0.00005:20000
separated by a hyphen
- keywords are now stored in a separate TXT instead of filenames
- it is also possible to request a sample based on individual text using the parameters and texts of the txt2img tab
- the sample image will be displayed when the "Show image creation progress every N sampling steps. Set 0 to disable." is set from zero upwards (e.g. to 5)
- I'm still experimenting with the "Unload VAE and CLIP from VRAM when training" option in Settings. When it's on, it seems to learn faster, but when it's off it somehow fits the environment better during generation. This needs to be set before training, but it's not clear yet when it's better.
- You can now train Hypernetwork, which is specifically good for style blending, which is similar to Textual Inversion, but you have to train it with a lower learning rate, and it can be blended to any image
Suddenly that's it, and thanks a lot for your video, because I'm still experimenting with vectors!
I somehow managed to do my first hypernetwork training perfect but after two hours had things to do and stopped it with the idea to carry on later. Later that day when I wanted to carry on the training I couldnt for the life of me remember every setting value and kept derailing the training my previous last checkpoint. Giving up after hours I decided to just start over now I can barely get any subject trained with hypernetworks, it barely makes it past 300 to 500 steps without becoming corrupted and at those steps before the corruption doesn't seem like it did a very good job of studying the subject. So it's very strange the training becomes corrupted before it ever even proper learns the subject in the dataset.
Not sure for the life of me what I'm doing wrong, the CLIP, VAE stuff certainly plays a huge a part in that. Numerous testing with those settings on and off produces massive different results, neither the majority leaning to one way or the other if it's good or not. I've heard from some people playing with hypernetwork training it's better to just switch CLIP,VAE off. But I could remember on my first training that is the only time it went well, that setting I didnt know to switch off and so it was present during that first very good training run. Damn odd...
But hypernetworks certainly seems to be the solution of training for lower VRam which is a big step in the right direction. This feature being very new we should see some drastic changes in the coming weeks to itself or it's integration into SD. Reading about what hypernetworks are, there seems to be a lot of potential here when refined to upgrade various parts of SD.
@@kernsanders3973 I had mine stop working all together but was playing with all sorts of things, Restarted PC and it works fine now. No clue what I had done. Also if your watching 4k videos while its running will cause issues. Possibly throw another card in to do other things while its going or sacrifice a chicken to the Ai lord before you start.
@@oldman5564 That was two months ago, hypernetwork training much more fluid and accurate now I feel. In fact some of the best quick training out there. So far been successful with training subjects and styles. As well the hypernetworks have been fine to use with other models. So sometimes I merge models and switch between different hypernetworks depending what Im trying to do
I think the textual inversion embeddings were moved to the "Train" tab in WebUI
I don't have a "Textural Inversion" tab, but I have "Train" tab, where "Create Embedding" is a subtab
Thank you!!!
needed this comment indeed
Hey, just wanted to thank you for creating these videos. You're pretty much the sole reason that I'm diving into this stuff. Absolutely love it.
I’m just getting into stable diffusion and all these videos are great 👍
Thank you :)
THANK YOU.
Seriously I'd been digging through actually how to do this for days, but no one had provided a clear and concise tutorial on the information, most ran off broken scripts, and the web varient had no clean documentation of actually what the heck you'd see. This has saved me oodles of time and I greatly appreciate it.
I installed a new version yesterday, came around to this tonight, and realized that it has evolved. Hypernetworks must be next. But I can't keep up with this speed of change. Well, that's just the way it is.
Thanx a lot for your tutorials and very instructive videos, which I have had good use for.
Now on to trying to make a few bucks, so I can afford the bills.
:)
Sarikas-San has reached diffused enlightement.
Hush, fellow students, and listen to his words of stability.
You're not from japan.
It looks like the Automatic 1111 GUI has been updated, so that this is now in several Tabs, but the way to do this is pretty much the same as in this video :)
Links from the Video:
Install Automatic 1111: ua-cam.com/video/Pyze0seDHzA/v-deo.html
Textual Inversion wiki: github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Textual-Inversion
Tokenizer: beta.openai.com/tokenizer
Stable Diffusion Conceptualizer: huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer
Thanks so much!
Thanks, my training was not working at all and with this video I found that we have to change style by subject file path.
Just one thing, if you are training a subject and therefore using the "subject_filewords" template then the prompt should be "a photo of my japbun" instead of " by japbunt" (this is for style template)
Yeah, this seems like it's totally backward. He's clearly using a subject, keeps showing subject_filewords.txt, but then uses style_filewords, and uses it as a style. I guess if that's what he wanted, then cool. [Maybe I should have watched the whole video, at the end it's clearly being used as a style with different animal prompts. So now it all makes sense.]
A little out of date even two weeks later since the tab is now called "Train" and has new sub-tabs including Hypernetwork. Regardless, I was able to start training knowing zero when I started. Thanks, Olivio!
This is what I've been waiting for, I was using cmdr2 UI but this was worth installing. My first style train ETA 1h20m using a 3090. Very good video!
what works better an embedding or training a model for a certain style?
Hi Olivio, can you correct my understanding if it is wrong - when we train the model using textual inversion, is it understanding fully what this object is and to then be able to use it in different contexts? E.G. if you said you wanted your character in a sitting pose instead, facing out at an epic landscape. Is that something that is possible to do given enough unique training data inputs? Or are you basically training in the exact character and pose position, and you have free reign to change small parameters about the composition of the image.
Personally both using the textual inversion method and the trained .ctpk model generation method, I have not had significant luck reproducing my detailed character in other positions / environments. My character is my profile image - a humanoid light being with a neurological light pattern running across his entire body. So I now have one .ctpk for front poses, and one for back... But issue is any complexity in my prompts and my character starts becoming extremely distorted / downright removed from the picture altogther. Haven't found any good way to work the prompts to maintain character integrity while also having them do something unique / be in a unique environment
How do you train a textual inversion for SD 1.5 vs. 2.1? Do you need to select the model? Not sure this was covered in the video. I couldn't find it.
Im having some trouble with mine. every time I go to train embedding it says finished at none steps. im not sure what I am doing wrong
Awww hay love those japbun images they really made me laugh 😀 and the exquisite detail and quality is something to behold >> it would be good to see a final gallery of your favorites 👌💯👀🖼😎🌟
Hi there, for some reason in the most recent version of A1111, theres no option to do textural training under the training tab. Theres options to create an embedding, create a hypernetwor, and pre process the dataset, but no option to train. Am I doing something wrong? Any insight would be much appreciated!
Hi there, just one question, if I train it for style, can I use more different images in the same style? It would be great to rain it on my own art. Thank you!
Hi Olivio, I have downloaded the latest Automatic1111 and unzipped it and copied and overwrite it to my existing Automatic1111 Stable diffusion folder. but no luck, I don't see the Textual Inversion Tab? Please can you help me and explain why this is happening? I have seen below in the comments that there are some people who also doesn't have this extra tab...Thank you!
My bad!! ...They have changed it already, the TABS are now as following: || txt2img | img2img | Extras | PNG Info | Image Browser | Checkpoint Merger | Train | Settings || As you can see there are 8 Menu-Items yet, and the Textual Inversion has changed into Train. They have split the "New" Tab "Train" into 4 sections. I was a little confused at first, because I wanted to follow everything step by step, but then I'd realised that there was a difference. Hope this will help. There is also a new Section under Train, the second Tab called "Create Hypernetwork" I think that there will be a video soon from Olivio with the explanation of this! :D
Oh another question: now you can train with "Embedding" or "Hypernetwork" mode: which one would you recommend, and why?
Hi! First of all thanks for making this video. Possible to update it on how to train TI for concepts like lighting, photography styles, hair styles etc? And I have a 3060 12GB VRAM card but most of the time it says it's out of memory to perform training
Thank you for the detailed video! If some of my images in the "Input" folder is causing unwanted results, how do I remove it? Is it as simply as I deleting it from both Input and Output folder?
That's a great video, thank you! But if I want to train SD not for a style but for an object or face, I guess it's exactly the same process, but then later the name of the model is the name of the object I write in the prompt, regardless of style, right?
In this video the approach is to put the same identifier on "initialization token" as the identifier you are naming your inversion as.
I've not seen that approach before but I've heard differ thoughts about what you're supposed to do there.
Does anyone have any solid knowledge on this?
Is this also how you could train a model for your own face for prompts?
In Stable diffusion conceptualizer i don’t see for which model work? How do I know? Thanks
Followed all step, after clicking Train Embedding it processes for about 5 seconds then says "Preprocessing finished" with no image and nothing in the inversion folders
Thank you i was really confused with this
Hey Olivio, thank you very much for the videos! I did exactly as you teach, but something doesn't work for me. I don´t have Textual Inversion. These are the options I have in the menu:
txt2img
img2img
Extras
PNG Info
History
Checkpoint Merger
Train
Settings
I even tried to make it on "Train", but in the end, I have these options:
Interrupt
Train Hypernetwork
Train Embedding
The images are not generated. Could you tell me what to do?
go to training and use "create embeddings" for the textual inversion. the rest of the steps are the same, just the interface looks a little different
I tried to download those bin files from the concept library. I renamed them and put them in embeddings. I tried them in prompts but can't get them to work. I'm trying to figure out what I'm missing. I used , by xyz also , by not working.
Setting up git and python, pulling the latest repository and adding git pull to the batch file will keep you always at the latest version too. No more downloading zip files.
I'm trying to train in invokeAI but am getting the error "trigger term must only contain alphanumeric characters". But there's no place to enter characters, so far it is just entering checkboxes in the command line interface.
Also, I think, it can make sense to lower the training speed value over time. Otherwise, you quickly reach a point of diminishing return.
Interesting. I just saw that they made that possible in the updated version
@@OlivioSarikas Really, did they? I always did that manually^^
@@perschistence2651 Have a look at the comment by Mike Menders. He wrote "you can now specify more than one value for the Learning rate, in such a way that you can slow down the learning at the specified steps, so the lower the value, the slower the learning, refining the final result. Enter
0.05:200, 0.005:1000, 0.0005:10000, 0.0001:10000, 0.00005:20000
separated by a hyphen"
@@OlivioSarikas Wow, nice!
Very interesting. But I got the latest version from master branch, but the textual inversion doesn't appear to me. Maybe I need to do some configuration, or they moved this update to another branch, or whatever, I don't know.
One question sir, if we use blip did we need to rename each data afterwards using the tag we got in the text file? Or it will automatically be read by the stable diffusion when the said data used in training?
hm, doesn't work for me... it does preparing the dataset and then says "Training finished at 0 steps."
Do you train or have a class for new student?
I just wanna know WHAT folder to put my "Textual Inversion" files in, im using Easy Diffusion. Thanks
Awesome! Thank you very much!
just have one question when I train my images will they be locally installed or will they be available for the public. for example, if I train it with the images I shot of friends and family will it remain private or will it be in the public domain?
Just on your machine, though you can send the embed fiiles to people.
@@KadayiPolokov thanks
Лайк не глядя, как всегда.
So I trained a model, but I can't get it to show up at all in the text to image :/
I do have the latest stable diffusion, I just installed i today, it is up to date, I dont see any Textual inversion tab,
You can add "git pull to" the webui-user file and the Automatic 1111 will update automatically when it is run
That's a great idea, assuming you have git installed. It's just `git pull`, though. Not `git pull to`.
excelent video , work for faces?
8:28 if u wanted to train for a subject, then why'd u choose style_filename?
Great explanation! I wonder if automatic1111 could also run with openvino. I don't care the training time, I just want to test before I buy a massive GPU.
I updated my webui like you did in the video but I still don't have the "textual inversion" tab, its still a "train" tab.
Same here. A tad confusing, any help?
"Train" tab is the latest version, "Textual Inversion" is the outdated one.
2:47- 4:42 is in "Create Embedding" subtab
4:42 - 7:25 is in "Preprocess Images" subtab
7:25 upwards is in "Train" tab. Ignore "Create Hypernetwork" and "Hypernetwork" selects as its a complete different thing (textual inversion kinda adds new objects or styles. Hypernetwork modifies entire creation model)
if you run your Stable Diffusion with --medvram or --lowvram paramaters, entire thing problably wont work.
Also anything can change at the time I'm writing this, as that repository tends to add something everyday.
Even after your explanation I still don't know what textual inversion is Q.Q is it just training the AI? or is there more to it?
basically what it does is, that it trains Stable Diffusion to learn a certrain style or object or person or animal you want it to be able to do. and then you can use the prompt to create all kinds of images with that
Ilya Kuvshinov is the artist referenced at 1:40, please consider purchasing their amazing art books, MOMENTARY and ETERNAL
Hi. What's your opinion on Artroom AI?
Can you share embeddings? If you create one of a character and save it on your PC can that same embedding file be transferred to another? If so, where do I find them?
He mentions early in the video that if you don't have an "embeddings" folder in your SD folder, then you have an outdated version. That's the folder where the embeddings are saved in .pt format.
I still don't have the tab after updating
This won't work with a 4GB NVIDIA card, correct? I have Automatic1111 running on my card, but this feature won't function, right?
I have 6GB.. it works?
What's the Diff? Between Textual Inversion, Lora, and Checkpoint?
Basically, your checkpoint is your main model, you need one to generate anything. Embeddings (trained through textual inversion) use up tokens in your prompt to change the style of your output. Loras and Hypernetworks do not take any tokens, but (in different ways which I don't really understand or care about) affect the model itself during generation to change the style of the output. Embeddings don't increase the generation time of an image, nor use more memory (afaik), but Loras and Hypernetworks do both.
Loras are awesome because you can use them to mix many different styles (with varying levels of strength) to get very unique outputs, all without taking up tokens in your prompt. Thats the main thing you gotta know unless you really want to get into the technical side of things but that isn't necessary for just working with the tools.
Really helpful, thanks ! 🙂
instead of textual inversion i have the train tab.... can you help me
how use flags --no-half, --precision full?
is it possible to do this for anime hands , or even feet?
I don't understand what I'm doing wrong, I followed the tutorial exactly but when training I'm noticing the previews look nothing like the images I'm using to train. It seems like its just taking the blip caption and outputting a random picture with that prompt, none of the pictures look like the person in the photos. Could someone help please?
Did you check "Use cross attention optimizations while training" by any chance ? If so, disable it. It uses more VRAM (almost 10Gb) if disabled but if enabled it simply doesn't work.
@@jonathaningram8157 I did have it enabled, i turned it off but the results are still about the same
Excellent
8:28 wait wait wait so you use subject or style as a template for faces?
The video is unfortunately out of date. The new interface is now. Can you update it?)
why my model getting worse after 20000 training steps? i have trained a figure with 12 Fotos, the Model getting better and better, and than after 20000 it trains false?
How can I make an x/y plot testing different embeddedings?
Can I install SD on my iMac?
I watched the start and damn it you have so many things to learn
- middle-click or ctrl + click to open link in a new tab with one click
- shift + right-click in a folder to see "open console" option ( if you dont have git bash )
- "git pull" in an openned console, in your stable diffusion webui folder, to update your stable diffusion webui current install ( if you installed it with git, like explained in installation documentation ) ( will not work anymore since you modified the files by updating manually )
- shift + right-clic on a folder or a file to see "copy as path"
Great . Thanks a lot
so I'm trying to train an embedding rn. On a RTX 3060 12gb vram. Cuda always run out of memory. I've tried setting the max split size down, using the low/med vram option.. is there anything I can do?
okay, I've installed automatic 1111 in windows and now it works. Linux somehow not..
Will this work for to training people and faces
Are animations possible yet?
20000 steps is wild lol....I can't imagine ever doing 40000, I'd be there until Christmas 2025
Hi great tutorial thank you! I have a question - could you take a style generated in this way and use it in an img2img prompt?
yes, that works too :) I tried it with faces
It should work fine
Does doing this training generate .cptk files? How could I use my trained model on a different environment, say i wanted to run it on runpod after creating it locally?
no, it creates a .pt file. I think you can use it with other stable diffusion installs if they support textual inversion
Pls make. Updated video for the new verison
Basically someone who wants a "personal" style could keep retraining it with their generated (and retouched) images to make something really unique and consistent.
Yes, i was thinking that too. Over time you could get something that is totally yours and nobody else could copy it
thats a cool idea
This is great, thanks! One question: What model should I use? If you use different models for training with the same input, do I get
different results?
I installed stable diffusion locally yesterday but it doesn't quite look like this so I'm a little confused. Is there an official SD that I managed to miss perhaps? I think I might have installed a slightly different ui that still works on SD but I think I would like to work with the original version that I keep seeing in all these UA-cam videos. Any input would be appreciated here.
The “textual inversion” tab has been updated to say “train”. They’ve added a new type of training called Hypernetworks, so there’s some extra stuff, and the layout changed 2 or 3 days ago, but if you can find the fields he’s talking about under “train” you should be good
Mine is dark mode
How do you think training embeddings compare with hypernetworks?
I haven't tried hypernetworks yet, so i really can't tell
What if train sd textual inversion on 512x768 images?
Do you know where I can get information on CUDA out of memory? Does that mean I have a graphics card that doesn't have the horsepower yo do Textual Inversion?
I'm having this problem too - i'm using a 3070 and keep getting a CUDA memory error. But i see people using a 3060 etc - so i'm wondering what i'm doing wrong! Can anyone help?
@@mikelatter9484 I'm using 3070 as well and faced the same issue. Did you manage to resolve it?
i've already updated everything, but still don't have that textual inversion tab
i think he's using an old build. that tab was replaced with "Train" a week or two back and includes hypernetwork now
I may be missing something, but what is leica?
leica is a camera brand. I put it in to have a word that wouldn't change the output of SD too much
This guy is like if Notch had gotten the good ending.
bruuuuh
If you don't see the new embedding that you created, please do:
`The file may be malicious, so the program is not going to read it.
You can skip this check with --disable-safe-unpickle commandline argument.`
Is anyone else having the problem that clicking Train Embbeding does not work ?, no error It starts initializing and then just stops and after half a second. ?.
I have done all the steps just this last thing does nothing.
I had the same problem - turns out I hadn't preprocessed my images - which didn't work initially because I used a different technique to copy my folder locations that enclosed them in " marks, once I removed those preprocessing worked - once that worked my training started as expected
What's the advantage to using Midjourney as opposed to Automatic 1111? I know Midjourney is cloud based and Automatic 1111 is installed locally, thus using your own GPU. However, is Midjourney superior to Automatic 1111 other than that fact?
Midjourney is easy to use, and have a great model, no need for a good PC.
Stable Diffusion is entirely customisable, and there’s no rules, no one is looking over your shoulders, everyone is collaborating and working together to create features, models, scripts whatever really, and sharing it with the community, it’s fantastic.
No. Auto1111, you get more control of what kind of image you get. Therefore it's more technical and a larger learning curve than MidJourney. Midjourney uses a learning algorithm that prioritizes certain styles and settings based off what other users like.
With Auto1111, you need to give a lot more detailed prompts, and adjust settings.
So for example, if you put just "Man" in prompt of Auto1111, it will shoot out a simplistic image of a man, with no styles.
But if you do the same for Midjourney, the algo will choose a random style for you, preferably styles that are popular.
Not one is superior to the other. I prefer Auto1111 because I don't like the handholding from Midjourney.
Plus I think auto1111 is better because if you have the technical know-how, you can create something totally unique from the rest of Midjourney, like animated images with Deforum and entire music videos
@@user-pc7ef5sb6x Thanks for the explanation! That makes a lot of sense.
@@Muzick here, this is the kind of stuff you can do with Auto1111. Text prompt to video is mind blowing ua-cam.com/video/R52hxnpNews/v-deo.html
is a 3060 up to the task of training?
I'm doing it right now on my 3060! 512x512px, 16 images, default speed of 0.0005, and it's running at about 7000 steps/hour.
@@ModestJoke Thanks! I gave it a go and it worked great. Yes it was about 7000 steps phr.
I think AI help is the solution for any artist that suffers artist-block days.
Thanks
You are welcome :)
Also you can mix unrelated pictures, Like when I was pushing buttons and pulling levers with no clue what I was doing. And make some crazy nightmare fuel. 🤣🤣
Did original artists uploaded their art there to be used for this?
I created all the images with Midjourney and Stable Diffusion
@@OlivioSarikas What I meant by it. Did the original artist uploaded their images to this site as training data for this thing? What I see here as a style uses images from Sakimi Chan
@@NotJustCreative There is no need for that. It's done automatically. Web Scrapping is legal by US law and done by many Internet Companies. How do you think Google search works?
Has anyone been able to successful train using Textual Inversion on an M1 Mac?
With Dreambooth it has killed off TI as well as hypernetworks due to its quality, and sheer speed.
I have to try Dreambooth next, but i read that it creates massive files
@@OlivioSarikas Well, TI 4h DB 15m with better quality. They can be trimmed so everyone has jumped to them. Tech is moving so fast DB will be obsolete tomorrow, lol.
@@OlivioSarikas half precision 2gb as it creates a complete model, also if youre creating a model of a person dont bother using prior preservations. It means you dont need any "class images" ... you only need training (your pictures for training you etc) pictures. Saves even more time.
Holy sh*t
Hanfu (han chinese) is traditional chinese clothing, trad. japanese clothing is called kimono.
You never explained what textual inversion is. Just how to install it, not how it works
0:45
nooo, it is not easy. it is easy to try it, but no easy to achieve it