Well it does look like you 😅 we don't see ourselves like the other do. The same happens when we hear a recording of our voice, it sounds nothing like us. But it does 😊 Also I found that aí does way better face swaps with females than males. Or is it just me... Have fun either way 😅 😊😊😊 thank you for all the content.
Women are biologically more neotenous, meaning closer to our childlike state and possibly for that reason deviate less from the average. Another possibiliy is htat women in general are more similar, how negative that may sound because in terms of evolution their main focus or jobn has often been that of raising children, whereas men though the times had much more different types of taks they would do, possibly leading to greater generic variation and directions they evolved towards. They say they smae thing about IQ. The dumbest people are typically men and the smartest are men, and women on average are more distrubuted in well..the average area. So a long story short..physicallly women may look alike more, which possibly leads to better results. However, everything I say is of course basically pure speculation based on logic and biology, but nothign set in stone. It just made me wonder!
Installation with pinoko was sucessful. How can I activate the Dark Mode? It's not as easy as you describe in the video. On my first attempt with the default images with the blue flower and the vases, I aborted the rendering after 45 minutes of waiting. The display that Omnigen was currently calculating rotated and the seconds display was running. Only when I switched to the terminal did I see that the reloading process of a model had stopped. I stopped and restarted Omnigen and opened the terminal first. A model that was not found was reloaded. So of course no calculations could take place. With a very small photo of my own, I then got a nice result with my own prompt after a few seconds. I dramatized the picture of an old, dilapidated barn a bit: The old barn is ablaze, flames are bursting out of the wall openings and thick, black smoke is rising into the blue sky. With a guidance scale value around 2 and lower, not much happened. At 4, flames burst out of the building and thick smoke rose. Very exciting! I would like you to be more practical in your videos and also describe the problems as I experienced them. I only found out where the saved images were stored using the “Everything” program by searching for images in webp format. Unfortunately, all images are saved in a new folder with the same name. If you want to copy or move all the images to a folder, this is very cumbersome and time-consuming in Explorer. It worked better with the Everything overview, in which I selected all the pictures, copied them and pasted them into a folder of my choice. During the action, you can instruct Windows to add numbers to the images with the same name. In the output window, I also discovered the gray progress indicator, which provides information about the status of the calculation. However, it is not a sign that the calculation of the image is finished when the gray area has reached the right edge of the output window. I have no idea what this means.
i've follow all step, but in all case that finish with "ENOENT: no such file or directory, stat" error and the sitepackage env folder is almost empty (trying to instal dependencies manualy doesn't work either)
I’d like to be able to use it in ComfyUI. But even after installing what I believe to be the correct nodes and refreshing, it’s saying the nodes are missing.
The model is extremely impressive when you consider what it's doing under the hood: this level of linguistic visual understanding -- understanding not only the image, but also how linguistic descriptions would visually change the image -- is bleeding edge, even if the resulting images aren't "perfect photorealistic commercial quality". The intelligence it has is more interesting than the results is produces, to me. Unfortunately, I only have 8GB of VRAM, so according to the OmniGen docs, I'd have to use every optimization (including CPU offloading) and it would take over 8 minutes per prompt to run. So...I've just been using the HuggingFace demo, which hits the resource request quota after only one or two generations per day 😅 Oh, to be rich and afford the best GPUs...
I feel the pain. Currently saving 2K+ $$ for just the 5090. That will probably cost more than ALL the other parts for the new PC >.< I've been using a 1070Ti this whole time for local install of Stable Diffusion heh
I tried this and saw some potential for face swapping and consistent characters. It worked okay-ish for one instance but as soon as I tried two, it only gave me out of memory errors with 12Gb VRAM at any resolution, even with offload model selected. I need to try it with lower res versions of the faces (they were likely too high) but if that fails it's no from me. PuLID is also better for consistent characters IMO.
Yeah, I (at only 8GB of VRAM myself) definitely can't wait for the community to optimize it further. That said, while there are other models that may be better for one task or another, the impressive bit is that this works (given enough VRAM) for *any* visual task all at once, by just describing the task. The GitHub repo even has everything in place, with documentation, for how to fine-tune it to add new types of tasks by example, so it can do pretty much *anything* you want... with enough resources.
It's not working on my computer. I let it work overnight and it was still grinding. gave up and tried something straight forward like add a girl holding a pumpkin. 17000 minutes and still nothing.
I've used many many different open source tools and models, and Pinokio is the one I've never been able to get it to work. For some reason it consistently fails to correctly set up the requirements regardless of the tool it is trying to install. The tools do work no problem when I install them independently outside of Pinokio by setting up a venv.
kind of. yes it can change it, but it will not look perfectly like the clothing you give as an example. For now this is more of a early access version with lots of room to improve
@@FuryisBACKHD @PowerRedBullTypology I'm on 12Gb VRAM and it manages to do a single image with prompt, from the first of the options that come with the Pinokio webui. Anything more gives me out of memory errors even with the offload model option selected. I haven't tried it in ComfyUI to see if it has better memory management there.
@@Nelwyn I get CPU bottleneck when playing games, wont that matter here? Processor Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz 3.40 GHz Installed RAM 16,0 GB Geforce GTX 1050 TI
No, because your pc is too old. You can't possibly have modern GPU or CPU, also VRAM and RAM matters. Geforce GTX 1050 is decent but generating even one image would take time specially with higher resolution, incl. loras, upscaling, controlnet etc..
Same here, installed tried using Pinokio and even tried their examples. My machine has no problems running the high end Flux model with ComfyUi or Forge
Video pony. Wait for it, prompt video to video using PH source, and before you know it…. another explosion in data center energy needs and yet more orders for A200’s.
for 1 512x512 i have like 9,90s per iteration while when i add a picture to it, it is already at 36,80s per iteration im running a 3070 ... anyone else such slow performance with omnigen?
To be honest, i'm not impressed. I also miss an inpaint function with a mask tool. My first choice for post-processing images is still fooocus. But thank you for the video.
I am not convinced about this type of model. It seems super cool to have a LLM as TE, but on the other hand all the features combined in one model seem to make it very heavy and not very friendly for specializing or fine tuning. It is a good proof of concept though.
It's definitely bulky, but in terms of fine-tuning, the GitHub repo already has the code and documentation in place for how to fine-tune it on any visual task you want. All it requires are examples of input-instruction-result sets, in the form of images and JSON lines that reference those images.
Technically impressive but still not useable. My experience is: the more you inpaint the worse are the results. If a model can't get it right first it's best to either do it manually or create a heap of images until you get something that is acceptable.
Yep, not perfect at all. This has ways to go. But the potential it show is pretty amazing non the less. Can't wait to see where is this in 6 months from now
@@BlazarAzul Production: movie or photography production. Commercial level software that can produce something worth charging for. And no. It’s not ready for production. But it’s pretty freaking cool for one of the first tools to ever accomplish something like this. We are living in the future.
I hope they will keep improving this model. I love the concept. I am also impressed with my results.
What it does is amazing, but the actual image generation side of it is like 2 generations behind current models :(
4:10 The 6 fingers curse strikes again
Love that shirt !
Can you tell me how to darken the theme of this page? I couldn't find the setting
Ist this model censored?
@@mr_pip_ asking the real questions
Well it does look like you 😅 we don't see ourselves like the other do. The same happens when we hear a recording of our voice, it sounds nothing like us. But it does 😊 Also I found that aí does way better face swaps with females than males. Or is it just me... Have fun either way 😅 😊😊😊 thank you for all the content.
Women are biologically more neotenous, meaning closer to our childlike state and possibly for that reason deviate less from the average. Another possibiliy is htat women in general are more similar, how negative that may sound because in terms of evolution their main focus or jobn has often been that of raising children, whereas men though the times had much more different types of taks they would do, possibly leading to greater generic variation and directions they evolved towards. They say they smae thing about IQ. The dumbest people are typically men and the smartest are men, and women on average are more distrubuted in well..the average area.
So a long story short..physicallly women may look alike more, which possibly leads to better results. However, everything I say is of course basically pure speculation based on logic and biology, but nothign set in stone. It just made me wonder!
Installation with pinoko was sucessful. How can I activate the Dark Mode? It's not as easy as you describe in the video. On my first attempt with the default images with the blue flower and the vases, I aborted the rendering after 45 minutes of waiting. The display that Omnigen was currently calculating rotated and the seconds display was running. Only when I switched to the terminal did I see that the reloading process of a model had stopped. I stopped and restarted Omnigen and opened the terminal first. A model that was not found was reloaded. So of course no calculations could take place. With a very small photo of my own, I then got a nice result with my own prompt after a few seconds. I dramatized the picture of an old, dilapidated barn a bit: The old barn is ablaze, flames are bursting out of the wall openings and thick, black smoke is rising into the blue sky. With a guidance scale value around 2 and lower, not much happened. At 4, flames burst out of the building and thick smoke rose. Very exciting! I would like you to be more practical in your videos and also describe the problems as I experienced them. I only found out where the saved images were stored using the “Everything” program by searching for images in webp format.
Unfortunately, all images are saved in a new folder with the same name. If you want to copy or move all the images to a folder, this is very cumbersome and time-consuming in Explorer. It worked better with the Everything overview, in which I selected all the pictures, copied them and pasted them into a folder of my choice. During the action, you can instruct Windows to add numbers to the images with the same name.
In the output window, I also discovered the gray progress indicator, which provides information about the status of the calculation.
However, it is not a sign that the calculation of the image is finished when the gray area has reached the right edge of the output window. I have no idea what this means.
Your tshirt here lol 😎
I tried a several comfy-ui nodes and they are very buggy. Cannot get one to work yet.
Cool. I thought it was just a normal text to image and nothing else.
Neat.
Incredible!
i've follow all step, but in all case that finish with "ENOENT: no such file or directory, stat" error and the sitepackage env folder is almost empty (trying to instal dependencies manualy doesn't work either)
I’d like to be able to use it in ComfyUI. But even after installing what I believe to be the correct nodes and refreshing, it’s saying the nodes are missing.
The model is extremely impressive when you consider what it's doing under the hood: this level of linguistic visual understanding -- understanding not only the image, but also how linguistic descriptions would visually change the image -- is bleeding edge, even if the resulting images aren't "perfect photorealistic commercial quality". The intelligence it has is more interesting than the results is produces, to me.
Unfortunately, I only have 8GB of VRAM, so according to the OmniGen docs, I'd have to use every optimization (including CPU offloading) and it would take over 8 minutes per prompt to run. So...I've just been using the HuggingFace demo, which hits the resource request quota after only one or two generations per day 😅
Oh, to be rich and afford the best GPUs...
I feel the pain. Currently saving 2K+ $$ for just the 5090. That will probably cost more than ALL the other parts for the new PC >.< I've been using a 1070Ti this whole time for local install of Stable Diffusion heh
Very cool ❤
Hey can you use a drawing and a photo to make a new image?
Where are the models or safetensoners stored in OmniGen it apears that when using cumfyui it downloads the each time, please advise
In OmniGen it is simulated in the model. There are no extra models you need. That's the advantage
I tried this and saw some potential for face swapping and consistent characters. It worked okay-ish for one instance but as soon as I tried two, it only gave me out of memory errors with 12Gb VRAM at any resolution, even with offload model selected.
I need to try it with lower res versions of the faces (they were likely too high) but if that fails it's no from me. PuLID is also better for consistent characters IMO.
Yeah, I (at only 8GB of VRAM myself) definitely can't wait for the community to optimize it further. That said, while there are other models that may be better for one task or another, the impressive bit is that this works (given enough VRAM) for *any* visual task all at once, by just describing the task. The GitHub repo even has everything in place, with documentation, for how to fine-tune it to add new types of tasks by example, so it can do pretty much *anything* you want... with enough resources.
is pinokio Run as if it's docker? so it won't change anything in my Linux settings?
Is it uncensored?
What would you like to do with it?
@centerdepenter adult fun stuff
@@centerdepenter NSFW
I'm trying to run it on a macbook with m2 and 32gb of ram and it doesn't work. Is there any configuration I could apply to make it work?
It's not working on my computer. I let it work overnight and it was still grinding. gave up and tried something straight forward like add a girl holding a pumpkin. 17000 minutes and still nothing.
Hello and thank you for this video. What are the recommended hardware resources?
Tried it with 8GB VRAM - no chance. Have to wait for a slimmer, optimized model.
@@MikevomMars would 32 gb work? I thought such things relied mostly on gpu?
What about forge?
all I get are errors
I've used many many different open source tools and models, and Pinokio is the one I've never been able to get it to work. For some reason it consistently fails to correctly set up the requirements regardless of the tool it is trying to install. The tools do work no problem when I install them independently outside of Pinokio by setting up a venv.
Can this be used to easily change existing clothing?
kind of. yes it can change it, but it will not look perfectly like the clothing you give as an example. For now this is more of a early access version with lots of room to improve
Unfortunately, it does not work with 8GB VRAM ...yet 😐
Yep, it is a great idea but only for high end GPUs.
how much memory do you need?
@@FuryisBACKHD I am curious about that oo
@@FuryisBACKHD @PowerRedBullTypology I'm on 12Gb VRAM and it manages to do a single image with prompt, from the first of the options that come with the Pinokio webui. Anything more gives me out of memory errors even with the offload model option selected. I haven't tried it in ComfyUI to see if it has better memory management there.
I got it working with 8GB VRAM and 64GB RAM, but quite slow
OMG Olivio has a girlfriend!!! 😊
Hi i have a 10 year old PC, but i wanna get into this. Is it possible?
If you have a modern GPU then the rest of the system doesn't matter as much. Otherwise, I'd say no.
@@Nelwyn I get CPU bottleneck when playing games, wont that matter here?
Processor Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz 3.40 GHz
Installed RAM 16,0 GB
Geforce GTX 1050 TI
No, because your pc is too old. You can't possibly have modern GPU or CPU, also VRAM and RAM matters.
Geforce GTX 1050 is decent but generating even one image would take time specially with higher resolution, incl. loras, upscaling, controlnet etc..
Wait for a better program this one skip
if it can take Lora, it would be amazing
I tried it on pinkio but even with the RTX 4070 i kept waiting almost more than ever but nothing was generating
Try with a smaller image
Same here, installed tried using Pinokio and even tried their examples.
My machine has no problems running the high end Flux model with ComfyUi or Forge
Useless tool
Can't wait to see what the AI community come up with in 2025
Video pony. Wait for it, prompt video to video using PH source, and before you know it…. another explosion in data center energy needs and yet more orders for A200’s.
interesting, thanks!
This is becoming absolutely nuts... it's almost magic
if i ask for a women from image one i get multiple people all the time. not sure whats up with that
for 1 512x512 i have like 9,90s per iteration while when i add a picture to it, it is already at 36,80s per iteration
im running a 3070 ... anyone else such slow performance with omnigen?
what if my computer had only 6GB of VRAM?
not sure, but that sounds like you need to update your GPU or rent a online server with a good GPU
it's text-img and img-img and LLM and SAM all in one.... seems like a nice concept, but i think its too early as a concept. nice t-shirt through
true, but they got to start somewhere
Photoshop is done. Unless...
To be honest, i'm not impressed. I also miss an inpaint function with a mask tool. My first choice for post-processing images is still fooocus. But thank you for the video.
lion with a dog tongue 🤣🤣
🙂👌
I am not convinced about this type of model. It seems super cool to have a LLM as TE, but on the other hand all the features combined in one model seem to make it very heavy and not very friendly for specializing or fine tuning.
It is a good proof of concept though.
It's definitely bulky, but in terms of fine-tuning, the GitHub repo already has the code and documentation in place for how to fine-tune it on any visual task you want. All it requires are examples of input-instruction-result sets, in the form of images and JSON lines that reference those images.
Technically impressive but still not useable. My experience is: the more you inpaint the worse are the results. If a model can't get it right first it's best to either do it manually or create a heap of images until you get something that is acceptable.
Aaargh not again the dreaded ENOENT with Pinokio
have you managed to find a fix?
Omnigen is more of a fun toy. It’s one of those party trick AI tools than anything useful yet
should i give this video rated+? so kids doesn't mistakenly watch this?
(reason cookie monster is teasing it's food)
🤣
well you said remove the dog, not the leash to be fair :D
true :)
Someone make a easier tool to use this is installing all kinds of stuffs into our system
We need Nvidia to release the 5090 soon.
Nvidia - take my cash!
Honestly its pretty bad, maybe it will be better in the future but it butchers most of the outputs and gen time is terrible.
Yep, not perfect at all. This has ways to go. But the potential it show is pretty amazing non the less. Can't wait to see where is this in 6 months from now
useless model for production
But for everything else is pretty great ❤
For production?
@@BlazarAzul Production: movie or photography production. Commercial level software that can produce something worth charging for. And no. It’s not ready for production. But it’s pretty freaking cool for one of the first tools to ever accomplish something like this. We are living in the future.
@@yoanhg421 Oh, okay. Excuse my naivety, I'm new to this, but what software is considered most suitable for film and photography?
You pay for professional tools. This is a free tool (basically an early tech demo).