The Open Source KING is BACK. Stability's NEW AI Image Generator!

MattVidPro AI

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 9 вер 2024

КОМЕНТАРІ • 262

@matsleonrichter5305 6 місяців тому ⁺⁴⁸
Thanks for covering our work, thrilled to see how our research gets adopted this way. Also, I still find it hilarious that "Würstchen" stuck as the name of our architecture. Sorry in advance for all non-German speakers who break their tongues while trying to pronounce it.
@paulmuriithi9195 6 місяців тому
Wow never knew tounge could be broken..must be a bony tounge
@Athari-P 6 місяців тому
I'll just call it Worse Ten.
@CryptoTonight9393 6 місяців тому ⁺²
Small sausage?
@KurtWoloch 6 місяців тому ⁺¹
@@CryptoTonight9393Yeah, small sausage... that's the translation. It's actually hard to find an English sequence of characters that sounds remotely like "Würstchen"... the "ü" being an umlaut of "u" which isn't used in English, and the "ch" is a single phoneme as well... imagine "k" but speaking it softly... in a way "ch" would be to "k" what "f" is to "p".
@RideShareRocks 6 місяців тому ⁺¹
The biggest problem I have is duplicating a face I've created in different poses. It's infuriating.
@MattVidPro 6 місяців тому ⁺²²
King's back.
@PigeonyStudios 6 місяців тому ⁺¹
Emperor Pigeon is back (me)
@The_Questionaut 6 місяців тому
@@SW-fh7heit's subjective
@hipjoeroflmto4764 6 місяців тому ⁺¹
@@SW-fh7hestop boofing monster energy
@SW-fh7he 6 місяців тому
@@hipjoeroflmto4764 what do you mean?
@CrystalBreakfast 6 місяців тому
The king never left. 😅
@Lar_me 6 місяців тому ⁺³⁷
Trying to wrap my head around how it can get a 1024x1024 image from 24x24 o_o
I really REALLY want to see Stability's models pull ahead of the competition soon! I hope the (supposedly) easier training times can allow Stable Cascade to reach Midjourney's level of detail somehow.
@pon1 6 місяців тому ⁺⁸
It probably can, this is only the base model, it is very general so it can probably do a lot better than SDXL when finetuned, and SDXL can achieve Midjourney level of detail in some circumstances (like in Fooocus using certain styles and settings).
@jeffwads 6 місяців тому
Reminds one of the quants.
@kuromiLayfe 6 місяців тому ⁺⁴
Just wait till they figure out how to encode the image in subpixels 😂
1024x1024 encoded to 0.2 x 0.2 pixels
@jonmichaelgalindo 6 місяців тому ⁺³
@@kuromiLayfe You can actually escape the pigeon hole limit by just setting the font size to 0.
@Kylo27 6 місяців тому ⁺¹
Lol
@oholimoli 6 місяців тому ⁺⁶⁰
"Würstchen" is german and the translation could be "small sausage" 😂
@MattVidPro 6 місяців тому ⁺¹³
ah..
@christianstein5130 6 місяців тому ⁺⁴
always funny to hear ü, ä und ö in english, in poland its easier to do a "smaller" version of a word, like wodka is small woda (water)
@hy7at 6 місяців тому ⁺³
I watched this whole thing mainly because of Matt saying "Würstchen" multiple times throughout this video 😁
@pierruno 6 місяців тому
Haha
@vitesh6429 6 місяців тому
I translated it from german, and it translated to 'hot dog'
@martianingreen 6 місяців тому ⁺²⁸
As a German speaker that's a really funny architecture name, literally just means sausage 😅
@CanadaBlue85 6 місяців тому ⁺⁶
Sausage AI™
@pierruno 6 місяців тому
Haha
@sasbe1852 6 місяців тому ⁺¹
*The trivialization of sausage, to be more precise.
@justinwhite2725 6 місяців тому
I used to work at a german pub called Wurst. Closed during the pandemic.
@anthony_leckie 6 місяців тому ⁺⁶
Great video as always, Matt. Very happy to see this new model. I got my first job using stable diffusion and video diffusion 1.1 last week. Very happy to see the new model.
@jtjames79 6 місяців тому ⁺¹¹
Even the text kerning was basically perfect. 😯
@ahsookee 6 місяців тому ⁺¹⁵
Würstchen is pronounced Vürst-yen. V as in view, ü like the u in lurk, st as in stash and yen like the currency.
@1Know1tHurts 6 місяців тому
Americans never give a fuck about how names and words from other languages are pronounced.
@GearForTheYear 6 місяців тому ⁺⁷⁶
Anyone else get the feeling that we're hitting diminishing returns with what's possible using the current NN architectures?
@pepenakamoto3675 6 місяців тому ⁺¹⁸
Yes. But I think there is a clear movement of capital and intelligence towards advancement in other areas of AI
@GearForTheYear 6 місяців тому ⁺⁵
@@blakecasimir Right, I agree. It's just a bummer that we may see another protracted plateau before getting something genuinely revolutionary to use within a commercial context (i.e better than humans). The Transformer arch is so close and yet so far away.
@BeginningInfluence55 6 місяців тому ⁺⁷
@@GearForTheYearYou are right in terms of image fidelity/aesthetics. It won’t get any better than midjourney v6. However prompt understanding and following is still not optimal. DALL-E 3 shows that it can be much better still. The problem is the training data. They lack more concepts than they provide. You can’t create truly creative images because for example there is no training example of a horse riding a human - so it can’t do it at all.
@clickpwn 6 місяців тому ⁺¹⁵
It’s not just the limitation of the architecture. Lot of it stems from the limitation of our language itself. We train and guide these models by using natural language however words are not sufficient for pinpointing an exact image you are looking for. One picture is worth more than thousand words and and using just few sentences as prompt will only get you just general image that could look okay but not exactly what you want down to nuance. Even if AI becomes smarter than humans, it still cannot read your mind and have only your words to go off of. Words carry too low-bandwith of information and only breakthrough I can think of is when we are able to upload our mind and thoughts directly to AI.
@jdietzVispop 6 місяців тому ⁺¹
@@clickpwngreat comment! So what to do about it?
@chanm01 6 місяців тому ⁺⁵
Just awesome.
I kinda lost interest in text-to-image for a while. It isn't reliable enough to use in commercial applications yet (imo), and it didn't feel as competitive as text gen where almost every week there was news.
Nice to see open source text-to-image making progress towards catching up to the state of the art in this field.
@Athari-P 6 місяців тому
Open-source isn't catching up with gpt-4, gpt-4 is still costly, gpt-5 tier doesn't exist. Overall, pretty meh too.
@jeffbull8781 6 місяців тому ⁺³
I think this is more focused on efficiency and speed, which means things like animation and video (using similar methods) is going to be much more realistic. As currently the static models are being sort of shoehorned into animation workflows.
@abandonedmuse 6 місяців тому ⁺¹
Their video is insanely realistic. Been beta testing it for a few days already.
@2CSST2 6 місяців тому ⁺⁶
Matt, I absolutely adore all your videos, but 42 is not orders and orders of magnitude greater than 8, it is barely half an order of magnitude!
@Athari-P 6 місяців тому
That's more than two orders of magnitude in binary though.
@ShoMorphias 6 місяців тому
This comment is half an order of magnitude more accurate than the subject matter!
@DiceDecides 6 місяців тому ⁺¹
15:20 even though no mustache, there's something about the quality that's really soothingly satisfying I think!
@IcyLucario 6 місяців тому ⁺¹
Awesome, glad to see SD keeping up. 1.5 is still relevant from the community, hope to see something like this treated the same way.
@AttenBot 6 місяців тому ⁺¹¹
i would love to make consistent 16-bit style video game character sprite sheets
@petitemasque5784 6 місяців тому
This model is non-commercial but if you want to make free games...
@AttenBot 6 місяців тому
Nah i dont care for non commercial, more of a personal project to achieve, go have a look at wwf royal rumble sprite sheets for example. One sheet thats of one character, Walking running jumping punching kicking etc.
@drew5564 6 місяців тому ⁺³
my boy!!!!!! whats good matt! just been sick recently and i have been away from yt as usual. im here now though, amazing video its looking like and i cant wait to get my popcorn and watch
@JonnyCrackers 6 місяців тому ⁺¹
Sick! Been hoping they'd come out with something to compete with Midjourney and Dall-E. I love Dall-E 3, but I get so tired of getting "prompt blocked" with prompts that have nothing offensive or copyrighted in them. Wasn't aware of Pinokio either, so I'm excited to give that a try. Thank you!
@abandonedmuse 6 місяців тому ⁺¹
I have actually been beta testing their video generation, which is absolutely amazing compared to anybody else even Pika. I also was able to ask for extra credits and they gave them to me because of the project that I’m doing with their video so I’m super excited.
@MyAmazingUsername 6 місяців тому ⁺²
This was something I looked for a few days ago, since I am tired of SDXL being pretty bad compared to Dalle and Midjourney. Especially SDXL's extremely deformed hands and feet. So I checked Stability for news and saw nothing. Then your news dropped. Thanks. I just got excited about open source AI again.
@Airbender131090 6 місяців тому
Sont get your hopes up. This is not the model that will rival mj. Next ine probably will ( but mj will already release v7 till then )
@davidbangsdemocracy5455 6 місяців тому ⁺²
Perhaps but image generators use Convolutional Neural Networks and Transformers are for sequential data such as text. So, I assume huge improvements will be realized with both types of models and whatever improvements are made to them. It may seem more subtle because they are already great, but the will be faster, more controllable, more efficient, and integrated into useful apps.
@gameswithoutfrontears416 6 місяців тому ⁺³
I just did a quick text test. Wow, perfect on the first one, but then not so great on the follow ups.
@MrTk3435 6 місяців тому ⁺¹
Good Job Matt!! Truly Exciting... We need more competition so, the subscription price will go Lower! ✨✨🤟✨✨
@RodgerE2472 6 місяців тому ⁺⁹
Updated Forge UI is out too!!!
@hipjoeroflmto4764 6 місяців тому
Well I'dk what that is so yes matt should make a video
@Elwaves2925 6 місяців тому
I thought you meant a new update, with the ControlNet fixes but it's the one that's been out a few days. 😞
@abandonedmuse 6 місяців тому
Which one is forge? Hard to keep up. Not sure i have used it.
@Elwaves2925 6 місяців тому
@@abandonedmuse Search for SD Webui Forge.
@LouisGedo 6 місяців тому ⁺⁶
From my testing, SDXL Turbo is utter garbage 💩 🤮.
I'm looking forward to Cascade
@ahsookee 6 місяців тому ⁺³
I didn't like it either, although I really tried.
@aouyiu 6 місяців тому
Garbage how? It just needs tweaking to reach its potential.
@LouisGedo 6 місяців тому
@@aouyiu
The quality of the images is like that of Midjourney 2 based on my testing.......utter garbage
@FRareDom 6 місяців тому ⁺¹
This came at a time we needed it most
@okolenmi7511 6 місяців тому ⁺²
It will be better than Midjourney. 16x training performance + open source = magic
@AmandaFessler 6 місяців тому ⁺¹
I was starting to lose hope, but here they are! And with a focus on cost efficiency too! I hope it has backwards compatibility with 1.5. I have way too many loras of it stored up.
@Athari-P 6 місяців тому
All loras are tightly coupled with base models, nothing will be compatible with sd 1.5 ever.
@MrPablosek 6 місяців тому ⁺⁶
Does this mean it will require less VRAM to use? My 3070 struggles with SDXL without setting up various parameters and such to make it work and then it takes a pretty long time to generate an image.
@sherpya 6 місяців тому ⁺¹
I've read something on reddit about needing more instead
@kuromiLayfe 6 місяців тому
Think pretty much the same amount.. the concepts of this is similar to running a workflow in comfy that generates an image at 256x256 then does image to image with a upscale to 1024x1024 and then once more to detail the final sampler output.
@povang 6 місяців тому ⁺²
bro this is crazy, looks like it'll blow midjourney out of the water once it gets in the hands of opensource trainers for a few more months down the line.
@saymydomain9504 6 місяців тому ⁺¹
Mage and Leonardo will probably implement this model soon as possible.
@MilesBellas 5 місяців тому
Elon needs to take over !
"Robin Rombach, Andreas Blattmann, and Dominik Lorenz essentially created Stable Diffusion while at a German university. Stability AI got involved after the publication of their research and offered them the company’s computing resources. According to Forbes, all three have now left Stability AI which is also experiencing cash flow problems."
- Petapixel
@I-Dophler 6 місяців тому
It's an interesting concern, especially with the rapid evolution in AI. While Transformers have indeed been groundbreaking, the tech field's nature is to innovate continuously. Who knows, the next big breakthrough could be just around the corner, rendering today's limitations a thing of the past.
@HouseOfSynister 6 місяців тому
Thanks for these videos! I learn so much from them, keep it up!
@Modioman69 6 місяців тому
I can’t wait to see what the trained models of Cascade end up producing later. Heck I say later but someone will probably have trained model by end of week or something with the current pace of things lol.
@jonmichaelgalindo 6 місяців тому ⁺¹
Got it running on Windows (command line). It has to be possible to make it run in Comfy, but it would take some work.
@SuperAleaiactaest 6 місяців тому
There's a way easier way to do this. You just loop a clip the length of each notes phase. You do this and extend the loop out till it merges back in and you do this for all of the notes then you ctr+j to consolidate it.
@vi6ddarkking 6 місяців тому ⁺³
The Stable Zero123 model still has and the Stable diffusion video had the same limited licence during it's experimental phase.
So nothing new here.
Still being vigilant is always the way to go.
@starblaiz1986 6 місяців тому
Do we have any idea based on past experience how long that licence will be limited? Are we talking weeks? Months? Over a year? 😮
@vi6ddarkking 6 місяців тому
@@starblaiz1986 Once Version 1.0 releases usually it bounces to the new fully open source licence.
@hipjoeroflmto4764 6 місяців тому
Matt I just had or still have covid need to retest but this video made me feel good
@jay_sensz 6 місяців тому
Not sure if I'm just spoiled by community-finetuned SDXL models and Fooocus, but I'm not terribly impressed by what I've seen so far. But then again I was initially underwhelmed by SDXL as well.
What keeps me interested is the possibility of much more efficient finetuning compared to SDXL, but it might take a while for tooling and fine-tuned models to become available/usable.
@KurtWoloch 6 місяців тому
Interestingly, at 11:07 when the picture of Barack Obama comes together, at times it looks a bit like Alfred E. Neuman from the Mad magazine.
@sinayagubi8805 6 місяців тому ⁺²
I think you don't realize, this means opensource totally won today. just need to do this with language models too
@MattVidPro 6 місяців тому
You haven’t seen anything yet :)
@aouyiu 6 місяців тому
Meta might get us that, maybe sooner than you think now that Gemini is officially competing with ChatGPT.
@haileycollet4147 6 місяців тому
Miqu is getting there... It's not gpt4 level but it's definitely better than 3.5 all around, nearly as good as Gemini Ultra... And it's 70B 😂 It's coming!
@tonyzed6831 6 місяців тому ⁺¹
Wow, and in Pinokio already??? Love that!
@jeffwads 6 місяців тому
Wow. Never heard of this before.
@tonyzed6831 6 місяців тому ⁺¹
@@jeffwads I think he made a video about it... pinokio allows you to run AI tools on your PC without the hassle of installing complicated stuff, it's truly gamechanging. But you'll need a good GPU with a lot of vram (I went "cheap" by buying a used 1080ti, and 11gb of vram seems to be enough for what I do... for now).
@AzoreanProud 6 місяців тому
Nice
@godnyx117 4 місяці тому
Thanks for sharing!
@cagnazzo82 6 місяців тому
I'm sorry to say, but with the endless possibilities now available with Midjourney's --sref feature, I think they ran away with the crown. What's possible now is absolutely mindblowing.
@vitesh6429 6 місяців тому
With the same prompting, you can get better images (not definitive testing, just a couple of tests) than SDXL (NightVision XL), the images have a HDR midjourney look to them.
@LoneBagels 6 місяців тому
God: "walter white eating a big mac inside of mcdonalds, there are blue crystals in the big mac burger, walter white is dressed in a yellow hazmat suit"
Dall-E: "Even though I am just a tool and don't have a soul; I will pretend I have one. Therefore, I cannot do what my master commanded me to create, even though I'm fully capable of doing the job."
God: "Kicks Dall-E from the heavens; Downloads Stable Cascade!"
@fire17102 6 місяців тому
Soon In SD5... For my kids, Remake this folder of movies to take out all the non wholesome parts.
For example, in Bambi the mother doesn't die, no one is in life danger, they all meet happily in the end. In the lion king, Mufasa and Scar are good friends and Simba is raised with his Dad. Ariel doesnt loose her voice. Remove nightmare fuel from Pinokio and Dumbo, etc etc etc etc etc.
Generate new wholesome scenes, keep characters and style as the originals, voice with 11Labs.
We will actually be able to give nice content to our kids, without passing any horror from the hydra studios.
@zingsnapbites 6 місяців тому ⁺¹
Are the images commercial free to use?
@shaunralston 6 місяців тому
Always appreciate your being on the cutting edge of OS reporting, Matt.
@elishevafreely3206 6 місяців тому
I really hope that playground AI picks this up.
@goodtothinkwith 6 місяців тому
Würstchen? Um… little sausage? Hot dog?
@ahsookee 6 місяців тому
11:50 it's easier to finetune this way than starting from a model biased towards photorealism
@cysshorts1529 6 місяців тому
People: 1980: we will have flying ca-
*literally 2024:*
@BTMYYY 6 місяців тому ⁺¹
yoo this is so exiting i love open source :D
unfortunately it takes like 30 minutes to generate a photo locally on my 3060 with pinokio
@ilplopperz 6 місяців тому ⁺¹
xD
@BTMYYY 6 місяців тому
Updated pinokio now it takes like 15 minutes
@AscendantStoic 6 місяців тому ⁺²
What are the Hardware requirements for running it local?
@LukePellen 6 місяців тому
Open Source FTW.
Open Source means everyone is a winner.
@DezorianGuy 6 місяців тому
Why does it take Stable Cascade several minutes to generate an image with my RTX 3060 12GB? No problems with Stable Diffusion etc.
@julianopajaro2005 6 місяців тому ⁺²
Hey, Matt. Do you know any A.I. that makes Cinemagraphs?
@doben 6 місяців тому
I think "Imagen 2" can do that.
@ilyass-alami 6 місяців тому ⁺¹
Hi Matt you can test the LLaVA 1.6- 34bit demo llm vision assistant,
@havemoney 6 місяців тому
Happy Valentines day 💓
@nachod9772 6 місяців тому ⁺¹
tried it, but idk dalle 3 give me a lot more specific and good results
@consig1iere294 6 місяців тому
I am curious, why did it take so long for implementing the Würstchen tech? This was shown by the actual people behind Würstchen last year.
@alexnorth3393 6 місяців тому
Exciting news!!
@toCatchAnAI 6 місяців тому
curious why they didnt show a benchmarking with MJ
@twilightfilms9436 6 місяців тому
You mention Krea, and Krea uses SDXL under the hood, so I wonder if you have found a way to get Krea or Magnific results but for free using comfy or a1111? I actually wonder how come no one is even trying to do it……anyways, great video!
@fontenbleau 6 місяців тому ⁺¹
creating something from nothing by spells, is it Harry Potter in real life? It's a magic!
@Fustercluck06 6 місяців тому ⁺¹
I also feel mppy inside lol
@isajoha9962 6 місяців тому
This video makes me happy for the future.
@moelleunbelievable 6 місяців тому
As a german, I have to admit, they did y'all dirty by calling an international used software (or at least part of it) "Würstchen" 😂😂😂 ... It means small sausage if someone is wondering.
@tradehut2782 6 місяців тому
OH my god...
Talk about seeing something unexpected when opening UA-cam
@raaghavgr1990 5 місяців тому
How many free prompts in a day do you get in the free plan of stable cascade?
@TheGoodContent37 6 місяців тому
What specs a pc should have to be prepared to run a SD model relatively fast? Is all about the graphic card?
@shazolislam6359 6 місяців тому
Honestly, I have a really interesting Question @mattvidpro. What is the relation between You and Lemon?
@3DArtistree 6 місяців тому
Of course when I just uninstalled Pinokio to make room for more checkpoint models! lol Hope someone ports it to Comfy in the next few days!
@faymo8925 6 місяців тому
It's a bit slow one minute. 40 Seconds on a 3060TI. But as you said, it's FREE.
@jopansmark 6 місяців тому
It's over for Midjourney and OpenAI.
@A-uz3uj 6 місяців тому
It’s crazy though open ai just released Sora yesterday, way ahead of anyone else on ai video
@RomiWadaKatsu 6 місяців тому
I'm running it locally and it's far slower than sdxl for some reason, the web demo works better. Also the results are clearly inferior to dall e 3 so there must be some setting I'm missing. I'd say one can skip it until it's in the hands of someone that can run it to satisfactory levels
@PostmetaArchitect 6 місяців тому
The model is not open source. Its non-commercial use only, the dataset is not available, training method undisclosed. Just because you can run a model locally doesn't mean its open source.
@BlackMita 6 місяців тому ⁺¹
How censored is it though?
@MrPablosek 6 місяців тому ⁺¹
From what I saw, not at all.
@USBEN. 6 місяців тому
Looks a lot better.
@christopherd.winnan8701 6 місяців тому
Can it handle compoond nouns yet? How about magnet fishing for example?
@Ariane-qq9co 6 місяців тому
Nightshade is coming.
@realWorsin 6 місяців тому
Requires 20gigs of VRAM though. That will eliminate most people.
@gionicol_ 6 місяців тому
My honest reaction was: "Oh no..." 🤣
I'm really trying to catch up with everything, but oh boy, it's hard
@AndreFelipeF 6 місяців тому ⁺¹
niccee, going to check right now!
@andyone7616 6 місяців тому
Can this model will be used in automatic 1111?
@hermeticsense8805 6 місяців тому
42 is not orders of magnitude larger then 8. It isn't even 1 order of magnitude larger. I'm not completely confident in my criticism, but I hope my comment is useful. 2:00 4:51
@0ceanswave 6 місяців тому
Close, but not even 1 order of magnitude, 1 if we round up.
@KlimovArtem1 6 місяців тому
I’ve tried it. Not even close to Dalle3 in following difficult prompt. Not even close to realism of MJ v6.
@pn4960 6 місяців тому
I ami hyped !
@morizanova 6 місяців тому
Just trying it . Not full test but generating text seem OK
@user-ef4df8xp8p 6 місяців тому
Stability AI is cool....
@CrystalBreakfast 6 місяців тому
Nothing currently beats SD1.5 due to Controlnet, IP-Adapter, and LoRA training, just to name a few massive game-changers with NO comparable equivalents among closed models.
SD1.5 can even animate. With a HUGE amount of control over just about everything. SDXL is still catching up while the community continues to expand 1.5's capabilities. It'll probably be a while before Cascade can be controlled to that degree.
Everything else is just a toy. If you think there's anything SD1.5 can't do that a closed source model can, then quite frankly you're in the dark. The community has expanded 1.5 in so many ways that nothing else comes close.
@danielleza908 6 місяців тому
stability ai are the best!
@SchusterRainer 6 місяців тому
try photo taken on Fujifilm XT3
@RickPMandel 6 місяців тому
The question I have, is, as always, how does it handle censorship? What happens if you give it a prompt that many AIs will label as NSFW, and will not render?
@GearForTheYear 6 місяців тому
It seems to just ignore those parts of the prompt. I couldn’t even get two mechs to shoot at each other.
@TruthTrill 6 місяців тому
Can this run in Forge WebUI?
@zodiacblue9312 6 місяців тому
One of the example generations being literally a cherry being picked is hilarious
@ctrlartdel 6 місяців тому ⁺¹
Slow Magic!
@zodiacblue9312 6 місяців тому
@@ctrlartdel yeah really like his music and the album cover is goated
@darkman237 6 місяців тому
No way to install apart from pinokeo?

Наступне

Автоматичне відтворення

Reflection 70b Controversy is PROOF our Perspective on LLMs is wrong.