Ai that makes thumbnails (or any image)
Вставка
- Опубліковано 9 чер 2024
- A video about using AI to generate youtube thumbnails. I explore the classic GAN method and compare it with a newer method called diffusion. One turns out to be better than the other!
Reviewed by Andrew Carr: / andrew_n_carr
Disclaimer: All thumbnails were deleted after use. I do not aggregate youtube data.
The losses are not exactly inverse for the generator and discriminator because the two are not trained on the same data.
LINKS
Twitter: / max_romana
Discord: / discord
Patreon: / emergentgarden
The life engine: thelifeengine.net
SOURCES
Original GAN Paper: proceedings.neurips.cc/paper/...
Face interpolation: • StyleGAN2 Interpolatio...
BigGAN Paper: arxiv.org/abs/1809.11096
thispersondoesnotexist.com
Flower Gan: / 1527890938386857984
Katydid: • The Katydid (Leaf Bug)
Mantis: • Praying Mantis Hunts a...
Diffusion Paper: arxiv.org/abs/2006.11239
Diffusion beats GANs: arxiv.org/abs/2105.05233?curi...
Blog Post: gretel.ai/blog/diffusion-mode...
Diffusion explanation: • Diffusion Models | Pap...
Diffusion Visualization: / 1537042940475883520
Water Diffusion: • demo - (hot and cold w...
Dall-E 2: openai.com/dall-e-2/
Imagen: imagen.research.google/
CogView: arxiv.org/abs/2105.13290
Parti: parti.research.google/
TIMESTAMPS
(0:00) Intro
(0:32) The Goal
(1:11) The Data
(2:15) Latent Image Generators
(3:03) GANs
(4:35) GAN training
(7:45) Diffusion
(8:55) Diffusion training
(10:43) ☆Generated thumbnails☆
(13:58) Diffusion beats GANs
(15:27) Conclusion
(16:28) Outro
MUSIC
• Closed Circuits - Наука та технологія
Fantastic video, this channel deserves much more attention. You have a real talent for breaking down complex ideas and making them easy to understand. Thanks!
It would be super awesome to have a GAN trained to do camouflage. In fact, there are papers that describe this already. They train a GAN with one NN to colour a triangle on a random position on a random background, and a second NN to try to detect this one. As a result the triangles take on patterns that are harder to make out. There's cool websites where you can try to spot these triangles yourself.
I've always wanted to do a few variants of this. Firstly I'd love to see this done, but with a "poisonous" triangle added to the image along with the camouflaged triangle, one with some very distinct pattern. Then the spotting NN is penalised for detecting the position of that triangle. It would be awesome to see if aside from camouflage, mimicry would evolve - and which one would be more likely.
Secondly a variant where some parameter influences the contour of the triangle as well, like a frayed edge, would be cool. I'm sure you'd get some crazy good results.
hey, can you please post a link to such an website? thanks in advance!
What an amazing video. Thank you Emergent Garden. What a great channel name btw. You're awesome.
This is an awesome project. I hope you can take this to the next level.
🎉 I am glad UA-cam recommended your channel 🔥🔥🔥🔥 this is something else 🎉
This is super neat! Amazing explanation of diffusion!
Idea#1: AI that can generate great comments.
Idea#2: AI that can generate a script for a movie/Stories.
Idea#3: AI that can generate tips for demotivated people.
Idea#4: AI that can tell, that when approximately human civilization will end.
Idea#5: AI that can generate Idea like this.
Omg. This is the best sick day ever. Thank you sir for taking the time to teach us all that you have! I just cannot stop watching your content!
You know what. I am realizing that the process of diffusion is a lot like reddit's "the place" event that they have done in the past. People would contribute pixel colors individually to anywhere on the page canvas and it was always amazing to me to see how the image would "evolve" over time. People would organically arrive at recognizable images by seeing the patterns that would emerge from others that had laid down pixel colors before them. As the images begin to take shape, in an iterative way more pixels would come to fill in the gaps and hone it into a final form that would resemble a flag, a person's face, a logo, etc. In this case however, it was an image that was collectively well known to the people that participated. Obviously a lot of the images were coordinated and didn't undergo this process but there were a lot of areas where this seemed to be the case.
Awesome video and explanation, thank you!
Love the presentation!
My experience with GANs is exactly the same. After solving many bugs and issues, I end up with "mode collapse".
okay seeing the thumbnail change to what it is now made me watch this. im making this comment before i watch cause i wanna say that it made me wonder if you gave the the ai access to the youtube statistics to try and learn to make better thumbnails for this video, and if you did then thats cool thats why i wanted to watch this. but if you didnt, id love your input on if thats a good idea or even a possible idea.
Yes, sounds like you can extend diffusion model with something like this(as diffusion model has some internal measure of how good image is(it's trained there for measuring noise, unrealisticness) you can try to put such statistics in that space for worse thumbails being worse in the same sense as noisy images worse - but it can be tricky that way, also something like using conditioning, like with text or class conditioned diffusion, to get the "slider" for generating better or worse thimbnails) - but one of the real problems is how to get such statistics, like you can not just use amount of views or something like that cause it's depends on so many things, like channel popularity, trend, youtube recomendations system and so on. Something like rate of click per view of thumbail would be good probably, but it's some internal youtube info, we're not gonna get
Great video!! I sort of agree with the other comment. I'd say this video is deserving of a little better title that reflects its educational value and content. I only clicked because I was subscribed and I wanted to see if I had to unsubscribe. I'm picky!^^ and I think I've come to associate some clickbate with poor quality video content, and this is definitely not that!
larger datasets may not even be necessary. You can accomplish an increase in diversity by deduplicating the data you already got, potentially actually increasing performance with a smaller dataset!
Deduplication may be tricky. But one method might be to train up a purposefully relatively small network to simply distinguish images. If it thinks two images are the same, chances are, the images are really similar.
And to further improve this, you can train up *multiple* such networks and go with if like more than half of them think they are the same image, they are too similar and should be picked at random as a group - i.e. you group up "similar" images, then randomly select groups, and finally randomly select an image from each group.
Alternatively, more easily, you can just discard all but one of the images of each group to shrink your dataset down to only sufficiently unique thumbnails.
Can you share a link to the dataset generated or a tutorial on how to do it?
Also, is it feasible to create UA-cam Thumbnails with the current state of the art of AI?
Just a little after this was released, the most impressive diffusion based image generator yet was open-sourced. Stable Diffusion is the most promising AI image creator yet, at least until Parti's techniques are perfected and data researchers go back to the drawing board.
Who's Parti?
Is there a hybrid approach? Like using diffusion to generate images and then a gan to tune that model? I'm not a computer scientist and I may be either saying something super stupid or super obvious but I'm genuinely curious.
Can you similarly walk the latent space with a diffusion model by modifying the input noise?
amazing video
this is the first time i click on a click baity thumbnail and get good content
So with thumb nails you could include the text of the video, so It would have to make a good thumb nail and text prompt
What happens if you use a diffusion network as generator for a GAN?
As long as diffusion models cannot generate samples in one forward pass i think GANs have a reason to exist in use cases where synthesis speed is an important factor.
very cool
great video
can't wait to play the new MIAECROOFT: MRGROOTBU update 13:48
If it takes 24 hours to train a batch of images how do wombo and dall e generate images in less than a minute?
so generating the perfect image is like a slot machine
600/10
Seems like my comment got deleted cause of arxiv link, I was saying you could try StyleGAN XL as it showed quite good performance with diverse datasets like imagenet, and trains relatively fast(despite big size) and second advise is using finetuning instead of training from scratch, it's much faster and more stable for gans
Oh yes, the best way to do it would be to fine-tune a big pretrained model like stylegan. But I'd rather do that with a diffusion model first, and maybe stylegan for comparison.
Hi, can you share your datasets? Or is it from kaggle?
Fucking Great video man!!
Imagen is pronounced I-mi-gen. Great vid!
5:02 is that by brute force? So the discriminator will see 1,000,000 pure noise pictures, and when, by chance the generator generates two black pixels besides each other, it will decide that it prefers that picture.?
gg !
The more I've learned about neural networks, particularly while watching Machine Learning Street Talk, the more I doubt when people say "these people do not exist" about the deepfaked face images. I believed it blindly before, but now I am concerned that it's basically just interpolating between faces, which is pretty great regardless, but I think someone needs to take the input images the network was trained on and compare them to the best most concise outputs it generates, and see which faces it most resembles - if not matches almost perfectly. Sure, with a GAN it's encoding down to a much lower dimensional latent variable, and then decoding back up to image resolution, but that still could just mean that it's just showing us faces it's learned, and interpolating between them within the latent variable space. At any rate, I'd just ;ike to see comparisons between the "random" outputs and the actual images that the network is trained on.
That's the point of latent spaces of human faces … ? Given the right parameters it should be able to generate every possible picture of a human face within and outside the training data.
12:23 Does anyone know if the Chinese text is intelligible?
Looking at the patreon, we must all be broke
This video's script was enirely AI generated. then read by an AI, and pictures created by an AI
I recently played with Stable Diffusion beta 1.5, they have a trial and some points for everyone to try their model and my impression is that their diffusion model is really great only for generating artistic images, paintings, in many other situations it looks either too artificial or overfited - it copies too much, (that is my impression) and all faces looks awful - especially when compared to "Mid Journey" and their model and images.
I guess, more work is needed on this type of neural network...?! _I'm not expert, just a hobby programmer_
Errant was here
Well we DID "invite" you. Now guess how.
Have you generated Adam Neely?
i have a friend that is an "artist" (not really that good but thinks they are), and she is pissed about this AI thing, it's pretty funny, sucks to be an artist now :D (or soon when this thing gets really REALLY good)
annealing
defusion
waveform collapse
…
Make a thumbnail of a bot sitting infront of the computer making/editing thumbnail
As if youtuber's jobs weren't easy and lazy enough.
So the discriminator is basically solving the Turing's test...
Unfortunately…Oil is still the oil of the 21st century …
true lol
Man, love the vid, but for real change the thumbnail. Your current just falls too much into the uncanny valley, as you too know and I didn’t click on the vid for many hours, even considering unsubbing because I thought it was junk cluttering up my “subscribed” feed.
Worst possible reaction ever. I did not have the same reaction, I saw it and was intrigued. Reminded me of some of VSauce's thumbnails!
Not the reaction I was going for lol! I'll be messing around with the thumbnail/title, I figured uncanny ones would catch the eye but they can also freak people out.