It's a different kind of mess up, because this AI isn't anything like humans. Humans still easily get the right amount of fingers right, even a child does. These AI (ML) tools are just forming shapes from tonnes of other people's art turned into a weighted matrix. That's not what humans do, humans learn to conceptualize the kind of thing they want to draw and go about many steps towards constructing it using a deep understanding of what the thing is.
In the lucid dreaming community - one of the most reliable "reality checks" is inspecting your hand and confirming if you have 5 fingers. For whatever reason, the brain has a difficult time generating a five fingered hand while dreaming. It's kind of a creepy coincidence that AI has the same issue.
Same reason people generally have trouble drawing hands from memory or imagination. I would bet that artists, particularly animators or illustrators, who have to express action and emotion in hands and limbs, don't lack an ability to dream "correct" hands.
It's weird how humans can instantly determine when something looks wrong, but the same humans cannot necessarily correct it or make it right from scratch. As a beginning artists there's a weird rift between your mind's eye and your skill.
I'm missing where's the weird part in all this... We observe and sense far more frequently than we create, and both are different skillsets that need practicing to master. When something goes against those ingrained patterns that you've built over thousands of hours of observing, you'll sense it because the result doesn't meet your expectations. You'll know when a circle isn't a circle, but only if you've mastered making one enough will you be able to make a perfect one. If you've never done any martial arts an incoming kick might startle you, but when you're an experienced practitioner you may instantly sense that the kick was never going to hit you in the first place based on its motion and all kicks you've seen before, and you'll not even flinch. Different situation, same principle, and it works in all learning.
It's called millenniums of evolution. If you can't quickly tell is something is close or far away or a predator or your mom then you wouldn't have survived
I don't mean to be rude but maybe you've been looking for calming videos in the wrong places lol cos they exist plenty. Look up gregorian chants, for instance.
@@dundermifflinity Exactly. It's funny how he says "even AI struggles" as if it is a benchmark to reach. Goes to show how far it has progressed recently. Feels like all of this happened so quickly, almost overnight. Unreal
@@anmolagrawal5358 >almost overnight it's kinda interesting to think about. machine learning models have almost certainly been under development for the past couple of years, but it's only been in the last few months that they've been publicly released. and the interest and demand they've generated just encourages the developers to work faster and harder to create higher quality models
@@hiddendrifts iirc from our university lectures, machine learning has been around before the 2000s. It's just that nowadays, we have more data and more processing power to train models with.
I remember watching Proko's lessons about anatomy years ago. Drawing hands from imagination was one of the most difficult part for me. I think I am just an AI lol.
If you know a thing or two about sewing, you notice pretty fast that AI is also terrible about clothing. Buttons merging into zippers, fabrics changing textures and weights, folds appearing and disappearing without seams, those are all things you see commonly in AI art but people don't notice as much because your average AI artist isn't a seamstress.
Not a seamstress (or seamster?) but I do notice! Drapes was one of my favorite things to draw in school but AIs so often get all the details wrong. They don't have the structural knowledge they need to make everything convincingly enough.
Interesting! Makes you wonder what else AI gets wrong that only people with certain skills or knowledge would notice. I guess AI isn't the crafty know-it-all artist we thought it was. At least not yet.
Not sure what is interesting about this take. AI is making all of these images with little context. It doesn't know what a material is or how it should behave. It just "knows" what something looks like on average, basically. If you are shocked or surprised by that you fundamentally misunderstand how it works and are expecting it to generate images based on parameters it just doesn't account for.
Sigh, it's a different kind of struggle. Please don't act like this is evidence the AI is similar to humans.. Humans struggle at the perspective of hands, they don't struggle with the number of fingers, even a child gets that right. This is evidence of how this "AI" is purely algorithmic, merging data from millions of pieces of images. Humans do not create art this way, we create from a deep understanding and many other factors that cannot be quantized by a layered neural net.
Definitely! I understand why hands are difficult but it's sort of mind boggling how hard feet are. You wouldn't think it but it's very frustrating to try to draw feet well
At the moment, I noticed that at least some neural networks draw faces as a separate module, on top of the rest of the picture. The same should be done with your hands. There should also be a setting to “hide your hands” so that they simply end up behind your back, in your pockets, etc.
That last setting is simply including it in the prompt. It still won't always work, and even simple ideas like "facing the camera" or "facing forward" sometimes get you a back shot.
Best yet, it's subtle. You can watch the video without noticing it. So many editors early in their career go over the top with their editing which...some people like (usually younger people) but is really bad by the standards of editing.
@Tinil0 oh wow! you create the rigid, never changing standards of editing!? I'm star struck, its so nice to meet you, i have so many questions about editing seeing as we've got an expert here 😁
As an artist, I will confirm, hands have an EXTREMELY low margin for error. There are many different body types, face shapes, limb proportions. Consequently, there's wiggle room. Not so with hands. People will still compliment most artwork that slightly misses the mark, but they will go silent if you mess up hands.
Biggest mistake even decent artists make is messing up the direction the hand and fingers are facing. Like someone draws a left hand palm out but the thumb starts on the right side. Technically you drew the hand right but the order is wrong and it's obvious when that hand is apart of a human/character drawing. I've been drawing for years and years and I still struggle with hands and foreshortening.
I think that in order to solve the "AI knows how things look, but not how they work" problem is to train the AI not only on images, but also on rigged models, like Blender models before you hit "Render." I personally find out how things work and what proportions they generally have by spending a few minutes fiddling with the object and studying it from different angles before trying to draw. Edit: Sorry I'm late.
I think the problem is that those are entirely separate logic bases. It would be like training a single AI to perform both image recognition and audio recognition. While that might be possible, the complexity of the neural network involved would be exponentially greater than simply creating a separate AI for each task. The image-generating AIs do not have any concept of the physical structure of the environments they create. All it does it to generate pixel patterns. If you tell it to draw a tree, it draws pixels. It has no idea what a tree is. But it has reference material labeled as "tree" and the pixels it draws are consistent with that reference. So both what it's trained on and what it produces are just colored pixels with labels. To try to expand that training to include 3D models and an understanding of structure and space and form would be an unimaginably daunting task, I think.
@@ecMathGeek But couldn't you use 3D software to generate billions of 2D images as base for AI to learn from? …instead of waiting for real hand photos to appear.
@@karoljankaminski5793 You could give it a huge array of hands to train on, but I don't think that would fix the main issue: the AI doesn't know what the structural rules are for hands. It only knows grouping rules for the pixels that are labeled as hands. It looks for general patterns, not specific rules. So it can recognize that finger-like protrusions exists on most hand images, but it doesn't know how to count them, or what angles make sense, or whether showing the back or front of a finger makes sense, etc. And since the AI isn't just copying and pasting hands it's seen on other images, it has to rely on the rulesets it has created to draw them from scratch. Perhaps the real problem is that the rulesets for drawing hands are too general and contradictory? Some hands in images show all fingers, others might only show one or two. Some show them positioned at odd angles. Some will be holding something and others won't. And each variation implies different rules for how the hands should be drawn. If the AI is generalizing all of those contradictory rules into one set, creating abominations is almost inevitable.
“The AI knows how things look but not how they work” I’ve gotten into so many frustrating conversations trying to correct friends and colleagues talking about chat gpt as if it had some internal logic and self-referencing reflective capabilities
it must have internal logic however, that is the whole point of training a network. like if you train a network to add two numbers, you will expect an analogue of a summing circuit to form using it's weights. otherwise it would simply be memorizing.
@@biocode4478 Thank you. You're completely right. Sure, gpt-3 is just a trained neural network, not a calculator, but through training from human data that includes a lot of logic, the neural net actually "learns" that. Now, it will contain the same mistakes that a human might make, but there is absolutely internal logic. Thanks for bringing this up.
I don't like reading AI responses to customer service questions. The answers seem very hollow and rigid as if someone was reading an instruction manual.
At 7:57 he said something that resonated with me: "AI art is basically bad at art, we're just able to see it with hands". A lot of times, when you look closely at an AI generated image, you start to notice all kinds of strange things, like shapes that doesn't make sense, roads leading nowhere, details that are simply wrong. Will this change, and what will it take? Right now it seems that you either have to accept a lot of errors or "peculiarities" with AI generated images, or you have to do a lot of manual work to get it right.
He also followed by saying ''But both of these things are also a bit wrong". AI will indeed get better, and it is getting better. It's only a question of time until AI can do all of those things that the video said they weren't able to do at the moment.
@@deadeaded Are you sure? Because above; LuisPereira stated that "This is just nonsense that people who don't understand AI like telling themselves to feel better. Much like humans, neural networks also conceptualize everything they draw, i.e. they also break down large complex shapes into patterns of smaller shapes and learn the patterns between them." I don't know enough of Ai to know who to believe.
@@deadeaded Well, without knowing exactly what that built-in structure does, it's hard to agree or disagree. I think that shouldn't be a problem, but i can be wrong. We'll see in the future.
I’ve tried to use it to make fun renaissance style paintings of modern scenes and I can say it also seriously struggles with feet (toes specifically) and keeping track of limbs in crowds. You wind up with disembodied arms and legs.
Yeah, it is all a moving target. It used to be getting a single person even just standing there might be mangled. Now a single person is likely to be very good, even the hands if they aren't doing something super complicated. Feet are also better than they used to be, but obviously haven't had as much attention as hands. And the more people and more details in general, the more likely something goes wrong. It'll keep getting better but the standards will keep getting higher too.
SUMMARY AI art models struggle with drawing hands due to data size, data quality, and low margin for error. AI models have limited exposure to hand images and lack annotated datasets to learn how hands work, unlike the abundance of face images available. Hands are more complex and diverse in appearance and function than other body parts, making it difficult for AI models to learn and replicate them accurately. AI models can create visually appealing art in many other aspects, but "hand-like" isn't sufficient, as humans expect more accuracy when it comes to hands. Improvements in AI art generation could come from increasing computing power, using more human feedback, and encouraging people to rank the quality of generated images.
and if you look carefully you'll see the skyscraper is also a bit creepy, especially its lines not straight enough, ragged, weird, the whole texture is not constant :D
I really like how this simplifies the concept so people who are neither software engineers nod illustrators can start to understand how complicated all of this stuff actually is, even though the years of learning and practice that goes in is kinda invisible.
@Zaydan Alfariz The thing is that this would need a fundamentally new approach. The current way does exactly NOT work with anything 3D, it just learns patterns from 2D imagery. There is no feasible way to integrate 3d components into its 2d workflow in this way.
@@nameless9084, yeah, no. It'll take so much more to do that. Unless you can somehow teach it the thousands of rules, techniques, and other such things that artists can learn much easier, along with making sure that it actually understands how an idea is supposed to work in reality, artistry will never come to this concept.
@@Sir.light1Idk, you can do that to the subscribe button too. The buttons highlight when the person in the video says to like and/or subscribe, but they have to choose if the highlight effect is enabled, from what I know.
I used to draw and paint semi professionally and the hardest for me was always hands and feet. Specially how intricate they can get and the multiple poses they can have and achieve. Our body is art itself because some of the things and poses we do are very weird and complicated and don’t even look unnatura.
I didn't have the same problem in drawing simple hands and feet. It takes time but at least you have to really watch your own extremities to understand their shapes and why they are. It's fascinating to look and contemplate, and it leads to me liking beautiful hands and feet hahaha.
If you break it down to the anatomy of hands and feet then it's clear why it's so difficult. There's a lot of tiny bones and muscles clustered in an very intricate way. The rest of the human body has a lot of long straight or curved bones. That's easier to understand and picture on your mind.
03:43 You do realise you just picked the worst example possible next to imagining hands? I want you to take a very close look at this symmetrical perfection of a skyscraper...
I learned about this recently and you reminded me of it, the "Chinese Box Problem" - the AI knows where the hands are, and generally, WHAT hands are, but it doesnt UNDERSTAND what they do or how they work. It can draw everything it knows about "hands", but that doesn't include complex things like "range of movement"
I heard someone say that a good way to find good poses is to pretend like a part of your body hurts. For example you have a headache so you rest the back of your hand on your forehead. I'm not a model but the poses they were striking looked great and it was pretty funny watching them explain it at the same time.
There's a problem with any repeating features (teeth etc) because the AI works like autocomplete: after a finger tends to be another finger, then after that finger comes another finger, then ooh a finger! Usually there's a finger next to that! You also see it in text, it puts shapes next to each other that tend to be neighbours and the results are hilarious. (Look up the AI Waffle House and In-N-Out signage)
My question is why the AI can’t grasp the concept of people have five fingers and if your picture has more that means you messed up. Teeth makes more sense because yeah you could have basically any number visible but fingers it’s pretty much always five maybe a couple less in very rare instances but def never more.
@@monhi64 I would say that the majority of photos of hands do not show all five fingers. Look at your own hands holding objects at different angles to you. How many fingers do you see? Also, photos of people with two hands in the frame will have up to 10 fingers. AI does not care that they're on two different hands. It doesn't solve for problems like that. AI is trained basically like: these pixels next to each other are called this. Here is another set of pixels next to each other called the same thing. Over and over... Then can you (AI) tell what the similarities are? Basically, I'm trying to say that AI doesn't "grasp concepts" as you wrote in your comment. It doesn't have concepts. It's basically just very advanced pattern identification, without any additional "thought" behind it.
To be fair drawing hands is really hard irl, it's always the number one thing new artists struggle with when starting out. I have been painting for 25 years and i still get anxious whenever i have to paint hands doing anything advanced.
when I found out about that, it suddenly made sense why AI art generators struggle with them; they're at that same stage in their life as the newbie artist that is just starting out. Just like the newbie artist, I'm sure eventually that will get sorted out.
Here's a thought: What about making use of a 3D model, like you'd use for a game character, to create training data? Program in some constraints so it can't make any impossible or painful poses, then render a tonne of random poses with a few hundred random orientations each to give the neural network a decent idea of what a hand looks like.
thought about this to. I imagine its not done because people want the AI training on "real" things. But using 3d to train it for attributes and amounts (like the number of fingers) probably would probably be good. With the alternative just being more data A hundred random orientations is probably not enouth tho. Could do thousands since you can just rig and reuse the model and let it feed.
This is exactly what I have been thinking throughout this video. I believe this is possible as I've learnt that Disney has built a similar "learning program" to animate hair and water that is used in movies such as frozen ii.
This analogy of the museum reminds me of "Mary's room" a philosophy experiment. While it is not originally made for the AI question, it asks if there is extra knowledge generated through the conscious experience vs just the physical descriptions of something. Which in this case would be comparable to if there is more understanding gained through the human experience of we know how hands work vs. computing every pixel of a hand image.
Lately I went to a Contemporary Art Museum and saw a movie created entirely with AI. Everything looked perfect except for the hands (and some faces). The overall effect was so creepy that I had to get out of the screening room. I was literally scared.
This is a reminder for artists: draw from real life as much as you can, not just photos. Our understanding of volumes and structure over simple outlines and textures is what will set us apart from AIs
Why are artists trying to be better than ai? Just try to make your style as unique and interesting as possible, that's what will set you apart. Us chess players have already gave up long ago on ever getting better than an ai because they're a combination of all human efforts.
Until people figure out how to make them analyze 3D models to implicitly understand the rules. I agree with the others: using AI to improve is smarter than trying ro surpass it
@@thecousindeci1103 not to mention "better" is subjective. and technical skill is not the only thing that makes good art, just what our ableist colonial capitalist society values most. btw I'm a hyper surrealist... lol.
Lol what I do now is that I doodle and everything, but then I just use stable diffusion to automate stuff. Its still painting. But you baby the AI to the point your basically doing the work yourself. It just so happens to do the heavy lifting for you.
I find it interesting that one of the telltale signs of being in a dream is your hands being shaped abnormally. Not even our brains can make hands accurately, consciously or not.
I found from playing with imagine AI that while it also struggles with feet (including animal feet) it has an easier time with human feet than hands because there's so many more photos of feet on the internet... for reasons. 👀
It's not an unsolvable problem, they just have to change the training method. It can't be just with a blackbox trained on more pictures of hands, the programmers can provide a 3D mesh of the hands and limbs, and include the mapping of the mesh to people in a few hundred models until this specific AI learns how to map hands and limbs to the mesh correctly. Then apply this AI to modify existing images to produce nice looking hands, this could be a bit like how phone cameras apply a moon filter to make 100x zoom moon shots look detailed.
@@samthesomniator No, not really. In a neural network (the AI thing), every neuron performs a certain operation on the inputs it receives. This operation is fixed for a given neuron and doesn't change, neither while training, nor afterwards. All that can be adjusted are the weights which determine to what extend the respective inputs factor into the calculation. That's really different from how the neurons in our brains work. There, simply put, every neuron comes with a certain threshold. It then absorbs (electric) signals and---somehow---keeps track of the accumulated amount. Once this amount surpasses the threshold, the neuron fires---which means it forwards a signal of a certain strength to every neuron it is connected to. Again, adjustable weights factor in and determine how well the connections transmit the signals and what not. But the fundamental logic of the system is fairly different (and not yet particularly well understood). Also, I reject the notion that we are our brains controlling the body. We're far from our whole brains; we're but an emergent process taking place in it, a subroutine if you will.
This is why there are separate physical wooden artist articulated models of pairs of hands that you can get from high end art supply stores to help people draw hands. Another things to note, AI art usually is bad at not making perfectly symmetrical faces, and is bad at making faces in motion that have to show muscle movement (such as speaking specific letters/words, eating or licking the lips etc.).
At this point Phil's main channel videos are the treat and his Vox videos are like a second channel bonus. xD Give him more creative control, his main channel is honestly more everything to me than Vox ever managed. Absolute gem, that guy.
Personally, I kinda of hope that A.I. never truly gets the hands right, I mean yeah it would be kind of cool, but I think it would be really important to be able to tell what's real/made by a human artist between what was generated by an A.I.
I read a book about lucid dreaming. One tip to train yourself to check if your are dreaming, is to - when awake - create the habit of consciously looking at your hands a couple a times throughout the day. Thus, when dreaming, you will find yourself remembering to look at your hands. This actually works!! And the thing is, your dream-generator has difficulties with hands too: the number of fingers keeps changing, and the hand’s shape keeps changing - that’s when you can realize you are dreaming and can get to having fun dreaming lucid.
Midjourney v5 largely seems to have hands handled. And realistically, they could feed it a lot of 3d model based images, and maybe just do style transfer to make them more real looking, and you'd have a solid synthetic dataset.
At one point AI will start training with 3D video cameras filmed just for it. Like, they will film a 360 video of a person and will tell it that’s how a person looks from all sides.
aaaaaww it was so nice to see Stan! :) amazing video, as always! loved the editing 😄😄 i really liked the idea that the standard for hands being accurate is much higher than for other stuff it's a common thing among artists to see these discrepancies in AI art pretty quickly - like lines that go nowhere, furniture that makes no sense, seriously messed up anatomy. but since the overall look, the light, the colors are good, the usual viewer doesn't see that
That was very well explained without getting too technical. Another reason is we understand hands from the inside. Our pattern recognition has been trained by moving our fingers and looking at the result. You can see babies do this sometimes. Perhaps the chap whose first love is robotics should create a humanlike robot hand and then have a feedback system where the AI can self train while adjusting the hand.
@@dibbidydoo4318 That seems to just push the problem one step down the road. Unless there is a test against reality, how do we "reward" a correct hand posture guess during training?. Grading all the guesses with human input seems tedious and there are all sorts of confounding factors like wearing mittens, jewellery, knuckledusters, hands holding each other, shadows, people who actually have joined fingers or an extra finger etc.
i’ve wondered for a while if using an ai to generate a rig or skeleton of the most prominent few people in an image, and maybe some basic shapes that they could be interacting with, if that could then be used in the training process and later generated first as part of generating an image to get a better quality human form. the first ai has to fit all the fingers somewhere and they would be stored in a non-image format so they could be seen behind occluding forms by the second ai. that way it always gets the same number of fingers, and doesn’t need to know as well how many fingers or arms there should be, because it’s constrained by the rig. it would also open up the possibility to input the rig manually, or have it only change a little between frames in an animation, etc.
This is a really good visual representation of why you need to be careful when using/ relying on generative models. For example they might know how to provide a code example that works but they don't know the rules that we would expect it to adhere by e.g. security principles. Especially because they're trained on a lot of data which also doesn't consider good practice rules.
this stuggle is nonexistent for the Pro, only amateur, or hobbyist have this problem. that's why the game you play and the Movie you watch, all have perfect hand, perfect environment, perfect everything. that's what "Production Grade" Meant. AI is good for producing one single Image, but one single image is not a Movie or Game, it's not a "Product". one single image with mutated hand is worth nothing, you can't sell it. just create an AI generated video and compared it side by side with the real movie. can't even hold a candle.
When I tried using AI image generators for the first time, the first thing I noticed was that it generated some weird animals. Particularly stylised/non-photorealistic animals. A Turtle is a carapace plus a plastron plus some things that stick out from between them, like a number of legs and heads and tails and whatnot. 🐢
Our of curiosity, I just went and did 3 prompts in Midjourney (03.2024) and the results were interesting. So first, was a woman holding an apple. Surprisingly, it wasn't too bad. Only one picture had completely messed up fingers, others were okay (like if something is off, it didn't jump right at you). The second one was: a person holding an open umbrella. AI "cheated" its way out of the hands situation, by turning the person around (so no hands visible, only back) on 3 pictures out of 4. On the 4th one hands were not too bad (both of them "holding"), but there was no umbrella handle, so they were holding a thin air. And lastly, "a woman holding an umbrella in her hands" - pictures tuned out very well, but once more, AI "cheated" and on 3 pictures out of 4 cut out hands, like you can see a face and umbrella, but only a little bit of fingers, as they are out of the shot. Only one contained a full palm holding a handle and it looked very realistic..., even though it had 6 fingers :D
Human artists, even really good human artists, struggle with hands. Go to any art gallery and see how well-known artists often "cheat" by hiding hands when painting people. Seems like the AI is learning this same technique.
I remember in the original Westworld movie you could always tell the androids from the hands. At the time, I thought that was a bit weird and forced, but I'm starting to come around to their way of thinking!
A lot of actual artists struggle with hands also. Because they are so flexible and able to move in different weird ways. Ways that we would normally hold things, when pointed out x don't look natural at all hands are complicated and difficult to reproduce
It also struggles with arms and legs. I've made prompts like "two men boxing" and once if their arms with be connect to the other person, then there will be another are or a missing arm or three legs on one person.
the thing with transformer models is that they take the same time to produce "simple" outputs and more complex outputs. it's a direct mapping from an input to an output (with a random seed), so it doesn't try again if it messes up catastrophically. if you draw hands, you're probably going to iterate on the drawing, perfecting the hands over time. Dall-E will give you its first shot at it
Try feeding the output back in and see if you can get it to refine things. The popular Automatic1111 UI for Stable Diffusion has a "Send to img2img" button where you can easily put the current output (both image and prompt) into the input for the img2img mode.
Yeah thats the problem with how Ai systems are made lowkey. They dont really have the capability to reflect back. Maybe it would be cool if people added an extra step for "fixing the image' while still remembering its first attempt
Since a year this has changed as you may have seen : AI can draw hands. I don't know which prompts are necessary, but the overall level (models) have grown.
That's why you use img2img! 😉 We are also acting like all artists are good at hands which as an artist myself I know I've struggled with for a looooooong time and they are honestly never easy.
Every time the models get slightly less error prone a bunch of people declare the hand problem "fixed" as if it's a puzzle with a definite solution that we've finally cracked. Yet I continue to see images with weird looking hands fairly often. As the models improve it'll mess things up in general less and less, but there will still be things it's better and worse at. It's not just hands, it's text, musical instruments, muscles, animal anatomy (especially bugs), machinery, board games. Basically, things that are complex, not always the same, and where the details matter.
So the main message is : AI art is not reflecting 3d reality because it is not trained on 3d reality. You can also see this when you look exactly at the image at 8:36. If light would fall through trees like that the sun would be very very close.
Midjourney's V5 is pretty good at doing hands. You can pretty easily get a very realistic image of a hand. I tested this out and it worked almost every time but it occassionally it would give me a hand with 4 fingers. These AI image generative technologies are adapting super fast so these kinds of videos get outdated in faster than ever
Now I’m extremely curious how AI is at drawing feet? There has to be substantially less images of feet for it to learn how to draw feet. On the other hand (pun intended), feet aren’t as intricate but they have their own complexities. Has anyone else thought of the foot question?
I suppose AI could mess feet up less than hands as compared to hands feet aren't as flexible. Also it seems AI messes up the fingers more than the palm of hands. And as toes are a lot smaller(lenthwise) and is less bendable, AI supposedely should struggle less with it.
It still messes up feets alot, especially shoes where it might draw the overall design backwards in some angles. Its not horrible but its just as bad of a problem.
The point about crowdfunding quality control from people is interesting because, if I'm not mistaken, that's more or less the byproduct of some Captcha programs. Yes they were "testing to see if you're not a robot" but really in the background it was helping computers get better at identifying objects and text by having humans type in what an image it didn't know was portraying (along with ones it already knew so it can make sure you're being accurate)
Is like us, humans struggle to draw hands aswell But is just a matter of time to it master this, actually MidJourney V5 is already doing some pretty good hands.
Considering how much human artists struggle with hands, I’m not surprised the AI can’t do it
I like how outdated this video is already.
As of right now, Midjourney draws perfect hands 9/10.
AI just need more data to learn. Give it months and AI will learn faster than any humans
It shouldn’t be hard to just study an anatomy book
@@futon2345 but that’s the point of the video, the anatomy allows for so much variety that’s hard to translate into a 2d image
@@shivanibatra7659 idk it’s not hard for me and my classmates with practice but then again I’m human
My grandfather is a semi famous artist and he gives the family art that he messed up. It's usually the hands that he messed up
Whose he
You’re grandpas got cool art!
That's so cool!
It's a different kind of mess up, because this AI isn't anything like humans. Humans still easily get the right amount of fingers right, even a child does. These AI (ML) tools are just forming shapes from tonnes of other people's art turned into a weighted matrix. That's not what humans do, humans learn to conceptualize the kind of thing they want to draw and go about many steps towards constructing it using a deep understanding of what the thing is.
So the ai hands are bad because you can't draw hands?
as someone who went to art school, and was required to take a course on drawing hands, I can confirm: drawing hands is hard.
not if you draw it like this all the time: 🖐🤪
It's hard but you won't make the same kind of mistakes
as someone who draws for fun, I can confirm: it is hard. that's why I usually choose positions where they aren't visible lol
Yeah but at least you understand what hands are. AI doesn’t lol
Once you get good at them, it's actually probably the most fun thing to draw. Besides the human ear, my no.1 fav. ^^
In the lucid dreaming community - one of the most reliable "reality checks" is inspecting your hand and confirming if you have 5 fingers. For whatever reason, the brain has a difficult time generating a five fingered hand while dreaming. It's kind of a creepy coincidence that AI has the same issue.
Same reason people generally have trouble drawing hands from memory or imagination. I would bet that artists, particularly animators or illustrators, who have to express action and emotion in hands and limbs, don't lack an ability to dream "correct" hands.
Interestingly animals don't know how many legs is normal, that our hands are tied to our body, that humans are not supposed to have a face behind etc.
@@DiscoFang untrue, no matter how good you are as an artist, hands will always be a nightmare
@@GabeHowardd Very funny. (:
@@GabeHowardd As an artist I can confirm this
It's weird how humans can instantly determine when something looks wrong, but the same humans cannot necessarily correct it or make it right from scratch. As a beginning artists there's a weird rift between your mind's eye and your skill.
Agreed.
I'm missing where's the weird part in all this... We observe and sense far more frequently than we create, and both are different skillsets that need practicing to master. When something goes against those ingrained patterns that you've built over thousands of hours of observing, you'll sense it because the result doesn't meet your expectations. You'll know when a circle isn't a circle, but only if you've mastered making one enough will you be able to make a perfect one. If you've never done any martial arts an incoming kick might startle you, but when you're an experienced practitioner you may instantly sense that the kick was never going to hit you in the first place based on its motion and all kicks you've seen before, and you'll not even flinch. Different situation, same principle, and it works in all learning.
It's called millenniums of evolution. If you can't quickly tell is something is close or far away or a predator or your mom then you wouldn't have survived
Everyone's a critic
This reminds me of P vs NP: "It's hard to solve a problem, but it's easy to verify the solution"
i love how not so fast at explaining this video is and really having a calm music. we need these types of videos more. thanks Vox!
I don't mean to be rude but maybe you've been looking for calming videos in the wrong places lol cos they exist plenty. Look up gregorian chants, for instance.
You know it’s hard to draw hands, when even AI struggles with it.
That’s true. But what’s mad is that nobody would’ve said that 2 years ago. That’s how far it’s come
@@dundermifflinity Exactly. It's funny how he says "even AI struggles" as if it is a benchmark to reach. Goes to show how far it has progressed recently.
Feels like all of this happened so quickly, almost overnight. Unreal
@@anmolagrawal5358 >almost overnight
it's kinda interesting to think about. machine learning models have almost certainly been under development for the past couple of years, but it's only been in the last few months that they've been publicly released. and the interest and demand they've generated just encourages the developers to work faster and harder to create higher quality models
Another proof that we are living in a computer simulation
Just look at your hands within a dream
@@hiddendrifts iirc from our university lectures, machine learning has been around before the 2000s. It's just that nowadays, we have more data and more processing power to train models with.
Thanks for the talk, Phil! We live in some interesting times for art.
Now, back to practicing drawing hands! 😅
Woah hey proko
I guess you're a precious knowledge that ai want to take it😅
I remember watching Proko's lessons about anatomy years ago. Drawing hands from imagination was one of the most difficult part for me. I think I am just an AI lol.
My man Proko featuring in a Vox video! Nice!
i was so surprised when you showed up!! One of the greatest teacher and artist i know!!
Your videos are a true testament to your passion for creation.
If you know a thing or two about sewing, you notice pretty fast that AI is also terrible about clothing. Buttons merging into zippers, fabrics changing textures and weights, folds appearing and disappearing without seams, those are all things you see commonly in AI art but people don't notice as much because your average AI artist isn't a seamstress.
Not a seamstress (or seamster?) but I do notice! Drapes was one of my favorite things to draw in school but AIs so often get all the details wrong. They don't have the structural knowledge they need to make everything convincingly enough.
This is a really interesting point!
This is also the case with AI architecture. The details are super janky and nonsensical when you look closely.
Interesting! Makes you wonder what else AI gets wrong that only people with certain skills or knowledge would notice. I guess AI isn't the crafty know-it-all artist we thought it was. At least not yet.
Not sure what is interesting about this take. AI is making all of these images with little context. It doesn't know what a material is or how it should behave. It just "knows" what something looks like on average, basically. If you are shocked or surprised by that you fundamentally misunderstand how it works and are expecting it to generate images based on parameters it just doesn't account for.
Hands are tough for humans too. Ask any artist what they have struggled with the most, and the answer will be hands, followed closely by feet.
After years of practice, I still find feet harder than hands 😅
Sigh, it's a different kind of struggle. Please don't act like this is evidence the AI is similar to humans.. Humans struggle at the perspective of hands, they don't struggle with the number of fingers, even a child gets that right. This is evidence of how this "AI" is purely algorithmic, merging data from millions of pieces of images. Humans do not create art this way, we create from a deep understanding and many other factors that cannot be quantized by a layered neural net.
Yeah, hands are so hard to draw that i end up drawing 7 or 4 fingers.
Humans struggle so hard they totally draw 4 or 6 fingers, not, we struggle in different part of hands compared to AI.
Definitely! I understand why hands are difficult but it's sort of mind boggling how hard feet are. You wouldn't think it but it's very frustrating to try to draw feet well
At the moment, I noticed that at least some neural networks draw faces as a separate module, on top of the rest of the picture. The same should be done with your hands. There should also be a setting to “hide your hands” so that they simply end up behind your back, in your pockets, etc.
That last setting is simply including it in the prompt. It still won't always work, and even simple ideas like "facing the camera" or "facing forward" sometimes get you a back shot.
i'm impressed by vox's editing team every single time. the pixelated theme throughout this whole video is so good
Best yet, it's subtle. You can watch the video without noticing it. So many editors early in their career go over the top with their editing which...some people like (usually younger people) but is really bad by the standards of editing.
@@moomoocowsly they mentioned v5 in the video
@@moomoocowsly they literally mentioned Midjourney v5. You spent so much time writing this comment you didn’t watch the video.
@@moomoocowsly please see whole video 😂😂😂 you should have some patients
@Tinil0 oh wow! you create the rigid, never changing standards of editing!? I'm star struck, its so nice to meet you, i have so many questions about editing seeing as we've got an expert here 😁
As an artist, I will confirm, hands have an EXTREMELY low margin for error. There are many different body types, face shapes, limb proportions. Consequently, there's wiggle room. Not so with hands. People will still compliment most artwork that slightly misses the mark, but they will go silent if you mess up hands.
Biggest mistake even decent artists make is messing up the direction the hand and fingers are facing. Like someone draws a left hand palm out but the thumb starts on the right side. Technically you drew the hand right but the order is wrong and it's obvious when that hand is apart of a human/character drawing. I've been drawing for years and years and I still struggle with hands and foreshortening.
I think that in order to solve the "AI knows how things look, but not how they work" problem is to train the AI not only on images, but also on rigged models, like Blender models before you hit "Render." I personally find out how things work and what proportions they generally have by spending a few minutes fiddling with the object and studying it from different angles before trying to draw.
Edit: Sorry I'm late.
I think the problem is that those are entirely separate logic bases. It would be like training a single AI to perform both image recognition and audio recognition. While that might be possible, the complexity of the neural network involved would be exponentially greater than simply creating a separate AI for each task.
The image-generating AIs do not have any concept of the physical structure of the environments they create. All it does it to generate pixel patterns. If you tell it to draw a tree, it draws pixels. It has no idea what a tree is. But it has reference material labeled as "tree" and the pixels it draws are consistent with that reference. So both what it's trained on and what it produces are just colored pixels with labels. To try to expand that training to include 3D models and an understanding of structure and space and form would be an unimaginably daunting task, I think.
@@ecMathGeek But couldn't you use 3D software to generate billions of 2D images as base for AI to learn from? …instead of waiting for real hand photos to appear.
@@karoljankaminski5793 You could give it a huge array of hands to train on, but I don't think that would fix the main issue: the AI doesn't know what the structural rules are for hands. It only knows grouping rules for the pixels that are labeled as hands.
It looks for general patterns, not specific rules. So it can recognize that finger-like protrusions exists on most hand images, but it doesn't know how to count them, or what angles make sense, or whether showing the back or front of a finger makes sense, etc.
And since the AI isn't just copying and pasting hands it's seen on other images, it has to rely on the rulesets it has created to draw them from scratch.
Perhaps the real problem is that the rulesets for drawing hands are too general and contradictory? Some hands in images show all fingers, others might only show one or two. Some show them positioned at odd angles. Some will be holding something and others won't. And each variation implies different rules for how the hands should be drawn. If the AI is generalizing all of those contradictory rules into one set, creating abominations is almost inevitable.
“The AI knows how things look but not how they work”
I’ve gotten into so many frustrating conversations trying to correct friends and colleagues talking about chat gpt as if it had some internal logic and self-referencing reflective capabilities
it must have internal logic however, that is the whole point of training a network. like if you train a network to add two numbers, you will expect an analogue of a summing circuit to form using it's weights. otherwise it would simply be memorizing.
@@biocode4478 Thank you. You're completely right. Sure, gpt-3 is just a trained neural network, not a calculator, but through training from human data that includes a lot of logic, the neural net actually "learns" that. Now, it will contain the same mistakes that a human might make, but there is absolutely internal logic. Thanks for bringing this up.
example of dunning krueger
LLMs do have internal logic and with chain of thought reasoning even reflective capabilities
I don't like reading AI responses to customer service questions. The answers seem very hollow and rigid as if someone was reading an instruction manual.
At 7:57 he said something that resonated with me: "AI art is basically bad at art, we're just able to see it with hands". A lot of times, when you look closely at an AI generated image, you start to notice all kinds of strange things, like shapes that doesn't make sense, roads leading nowhere, details that are simply wrong. Will this change, and what will it take? Right now it seems that you either have to accept a lot of errors or "peculiarities" with AI generated images, or you have to do a lot of manual work to get it right.
He also followed by saying ''But both of these things are also a bit wrong". AI will indeed get better, and it is getting better. It's only a question of time until AI can do all of those things that the video said they weren't able to do at the moment.
@@deadeaded Are you sure? Because above; LuisPereira stated that "This is just nonsense that people who don't understand AI like telling themselves to feel better.
Much like humans, neural networks also conceptualize everything they draw, i.e. they also break down large complex shapes into patterns of smaller shapes and learn the patterns between them."
I don't know enough of Ai to know who to believe.
@@deadeaded Well, it should be possible by using the same technique we humans use: exploration of the real 3D world
@@deadeaded Well, without knowing exactly what that built-in structure does, it's hard to agree or disagree. I think that shouldn't be a problem, but i can be wrong. We'll see in the future.
It'll get better and better over time.
Nolan went back in time to be his younger self and explain complex stuff like these to us. Thanks Vox for bringing him aboard
I’ve tried to use it to make fun renaissance style paintings of modern scenes and I can say it also seriously struggles with feet (toes specifically) and keeping track of limbs in crowds. You wind up with disembodied arms and legs.
Yeah, it is all a moving target. It used to be getting a single person even just standing there might be mangled. Now a single person is likely to be very good, even the hands if they aren't doing something super complicated. Feet are also better than they used to be, but obviously haven't had as much attention as hands. And the more people and more details in general, the more likely something goes wrong. It'll keep getting better but the standards will keep getting higher too.
SUMMARY
AI art models struggle with drawing hands due to data size, data quality, and low margin for error.
AI models have limited exposure to hand images and lack annotated datasets to learn how hands work, unlike the abundance of face images available.
Hands are more complex and diverse in appearance and function than other body parts, making it difficult for AI models to learn and replicate them accurately.
AI models can create visually appealing art in many other aspects, but "hand-like" isn't sufficient, as humans expect more accuracy when it comes to hands.
Improvements in AI art generation could come from increasing computing power, using more human feedback, and encouraging people to rank the quality of generated images.
3:44 "It can make a beautiful skyscraper" - literally a box, which has many boxes inside, with clear geomethrical pattern.
But it shiny!
You could say basically the same thing about abs though? And yet... 9:35
and if you look carefully you'll see the skyscraper is also a bit creepy, especially its lines not straight enough, ragged, weird, the whole texture is not constant :D
can't you say the same thing about the shape of hands?
@@AustrianEconomist no you can't
hi, 11 months later, they can do hands now
Sure, but the number is still wrong. There will be 4 fingers where the finger is supposed to be visible and 6 fingers or even 7.
I really like how this simplifies the concept so people who are neither software engineers nod illustrators can start to understand how complicated all of this stuff actually is, even though the years of learning and practice that goes in is kinda invisible.
Your corny kid
The fact that AI struggles with hands means that it really became more like humans
Welp the time has come when humans lose their jobs
@Zaydan Alfariz The thing is that this would need a fundamentally new approach. The current way does exactly NOT work with anything 3D, it just learns patterns from 2D imagery. There is no feasible way to integrate 3d components into its 2d workflow in this way.
As someone who loves (and has always loved) drawing, I agree 100% lol hands are very hard to master
they already can draw hands, so yeh
@@nameless9084, yeah, no. It'll take so much more to do that.
Unless you can somehow teach it the thousands of rules, techniques, and other such things that artists can learn much easier, along with making sure that it actually understands how an idea is supposed to work in reality, artistry will never come to this concept.
You accidentally made the like button highlight at 7:50 when saying "button-like".
dang dude since when can the like button highlight like that?
@@Sir.light1Idk, you can do that to the subscribe button too. The buttons highlight when the person in the video says to like and/or subscribe, but they have to choose if the highlight effect is enabled, from what I know.
Woaah that's actually wild, I had no idea it did that
AI struggles with abs because we all struggle with abs.
not me )
Tell me about it. I like pasta too much. 🤪
I used to draw and paint semi professionally and the hardest for me was always hands and feet. Specially how intricate they can get and the multiple poses they can have and achieve. Our body is art itself because some of the things and poses we do are very weird and complicated and don’t even look unnatura.
No matter how bad you were at drawing hands, you surely never were as bad as AI.
I didn't have the same problem in drawing simple hands and feet. It takes time but at least you have to really watch your own extremities to understand their shapes and why they are. It's fascinating to look and contemplate, and it leads to me liking beautiful hands and feet hahaha.
If you break it down to the anatomy of hands and feet then it's clear why it's so difficult. There's a lot of tiny bones and muscles clustered in an very intricate way. The rest of the human body has a lot of long straight or curved bones. That's easier to understand and picture on your mind.
Did you also struggle with number of fingers🤣
@@Nat-oj2uc a TikTok kid here, everyone!
03:43 You do realise you just picked the worst example possible next to imagining hands? I want you to take a very close look at this symmetrical perfection of a skyscraper...
I learned about this recently and you reminded me of it, the "Chinese Box Problem" - the AI knows where the hands are, and generally, WHAT hands are, but it doesnt UNDERSTAND what they do or how they work. It can draw everything it knows about "hands", but that doesn't include complex things like "range of movement"
As a photographer I can tell you having my models know what do with their hands is one of the more difficult aspects of my craft.
I heard someone say that a good way to find good poses is to pretend like a part of your body hurts. For example you have a headache so you rest the back of your hand on your forehead. I'm not a model but the poses they were striking looked great and it was pretty funny watching them explain it at the same time.
A year later, and it's already gotten a lot better than this video shows.
There's a problem with any repeating features (teeth etc) because the AI works like autocomplete: after a finger tends to be another finger, then after that finger comes another finger, then ooh a finger! Usually there's a finger next to that!
You also see it in text, it puts shapes next to each other that tend to be neighbours and the results are hilarious. (Look up the AI Waffle House and In-N-Out signage)
My question is why the AI can’t grasp the concept of people have five fingers and if your picture has more that means you messed up. Teeth makes more sense because yeah you could have basically any number visible but fingers it’s pretty much always five maybe a couple less in very rare instances but def never more.
@@monhi64 I would say that the majority of photos of hands do not show all five fingers. Look at your own hands holding objects at different angles to you. How many fingers do you see? Also, photos of people with two hands in the frame will have up to 10 fingers. AI does not care that they're on two different hands. It doesn't solve for problems like that. AI is trained basically like: these pixels next to each other are called this. Here is another set of pixels next to each other called the same thing. Over and over... Then can you (AI) tell what the similarities are? Basically, I'm trying to say that AI doesn't "grasp concepts" as you wrote in your comment. It doesn't have concepts. It's basically just very advanced pattern identification, without any additional "thought" behind it.
@@monhi64It is because AI cannot grasp any concept. It does not have the ability to perform logic or understand concepts.
lol I looked them up.
noun and nonut really got me XD
To be fair drawing hands is really hard irl, it's always the number one thing new artists struggle with when starting out.
I have been painting for 25 years and i still get anxious whenever i have to paint hands doing anything advanced.
when I found out about that, it suddenly made sense why AI art generators struggle with them; they're at that same stage in their life as the newbie artist that is just starting out. Just like the newbie artist, I'm sure eventually that will get sorted out.
@@santosic The newest Midjourney models have already figured it out.
At least we know they only have five fingers, a top and bottom part, and nails go on the top and tips.
No new human artist would seriously draw 6 fingers. Not having skills to draw isn't the same as not having a clue what you're drawing
I like how outdated this video is already.
As of right now, Midjourney draws perfect hands 9/10.
8 months later and this is basically fixed. Incredible how fast this all is advancing.
Here's a thought: What about making use of a 3D model, like you'd use for a game character, to create training data? Program in some constraints so it can't make any impossible or painful poses, then render a tonne of random poses with a few hundred random orientations each to give the neural network a decent idea of what a hand looks like.
I think your idea points to the correct solution
thought about this to. I imagine its not done because people want the AI training on "real" things. But using 3d to train it for attributes and amounts (like the number of fingers) probably would probably be good. With the alternative just being more data
A hundred random orientations is probably not enouth tho. Could do thousands since you can just rig and reuse the model and let it feed.
you beat to it. And you were a lot more detailed in your solution!
This is exactly what I have been thinking throughout this video. I believe this is possible as I've learnt that Disney has built a similar "learning program" to animate hair and water that is used in movies such as frozen ii.
Same idea here. 2D models like pictures won’t be enough for AI to draw hands!
This analogy of the museum reminds me of "Mary's room" a philosophy experiment. While it is not originally made for the AI question, it asks if there is extra knowledge generated through the conscious experience vs just the physical descriptions of something. Which in this case would be comparable to if there is more understanding gained through the human experience of we know how hands work vs. computing every pixel of a hand image.
Vox, amazing video you deserve more subscribers
Great choice for a collaborator. Proko, Stan's channel, helped me get past a few challenges as an artist.
I'm glad I could help!
1:39 the ai 'being trapped in a museum' and only learning from pictures online, it reminds me of people who don't go outside
This is,hands down, the best video about the subject.
Lately I went to a Contemporary Art Museum and saw a movie created entirely with AI. Everything looked perfect except for the hands (and some faces). The overall effect was so creepy that I had to get out of the screening room. I was literally scared.
And everybody clapped for you for how brave you were... not
@@Valadion1 I was alone -- and honestly I do not quite catch your sarcasm.
AI gives me uncanny valley too
@@Valadion1 for real! Scared though? If anything I'd be fascinated
@@biohazard737 in a way it's kinda bodyhorror
This is a reminder for artists: draw from real life as much as you can, not just photos. Our understanding of volumes and structure over simple outlines and textures is what will set us apart from AIs
Why are artists trying to be better than ai? Just try to make your style as unique and interesting as possible, that's what will set you apart. Us chess players have already gave up long ago on ever getting better than an ai because they're a combination of all human efforts.
@@thecousindeci1103 Yes, I say use AI to make your art 10x better.
Until people figure out how to make them analyze 3D models to implicitly understand the rules. I agree with the others: using AI to improve is smarter than trying ro surpass it
And yet current day artists will keep saying " realism is not art" 😏....
@@thecousindeci1103 not to mention "better" is subjective. and technical skill is not the only thing that makes good art, just what our ableist colonial capitalist society values most. btw I'm a hyper surrealist... lol.
Today, 9 months later, AI has gotten so much better at hands.
nope they still have not
I liked doodling but sorta stopped because drawing hands was too difficult. Glad to know AI is struggling as well.
It’s not struggling. Midjourney V5 makes hands extremely accurate. This video is dated.
once you get the technique down youll be good at it
Lol what I do now is that I doodle and everything, but then I just use stable diffusion to automate stuff.
Its still painting. But you baby the AI to the point your basically doing the work yourself. It just so happens to do the heavy lifting for you.
I find it interesting that one of the telltale signs of being in a dream is your hands being shaped abnormally. Not even our brains can make hands accurately, consciously or not.
I found from playing with imagine AI that while it also struggles with feet (including animal feet) it has an easier time with human feet than hands because there's so many more photos of feet on the internet... for reasons. 👀
so drawing hands is so hard that even an AI can't do it
1:05 I felt that
Phil shoving that drawing into the chair is so deeply real to me about creating things.
It's not an unsolvable problem, they just have to change the training method. It can't be just with a blackbox trained on more pictures of hands, the programmers can provide a 3D mesh of the hands and limbs, and include the mapping of the mesh to people in a few hundred models until this specific AI learns how to map hands and limbs to the mesh correctly. Then apply this AI to modify existing images to produce nice looking hands, this could be a bit like how phone cameras apply a moon filter to make 100x zoom moon shots look detailed.
Even we as humans struggle to draw some simple hands, so it's understandable
Well. You are a neural network as Well 🤷🏻♂️😅
Yeah hands are so hard to draw that i end up drawing 7 or 4 fingers
@@samthesomniator you should talk for yourself
@@yashwardhansingh4787 Aren't we all?
@@samthesomniator No, not really. In a neural network (the AI thing), every neuron performs a certain operation on the inputs it receives. This operation is fixed for a given neuron and doesn't change, neither while training, nor afterwards. All that can be adjusted are the weights which determine to what extend the respective inputs factor into the calculation.
That's really different from how the neurons in our brains work. There, simply put, every neuron comes with a certain threshold. It then absorbs (electric) signals and---somehow---keeps track of the accumulated amount. Once this amount surpasses the threshold, the neuron fires---which means it forwards a signal of a certain strength to every neuron it is connected to. Again, adjustable weights factor in and determine how well the connections transmit the signals and what not. But the fundamental logic of the system is fairly different (and not yet particularly well understood).
Also, I reject the notion that we are our brains controlling the body. We're far from our whole brains; we're but an emergent process taking place in it, a subroutine if you will.
Another thing to note - AI also has trouble with things like glasses, that exist in databases of faces and are annotated for.
the fact that this wasnt even a year ago and is pretty much outdated
This is why there are separate physical wooden artist articulated models of pairs of hands that you can get from high end art supply stores to help people draw hands.
Another things to note, AI art usually is bad at not making perfectly symmetrical faces, and is bad at making faces in motion that have to show muscle movement (such as speaking specific letters/words, eating or licking the lips etc.).
I find it hilarious that even ai struggles with drawing hands. I remember in school the one thing that most people found hard to do was draw hands
At this point Phil's main channel videos are the treat and his Vox videos are like a second channel bonus. xD
Give him more creative control, his main channel is honestly more everything to me than Vox ever managed.
Absolute gem, that guy.
1:26 I didn’t think Proko would find me slacking off watching a Vox video
Personally, I kinda of hope that A.I. never truly gets the hands right, I mean yeah it would be kind of cool, but I think it would be really important to be able to tell what's real/made by a human artist between what was generated by an A.I.
I read a book about lucid dreaming.
One tip to train yourself to check if your are dreaming, is to - when awake - create the habit of consciously looking at your hands a couple a times throughout the day.
Thus, when dreaming, you will find yourself remembering to look at your hands.
This actually works!! And the thing is, your dream-generator has difficulties with hands too: the number of fingers keeps changing, and the hand’s shape keeps changing - that’s when you can realize you are dreaming and can get to having fun dreaming lucid.
Midjourney v5 largely seems to have hands handled. And realistically, they could feed it a lot of 3d model based images, and maybe just do style transfer to make them more real looking, and you'd have a solid synthetic dataset.
Nah. Marked improvement, but still needs work.
At one point AI will start training with 3D video cameras filmed just for it. Like, they will film a 360 video of a person and will tell it that’s how a person looks from all sides.
aaaaaww it was so nice to see Stan! :)
amazing video, as always! loved the editing 😄😄
i really liked the idea that the standard for hands being accurate is much higher than for other stuff
it's a common thing among artists to see these discrepancies in AI art pretty quickly - like lines that go nowhere, furniture that makes no sense, seriously messed up anatomy. but since the overall look, the light, the colors are good, the usual viewer doesn't see that
Phil's good at what he does! It was a great chat.
5:03 wait that’s my Professor’s name lol
That was very well explained without getting too technical. Another reason is we understand hands from the inside. Our pattern recognition has been trained by moving our fingers and looking at the result. You can see babies do this sometimes.
Perhaps the chap whose first love is robotics should create a humanlike robot hand and then have a feedback system where the AI can self train while adjusting the hand.
not really necessary, we already have a pattern recognition that can detect the pose of hands and it has been applied to an AI art generator.
@@dibbidydoo4318 That seems to just push the problem one step down the road. Unless there is a test against reality, how do we "reward" a correct hand posture guess during training?. Grading all the guesses with human input seems tedious and there are all sorts of confounding factors like wearing mittens, jewellery, knuckledusters, hands holding each other, shadows, people who actually have joined fingers or an extra finger etc.
@3:18 'fingers don't bend like this' - well, I beg to differ. Just look at my hands! Oh wait, you can't... but believe me, mine can.
i’ve wondered for a while if using an ai to generate a rig or skeleton of the most prominent few people in an image, and maybe some basic shapes that they could be interacting with, if that could then be used in the training process and later generated first as part of generating an image to get a better quality human form. the first ai has to fit all the fingers somewhere and they would be stored in a non-image format so they could be seen behind occluding forms by the second ai. that way it always gets the same number of fingers, and doesn’t need to know as well how many fingers or arms there should be, because it’s constrained by the rig. it would also open up the possibility to input the rig manually, or have it only change a little between frames in an animation, etc.
0:47 Idk why but looking at that hand at my dark room in bed at 12:21 AM got me scared.
This is a really good visual representation of why you need to be careful when using/ relying on generative models.
For example they might know how to provide a code example that works but they don't know the rules that we would expect it to adhere by e.g. security principles. Especially because they're trained on a lot of data which also doesn't consider good practice rules.
Also they dont really have long term memory. They take forever to learn so they ain't all that adaptable either.
That generative image software "AI" can't render human hands is so wonderfully poetic.
It is no coincidence that Leonardo Da Vinci studied anatomy in such depth, in an age when that was particularly difficult!
not when you are rich and have the ruler loving you.
@@xBINARYGODx it still was basically heretical to do autopsies
its comforting to know that even ai struggles to draw hands as much as I do
this stuggle is nonexistent for the Pro, only amateur, or hobbyist have this problem.
that's why the game you play and the Movie you watch, all have perfect hand, perfect environment, perfect everything.
that's what "Production Grade" Meant.
AI is good for producing one single Image, but one single image is not a Movie or Game, it's not a "Product".
one single image with mutated hand is worth nothing, you can't sell it.
just create an AI generated video and compared it side by side with the real movie.
can't even hold a candle.
@@jensenraylight8011 just wait, this ai will put these so called pro out of business soon.
Best video on the subject. Hands down
I find it very interesting how because hands are the hardest body parts to draw, that AI also struggles with that as much as we do.
Depends on the hand pose and the artist.
Not for long.
Right?!
I like how outdated this video is already.
As of right now, Midjourney draws perfect hands 9/10.
@@pt9845 Yep but now ask it to draw a load of bare feet.... 😕
Probably cause apples look very similar but hands and fingers look very different depending on what object its holding or how it’s positioned.
4:25 saved you time
When I tried using AI image generators for the first time, the first thing I noticed was that it generated some weird animals. Particularly stylised/non-photorealistic animals. A Turtle is a carapace plus a plastron plus some things that stick out from between them, like a number of legs and heads and tails and whatnot. 🐢
Critics: AI you can't draw hands
AI: Wrong, I'm Picasso
Our of curiosity, I just went and did 3 prompts in Midjourney (03.2024) and the results were interesting. So first, was a woman holding an apple. Surprisingly, it wasn't too bad. Only one picture had completely messed up fingers, others were okay (like if something is off, it didn't jump right at you). The second one was: a person holding an open umbrella. AI "cheated" its way out of the hands situation, by turning the person around (so no hands visible, only back) on 3 pictures out of 4. On the 4th one hands were not too bad (both of them "holding"), but there was no umbrella handle, so they were holding a thin air. And lastly, "a woman holding an umbrella in her hands" - pictures tuned out very well, but once more, AI "cheated" and on 3 pictures out of 4 cut out hands, like you can see a face and umbrella, but only a little bit of fingers, as they are out of the shot. Only one contained a full palm holding a handle and it looked very realistic..., even though it had 6 fingers :D
Human artists, even really good human artists, struggle with hands. Go to any art gallery and see how well-known artists often "cheat" by hiding hands when painting people. Seems like the AI is learning this same technique.
1:00 meme material 😂
I remember in the original Westworld movie you could always tell the androids from the hands. At the time, I thought that was a bit weird and forced, but I'm starting to come around to their way of thinking!
9 Months later, and hands are now accurate
You know this is old, because now a year later AI already able to make more realistic hands.
A lot of actual artists struggle with hands also. Because they are so flexible and able to move in different weird ways. Ways that we would normally hold things, when pointed out x don't look natural at all hands are complicated and difficult to reproduce
Pretty much every artist knows how many fingers to draw though lol
It also struggles with arms and legs. I've made prompts like "two men boxing" and once if their arms with be connect to the other person, then there will be another are or a missing arm or three legs on one person.
the thing with transformer models is that they take the same time to produce "simple" outputs and more complex outputs.
it's a direct mapping from an input to an output (with a random seed), so it doesn't try again if it messes up catastrophically. if you draw hands, you're probably going to iterate on the drawing, perfecting the hands over time. Dall-E will give you its first shot at it
Try feeding the output back in and see if you can get it to refine things. The popular Automatic1111 UI for Stable Diffusion has a "Send to img2img" button where you can easily put the current output (both image and prompt) into the input for the img2img mode.
Yeah thats the problem with how Ai systems are made lowkey.
They dont really have the capability to reflect back.
Maybe it would be cool if people added an extra step for "fixing the image' while still remembering its first attempt
8:52 "a handful of images" 💀
As an artist, it's very comforting to know that even AI struggles with drawing hands
One year later and AI do it right most of the time.
Will Smith: "You can't draw hands."
AI: "Can you?"
Will Smith slaps.
Well im here from the future to let you know it dosen't stuggle anymore.
Since a year this has changed as you may have seen : AI can draw hands. I don't know which prompts are necessary, but the overall level (models) have grown.
No it can't. Unless your standards of what a hand is have plummeted.
@Tifinagh.sora is literally 3d version of dalle 3
That's why you use img2img! 😉
We are also acting like all artists are good at hands which as an artist myself I know I've struggled with for a looooooong time and they are honestly never easy.
Funny how this was no longer a problem, merely a few months after this video came out..
Every time the models get slightly less error prone a bunch of people declare the hand problem "fixed" as if it's a puzzle with a definite solution that we've finally cracked. Yet I continue to see images with weird looking hands fairly often. As the models improve it'll mess things up in general less and less, but there will still be things it's better and worse at. It's not just hands, it's text, musical instruments, muscles, animal anatomy (especially bugs), machinery, board games. Basically, things that are complex, not always the same, and where the details matter.
In Renaissance period there were artists who were just hired to do hands because they are inherently difficult to do.
In Stable Diffusion, you can use Control Net to give the AI more information about how the hand should look and be positioned
So the main message is : AI art is not reflecting 3d reality because it is not trained on 3d reality.
You can also see this when you look exactly at the image at 8:36. If light would fall through trees like that the sun would be very very close.
Midjourney's V5 is pretty good at doing hands. You can pretty easily get a very realistic image of a hand. I tested this out and it worked almost every time but it occassionally it would give me a hand with 4 fingers. These AI image generative technologies are adapting super fast so these kinds of videos get outdated in faster than ever
Now I’m extremely curious how AI is at drawing feet?
There has to be substantially less images of feet for it to learn how to draw feet.
On the other hand (pun intended), feet aren’t as intricate but they have their own complexities.
Has anyone else thought of the foot question?
I suppose AI could mess feet up less than hands as compared to hands feet aren't as flexible. Also it seems AI messes up the fingers more than the palm of hands. And as toes are a lot smaller(lenthwise) and is less bendable, AI supposedely should struggle less with it.
your feet is a lot less flexible than your hands. you can curl your toes, but otherwise your feet are pretty much always in the same exact position
It still messes up feets alot, especially shoes where it might draw the overall design backwards in some angles. Its not horrible but its just as bad of a problem.
they will feed the model the library of Quentin Tarantino's movies.
Lol more foot pics than hand pics online
Even my art teacher avoids drawing hands, somehow drawing hands are too hard that even ai avoids/messes up hands
The point about crowdfunding quality control from people is interesting because, if I'm not mistaken, that's more or less the byproduct of some Captcha programs. Yes they were "testing to see if you're not a robot" but really in the background it was helping computers get better at identifying objects and text by having humans type in what an image it didn't know was portraying (along with ones it already knew so it can make sure you're being accurate)
I love when he adds a side note he literally turns his head to the side and denotes it on another camera (angle)
He treats the second camera as the skeptic.
Mah man Proko done got himself in to a Vox video... I feel proud for some reason.
Is like us, humans struggle to draw hands aswell
But is just a matter of time to it master this, actually MidJourney V5 is already doing some pretty good hands.
v5 pretty much solved this
Yes. This video was made before Midjourney V5. It just released after the fact unfortunately.
@@vectoralphaSec A day is a moth, and a week is a year considering AI evolution, so it's ok to be outdated considering this subject.