Encoder: "░▒▓█▒▒▒||█░||▓" RNN: "Totally reasonable Encoder, can do!" Cary: "But what is it that you can do?" RNN: *_*SCREEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEECH*_*
You should use MMD dance data. It'll be much more compact, because it's just 3D animation data, and the dances will be more involved with a third dimension. Also there will be shitloads of samples available.
VMD could be a difficult file format to work with - unlike with VSQ*, it's not obvious how VMD is encoded. I think you have the right idea. The homebrew motion capture community probably has a system to easily read Kinect motion data. With motion tracking, things like dancers having different builds, heights, or hair cuts wouldn't affect the video. It would be might easier to read, as well. *VSQ is a modification of INI format
Senshi Sun I think what he should do is compile a bunch of random motion datas, apply them to a base model, and render out each sample and then put them all into the AI.
Legal, eu gostaria muito de saber mais sobre os auto-encoders mencionados no vídeo e do que eles são capazer... talvez com alguns exemplos de como eles estão sendo usados em pesquisas hoje em dia
Nobody Actually Cares More About Grammar Or Punctuation On The Internet More Than Their Desire To Prove That Everyone Else Is Wrong And That They Are Clearly Superior To Everyone Else
Welcome back Cary :D This video was great to watch and I can't wait to see your AI/TWOW content in the future! We've missed you (but not completely since there's still humany but you know what I mean)
He could just develope an ai that animate the face of his stick figure that would just animate itself according to what he says or just do it random atleast bettter than his looped animations..did u see its face while talking its just in the loop
How about making a RNN that animates a stick figure of yourself, based on your ranting. Making it animate the lip-flap would be too easy. It needs to read your mood from your tone of voice and produce expressions and arm movements as well. You can then use it to help make all your future videos faster! :)
Damn that could be something! But also, the animation's mouth movements for example, must be based on the voice of Cary, which might lenghten the training period as it also needs to recognize speech properly (maybe existing speech recognition can be used, but there isn't any 100% accurate one out there), and the mouth's shape for each sound of the speech (like AA, EE, OO, etc) and maybe more things which I probably don't know about. But it'd be interesting if Cary gives it a try!
PERSONALLY I think it would be a bad idea to do shorter length/bitesize/5 minute videos. What I missed the most from when you were gone and what I found myself rewatching the most was the epic, hour long evolution sim videos. The only way I see something like that working is still doing those epic long videos but broken up into parts like your first evolution sim video or into a series like the algislocathlon (I did that from memory don't judge my spelling ;---;) ANYWAY Welcome back Cary! We missed you and we're glad to have you back.
Do this for your animations. Convert lip movement too animation frames. Pair animation frames too auto. Train a AI to be able to auto animate your audio
12:01 Cary: So RNN, come back here! I know I made you scream in agony twice, but you know what they say; third times the charm! RNN: Three times a charm? More like three times a harm!
You should include the music to the AI as well. Maybe on every frame, take a windowed FFT of the audio and bin it into like 20 different frequency bands. It would be interesting to see how the AI adapt to whatever the music you throw at it.
Dance moves aren't based just on the sound at that very instant, but also the sound in the instants before, you would need to get several sound samples for each frame.
I was literally W H E E Z I N G when i saw it punching over and over again, I would request a loop of the dancer punching over and over again without stopping, If you could make it for me i would be so happy and i would appreciate all your videos
Did anyone notice the funnel of the machine at 2:02 said, "The quick brown fox jumps over the lazy dog. Also, the quick brown dog jumps over the lazy fox. But this fox isn't so lazy. The dog gets eaten."
14:26 Cary: **looks in magnifying glass** Me, whos watching in mobile: cary... why are you looking at an upside down triangle? Also while writing this I realized that carys name is one letter off of my name, so I guess I am cory keyhole, cory kitten hugger, etc, etc
Why would you need an AI for animating the lips? Why not just write (or use, I’m sure it already exists) an algorithm that takes a transcript (handwritten or using existing speech recognition (which I know is probably still technically an AI)) of what you’re saying as input and then move the mouth? I’m sure there are some parts that you’d have to manually do, eg screaming, but it’d be a lot more reliable and robust than an AI based on the audio. If I were to code it I’d mine a dictionary for the International Phonetic Alphabet (or some other pronunciation respelling) representation of each word. Then just figure out what mouth shape you make and how long you make it for each sound and put it all together into an animation. Obviously you’d probably still need to tweak it some more, depending on how time-accurate your transcript is, and that might be where an AI could help. But, I still don’t think an AI would be robust enough for the whole process, especially for a pretty discrete animation where if it picks the wrong mouth shape it’s pretty noticeable. Whereas if you were to just use it to help with temporal alignment, it being wrong would only show up as a small offset, less noticeable.
I heard jk Rowling had to sign contracts for all sorts of random stuff like Hogwarts mystery, which had an energy system and micro transactions. I guess everyone’s making fun of the series now T_T
I think you should not give up that long format because it is really really interesting. But you could, and this is just an example, make a series about your evolution game where you explore different movements and play around with stuff witch is in my opinion better suited for a shorter and more consistent format
7:18 "Oh remember January 6th? I don't like that one bit!" I cant tell if this is a reference or if Cary just decided to predict the capital insurrection and throw it somewhere in the video.
with enough computing power... wouldn't it be possible to us a preexisting image recognition ai on dancing footage - just to track head, shoulder, shoulder, elbow, elbow, hand, hand, hip, knee, knee, foot, foot and output their coordinates as well as their size on screen / appearent distance then you'd ideally get - with 3 dimensions per point (two coordinates and size) a 36 dimensional vector for each frame that could then be fed through your text imitation network and teh result could be put back through some basic 2d animation software to actually create images
With enough computing power, we could calculate at what point humans will stop watching porn. Probably the exact instant the human race ends, realistically. With a margin of error as large as the amount of time beforehand that we would know the end is coming. Hmm...
actually, that is kind of too low, what about two cameras tracking it, which then the footage would get sent to a (kinda more powerful) ai track the positions you mentioned, then make skeletal animation data from it and send it to a 3d animation software?
In addition to compute power, you also need data tho. And if you just beef up the network with increasing complexity, without adding more data, at some point you will have to overfit really strongly, in order to configure your network with the given training data. However with more concise data, you can not only train the network faster, but also achieve higher generalization. If instead of having pixel images we had positional descriptions of each joint, that would be more ideal for understanding the motion of the dancer - one way to achieve that would be using a NN to extract/prepare that data. So yeah @Julian Danzer, joint positions are a more accurate way of describing the dancer, aka better data, and thus it could be used to achieve better results.
more accurate description yes, but also more difficult to extract from images - also I was not necessarily thinking of overly complex image recognition - you could probably even get it preconfigured - something like the really simple facefinding filters used for camera autofocus - except now for hands and feet as well - or the tracking function you can find in aftereffects
Thank you carykh! I found your channel through the algicosathlon and now I’m learning so much about AIs and RNNs and all this other stuff and it’s very interesting to me. Thank you!
Encoder: "░▒▓█▒▒▒||█░||▓"
RNN: "Totally reasonable Encoder, can do!"
Cary: "But what is it that you can do?"
RNN: *_*SCREEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEECH*_*
183 likes, 2 months old and I'm the only reply....
Not for long.
Wow BFB.
Cooliokid 956 cra-
-__-
You should use MMD dance data. It'll be much more compact, because it's just 3D animation data, and the dances will be more involved with a third dimension.
Also there will be shitloads of samples available.
Asdayasman upvote this pls
...and shitloads of new weeb carykh subscribers as well!
VMD could be a difficult file format to work with - unlike with VSQ*, it's not obvious how VMD is encoded. I think you have the right idea. The homebrew motion capture community probably has a system to easily read Kinect motion data. With motion tracking, things like dancers having different builds, heights, or hair cuts wouldn't affect the video. It would be might easier to read, as well.
*VSQ is a modification of INI format
Senshi Sun I think what he should do is compile a bunch of random motion datas, apply them to a base model, and render out each sample and then put them all into the AI.
and processing 3
Can you upload three hours of your computer dancing to the three hours of computer jazz music?
We need this more than anything else
I second this
Yes!
I quintuple this
😂
14:44
*Gotta break this bedrock*
Hi Asriel. Also, in minecraft its not worth to break bedrock. Also, nice comment!
You don’t know how much I laughed at this as a 9 year old
WilloRuby Gaming How is that a r/whoosh?!
@@rubybirdy no
@@rubybirdy No
You're a real inspiration that motivates me to keep studying cs!
Matheus Gouvea ahahaha eu amo esse canal 😍
Olá, Guru! Bom te ver por aqui... você pretende fazer algum vídeo relacionado a esse assunto?
Israel Raizer Cruvinel com certeza, comecei a estudar ciência da computação exatamente por causa dessa área!
Legal, eu gostaria muito de saber mais sobre os auto-encoders mencionados no vídeo e do que eles são capazer... talvez com alguns exemplos de como eles estão sendo usados em pesquisas hoje em dia
Yeah, this dude got me interested in neural networks, which I'm currently planning to major in!
How about making an ai that makes yt videos
bot
there is actually enough material to feed it
Just train it on content farm channels... wait, those are basically the same thing...
There is, its named T drinker. That post fake news videos from 2013 to 2015. And i know one bot, who make memez from social net images
He made that he's just fooling us.
Carykh is back, the blue color constant is back
don't try to betray the alphaverse language by using Karthus'...
Color Is Right If You Are American, Colour Is used if You Are British (I think)
Nobody Actually Cares More About Grammar Or Punctuation On The Internet More Than Their Desire To Prove That Everyone Else Is Wrong And That They Are Clearly Superior To Everyone Else
cary kill hippos is back
or cary knows hitler
"What? Dancer, punching the air over and over isn't dancing!"
*Charli D'Amelio wants to know your location*
Show me the sauce
I'm pretty sure Carykh might be the one who starts the robot takeover.
That video are interessing
Him and Michael Reaves.
But the robots are trained on limited sets of data and are useless in any situation except the super specific one they're trained for.
@@dmdizzy Robots who kill humans by generating dance moves!
Don’t forget about Code Bullet
Welcome back Cary :D This video was great to watch and I can't wait to see your AI/TWOW content in the future! We've missed you (but not completely since there's still humany but you know what I mean)
🥞
And you watch him too hmmmmmm
yup
Yep, I can’t wait for what’s coming up next.
Hello.
14:44
Uhm I dont know that to do now
*continuously hits herself in the face*
The rnn is trying to send him a message
*starts raving*
6:35
When a kid speaks of plasma and accurately explain what it is
I got a chocolate ad
Animating is such a pain... you put a lot of effort into this video!
He could just develope an ai that animate the face of his stick figure that would just animate itself according to what he says or just do it random atleast bettter than his looped animations..did u see its face while talking its just in the loop
6:12
Ik
holy crip he is still in our plane of existence after ascending to the world of speaking desktop items
yoo here is the creator of One so can you make Two please
A 6 year old chessyhfj comment on carykh's video with only 1 (now 2) replies?!
How about making a RNN that animates a stick figure of yourself, based on your ranting. Making it animate the lip-flap would be too easy. It needs to read your mood from your tone of voice and produce expressions and arm movements as well.
You can then use it to help make all your future videos faster! :)
:o
How about a ai that yells racist shit at you
And make it black so it can use the N word
@@waterwaifu612 How about... not doing that?
Jandiqar Thace wdym that’s a great idea
6:37 when my friend kills me in minecraft but I have 100 dogs.
Also, it’s a reference to bfb(battle for BFDI). It’s one of the teams theme song.
@TheGoldMushroom yea lol. Death P.A.C.T will have to save @Line Over Time. 😂
This is enlightening in many ways, the "boring" parts were well worth reading too.
make an ai that animates your character according to your voice
JoraForever 👍👍👍
Yes! Everybody spam this in the comments!
Damn that could be something!
But also, the animation's mouth movements for example, must be based on the voice of Cary, which might lenghten the training period as it also needs to recognize speech properly (maybe existing speech recognition can be used, but there isn't any 100% accurate one out there), and the mouth's shape for each sound of the speech (like AA, EE, OO, etc) and maybe more things which I probably don't know about. But it'd be interesting if Cary gives it a try!
I think that's been done in half life 2
JoraForever cool idea
I have 0 clue on what any of this means but it's still really enjoyable
Edit: I just realized that's almost every Cary video
EvanFireHD It’s you.
EvanFireHD now I don't need to feel stupid alone
Yeah same :P
lol u just need to rewatch a couple times to understand it better yknow?
HAHA SUCH A RELATABLE COMMENT
*OH GOD OH NO THEY'RE LEARNING TO DANCE*
edit: Also, the way you animated the RNN is really cute!
After about 15k iterations of the training, the reconstructed image appears better than the original
a-lpha of Zeldaforme Gaming No, because the decoder blurrs it
after about 15k iterations of training, the tetris player is at a slight disadvantage
Apathy *_P i ?_*
Apathy SHIGU
I actually think that could be part of what makes it look better. It looks less JPEGed to shit!
PERSONALLY
I think it would be a bad idea to do shorter length/bitesize/5 minute videos. What I missed the most from when you were gone and what I found myself rewatching the most was the epic, hour long evolution sim videos.
The only way I see something like that working is still doing those epic long videos but broken up into parts like your first evolution sim video or into a series like the algislocathlon (I did that from memory don't judge my spelling ;---;)
ANYWAY
Welcome back Cary! We missed you and we're glad to have you back.
algicosathlon is hard to speel
*HEAVEN HIGH POOP PUSH!*
*pee push
CarsonG1017 autocorrect ruins everything
Eurovision Cyan yeah. Autocorrect: only there when you don't need it.
Hell Low Pee Pull
Wild Animal Channel HEAVEN HIGH PIÑA-COLADA PUSH!
6:36 when u say something smart that the teacher didn't know
Death P.A.C.T. noises
You made this in a second year CS course?!?! I guess I should just dropout now
I'm kind of intimidated now since I''ll be taking CS for college starting next year :(
Memes For Humemes I'm taking it on a university level... I am afraid.
I learned so much in this video 😁😀
Don't be, Memes For Humemes... He's without a doubt top of his class, don't compare yourself to others.
Not necessarily top of his class, just interested in this type of thing.
Cary: *long hiatus*
Me: That wasn't very cash money of you.
I've been wooshed. Explain.
not very chest full of drawers.
No Fordnigt here.
🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣
Wondo Crafter ???
OH MY GOD CARYKH YOU FINALLY UPLOADED I LOVE YOU MAN IM SO HAPPY
6:34
“Are we talking about the compression JPEG uses?”
*death pact theme song intensifies*
Yes
do dddddd d UIDGSYHFBEWRGJFWEGKU
Do this for your animations. Convert lip movement too animation frames. Pair animation frames too auto. Train a AI to be able to auto animate your audio
LOTS OF YES
if this is possible
It would be interesting an AI that animates a character.
“Can we make an AI that can dance better than humans?”
Mettaton: am I a joke to you?
Mettaton is the robot version of Freddie Mercury
@@orangelake2268 yes
the f why it say 57 years ago
@@HauntedBronco cause THATS a part of his username
@@artzy188 ok
Heck yeah, Cary’s back! Reset the doomsday clock
6:37 i must have this sound effect
Its like windows xp tada but edited
@@1234abcd-i6b windows vista
@@sineNonymus windows 7 and vista have the same, *not xp* (xp have his own tada)
@@1234abcd-i6b YOU SAID WINDOWS XP
@@sineNonymus yeah i said windows xp
Welcome back!
OKAY WHAT
you can't escape justin y
God damnit you again
WHY ARE YOU EVERYWHERE
go away :(
"A long video of dancing silhouettes"
bad apple intensifies
Yeah but.... “reverse colour intensifies”
“It’s 2018 now?” Not anymore
Your right... The way we started this year we might as well be in 2001
Congratulations! You understanf how time works!
@@diamonddynamite1557 understand*
@@STAREI471 if you're gonna correct someone correct the person before me for using the wrong your
@@diamonddynamite1557 just saying
14:38
I literally CANNOT BREATHE
13:46 Please make it technical! If people don't want to see that, they can skip with an annotation. Amazing videos, please keep it coming.
No more annotation
make an ai that animates your characters mouth according to your voice
He did make a lip sync AI
@@rebelli65
I'm pretty sure this is a burn because when he talks the mouth is just in a constant loop.
@@rebelli65 they why doesnt he use it
@@MaksKCS right
He did !! ONG !!
i don't understand even half of the video, yet i'm still highly entertained by it lol
I understand the video.
19:06
He guessed the future. Here we are, with jackandjellify asking for animators to work with him.
This is the only computer related youtube channel that I can watch without being bored. Keep it up; your videos are funny and informational.
15:04 this is when I kinda start laughing
2:45 "Is this a pigeon?"
Jørgen Galdal mmm... why
I don't know... is it?
"Hint, hint. Wink, wink. Nudge, nudge. Stab, stab."
punching IS a dance move
...for murders.
I laughed so hard at that part.
MyOpinion same
I'm a vector: *_screams_*
Sam Teinert
Imagine a Vektor of an 4k Picture.
"Brain.exe has stopped working..."
*SesamToast* i done that.... Linux just screamed and killed the c++ binary. Memory overflow xD
Mempler HAHA WTH
Sam Teinert 12 year old me screaming about vectors
*_HOLY SCHNIDERS A NEW VIDEO_*
shadecho wait you were on the soothouse comment section
Cary: "No way! its 2018 now?"
Me: "Yeah its even 2021 now and it is bad year"
Been a while then you bring us a 20 minute video. Yay :)
14:33 guess who was actually only wearing underwear
Koos Naamloos Fiery Boy's Underwear
u
ur mom
i was
31 people.
14:55
**Flashbacks to "Marshmellow's Dancing Skills of Sweetness"**
13:24 I love that animation loop for some reason
mettaton goes hard
You should try to get it to make you a video script
Yes.
Yes.
Yes.
Yea.
*Read More*
Yes.
*You should really Read More guys*
The quality of this channel is stunning!
16:24 why is she removing her hair XD
12:01
Cary: So RNN, come back here! I know I made you scream in agony twice, but you know what they say; third times the charm!
RNN: Three times a charm? More like three times a harm!
"Can you at least try?"
"AHHHHHH"
You should include the music to the AI as well. Maybe on every frame, take a windowed FFT of the audio and bin it into like 20 different frequency bands. It would be interesting to see how the AI adapt to whatever the music you throw at it.
Dance moves aren't based just on the sound at that very instant, but also the sound in the instants before, you would need to get several sound samples for each frame.
6:37 when your dinner is all about eating a photo
I was literally W H E E Z I N G when i saw it punching over and over again, I would request a loop of the dancer punching over and over again without stopping, If you could make it for me i would be so happy and i would appreciate all your videos
*h e a v e n h i g h p o o p p u s h*
*H E L L L O W P E E P U L L*
*WORLD REGULAR TOILET DOOR*
Heaven High Poop Push --> Hell-Low Pee-Pull
Hell-Low Pee-Pull = *Hello People*
wut?
Zachyshows i know what that means and everyone other know so idk why did you wrote this
carykh's new video is edited and uploaded by an AI
Upload a video made by an AI before you watch it
Did anyone notice the funnel of the machine at 2:02 said, "The quick brown fox jumps over the lazy dog. Also, the quick brown dog jumps over the lazy fox. But this fox isn't so lazy. The dog gets eaten."
14:26
Cary: **looks in magnifying glass**
Me, whos watching in mobile: cary... why are you looking at an upside down triangle?
Also while writing this I realized that carys name is one letter off of my name, so I guess I am cory keyhole, cory kitten hugger, etc, etc
Took me a minute
Who whats the scream here
3:50
6:12
Here s the begging
12:00
14:34 Cary Enters The Abyss
a big hole
HE PREDICTED THE YT SHORTS LOGO 0:59
what timestamp?
@@theguywhoaskedyoutube0:58
@@theguywhoaskedyoutube He literally said 0:59 bro what 😭
“I’ll try to keep this short”
Video is 20 minutes long
I want that dancing stick figure slider thing, with randomized moving sliders.
It's in the video description
Why would you need an AI for animating the lips? Why not just write (or use, I’m sure it already exists) an algorithm that takes a transcript (handwritten or using existing speech recognition (which I know is probably still technically an AI)) of what you’re saying as input and then move the mouth? I’m sure there are some parts that you’d have to manually do, eg screaming, but it’d be a lot more reliable and robust than an AI based on the audio. If I were to code it I’d mine a dictionary for the International Phonetic Alphabet (or some other pronunciation respelling) representation of each word. Then just figure out what mouth shape you make and how long you make it for each sound and put it all together into an animation. Obviously you’d probably still need to tweak it some more, depending on how time-accurate your transcript is, and that might be where an AI could help. But, I still don’t think an AI would be robust enough for the whole process, especially for a pretty discrete animation where if it picks the wrong mouth shape it’s pretty noticeable. Whereas if you were to just use it to help with temporal alignment, it being wrong would only show up as a small offset, less noticeable.
but what about expositionalizing the photosnthesis for the purpose of calculating the velocity perfunctuary
@@ratamat Lol u so noob
GaBoX17 DA r/woooosh
@@jackmaara Stop linking subreddits outside of Reddit, no one likes that.
@Leafa 0910 What???
3:22
R.N.N: (Moves it's arms up and down very swiftly)
Pillow: *Oh* My god *HE* does *NOT CARE*
Wow, who needs mikumikudance when you got this advanced piece of technology lol xD
8:04 hint hint, wink wink, nudge nudge, *stab stab.*-
14:50 "punching over and over again is not dancing.
The hype: Am I a joke to you?
3:48
When Pillow Stops 4 from Screeching at everyone
Did Cary say... *Hiring animators?*
Yes Cheesy. Now you can sneak in ONE references.
yeah 🗣️🗣️🗣️🗣️🔥🔥🔥🔥🔥 but i have a request, add airy doing the griddy pls 🗣️🗣️🗣️🔥🔥🔥
Problem?
1. AI
2. ???
3. Profit
4. And then they'll be sorry
5. Deal with those meddling kids
6.Detroit: Become human
8. a peanut right now
Problem? Provago
Cary the kitten hugger is back!!!!
olleke Bolleke he's also called himself "Cary Kill Hitler". Just letting you know
pluey200 And Cary Knows Hell.
6:36 The Theme of Death Prevention and Creating Trust
Tbh this could have been a 1 minute video saying that auto encoders exist and do cool things and it still would've been amazing.
#StopNeuralNetworkAbuse
@Boyfriend from Friday Night Funkin'
Aren't you supposed to be saying distorted animal crossing noises
HARRY POTTER AND THE PORTRAIT OF WHAT LOOKED LIKE A LARGE PILE OF ASH
CHAPTER 13: THE CORRECT PLACE TO BEGIN
xXFieryDragonzXx *_HE SAW HARRY AND IMMEDIATELY BEGAN TO EAT HERMIONE’S FAMILY_*
Ron was going to be spiders. He just was.
xXFieryDragonzXx *_WHAT ABOUT RON MAGIC?_*
I heard jk Rowling had to sign contracts for all sorts of random stuff like Hogwarts mystery, which had an energy system and micro transactions. I guess everyone’s making fun of the series now T_T
1:12 Take 2 Cary: “Hold on. I think this is boring.”
Good to see you back, man!
If you wanna automate your lip sync, Adobe Character Animator can do it straight from the audio track. FYI.
m
ok
I think you should not give up that long format because it is really really interesting. But you could, and this is just an example, make a series about your evolution game where you explore different movements and play around with stuff witch is in my opinion better suited for a shorter and more consistent format
Pewdiepie just played your game, hell yeah
Um br num vídeo não br
•Willer Alves• br nem é gente
BRASIL CLARO, OU PORTUGAL
Um Cara Normal num Planeta normal what game
rose lia GOLAD, Game of Life and Death
0:09 I read it as "oh heavens hi pupils"
HEY PUNCHING ITS A DANCE MOVE THANK YOU VERY MUCH.
I haven’t watched the video yet, but
Yes. Yes it does.
1. 1 it does.
AI gets the possibility to dance for the first time
*starts twerking*
7:18 "Oh remember January 6th? I don't like that one bit!" I cant tell if this is a reference or if Cary just decided to predict the capital insurrection and throw it somewhere in the video.
"one bit"
its time to start training!
*does one lift*
and you're done.
mood
Everyone gangsta till the AI does the Penguin Club dance
Untill he cleans poop?
1:21 (DIFFERENT DRAMATIC STING)
It feels great to be able to come back to these videos and actually understand what cary is saying.
with enough computing power... wouldn't it be possible to us a preexisting image recognition ai on dancing footage - just to track head, shoulder, shoulder, elbow, elbow, hand, hand, hip, knee, knee, foot, foot and output their coordinates as well as their size on screen / appearent distance
then you'd ideally get - with 3 dimensions per point (two coordinates and size) a 36 dimensional vector for each frame that could then be fed through your text imitation network and teh result could be put back through some basic 2d animation software to actually create images
With enough computing power, we could calculate at what point humans will stop watching porn. Probably the exact instant the human race ends, realistically. With a margin of error as large as the amount of time beforehand that we would know the end is coming. Hmm...
actually, that is kind of too low, what about two cameras tracking it, which then the footage would get sent to a (kinda more powerful) ai track the positions you mentioned, then make skeletal animation data from it and send it to a 3d animation software?
In addition to compute power, you also need data tho. And if you just beef up the network with increasing complexity, without adding more data, at some point you will have to overfit really strongly, in order to configure your network with the given training data. However with more concise data, you can not only train the network faster, but also achieve higher generalization. If instead of having pixel images we had positional descriptions of each joint, that would be more ideal for understanding the motion of the dancer - one way to achieve that would be using a NN to extract/prepare that data.
So yeah @Julian Danzer, joint positions are a more accurate way of describing the dancer, aka better data, and thus it could be used to achieve better results.
more accurate description yes, but also more difficult to extract from images - also I was not necessarily thinking of overly complex image recognition - you could probably even get it preconfigured - something like the really simple facefinding filters used for camera autofocus - except now for hands and feet as well - or the tracking function you can find in aftereffects
I don't know if anybody noticed this, but the human is DABBING at 8:30
Poor RNN, Cary just kept shoving it in twice 😱
Not Neo That sounds wrong.
Thank you carykh! I found your channel through the algicosathlon and now I’m learning so much about AIs and RNNs and all this other stuff and it’s very interesting to me. Thank you!