I think there should have been a negative reward for jumping off too. It was clearly a preferable strategy to risking touching police officers, especially before discovering that coins give rewards or when the AI thought there were no way to get coins.
Letting you in on a lil secret. I did code a negative reward for falling off the map, or lets say atleast I tried to 🙂. However after 4 days of numerous repeated training sessions, for the life of me, the implementation wasn't working. I knew things would work just fine without the falling penalty at the expense of increased training time so dats the way we went.
You should have added a negative reward for getting seen by the police, that way Loki will sneak around them instrad of speedrunning trough them, maybe a level where police couldn't be outrunned could have helped
If I added the police officers activation radius as input for the AI, its plausible the AI would've learned to sneak around them by not entering the radius. Thanks for the idea!
Negative reward might’ve been too much, as it could lead to a local maximum where the ai thinks going through police only leads to lower score, unaware that if it sacrificed a little score it could reach more coins. This could lead the ai to get stuck not going pass the first police officer, if not getting seen was impossible
@@Mcervera In machine learning, after a series of experimentations, one realizes that there is no "have to". There are many possible inputs I could've added, many different elements implemented. Its impossible to implement just everything! Implementing the baseline requirements to get the thing working was the idea behind this video. In level 2, Loki learned to maneuver around the officers regardless of whether the activation radius was an input or not.
The problem with videos like this is that the AI can overfit to a specific map. You need to have some sort of shuffled dataset or randomly generated sequence of maps/coin arrangements for proper training.
I remember seeing a video train a track mania bot to solve a maze and they tried to fix it by having the car spawn randomly in the maze once that issue became the bottleneck
Yea, this videos coding seems good enough to mostly prevent that though. Notice how the AI doesn’t always take the same path in different levels. I have definitely seen videos where this does happen though.
11:17 I think this happens because the AI only learned to effectively collect coins in the one direction, or gets confused by there being no police to dodge. AI is not that good at changing its perspective, since it has no real correlation between x, y and z. It doesn't know that they are just sides of the same coin, it only knows what outputs will change them individually. I saw a video of a table tennis AI that worked great for one player, but once they spun it around for the second player, it just fell over, because it only learned to stay upright while looking in one direction. Their solution was to rotate the coordinate system with it (rotating a parent object and using local coordinates probably). I think something similar may work here too, by changing Loki's sensors to be relative to his orientation, thereby eliminating the need to correlate different axes (unless you are already doing that).
I think it was just hasty getting to the coins above and didn't bother moving a little bit to get the coins leading up to it. It's shown to be able to turn to pick up coins before, so idk.
@@silasnebulous4533yeah seemed more like it detected more coins further ahead and therefore decided to ignore one’s immediately ahead for a bigger long term pay off (also cause they were away from the bad thing)
if you increased the reward from coins by dividing it by the amount of time from the last coin (less time more reward) you'd also make it so that he doesn't skip nearby coins to often, but it would also result in more speedrun-ish behavior
You might want to add a very small negative reward that accumulated over time, and/or a time limit, so Loki is encouraged to pick up the pace. He might also be less scared of the police, as the penalty for meandering aimlessly will eventually be worse than just running for it.
In Mass Effect 1 someone created an illegal AI to steal money from gambling machines. When caught it self destructed to try and kill you along with itself rather than be shutdown.
This is cool, But wouldn't the A.I. learn more effectively if the levels scale slower in difficulty and repeated the same sort of scenarios? Idk, This just seemed to scale at a rate that's fine for players but maybe staggering for an A.I.
@@cozmouz Btw, What were the inputs? I mean, Were the cops and rewards registered separately from the walls? Or was there like a separate input that changed based on what it hit?
The 360 Degree Ray-cast is the main input source for the AI. Its like lasers being fired in all directions and waiting to hit something. If the AI hits a cop and gets negative reward, overtime, whenever the raycast beams hit anything tagged "police" , the AI will try to avoid that area. Raycast hits a wall, this is something I can stand and jump over! Thats how it works basically.
Although a 6 year old (ie my younger siblings) can finish these levels with with wayyyyyy less tries, this is still so impressive from something with no conciousness.
@@xxxD3FC0N_1xxx actually no it wouldn't be better than a human. Change the map or enemy slightly and the AI would crumble. considering it only went for safe route a human would be faster and would take less time completing this
@@xxxD3FC0N_1xxxactually no! If you changed anything major about say, the map at 8:50 Loki would freak out and take millions of tries to figure it out again. It could get very good at this specific map but nothing else. It can’t figure out how to apply the “knowledge” from this to a changed terrain. It just eventually figured out “these motions get me positive rewards and avoid the negative”.
I love how at just at the end of his journey he doesn't just collect all the reward, he also jumps around which looks like he really does have consciousness and is happy to see so much reward! It looks really nice and interesting)
Infantile robber tries to steal drugs in broad daylight while avoiding police officers: visualised I wonder if you can make it solely focus on the positive rewards of the coins and learn that obstacles are naturally detrimental because of the way they prevent the collection of coins
I have experience training an early AI / LLM (me along with many other associates) at a large wealth management firm starting in 2009 and leaving at 2015. It was not at all a primary focus or task we had to do, but very simply … we did it voluntarily when we had time. This video is a very good explanation to people new to AI on how it works in general, for such a complex area of study.
This was really cool to watch! I wonder how it would go if you made a city for Loki to run from police in. It’d be interesting seeing if Loki develops an optimal route to go.
I know nothing about AI but i think scaling the reward function of coins dependent on the closeness of police could encourage riskier behaviour, as long as contacting the police shortly after would remove those bonuses. or it could be fun to see what the ai does without that safeguard
cool project! i think the jumping behaviour observed is a result of the raycasts being centred on the character's body; the AI is initiating a jump because it causes the rays to jump with it, which means they don't hit the chasers, so the AI associates jumping with that positive outcome. might be worth only associating being caught with a negative reward for more distinct emergent behaviour
Watching this made me think, that it is exactly like evolution, for us, it might not seem that long, but for Loki, it took countless generations to achieve victory, this can be used as proof that we are indeed in a simulated universe and this is exactly how evolution works, our genes just do reinforced learning.
This video was incomplete without you explaining what the architexture, input/output structure, and training algorithm was. We don't just want to see cool art, we want to know that the AI is good.
AI like this cpuld benefit from getting points for surviving for long periods of time. That should, in theory, cause the character to do whatever allows it to prolong the run as much as possible, such as avoiding police.
Ironically, it does exactly the opposite with that reward mechanism. The AI simply stays away from the officers and keeps jumping around in safe areas to survive till the end of round, getting rewards for doing nothing essentially!
finally i can hop on this dude to escape the feds from all those endangered animal killing, trespassing, animal abuse, and a kill count in the triple digits!
so basically the equivalent of putting a baby in a timeloop and teaching it to steal.... i approve
"1 hour here is 7 years on earth"
And pumping dopamine into it every time it succeeds
@@stoobidthing 😭
Why is there no comment here
@@SirNob theres 3, now 4
I think there should have been a negative reward for jumping off too. It was clearly a preferable strategy to risking touching police officers, especially before discovering that coins give rewards or when the AI thought there were no way to get coins.
Letting you in on a lil secret. I did code a negative reward for falling off the map, or lets say atleast I tried to 🙂. However after 4 days of numerous repeated training sessions, for the life of me, the implementation wasn't working. I knew things would work just fine without the falling penalty at the expense of increased training time so dats the way we went.
Walls.
@@toasterhavingabath6980
Get this man a job at NASA
@@cozmouzwalls tagged as police that kill on contact?
@@toasterhavingabath6980hog rida
You should have added a negative reward for getting seen by the police, that way Loki will sneak around them instrad of speedrunning trough them, maybe a level where police couldn't be outrunned could have helped
If I added the police officers activation radius as input for the AI, its plausible the AI would've learned to sneak around them by not entering the radius. Thanks for the idea!
Negative reward might’ve been too much, as it could lead to a local maximum where the ai thinks going through police only leads to lower score, unaware that if it sacrificed a little score it could reach more coins. This could lead the ai to get stuck not going pass the first police officer, if not getting seen was impossible
@@cozmouzbut what about levels where you have to, like level 2
@@Mcervera In machine learning, after a series of experimentations, one realizes that there is no "have to". There are many possible inputs I could've added, many different elements implemented. Its impossible to implement just everything! Implementing the baseline requirements to get the thing working was the idea behind this video. In level 2, Loki learned to maneuver around the officers regardless of whether the activation radius was an input or not.
You should have made jumping into the void a negative reward@@cozmouz
Next Video: “AI Learns Tax Evasion”
yes
i guess you could say hes _lowkey_ a fast learner
nice one!
Idk. 3.5 million tries... Wonder how many controllers Loki broke trying to beat these levels
Yup.
The problem with videos like this is that the AI can overfit to a specific map. You need to have some sort of shuffled dataset or randomly generated sequence of maps/coin arrangements for proper training.
I remember seeing a video train a track mania bot to solve a maze and they tried to fix it by having the car spawn randomly in the maze once that issue became the bottleneck
@@AceTheAro7 sounds expensive
Yea, this videos coding seems good enough to mostly prevent that though. Notice how the AI doesn’t always take the same path in different levels. I have definitely seen videos where this does happen though.
if the police start using robot dogs, we will start making robots cat robber
that would be Purr-fect
That would be funny, cats are perfect for this
A cat burglar, if you will.
Theyre really sneaky, i think itd work imo. I'm a cat owner so I'd know.
@@thebooknerd5223 Nami 😂
11:17 I think this happens because the AI only learned to effectively collect coins in the one direction, or gets confused by there being no police to dodge.
AI is not that good at changing its perspective, since it has no real correlation between x, y and z. It doesn't know that they are just sides of the same coin, it only knows what outputs will change them individually.
I saw a video of a table tennis AI that worked great for one player, but once they spun it around for the second player, it just fell over, because it only learned to stay upright while looking in one direction. Their solution was to rotate the coordinate system with it (rotating a parent object and using local coordinates probably).
I think something similar may work here too, by changing Loki's sensors to be relative to his orientation, thereby eliminating the need to correlate different axes (unless you are already doing that).
Amazing explanation!
I think it was just hasty getting to the coins above and didn't bother moving a little bit to get the coins leading up to it.
It's shown to be able to turn to pick up coins before, so idk.
@@silasnebulous4533yeah seemed more like it detected more coins further ahead and therefore decided to ignore one’s immediately ahead for a bigger long term pay off (also cause they were away from the bad thing)
"That's her officers! That's the woman who programmed me for evil!" - Bender
if you increased the reward from coins by dividing it by the amount of time from the last coin (less time more reward) you'd also make it so that he doesn't skip nearby coins to often, but it would also result in more speedrun-ish behavior
That's a great recommendation. More complex reward functions is something I will implement in the coming videos. Stay tuned!
wow nice idea i hope i remember this too iin the future
I like how Loki figured that it’s better to die than get caught by the pigs
he knew he was going to drop the soap
@@generaldelasmontanas2699 lmaoo
Are you an anarchist
@@undefinedchannel9916 Why? Without cops we'd have anarchy... so they muct be an anarchist?
@@undefinedchannel9916pigs aren’t also known as cops. Cops are known as pigs.
You might want to add a very small negative reward that accumulated over time, and/or a time limit, so Loki is encouraged to pick up the pace. He might also be less scared of the police, as the penalty for meandering aimlessly will eventually be worse than just running for it.
Loki isn't evil he's just a silly guy
Programmers already teach AI how to do crimes. Perfect for our Sci fi apocalyptic fantasy doom.
In Mass Effect 1 someone created an illegal AI to steal money from gambling machines. When caught it self destructed to try and kill you along with itself rather than be shutdown.
Maybe we shouldn't be teaching AI to break the law, maybe that's just me.
Yes this definitely applies to actual irl crime. I love it when police digitize themselves to charge at rectangles
just maybeeeee
what do you mean, this AI is based?
It's just flavor
@JulieGallows are you stupid?, it's In the name of the video😂
This is cool, But wouldn't the A.I. learn more effectively if the levels scale slower in difficulty and repeated the same sort of scenarios?
Idk, This just seemed to scale at a rate that's fine for players but maybe staggering for an A.I.
Sir, you are absolutely right. Gradual scaling in difficulty would've resulted in more thorough learning.
@@cozmouz Btw, What were the inputs?
I mean, Were the cops and rewards registered separately from the walls?
Or was there like a separate input that changed based on what it hit?
The 360 Degree Ray-cast is the main input source for the AI. Its like lasers being fired in all directions and waiting to hit something. If the AI hits a cop and gets negative reward, overtime, whenever the raycast beams hit anything tagged "police" , the AI will try to avoid that area. Raycast hits a wall, this is something I can stand and jump over! Thats how it works basically.
@@cozmouzSo, the AI know in which direction are "things", but not at which distance ?
It knows direction as well as distance.
"started to associate negatives with something tagged as police"
it started using twitter
damn i can finally create army of ai thief with ability to escape on its own
He kept getting caught when teasing the cops
It’s so good that you finally have recognition for this.
Ayyy I remember ur comment from the basim video, thanks a lot man.
Ye@@cozmouz
Although a 6 year old (ie my younger siblings) can finish these levels with with wayyyyyy less tries, this is still so impressive from something with no conciousness.
that’s the point it’s a learning AI it’s not supposed to get it right the first time eventually it would be better than the best human player
@@xxxD3FC0N_1xxx actually no it wouldn't be better than a human. Change the map or enemy slightly and the AI would crumble. considering it only went for safe route a human would be faster and would take less time completing this
@@xxxD3FC0N_1xxxactually no! If you changed anything major about say, the map at 8:50 Loki would freak out and take millions of tries to figure it out again. It could get very good at this specific map but nothing else. It can’t figure out how to apply the “knowledge” from this to a changed terrain. It just eventually figured out “these motions get me positive rewards and avoid the negative”.
@@eldritchcupcakes3195that's called overfitting, usually AI is trained on a lot of different data to prevent this
@@ethantasti2521if you trained it on a wide variety of different maps, it would probbaly become much better than any person
1:30 ULTIMATE SPASM GO
I love how at just at the end of his journey he doesn't just collect all the reward, he also jumps around which looks like he really does have consciousness and is happy to see so much reward! It looks really nice and interesting)
Ok so basically training an AI is like beating ur kid if it doesnt bring you beer and giving it candy if it does
In this video: programmer explains criminal psychology without realizing it.
Loki is Low-key one of the AIs of all time
bros learning ai to evade taxes 💀
Teaching?
I think there should be a negative reward for coming into close proximity of a coin and then leaving proximity without collecting it.
I love how patient you are 😊
Thanks !
When Loki moves randomly he kinda looks like a speedrunner lol
Very good work! also you could include what the AI is receiving as input too
Infantile robber tries to steal drugs in broad daylight while avoiding police officers: visualised
I wonder if you can make it solely focus on the positive rewards of the coins and learn that obstacles are naturally detrimental because of the way they prevent the collection of coins
I have experience training an early AI / LLM (me along with many other associates) at a large wealth management firm starting in 2009 and leaving at 2015. It was not at all a primary focus or task we had to do, but very simply … we did it voluntarily when we had time. This video is a very good explanation to people new to AI on how it works in general, for such a complex area of study.
There should have been a boing sound when he jumps
Man I had this exact intrusive thought when I was programming Loki LOL, but it would've made the audio chaotic so I ditched the idea.
next video: " AI cops learn """pattern recognition""" "
next video: i reprogrammed elons self driving cars to outrun police cars
5:35 So like people
i love how at 6:49 it almost looks like he's taunting the cop lmao
The music is too loud
next video: AI Learns To Evade Taxes
This is the easy mode. The police just rewards you less
Cool Video! Would love more of an end goal to it though..
This was really cool to watch! I wonder how it would go if you made a city for Loki to run from police in. It’d be interesting seeing if Loki develops an optimal route to go.
Is that the cat ninja music???? I loved that flash game so much as a child
Well
Well
Well
Nice! I just came have after a day and he’s at 1k, keep it up!👍
Reinforcement learning is such a cool concept. It just learns things by trial and error, just like people do.
I’m going to use this knowledge to get away with violent crimes
Now optimize him
welcome to our new friend loki! :DDDDDDDD
do part two but loki has to learn how to use legs
200th subscriber here you have earned a sub keep up the work bro 💯💪🙏
You are a Legend, Thanks a Lot 😎👊
Earned my sub! Keep it up!
Sir, you are a legend. Thanks a ton.
I know nothing about AI but i think scaling the reward function of coins dependent on the closeness of police could encourage riskier behaviour, as long as contacting the police shortly after would remove those bonuses. or it could be fun to see what the ai does without that safeguard
He is so happy at the end lol
shoulda added a reward for going closer towards the coins and for going faster (and negative reward for going slower aka jumping around)
this video was recommended to me, most likely by an algorithm.
I am now scared.
cool project! i think the jumping behaviour observed is a result of the raycasts being centred on the character's body; the AI is initiating a jump because it causes the rays to jump with it, which means they don't hit the chasers, so the AI associates jumping with that positive outcome. might be worth only associating being caught with a negative reward for more distinct emergent behaviour
You should teach the cops to catch the robber now
What if the police is also ai
Lupinranger Vs Patranger lookin different
This is so underrated, great job! Also, if possible, can you do a tutorial on how to make these?
Loki is goated fr
"ai will be used to help people!"
ai:
Really like this one keep going 👍
i feel like i just went through all the stages of parenthood with loki
yooo ur content is incredible!, new sub
ayyy thanks!
The way the AI seemed to celebrate at the end was cute :)
bro puts so much effort for his video holy sh*t
It seemed soo happy at the end. Lol.
Next up: AI learns to drive my car to the bank
AI LEARNS TAX EVASION! (real1!1!1!!)
Im glad my kids didnt spam jump when they were infants. Yeeting themselves off the edge of the world tracks though.
Wouldn't it be funny if "loki" comes to the conclusion that stealing is not the glorious purpose he's looking for? 😉
I am Loki of Asgard, and I am burdened with devious purpose!
Little bro breaking some ankle’s
was it possible to place invisible barriers?
Yes it was possible.
The landing full of coins... truly the best ending to this video
magestic
Thanks man my wheelchair bound sister didn’t stand a chance from the tactics displayed here
Watching this made me think, that it is exactly like evolution, for us, it might not seem that long, but for Loki, it took countless generations to achieve victory, this can be used as proof that we are indeed in a simulated universe and this is exactly how evolution works, our genes just do reinforced learning.
Is Loki like… relearning everything each level? Or do you keep his knowledge for the next level. Cuz I saw more AI’s that do that more effectively.
Loki was probably like: OK WHAT DID I DO WRONG
9:30
This video was incomplete without you explaining what the architexture, input/output structure, and training algorithm was. We don't just want to see cool art, we want to know that the AI is good.
I dunno, 2.5k people seem to like the video as the time of writing
@@beywheelzhater8930 just because people liked the video doesn't mean they wouldn't like the vid more if it improved
@@beywheelzhater8930 case in point, many other commenters are asking for details on the input structure, reward function, etc.
If AI starts hacking banks, this guy is gonna be held for accusations
My timbers are shivering
Bro's called lowkey
😂
instructions unclear, i got caught stealing orphans and they sent me to the shadow realm irl
bro
i know right, so strange... they used to send me to the 4th dimension but yesterday they sent me there
When pillars decide to run away from police pillars
Really cool!
Soo his got no eye to actually see where the cop is to respond like human
He got Raycast sensors
I’ll be taking notes.
If you are gonna make a sequel, you should try proceduraly generating the map to make the ai more general purpose.
Finally, a real world use case for AI
wtf did i stumble upon
you should make it so he can fight the police in dire circumstances
Loki spent a few millions years in prison now
Thank you!!
AI is gonna learn to shoplift now
AI like this cpuld benefit from getting points for surviving for long periods of time. That should, in theory, cause the character to do whatever allows it to prolong the run as much as possible, such as avoiding police.
Ironically, it does exactly the opposite with that reward mechanism. The AI simply stays away from the officers and keeps jumping around in safe areas to survive till the end of round, getting rewards for doing nothing essentially!
i wonder what sparked this idea
finally i can hop on this dude to escape the feds from all those endangered animal killing, trespassing, animal abuse, and a kill count in the triple digits!
When i was scrolling past i thought the thumbnail was a gen z joke
I thought the video name was I train ai to overrun the police
Maybe in the future 😅
@@cozmouzyou didn’t specify. DO IT NOW