AI Learns to DESTROY pensioner AIs (Mario Kart Wii)

AI Tango

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 22 чер 2024
An AI races against Mario Kart Wii's own AIs, and shows them what next gen AI has to offer. This video makes many improvements upon my last Mario Kart Video, AI Learns to beat other AIs.
If you have any questions I love to answer them!
00:00 Introduction
1:20 New Rewards
5:16 Training Starts
7:12 20 Hours Training
9:17 Final AI
Ігри

КОМЕНТАРІ • 194

@Vexxter Рік тому ⁺⁷¹
Something that could help is giving a reward based on the amount of distance driven between checkpoints, this way the AI will try to minimise how far they drive while still achieving the same goal thus resulting in taking turns tighter
@aitango Рік тому ⁺²⁴
That’s a really interesting idea, I didn’t think of focus on distance instead of just time between checkpoints
@52flyingbicycles Рік тому ⁺¹⁹
The fastest distance between two checkpoints not necessarily the shortest distance. Basic cornering theory: you start wide, cut in, then end wide. If you take the shortest route (hug the inner edge) then you will either be going too fast to finish the corner or you will be going slow enough to make the turn but be ultimately slower than the competition.
@noahwhipkey6262 Рік тому ⁺¹
@@52flyingbicycles its literally just a simple optimization problem speed vs distance whilst maximizing both. just a system of equations
@stilliving Рік тому
@@noahwhipkey6262 doesn't sound like there's anything "just" in what you said. Seems real complicated to the uninitiated mind
@noahwhipkey6262 Рік тому
@@stilliving meh. just a bunch of matrices
@FranXiT Рік тому ⁺²²²
Very good progress, a shame that it still got obsessed with dodging speed boosts XD
@aitango Рік тому ⁺⁶⁰
Thanks! I was a little sad about that too, but I’m sure I can fix it in the next version!
@RealSkello Рік тому ⁺²
@AI Tango yay! Next version!
@RealSkello Рік тому
@AI Tango yay! Next version!
@redyoshi793 Рік тому ⁺¹
I think franx is the ultimate representation of a person digging his own grave
@Galakktika Рік тому ⁺⁶
@@redyoshi793 why?
@colororb4105 Рік тому ⁺¹³⁷
Maybe you could give it a reward for holding a wheelie for longer, so that it can get more of that speed boost instead of randomly jumping
To help with over-fitting, maybe you could play with items on. Have Funky just spam his use-item button at all times
Great vid!
@aitango Рік тому ⁺⁴⁵
I think that’s a good idea, wheelies are definitely a problem. Spam using items isn’t ideal, but definitely the most realistic solution so I might give it a go
@AxidoDE Рік тому ⁺¹⁶
With items on the "reset on low speed" rule would be pointless, if not counterproductive. Any hit with an item would trigger it. That rule limits the AI in the first place by disincentivizing the AI to even consider shortcuts, so getting rid of it might be a good thing.
@SandTurtle Рік тому ⁺³
@@aitango i think giving it one more input would probably be best. let it decide when to throw items. make sure red and green are distinguishable with the greyscale method you sre using if you plan to do this though
@SandTurtle Рік тому ⁺³
@@AxidoDE maybe it could be a limit of how long it can go below a certain speed, with it allowing for recovery from items
@farazmirza6048 Рік тому ⁺¹¹
Hey Tango, this is amazing! I couldn't have imagined so much progress in such a short amount of time. I really think your contribution is extremely valuable to the MKWii community and beyond.
@aitango Рік тому
Thanks so much! Feels great to hear comments like this and really makes me keep wanting to produce good content!
@NateM135 Рік тому ⁺⁵
Awesome! I really enjoyed seeing some of the code/output of variables you are working with for rewards. Other people probably gave that seem feedback on the last episode but I'm very glad to see you included it :) Nice video!
@aitango Рік тому
Good to know! Thank you very much!
@X4R80 Рік тому ⁺⁴²
Very important improvement: give it the option to turn without drifting, since it hops around a lot for alignment, which you can only avoid by giving it the option not to drift, and maybe even give it a penalty if it starts drifting without getting the mini turbo (not quite sure about that one tho)
@aitango Рік тому ⁺²²
That’s a very valid point, it should definitely help with all the hopping and is something I will add to the next version. Rather than a penalty for not getting a turbo, I will give a penalty for dropping below 75km/h since spam hopping will cause this and I already have access to speed. Plus if it needs to do a single hop, this shouldn’t invoke the penalty
@52flyingbicycles Рік тому ⁺²
Yeah. Even though handling is considered a weak stat in MKW, vehicles still benefit from the ability to turn without drifting. It’s also possible the “wheelie” gene gets killed early by locking in and driving straight into the off-road, whereas small adjustments from repeatedly hopping produce a lot of early offspring due to the ability to maneuver around the other CPUs
@lenabeck989 Рік тому ⁺²
Great improvement to the last video!
@aitango Рік тому ⁺¹
Thanks, Always love to hear feedback!
@rasmuspedersen4891 Рік тому ⁺⁴
to combat the overfitting more you might wanna spread the spawn positions more, since these spawn positions were pretty close
or if you want to experiment with more tracks i think modding in some snes mario circuit variants could make it easier since all those tracks have the same color scheme making it easier for the ai to carry over visual understanding
@aitango Рік тому
Yeah spreading out the starts should help so give the AI a wider range of experiences. True, if I can find tracks with similar features such as the family of mario circuit tracks it will probably allow learning between tracks much faster
@ryanm6268 Рік тому ⁺⁸
Insane to think that a bot is playing at that level, great job! I would suggest something like a penalty for multiple drifts in a certain timeframe (like multiple in one second) to help with the rehopping and incentivizing the wheelie with no turn
@aitango Рік тому ⁺¹
Thank you so much! My next video didn't include this, but future videos might as I think this is the most fool proof way to stop the spam hopping.
@owenkeith1188 Рік тому ⁺⁴
Liked and Subbed. Can't wait for the next Mario Kart Wii AI video!
In terms of what you could improve on in the next iteration (from the perspective of a computer science theory major with an emphasis on computational learning theory) : Typically the more assumptions about the way we "think" an AI should perform that we program into our models, the worse they perform. A good example of this are chess and Go AI. The best Go AI wasn't given any information other than how the game rules and it outperformed heuristic-based models where it was given some human assumptions about strategy.
Essentially, you will likely not achieve significantly better results by looking at the ways this current iteration under-performed and trying to remedy them one-by-one. Have you looked at any of the current research on racing game AIs? I can send you a couple papers that might help a lot.
@aitango Рік тому ⁺¹
Thanks! Some papers would help a lot, im curious what others did. I would make a far simpler reward function, however I lean towards using heavily descriptive reward functions, because despite potentially being less optimal in terms of final performance, they can speed up training significantly, which I need to do since I only have a single desktop computer. There's definitely a better balance I need to find though
@ezintheplaywork6212 Рік тому ⁺¹
this circuut never seems that long except in that video 🤣
@aurastrike Рік тому ⁺¹
I feel like you should try to reward the AI for taking smooth racing lines as that will encourage the AI to not jump back and forth. I don't know how easy that would be, but it might help the AI look more human-like and increase the general time to take corners
@aitango Рік тому
I've been wanting to figure out a reward for taking good lines, I just haven't been able to come up with a good solution. Originally I tried using the game's race completion variable, and giving it a higher reward for going taking less steps to get to the same location (ie taking better lines), however the game's variable was just too unreliable for this to really work
@johnsimpsen5 Рік тому ⁺¹
I’d be interesting to reward the AI based on the amount of time it could go straight forward in a wheelie, it would get rid of the random hopping problem
@aitango Рік тому
In my most recent video I actually included something similar to this. The AI would get increasingly much more reward for going at a higher speeds between 84 and 97 km/h, giving it an incentive to wheelie for longer as these are the range of speeds the AI will go between when wheelieing
@austin2000coursey Рік тому ⁺³
Loving the videos! I’m curious about some of the more technical aspects of the RL algorithm you’re using. Specifically, you kind of imply you’re using something like a Deep Q Network when you mention that you’re feeding in image frames. Is this what you’re using?
Do you think you could use data from your emulator to more efficiently learn this task with another algorithm (a model-based alg instead of model-free)? Like by feeding in the position of your agent, speed, other players, walls, etc. instead of just an image or stacked frames?
@aitango Рік тому ⁺³
Glad to hear it! I'm using an algorithm called Rainbow DQN, which is a very advanced version of a Deep Q Network with many improvements. In theory I could use data from the emulator to use a model-based algorithm, however retrieving data such as speed and race completion takes substantial effort. I've gotten reasonably good at it now, but still takes me some time to do. One idea I'd love to explore more is model based algorithms where the model is learnt rather than given, however these often require significant computational resources, but provide amazing results
@TASSnoop Рік тому ⁺⁵
This is an incredibly interesting project. As someone who spends a lot of time optimizing this game, I would love to see how far this is able to be pushed. It's unfortunate that the AI cancels wheelies so often. After the first time it makes sense, since funky's acceleration while in a wheelie is not very good, so you are usually better off drifting for a mini turbo to get back to max speed. I wonder what would be a good way to teach the AI that you want to be in a wheelie for max speed as much as possible and use mini turbos and drifting to reach that max speed.
@OrangeYTT Рік тому
Yooo tas snoop! Used to watch your taf montages
@aitango Рік тому ⁺²
Thanks! It's fun to hear the opinions of experts such as yourself! I've been looking into rewards for going over 88+ km/h, as this should encourage the AI to hold the wheelie to pick up the reward. The results of that will be shown in the next video!
@farazmirza6048 Рік тому
Let's go, the TAS community found this channel!
@aaronkbutler Рік тому ⁺¹
What great progress! I was wondering if you could provide updates about how the AI’s best completion times compare to the world record on each course with each video.
@aitango Рік тому ⁺³
Thanks! I might add something like that in for reference in future videos to give some perspective
@haunter013 Рік тому ⁺¹
i like this. what's the song at the beginning of the training montage though? banger for sure
@aitango Рік тому ⁺¹
Glad to hear it! That song is called Good Times by Patrick Patrikios
@ethandavis7310 Рік тому ⁺¹
I like that bots are included in the training. This adds a great amount of variation to prevent overfitting. It's nice to see that the AI can still perform even after bumping or getting a draft. Unfortunate that this variation is kinda lost once the AI improves to the point of frontrunning the whole race
@aitango Рік тому
Yeah the other racers do provide a lot of variation in the input, but once the AI starts frontrunning its just a time trial at that point. I might add some more different start locations in future videos so that the AI is forced to race against other at all points of the track. The only reason I've avoided this so far is because getting all the savestates takes a while!
@PigletTube Рік тому
Do this for next ai video:
Do a side by side of what we see and what the ai sees
Show the neural network of the ai
I also love this content Keep it UPPP
@aitango Рік тому ⁺¹
Thanks, I love to hear it! I definitely plan on showing what the AI sees, however the neural network is a bit problematic since it’s huge (three convolutional layers with 32,64,64 channels and a fully connected layer with 512 neurons)
@CyberCat3O Рік тому ⁺⁵
I kinda feel instead of getting a reward for the time between checkpoints, it would be more effective if it got a reward based on the current race time at each checkpoint or when it finishes a lap but I'm not entirely sure.
Also after this ai gets good at mkwii, it would be very interesting to see what would happen if it played another mario kart game like 8 and how much skill would carry over if any.
@alansmithee419 Рік тому ⁺⁵
From what I can tell the AI literally just works with image input.
The different art style of mario kart 8 alone might be enough to completely baffle it. It might not have any idea of what to do.
That said, just learning how to differentiate what is road and what is not is an ability it might be able to pick up generally, regardless of art style, so as you say it may be interesting to see.
@flooferdog2311 Рік тому ⁺⁵
I feel like doing this on Time Trials would be really cool
@aitango Рік тому ⁺¹
Check out my other videos! Before doing AI vs AI, all my videos were time trial based! They didn't have mushrooms, but apart from that it was a time trial!
@Luna5829 Рік тому ⁺¹
You should definitely do it so that it checks how much speed it's gaining, and the more speed it gains the bigger of a reward you give it
@xxhollyxx97 Рік тому ⁺¹
ur voice is so calming
@aitango Рік тому
Thanks!
@franciszekjanusztalarek7017 Рік тому ⁺¹
It would be awesome to see an IA playing by teams that tries to make the other team get in worse position instead of trying to beat everyone
@aitango Рік тому ⁺¹
It would be interesting to see it just hunt down the enemy team, would be entertaining
@Jess-uk8bj Рік тому ⁺³
I'm curious, is it possible to save what the AI has learned and have it pick up training later? I'm assuming this is running for 50 hours straight, but I could be wrong.
If it's possible, I think it would be interesting if a final agent from another video could pick up where it left off to become even stronger.
@aitango Рік тому ⁺²
It is running for 50 hours straight generally, but I am able to save training and continue later. I plan on exploring this idea soon! The only reason I haven’t in the past is because I keep changing the rewards drastically, which makes it difficult for the AI to use what it knows.
@ZachAttackIsBack Рік тому ⁺¹
In the game Perfect Dark, you could play with customizable bots. For example, you could pick the grudge bot, who would attack the last person to kill him. Or slapper bot only used his fists in combat.It would be fun to have similar options for each AI opponent in Mario Kart.
@aitango Рік тому ⁺¹
That does sound cool! The game would be more interesting if the built in AIs had more personality than just being so homogeneous
@Nick-yq5uz Рік тому ⁺¹
Great work on this!! I apologize if you already answered this one, but by chance have you released the code anywhere? I’d love to take a peek and see what’s under the hood
@aitango Рік тому
Thanks! I haven't released the code yet, however am looking to do so reasonably soon! I was thinking of doing this as a 1000 subscriber special, so if you want to see it be sure to subscribe! I would likely do a whole video on it, since the way its set up makes it very difficult to get working just from code since it requires some work from the side of dolphin emulator.
@ThorinWolf Рік тому ⁺¹
This is pretty darn good, but I ofc have one main criticism: you can't just train it on 3 tracks and have it be able to figure out what to do, like what you can do with the trackmania AIs. I'm guessing that here that would require some object detection module running at the same time though.
@aitango Рік тому ⁺¹
I don't think object detection is required for this form of AI, infact Deep Reinforcement Learning (what I'm using) has been used before to teach a single AI to play 57 different atari games to above human standard. The main issue I currently have is limited compute resources to train the AI, as learning multiple tracks can take a huge amount of training
@vincenzofranchelli2201 Рік тому ⁺¹
needs to be punished for cancelling wheelies and not getting a miniturbo
@aitango Рік тому
I'm currently looking into the solution of extra rewards for speed of over 88km/h and penalties for under 78km/h. A drift is 84km/h, so this will encourage the AI to get mini turbos (they give speed of over 100), but when the AI keeps cancelling wheelies the speed can drop into the 60s-70s, so will be punished. In addition, wheelies cause the speed to reach up to 97km/h, so it will be rewarded for holding the wheelie.
@sahil_tayade Рік тому ⁺¹
Hey have you heard of test data? You should try taking the final AI and running it on a new course, or new starting position that it did not train on. I'm very interested in how this goes! Keep improving!
@sahil_tayade Рік тому ⁺¹
By the way, huge fan of this project. Would love a tutorial video on how to run this code so we can play around with the reward function! Could even make this open source on github 👀👀
@aitango Рік тому ⁺¹
I have! I actually did in a previous video of mine! The AI did kind of ok, but really wasn't great indicating there was some level of overfitting going on, but am hoping to improve this over time. I am currently working on making the code accessible, however the current version is very hacky and unconventional, so even if I open sourced it in its current form it probably wouldn't be useable to anyone but me. I am hoping to make a new version which is faster and easier to use, so that is coming soon!
@sahil_tayade Рік тому
@@aitango that's awesome! I'm glad I subscribed to you :)
@beaksters Рік тому ⁺²
God I love AI, except I will always prefer natural learning AIs
@aitango Рік тому
Glad to hear it! What do you mean natural learning AIs?
@guardianwaldo Рік тому ⁺¹
So fundamentals of teaching here make sure you as the teacher prioritise the right motivators with your students and give them the amount of time and encouragement they need to flush out there understanding. Litterally we are using ai to relearn how to teach things... if only we got the chemicals from our family and frienda and communities that we get from watching this ai grow in the exact same way...
@aitango Рік тому
Interesting point. Reinforcement Learning does show some interesting things about teaching since agents are solely driven by rewards. This of course does something go horribly wrong, which I have to thing quite hard about to avoid!
@SonyUSA Рік тому
Fun, but the other racers may as well not be there, they are invisible to the AI as far as it's concerned and only serves to slow down the start of the race with unknown variables that make it reset itself. The only thing it's (unknowingly) learning is how to manipulate the enemy AI from a fixed starting position until it can get into first place reliably and quickly, at which point the rest of the track is static learning. It helps he's heavier too and can bump most of the racers to reduce rng jitter.
@aitango Рік тому ⁺¹
Yeah the AI really doesn't care about the other racers, they are just kind of obstacles. They also serve as a lot of noise in the input too compared to when the AI is in first and everything looks the same. In many videos I used multiple starting positions to try and alleviate this, but its still a tough issue. Would need quite a lot of varied positions to try and fix it
@michaelmoran9020 Рік тому ⁺¹
Mentioned this on the last video, if you're using a CNN have you tried giving it a horizontally symmetric kernel so it will function identically in mirror mode?
@aitango Рік тому
I am using a CNN, but haven't tried horizontally symmetric kernels. Would be interesting to see though. Furthermore, not using a horizontally symmetric kernel but training the AI on both the mirror and non-mirror version of a track could be an interesting way of preventing overfitting
@stilliving Рік тому
The speed reward is giving it a warm time with upcoming corners. Not sure what would help with that
@Spike11302000 Рік тому
I would like to know more on how you are reading the memory values from the game, i been wanting to this for a while but i couldn't find a way to read memory from dolphin. also how are you figuring out what memory addresses are for what value like for speed and race completion?
@aitango Рік тому
The code I show in the video gives most of how to access dolphin's memory values, just after installing the dolphin memory engine library. Finding the specific memory addresses is really difficult though
@thevillainscott Рік тому ⁺¹
So this is how Skynet starts, huh?
@aitango Рік тому
I imagine the first scene of the film would be it playing Mario Kart haha
@somisimons Рік тому
This is not funky kong, this is flashy kong!
@AudioYT Рік тому
Subbed
@TheDannyBoy12 Рік тому ⁺³
I’m very impressed, I don’t know if you could optimize it much more.
@aitango Рік тому ⁺⁵
Glad to hear it! I tried my best, but after finishing I think I needed something to encouraging wheelies
@GWLmantap Рік тому ⁺³
@@aitango yeah man encouraging wheelies, wait do you have a control set for the ai to do wheelie?
Edit: when will you make the ai learn how to rocket start?
@aitango Рік тому ⁺²
Yeah I can make the AI do wheelies! Rocket starting is a little odd since it’s so different to the rest of the game, it can quite heavily stunt the progress of the ai
@totaleclipse2225 Рік тому ⁺²
You should add item use and enemy hit. That way it can take shortcuts and it could be rewarded for its place.
@aitango Рік тому
I'm working on it!
@NateM135 Рік тому ⁺¹
For 1:50 how did you get the value 17964696? Is there a wiki page or spreadsheet somewhere with this info?
@aitango Рік тому
Much to my dismay, there is not! I had to find these values myself, and it can take some substantial effort. Even using a fairly smart search to look through memory, I had still had troubles finding them
@NateM135 Рік тому
@@aitango Insane. Is there atleast a place that mentions all the possible variables exist somewhere in MKW’s memory?
@NateM135 Рік тому
@tyler clark Yeah I can’t really find anything either. I tried using the same words he uses in the video as well with dorks and can’t find any info about this anywhere. I also couldn’t find comments in the previous video mentioning the lap completion variable; I assume the people who commented that would know where to find this info
@morgan0 Рік тому
so crazy idea, give the ai all the wii buttons and some way of selecting multiple (maybe there’s some cutoff for the value to decide if it’s pressed or not, and shaking or turning the wiimote processed in some other way), and let it loose in multiple games. would take longer to train but maybe it would learn a more general understanding of how to play video games on the wii. maybe 3d understanding would conflict with 2d but i’d guess there’s a lot of shared understanding between games in each category.
@morgan0 Рік тому
i think if you had multiple instances of dolphin running, and if you could programmatically pause and start them and send stuff like inputs and savestate loading to the right one, you could have that switch between multiple games without manual input
@aitango Рік тому
Making a general Wii AI would be really interesting, but sadly can't be done by me alone as this would require a LOT of compute. We're talking like 1 billion frames plus, whereas these AIs are usually trained for between 5-25 million. Perhaps a company like Deepmind could take a shot at it though, or I just need to buy a load of computers haha. Using multiple instances of dolphin is something I've looked into. It's possible, just a real pain to get set up and would takes ages to re-setup for every new AI I make, so I've kept away so far.
@littlelum9773 Рік тому ⁺¹
Would it be possible to minimize hopping by allowing turns that aren't drifts?
@aitango Рік тому
I think that’s something I’m looking to add in the next version as most of the AIs I’ve made have had a problem with alignment due to not being able to just slightly turn
@HugRunner 10 місяців тому
Really nice video and well done with you AIs. I don't mean this a critique to you, but I'm still leaning towards that this type of AI hasn't really learned anything. It's just forced to act towards a predetermined goal that you have set rules for. Take for example, as a simplified thought experiment, that your goal is for an AI to achieve a goal which we represent as the alphabet, ABCD... etc. If we "reward" getting to B, C, D etc. and punish A, D, B..., eventually even by random picks, we will reach the goal given enough time. Is that really learning though?
To me, the "ultimate" AI learning would be to allow the AI to learn a game through a somewhat simple tutorial where it gets to experience different concepts of the game for a limited amount of time, and then the AI goes through the actual game putting all the pieces together to overcome both new and previously seen obstacles. Obviously this has to be done without 1000000 tries "learning" each level. Haven't seen anything getting close to that so far. (And I'm not saying that we should be there yet or that your videos aren't impressive and entertaining regardless.)
@blackghostcat Рік тому
I was wondering about something. Does this use a navmesh that gives higher point values based on apexes of turns?
@aitango Рік тому ⁺¹
No this does not use a Navmesh!
@blackghostcat Рік тому
@@aitango whoahhhh now I'm hyper curious
@WilsontheWolf Рік тому ⁺¹
I'm wondering, to prevent over-fitting, could you just change the map every x generations, while keeping the same ai. Might take more time to train, but then it would be suited to a variety of tracks.
@aitango Рік тому
On an old video (doing time trials rather than vs other AI), I actually trained the AI on multiple tracks. Its hard to say if it prevented overfitting, but it definitely helped the issue. Took ages to train though!
@CodeF53 Рік тому ⁺¹
Time trials and let it use mushrooms
Assign negative weight to using mushrooms that is only outweighed by a good shortcut
@aitango Рік тому
If I was to add mushrooms, that is a good idea. Otherwise the AI would likely just spam them all at the start unless sufficiently deterred
@Cqrt3r Рік тому ⁺³
Is there a way you could discourage the constant hopping?
@aitango Рік тому ⁺²
I’ll have to check the speeds again, but I believe I could use another speed threshold. If the constant hopping causes the speed to go below 84km/h (the speed of a drift), I could give it a negative reward causing it to avoid that behaviour
@gbyt034 Рік тому
@@aitango or you could allow it to turn normally to align drifts
@birchy188 Рік тому ⁺²
Are you running this at 1.0x emulation speed or different for training?
@aitango Рік тому
This runs at 2x speed during training. I wish I could get it faster, but dolphin sadly wasn’t built for this type of thing
@RMED24 Рік тому ⁺¹
@@aitango Is this because of the framerate cap limit or because your PC struggles to get past 2x speed?
@aitango Рік тому
@@RMED24 It's actually running the AI algorithm. Currently my setup runs one frame of the game, then does one training iteration and repeat. Another problem as well though is how quickly dolphin can accept input, as I'm not sure if it can accept greater than 60 unique inputs per second
@coolcat8098 Рік тому ⁺¹
Couldn't you use a few different characters on the same track to prevent overfitting, rather than manipulating spawns, since each driver has different stats?
@aitango Рік тому
That is an interesting idea, it would force it to be much more general. I've considered using different vehicles as well such as the mach bike, for similar reasons. Perhaps using different spawns/vehicles/characters would create a very general agent
@Cracks094 Рік тому
would it be possible to make the Ai "see" the map on the bottom right, or rather the vision cone of that map, and make a decision based on that information? It might keep it from trying to turn left when the track obviously goes to the right
@aitango Рік тому
The bottom right is currently in the image the AI is given, but its probably too low resolution to make much out. Would be interesting to see how it would do just with the map though; as you say it should simply turning significantly, even if it loses some precision by not knowing exactly how close to walls and things it it
@r4z0rb3ard6 Рік тому ⁺¹
now it needs to learn how to use items, how to attack other raceers and defend itselfe from them, plus useing shrooms and stars for shortcuts. this would be hard to programm
@aitango Рік тому
I've avoided them so far, but its definitely possible, would just require a huge amount more training. If I had google's servers to myself I definitely would haha
@depressedowl Рік тому
Just for curiosity sake, why add an object detection model to help provide more infomation for the AI. One model for object detection that can detect imoortant information about the track such as the location character itself and speed boosts on the track. Then somehow combine that infomation as rewards to the movement AI.
@depressedowl Рік тому
Note that i think it might be better for other games such as new super mario bros wii
@aitango Рік тому
That’s a very interesting idea that definitely has potential, especially for new super Mario bros as you mention. I will have to have a look about the RL literature and see if anyone has done something like this before
@mechaboy95 Рік тому
new thought
find a texture pack that removes textures and just shows the ai what is road (green) offroad (red) boostpanel (yellow) and maybe so other stuff like that
that will probably give it a better understanding of where it is going, idk if this exists, but its a thought
@aitango Рік тому
I've had a few comments suggesting this, and I may look into mods/texture packs which do something like this as I think it would be a really interesting comparison to view
@tomrotelli1355 Рік тому ⁺¹
How long do you reckon a human has to play before he can first courses on hardest difficulty?
@aitango Рік тому
That’s an interesting question that I haven’t really though about! It would vary hugely based on the player, but maybe anywhere from 25 hours to 250 hours is my guess! I’m curious to see what others think of that too
@kurtisharen Рік тому
Once you have an AI learn the track really well, what happens when you drop that same AI onto a new track entirely? Does it's existing knowledge carry over and help it at all?
Also, what happens if you tie it's current race position into the reward variable? So it gets the highest reward for staying in first place. Could you have it train in multiplayer mode against copies of itself, to accelerate the learning process? Like, one is still in "learning mode" and the opponents are locked at the current "final agent" AI. Do you think that would affect the learning process at all?
@aitango Рік тому
I explore this a little in my video "Mario Kart Wii but its an AI". Current race position in the reward would help at the start, but then once it reaches first probably wouldn't do much. Could be an interesting way to accelerate the early stages though. Multiplayer is an interesting idea, as I am looking for ways to speed things up. The only issue is resetting the AI, is currently it dies whenever the speed drops below a certain amount, but it would be unclear when to reset it with multiple. If I had a good solution to that problem though, it would definitely be good! I'm not sure how having one in learning mode and the others in final agent would affect learning, but would definitely be interesting to see
@kurtisharen Рік тому ⁺¹
@@aitango Keep in mind that the opponent AI in this case would be advanced enough to more likely pass the learning AI, and keep changing race position. It might still be useful in that case.
As far as when to reset it, it doesn't seem to make any more stupid mistakes at that point, any times it slows down too much after a while appears be be when it's trying to use a shortcut, and even then it gives up after enough penalties. I think that once you get the AI to a point where you can train it against multiples of itself, you don't need to have low speed as the death condition. At that point, you might want to change it to either "Speed = 0" or when the spin-out animation plays, because that would mean it hit something, and you can start teaching it exclusively to avoid obstacles/threats. Doing it this way also means that you can make the opponent AI spam the item button and see if you can teach your still-learning AI to try and evade enemy items.
@montyy3130 Рік тому ⁺¹
You should try this AI to play a t1 mogi in competitive Mario kart wii
@aitango Рік тому
Maybe after about 10000 hours training :)
@montyy3130 Рік тому
Lol
@emperorjustinanthefirstoft6320 Рік тому
Just thinking, I saw it spam hopping, which could be a sign of 2 things. It could be trying to align a drift, even when it isn't necessary or it could be a sign of overfitting
@aitango Рік тому
I definitely think it uses the spam hopping to some extent to align for drifts. What do you mean by a sign of overfitting?
@emperorjustinanthefirstoft6320 Рік тому
@@aitango it could be that spam hopping randomly happened to a successful AI so others try to replicate it
@gamerkiba15 Рік тому
bro I can't see funky kong and dont remember WOTFI 2022😂😂
@NoFaceShiba Рік тому
i do wonder how many hours it would take for it to understand items
@aitango Рік тому
No idea, probably a lot though given how many different interactions their are to learn, such as getting starred, red shelled, blocking shells with items, the list goes on. I'm just hoping to get the AI to win a race with items in, let alone use them effectively!
@NoFaceShiba Рік тому
@@aitango yeah id imagine its quite a while, its very interesting that its even a possibility though
@oktayyildirim2911 Рік тому ⁺¹
Cool video, but your audio mixing could use some work; the music was much louder than your commentary.
@aitango Рік тому ⁺¹
Yeah I noticed that, I think in future videos I'll try and balance it out more
@vincenzofranchelli2201 Рік тому ⁺¹
is it possible to teach it using some human races as well to learn things its unlikely to fgure out itself?
@aitango Рік тому
It's definitely possible, there's a fairly simple method called Imitation Experience Replay which basically just allows me to give the AI access to some human gameplay. This could drastically help the AI explore in the right direction
@SimpleAmadeus Рік тому
I find it suspicious that the AI does really well and then, as soon as it reaches a new PB, completely forgets what a road is and immediately attacks the nearest wall. This does suggest that it is using its node formulas to encode a sort of memory of the track, through trial and error, rather than actually responding to its environment in the moment.
@aitango Рік тому
For learning a single track, especially at the start the network is likely to overfit, but that isn't really avoidable given the AI's limited experience. The hope however is that as it gets further through the track and explores more different areas, it will struggle to overfit, and be forced to learn a more general policy
@SimpleAmadeus Рік тому
@@aitango Wouldn't it have a really hard time switching to a completely different generalist approach, when everything it knows is the "memorization" approach? Intuitively this feels like teaching the AI to build a very specific traffic-light intersection, then urging it to optimize that intersection, and hoping that in this optimization it will suddenly invent the roundabout. I would expect it to stick to its known paradigm.
@sonicwaveinfinitymiddwelle8555 Рік тому ⁺²
next time make it use items
@aitango Рік тому
It's on my to-do list, but will take quite a while to train!
@brawlfan Рік тому
Hey if you get a friend who has a PC they'd be willing to use for AI training you could do 2 tracks at once.
edit: Actually, what about training with CTGP's item rain? The chaos could help the AI get better.
Also, has the AI learnt to use items and if not will it?
@aitango Рік тому
I wish I had a friend who would let me do that to their PC haha! Training on item rain would be interesting as it is just complete chaos, plus we be fun to see it learn to pick up bullets and stars and things. I haven't done items yet, but it is something I really want to do in the future. It has quite a few technical challenges which make it difficult, but if I can solve those I'll be straight on it!
@itz_ult Рік тому
Do Delfino Square next
@aitango Рік тому
Perhaps I can make that happen
@Flarefan60 Рік тому
I bet you can encourage it to actually stay on the road by rewarding the color of the road on both sides of the character
@vincenzofranchelli2201 Рік тому
at some pojnt even tho it adds a huge layer of complexity you gota teach it to use items
@aitango Рік тому
It's on my list, but I'm working up to that one slowly as there's a chance it'll take like a year to train
@RustyNova Рік тому
Why not reusing the model from last time? Could maybe help in speeding the prcess and prevent overfitting
@aitango Рік тому
Typically I would, and that is something I will do in the future. The only reason I didn't is because while the knowledge is transferable, I keep changing the reward function which these AIs typically struggle to transfer between tasks. When I stop messing around with the reward function, it'll definitely be something I'll do
@RustyNova Рік тому
@@aitango I didn't know that changing the reward was harder for the AI. The more you know
@MineBuoy Рік тому ⁺¹
Why does it keep jumping tho? :P
@aitango Рік тому ⁺¹
My guess is partially for alignment, and partially to avoid getting wheelie bumped, but that's just speculation
@farazmirza6048 Рік тому
@@aitango Yeah it's very likely that's what's causing it since turning in a wheelie immediately reduces speed. The main challenge now is creating an AI that learns how to get proper alignments.
@RitaTheCuteFox 10 місяців тому
Last Mario Kart Video, huh?
@martilechadescals653 Рік тому
Hi, nice improvement. Could you please share this code too??
@aitango Рік тому ⁺¹
Thanks! I will need to do a video when I release the code as it requires quite a lot of setup rather than just the source code. Was planning to do it as a 1000 subscriber special or something like that
@jaroto12 10 місяців тому
So basically you made an ai with a praise kink and told it to play mario kart?
@stargaming5912 Рік тому
add higher rewards then 1 or -1
like -2 for losing a position in the game
and -4 for getting last
@aitango Рік тому
I typically try to keep the rewards in quite a small range, because they can add up quite quickly and neural networks are designed to predict values relatively close to 0. For example, a reward of 0.5 might not seem like a lot, but if it gets 0.5 every frame, it will end up predicting a value of 47.6, which is way too high. For one of rewards though such as finishing the race, I do use higher values such as +12 for finishing.
@noobtracker Рік тому
Can the AI "see" the time?
@aitango Рік тому
The timer is in the region of the screen the AI can see, so yes. No Idea if it ever actually uses it though
@noobtracker Рік тому
@@aitango ok, I hope that it doesn't just learn at which times it should output something ...
@aitango Рік тому
Using multiple start locations will prevent this as it will start in different places with different times. Plus looking at the final agent it completes the track in a variety of different times
@noobtracker Рік тому
@@aitango So it can't just learn the 4 (there are 4 starting positions, right?) variations?
@Fiscooemoismydaddy Рік тому
I know
@aitango Рік тому
What do you know?
@Fiscooemoismydaddy Рік тому
@@aitango look below
@carlrygwelski586 Рік тому ⁺¹
Impressive, now stop "playing" with the AI. 🤣
@DreamyToast1 Рік тому
Your so looks like its in last
@noahwhipkey6262 Рік тому ⁺¹
this is weak. optimal AI bunny hops so fast its like a hovercraft
@Litschi21 Місяць тому ⁺¹
Code:
import dolphin_memory_engine as dme
import time
dme.hook()
print(dme.is_hooked())
class Reward():
def __init__(self):
self.last_comp = 0

self.check = 1.05
self.check_inc = 0.05
self.check_timer = time.time()

def reward_function(self):
reward = 0
terminal = False
speed = dme.read.float(17964696)
race_completion = dme.read_float(17964696)
reward = speed / 2200
if speed < 37:
return -1, True
if self.last_comp != 0:
reward += (race_completion - self.last_comp) * 100
self.last_comp = race_completion
if race_completion > 3.99:
return 10, True
if speed > 100:
reward += 0.1
if race_completion > self.check:
reward += 1 / (time.time() - self.check_timer)
self.check += self.check_inc
self.check_timer = time.time()
if terminal:
ran = np.random.random()
if ran < 0.25:
#spawn location 1
elif ran < 0.2:
#spawn location 2
elif ran < 0.75:
#spawn location 3
else:
#spawn location 4
return reward, terminal
@BlazingImp77151 Рік тому ⁺¹
i find it interesting how the AI was able to recognise the shortcut. maybe it was building where it would go based off the positions of walls?
I think with this course a way that the AI could've picked up the layout is that grey is good, and red and white stripes means it needs to turn. I wonder what it actually did
@aitango Рік тому
True, given that it tried to go through the gap that does sound like a plausible explanation!

Наступне

Автоматичне відтворення