AI Learns MARIO KART WII
Вставка
- Опубліковано 15 лип 2022
- In this video, an AI uses Machine Learning (Reinforcement Learning) to play the Mario Kart Wii time trials on Luigi Circuit, playing as Funky Kong with the Bowser Bike / Flame Runner, using just pixel input.
To give some technical details, this uses Rainbow DQN arxiv.org/abs/1710.02298?cont.... All parameters are the same as in the original paper, except for sigma (for noisy nets) which was set to 0.1 instead of 0.5. - Ігри
This is a really cool project but there are a lot of tools for better training. You can use programs like dolphin memory engine, lua core dolphin, or any other memory watch to take data addresses directly from the game. Some very useful inputs would be vehicle speed (for obvious reasons), mini turbo charge (so it can actually know how much drifting is needed to get the boost), race completion (a much better value to use as reward than the minimap head), and many other values related to how the vehicle is moving. Also it looks like there are some basic restrictions on movement from what I can tell such as no turning outside of holding the drift button, and no holding neutral during a hop, which certainly simplify things but are still some of the reasons why it has to rehop to realign so much. And it would probably help to use the right vehicle for the track lol.
Thanks for the advice, I'm new to using dolphin so was unaware of some of the useful tools out there. I may look to use memory values instead as it could definitely speed up training as you say. The action space this AI uses is very simplified, with just 5 actions. Reducing the action space massively reduces training time, hence why I tried to limit it as much as possible without drastic performance loss. If memory values turn out to train much faster as I'd expect, I can look to increase the action space.
I would be curious to see whether there's any transfer learning to other tracks. As far as I understand, one needs to re-tune the AI to new tracks as otherwise it can't perform, but maybe the re-tuning is quicker than training from scratch due to it having learned the basics of Mario Kart.
Yeah I definitely agree. My best guess would be that it would struggle on new tracks, but still be much faster than retraining from scratch as you say. I am considering attempting to train an agent on multiple tracks, so will be interesting the see if trained on say 4 tracks, how well it would generalize to other tracks.
@@aitango The limiting case could be leave-one-out crossvalidation, where you train the AI on all except 1 track, and then see how well it generalizes to the single left-out track. Could perhaps be useful to test how well Mario Kart is "learnable" through this method? (Though the training would take a lot of compute...)
@@tailcalled Yeah that would be a really interesting test to do, and would definitely show how well agents generalize between tracks. As you mention, it would take a huge amount of compute, as it took a long time on a single track!
Great Project and really interesting to see how you solved the checkpointing! I'm also currently working on an implementation based on Actor-Critics. How did you detect driving offroad or crashing? Also, on what hardware did you train your model and is you environment speeded up or do you learn in playing the game with real-time?
Thanks, great to hear! I used the difference in the position of Funky Kong's face on the minimap between frames. If the average distance between frames over the last 8 frames was under a threshold, I would end the episode. It actually worked really robustly! I trained the model on a standard new desktop computer, with a single rtx 3070. The environment runs in real time, with the AI experiencing the environment at the same rate as a human would.
@@aitango Nice, thanks, thats good to know. Since I am aiming to do some transfer learning to let the model generalize further, I am hoping that this will speed up training process :)
If you're already using dolphin for this, I'd suggest actually switching to the lua core of dolphin that they use for TAS because it allows you to access the memory values a lot easier and you could use those to feed to the AI too
Yeah that would definitely be worth a try, interesting to see if memory values or pixel values works better.
@@aitango Memory values will almost certainly work better.
@@vabold_ if he wanted to be a pixel purist you probably could find a way to extract most of that data from the minimap icon + torch thing
@@gumbo64 You can't derive the multiple velocity vectors from pixels because the only visible updates are when all of them are combined.
@@vabold_ what are the multiple velocity vectors? isn't it just normal x and z velocity + maybe drifting stuff but its always drifting so it should be learnable right
So this is where it all started. Like the 100 hours literally felt like a standard casual player with basic knowledge playing. If you showed that to someone who didn’t know what AI was, they would think it’s a real player. Great video!
Really glad you enjoyed it! I'd be really curious if people could tell the difference between this and a human playing!
one thing I've noticed is the ai would probably do much better if it is rewarded for the time holding a drift and by which level of boost it gets from releasing it. if you're still reading these comments, I'm curious if that's something you've implemented into the reward function since?
This is super cool
I’m curious to see if an AI would learn snaking with enough training in Mario kart ds
Really cool
whats is the benefits of using checkpoints instead of other methods, for example: on track and moving forward in the right direction?
Checkpoints are a good option mostly for convenience and robustness, as they are quite simple. They do however have the downside of the AI learning where the checkpoints are, rather than how to drive, causing the AI to generalise less. On the track and moving in the right direction would be better, however this information was quite difficult to acquire. Using the work of the TAS community in Mario Kart, in later videos I was able to access some variables such as the speed, which worked much better. "Going in the right direction" turned out to be quite problematic though, I originally tried making my own system, but ended up being too inaccurate to reliably reward the AI. Using Mario Kart's internal position tracking had problems too, as it also wasn't accurate enough to use on a frame by frame basis.
Interesting video! Although I would suggest that you fast-forward through the training part since seeing Funky Kong go from the start to the point where he resets is kinda boring
Glad you think so! In future I might do like 2x speed or something and focus on the more interesting bits
quick question as i don't really have any knowledge but the other day i was watching how the ai bot for rocket league was made and they used multiple instances of the game and changed the game speed, anyway you could use this method to speed up the learning?
The Wii emulator sadly doesn't really support multiple instances... but I may have found a workaround (maybe new video soon :))
Did you set up your encoder so that it was equivariant with respect to mirroring the screen? If you do that, then it should generalize to mirror mode trivially. Also have you tried supervised pre-training where you train it to match a TAS / human run first and only then is the DQN objetive used?
What encoder are you referring to? The image is just fed into a convolutional neural network. The image is just a greyscale pixel array, created simply from a larger rgb pixel array
Also supervised pre training us something im looking into! Definitely potential there
CPU: Wins by cheating
AI: Wins by practicing more than any human will be able to
I don’t know, I’ve seen some humans who have spent an absurd amount of time playing Mario kart haha
@@aitango
Humans need to sleep at some point
what resources would be good to learn how to do this myself?
How come you used flame runner over spear?
I actually wonder if the spears poor drift would make it less likely for the ai to crash into a wall or go offroading
I was considering seeing how this AI performed on other tracks, so wanted the AI to learn with the flame runner, but spear definitely would've been better for Luigi Circuit
Hi, nice project. Could you please share the code??
Thanks! The code is coming at some point, perhaps in a few videos time. I will need to do a video to go along with it, because the code alone won't allow you to run the AI as it requires some setup
@@aitango great! I'd love to give this a try on some other game. Not sure which one yet.
please share the code
Currently the code is quite a mess as I'm still working on this project, hoping to make some more videos on it. Once I finish the project I'll neaten up the code a little and open source the code.
@@aitango awesome. thanks a lot