AI Learns MARIO KART WII

AI Tango

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 15 лип 2022
In this video, an AI uses Machine Learning (Reinforcement Learning) to play the Mario Kart Wii time trials on Luigi Circuit, playing as Funky Kong with the Bowser Bike / Flame Runner, using just pixel input.
To give some technical details, this uses Rainbow DQN arxiv.org/abs/1710.02298?cont.... All parameters are the same as in the original paper, except for sigma (for noisy nets) which was set to 0.1 instead of 0.5.
Ігри

КОМЕНТАРІ • 45

@campbellblock3061 Рік тому ⁺²⁴
This is a really cool project but there are a lot of tools for better training. You can use programs like dolphin memory engine, lua core dolphin, or any other memory watch to take data addresses directly from the game. Some very useful inputs would be vehicle speed (for obvious reasons), mini turbo charge (so it can actually know how much drifting is needed to get the boost), race completion (a much better value to use as reward than the minimap head), and many other values related to how the vehicle is moving. Also it looks like there are some basic restrictions on movement from what I can tell such as no turning outside of holding the drift button, and no holding neutral during a hop, which certainly simplify things but are still some of the reasons why it has to rehop to realign so much. And it would probably help to use the right vehicle for the track lol.
@aitango Рік тому ⁺¹⁰
Thanks for the advice, I'm new to using dolphin so was unaware of some of the useful tools out there. I may look to use memory values instead as it could definitely speed up training as you say. The action space this AI uses is very simplified, with just 5 actions. Reducing the action space massively reduces training time, hence why I tried to limit it as much as possible without drastic performance loss. If memory values turn out to train much faster as I'd expect, I can look to increase the action space.
@tailcalled Рік тому ⁺¹⁰
I would be curious to see whether there's any transfer learning to other tracks. As far as I understand, one needs to re-tune the AI to new tracks as otherwise it can't perform, but maybe the re-tuning is quicker than training from scratch due to it having learned the basics of Mario Kart.
@aitango Рік тому ⁺⁴
Yeah I definitely agree. My best guess would be that it would struggle on new tracks, but still be much faster than retraining from scratch as you say. I am considering attempting to train an agent on multiple tracks, so will be interesting the see if trained on say 4 tracks, how well it would generalize to other tracks.
@tailcalled Рік тому ⁺³
@@aitango The limiting case could be leave-one-out crossvalidation, where you train the AI on all except 1 track, and then see how well it generalizes to the single left-out track. Could perhaps be useful to test how well Mario Kart is "learnable" through this method? (Though the training would take a lot of compute...)
@aitango Рік тому ⁺²
@@tailcalled Yeah that would be a really interesting test to do, and would definitely show how well agents generalize between tracks. As you mention, it would take a huge amount of compute, as it took a long time on a single track!
@rohansa2885 Рік тому ⁺⁴
Great Project and really interesting to see how you solved the checkpointing! I'm also currently working on an implementation based on Actor-Critics. How did you detect driving offroad or crashing? Also, on what hardware did you train your model and is you environment speeded up or do you learn in playing the game with real-time?
@aitango Рік тому ⁺³
Thanks, great to hear! I used the difference in the position of Funky Kong's face on the minimap between frames. If the average distance between frames over the last 8 frames was under a threshold, I would end the episode. It actually worked really robustly! I trained the model on a standard new desktop computer, with a single rtx 3070. The environment runs in real time, with the AI experiencing the environment at the same rate as a human would.
@rohansa2885 Рік тому ⁺¹
@@aitango Nice, thanks, thats good to know. Since I am aiming to do some transfer learning to let the model generalize further, I am hoping that this will speed up training process :)
@adammorgan5186 Рік тому ⁺⁸
If you're already using dolphin for this, I'd suggest actually switching to the lua core of dolphin that they use for TAS because it allows you to access the memory values a lot easier and you could use those to feed to the AI too
@aitango Рік тому ⁺⁶
Yeah that would definitely be worth a try, interesting to see if memory values or pixel values works better.
@vabold_ Рік тому ⁺³
@@aitango Memory values will almost certainly work better.
@gumbo64 Рік тому
@@vabold_ if he wanted to be a pixel purist you probably could find a way to extract most of that data from the minimap icon + torch thing
@vabold_ Рік тому
@@gumbo64 You can't derive the multiple velocity vectors from pixels because the only visible updates are when all of them are combined.
@gumbo64 Рік тому
@@vabold_ what are the multiple velocity vectors? isn't it just normal x and z velocity + maybe drifting stuff but its always drifting so it should be learnable right
@PinkyNardo 10 місяців тому
So this is where it all started. Like the 100 hours literally felt like a standard casual player with basic knowledge playing. If you showed that to someone who didn’t know what AI was, they would think it’s a real player. Great video!
@aitango 10 місяців тому
Really glad you enjoyed it! I'd be really curious if people could tell the difference between this and a human playing!
@jortsmcpunch2580 6 місяців тому
one thing I've noticed is the ai would probably do much better if it is rewarded for the time holding a drift and by which level of boost it gets from releasing it. if you're still reading these comments, I'm curious if that's something you've implemented into the reward function since?
@Joeblase Рік тому ⁺¹
This is super cool
@ProCraftGamin 10 місяців тому ⁺¹
I’m curious to see if an AI would learn snaking with enough training in Mario kart ds
@user-xn3jd9hw9c Рік тому ⁺¹
Really cool
@TheRazackk Рік тому ⁺²
whats is the benefits of using checkpoints instead of other methods, for example: on track and moving forward in the right direction?
@aitango Рік тому ⁺³
Checkpoints are a good option mostly for convenience and robustness, as they are quite simple. They do however have the downside of the AI learning where the checkpoints are, rather than how to drive, causing the AI to generalise less. On the track and moving in the right direction would be better, however this information was quite difficult to acquire. Using the work of the TAS community in Mario Kart, in later videos I was able to access some variables such as the speed, which worked much better. "Going in the right direction" turned out to be quite problematic though, I originally tried making my own system, but ended up being too inaccurate to reliably reward the AI. Using Mario Kart's internal position tracking had problems too, as it also wasn't accurate enough to use on a frame by frame basis.
@dirt3554 Рік тому ⁺¹
Interesting video! Although I would suggest that you fast-forward through the training part since seeing Funky Kong go from the start to the point where he resets is kinda boring
@aitango Рік тому
Glad you think so! In future I might do like 2x speed or something and focus on the more interesting bits
@jbezzaplays Рік тому
quick question as i don't really have any knowledge but the other day i was watching how the ai bot for rocket league was made and they used multiple instances of the game and changed the game speed, anyway you could use this method to speed up the learning?
@aitango Рік тому
The Wii emulator sadly doesn't really support multiple instances... but I may have found a workaround (maybe new video soon :))
@michaelmoran9020 Рік тому
Did you set up your encoder so that it was equivariant with respect to mirroring the screen? If you do that, then it should generalize to mirror mode trivially. Also have you tried supervised pre-training where you train it to match a TAS / human run first and only then is the DQN objetive used?
@aitango Рік тому
What encoder are you referring to? The image is just fed into a convolutional neural network. The image is just a greyscale pixel array, created simply from a larger rgb pixel array
@aitango Рік тому
Also supervised pre training us something im looking into! Definitely potential there
@TheZombiesAreComing 9 місяців тому
CPU: Wins by cheating
AI: Wins by practicing more than any human will be able to
@aitango 9 місяців тому
I don’t know, I’ve seen some humans who have spent an absurd amount of time playing Mario kart haha
@TheZombiesAreComing 9 місяців тому
@@aitango
Humans need to sleep at some point
@apa. 19 днів тому
what resources would be good to learn how to do this myself?
@SICW1970 Рік тому ⁺¹
How come you used flame runner over spear?
@SICW1970 Рік тому
I actually wonder if the spears poor drift would make it less likely for the ai to crash into a wall or go offroading
@aitango Рік тому ⁺¹
I was considering seeing how this AI performed on other tracks, so wanted the AI to learn with the flame runner, but spear definitely would've been better for Luigi Circuit
@martilechadescals653 Рік тому
Hi, nice project. Could you please share the code??
@aitango Рік тому
Thanks! The code is coming at some point, perhaps in a few videos time. I will need to do a video to go along with it, because the code alone won't allow you to run the AI as it requires some setup
@real9270 10 місяців тому
@@aitango great! I'd love to give this a try on some other game. Not sure which one yet.
@joydeepbhattacharjee3849 Рік тому ⁺¹
please share the code
@aitango Рік тому ⁺²
Currently the code is quite a mess as I'm still working on this project, hoping to make some more videos on it. Once I finish the project I'll neaten up the code a little and open source the code.
@joydeepbhattacharjee3849 Рік тому ⁺²
@@aitango awesome. thanks a lot

Наступне

Автоматичне відтворення