AI Learns to DESTROY pensioner AIs (Mario Kart Wii)
Вставка
- Опубліковано 19 чер 2024
- An AI races against Mario Kart Wii's own AIs, and shows them what next gen AI has to offer. This video makes many improvements upon my last Mario Kart Video, AI Learns to beat other AIs.
If you have any questions I love to answer them!
00:00 Introduction
1:20 New Rewards
5:16 Training Starts
7:12 20 Hours Training
9:17 Final AI - Ігри
Very good progress, a shame that it still got obsessed with dodging speed boosts XD
Thanks! I was a little sad about that too, but I’m sure I can fix it in the next version!
@AI Tango yay! Next version!
@AI Tango yay! Next version!
I think franx is the ultimate representation of a person digging his own grave
@@redyoshi793 why?
Maybe you could give it a reward for holding a wheelie for longer, so that it can get more of that speed boost instead of randomly jumping
To help with over-fitting, maybe you could play with items on. Have Funky just spam his use-item button at all times
Great vid!
I think that’s a good idea, wheelies are definitely a problem. Spam using items isn’t ideal, but definitely the most realistic solution so I might give it a go
With items on the "reset on low speed" rule would be pointless, if not counterproductive. Any hit with an item would trigger it. That rule limits the AI in the first place by disincentivizing the AI to even consider shortcuts, so getting rid of it might be a good thing.
@@aitango i think giving it one more input would probably be best. let it decide when to throw items. make sure red and green are distinguishable with the greyscale method you sre using if you plan to do this though
@@AxidoDE maybe it could be a limit of how long it can go below a certain speed, with it allowing for recovery from items
Something that could help is giving a reward based on the amount of distance driven between checkpoints, this way the AI will try to minimise how far they drive while still achieving the same goal thus resulting in taking turns tighter
That’s a really interesting idea, I didn’t think of focus on distance instead of just time between checkpoints
The fastest distance between two checkpoints not necessarily the shortest distance. Basic cornering theory: you start wide, cut in, then end wide. If you take the shortest route (hug the inner edge) then you will either be going too fast to finish the corner or you will be going slow enough to make the turn but be ultimately slower than the competition.
@@52flyingbicycles its literally just a simple optimization problem speed vs distance whilst maximizing both. just a system of equations
@@noahwhipkey6262 doesn't sound like there's anything "just" in what you said. Seems real complicated to the uninitiated mind
@@stilliving meh. just a bunch of matrices
Hey Tango, this is amazing! I couldn't have imagined so much progress in such a short amount of time. I really think your contribution is extremely valuable to the MKWii community and beyond.
Thanks so much! Feels great to hear comments like this and really makes me keep wanting to produce good content!
Very important improvement: give it the option to turn without drifting, since it hops around a lot for alignment, which you can only avoid by giving it the option not to drift, and maybe even give it a penalty if it starts drifting without getting the mini turbo (not quite sure about that one tho)
That’s a very valid point, it should definitely help with all the hopping and is something I will add to the next version. Rather than a penalty for not getting a turbo, I will give a penalty for dropping below 75km/h since spam hopping will cause this and I already have access to speed. Plus if it needs to do a single hop, this shouldn’t invoke the penalty
Yeah. Even though handling is considered a weak stat in MKW, vehicles still benefit from the ability to turn without drifting. It’s also possible the “wheelie” gene gets killed early by locking in and driving straight into the off-road, whereas small adjustments from repeatedly hopping produce a lot of early offspring due to the ability to maneuver around the other CPUs
to combat the overfitting more you might wanna spread the spawn positions more, since these spawn positions were pretty close
or if you want to experiment with more tracks i think modding in some snes mario circuit variants could make it easier since all those tracks have the same color scheme making it easier for the ai to carry over visual understanding
Yeah spreading out the starts should help so give the AI a wider range of experiences. True, if I can find tracks with similar features such as the family of mario circuit tracks it will probably allow learning between tracks much faster
Awesome! I really enjoyed seeing some of the code/output of variables you are working with for rewards. Other people probably gave that seem feedback on the last episode but I'm very glad to see you included it :) Nice video!
Good to know! Thank you very much!
Insane to think that a bot is playing at that level, great job! I would suggest something like a penalty for multiple drifts in a certain timeframe (like multiple in one second) to help with the rehopping and incentivizing the wheelie with no turn
Thank you so much! My next video didn't include this, but future videos might as I think this is the most fool proof way to stop the spam hopping.
Great improvement to the last video!
Thanks, Always love to hear feedback!
Liked and Subbed. Can't wait for the next Mario Kart Wii AI video!
In terms of what you could improve on in the next iteration (from the perspective of a computer science theory major with an emphasis on computational learning theory) : Typically the more assumptions about the way we "think" an AI should perform that we program into our models, the worse they perform. A good example of this are chess and Go AI. The best Go AI wasn't given any information other than how the game rules and it outperformed heuristic-based models where it was given some human assumptions about strategy.
Essentially, you will likely not achieve significantly better results by looking at the ways this current iteration under-performed and trying to remedy them one-by-one. Have you looked at any of the current research on racing game AIs? I can send you a couple papers that might help a lot.
Thanks! Some papers would help a lot, im curious what others did. I would make a far simpler reward function, however I lean towards using heavily descriptive reward functions, because despite potentially being less optimal in terms of final performance, they can speed up training significantly, which I need to do since I only have a single desktop computer. There's definitely a better balance I need to find though
this circuut never seems that long except in that video 🤣
I feel like you should try to reward the AI for taking smooth racing lines as that will encourage the AI to not jump back and forth. I don't know how easy that would be, but it might help the AI look more human-like and increase the general time to take corners
I've been wanting to figure out a reward for taking good lines, I just haven't been able to come up with a good solution. Originally I tried using the game's race completion variable, and giving it a higher reward for going taking less steps to get to the same location (ie taking better lines), however the game's variable was just too unreliable for this to really work
I’d be interesting to reward the AI based on the amount of time it could go straight forward in a wheelie, it would get rid of the random hopping problem
In my most recent video I actually included something similar to this. The AI would get increasingly much more reward for going at a higher speeds between 84 and 97 km/h, giving it an incentive to wheelie for longer as these are the range of speeds the AI will go between when wheelieing
Loving the videos! I’m curious about some of the more technical aspects of the RL algorithm you’re using. Specifically, you kind of imply you’re using something like a Deep Q Network when you mention that you’re feeding in image frames. Is this what you’re using?
Do you think you could use data from your emulator to more efficiently learn this task with another algorithm (a model-based alg instead of model-free)? Like by feeding in the position of your agent, speed, other players, walls, etc. instead of just an image or stacked frames?
Glad to hear it! I'm using an algorithm called Rainbow DQN, which is a very advanced version of a Deep Q Network with many improvements. In theory I could use data from the emulator to use a model-based algorithm, however retrieving data such as speed and race completion takes substantial effort. I've gotten reasonably good at it now, but still takes me some time to do. One idea I'd love to explore more is model based algorithms where the model is learnt rather than given, however these often require significant computational resources, but provide amazing results
I like that bots are included in the training. This adds a great amount of variation to prevent overfitting. It's nice to see that the AI can still perform even after bumping or getting a draft. Unfortunate that this variation is kinda lost once the AI improves to the point of frontrunning the whole race
Yeah the other racers do provide a lot of variation in the input, but once the AI starts frontrunning its just a time trial at that point. I might add some more different start locations in future videos so that the AI is forced to race against other at all points of the track. The only reason I've avoided this so far is because getting all the savestates takes a while!
This is an incredibly interesting project. As someone who spends a lot of time optimizing this game, I would love to see how far this is able to be pushed. It's unfortunate that the AI cancels wheelies so often. After the first time it makes sense, since funky's acceleration while in a wheelie is not very good, so you are usually better off drifting for a mini turbo to get back to max speed. I wonder what would be a good way to teach the AI that you want to be in a wheelie for max speed as much as possible and use mini turbos and drifting to reach that max speed.
Yooo tas snoop! Used to watch your taf montages
Thanks! It's fun to hear the opinions of experts such as yourself! I've been looking into rewards for going over 88+ km/h, as this should encourage the AI to hold the wheelie to pick up the reward. The results of that will be shown in the next video!
Let's go, the TAS community found this channel!
I kinda feel instead of getting a reward for the time between checkpoints, it would be more effective if it got a reward based on the current race time at each checkpoint or when it finishes a lap but I'm not entirely sure.
Also after this ai gets good at mkwii, it would be very interesting to see what would happen if it played another mario kart game like 8 and how much skill would carry over if any.
From what I can tell the AI literally just works with image input.
The different art style of mario kart 8 alone might be enough to completely baffle it. It might not have any idea of what to do.
That said, just learning how to differentiate what is road and what is not is an ability it might be able to pick up generally, regardless of art style, so as you say it may be interesting to see.
In the game Perfect Dark, you could play with customizable bots. For example, you could pick the grudge bot, who would attack the last person to kill him. Or slapper bot only used his fists in combat.It would be fun to have similar options for each AI opponent in Mario Kart.
That does sound cool! The game would be more interesting if the built in AIs had more personality than just being so homogeneous
What great progress! I was wondering if you could provide updates about how the AI’s best completion times compare to the world record on each course with each video.
Thanks! I might add something like that in for reference in future videos to give some perspective
You should definitely do it so that it checks how much speed it's gaining, and the more speed it gains the bigger of a reward you give it
Do this for next ai video:
Do a side by side of what we see and what the ai sees
Show the neural network of the ai
I also love this content Keep it UPPP
Thanks, I love to hear it! I definitely plan on showing what the AI sees, however the neural network is a bit problematic since it’s huge (three convolutional layers with 32,64,64 channels and a fully connected layer with 512 neurons)
This is pretty darn good, but I ofc have one main criticism: you can't just train it on 3 tracks and have it be able to figure out what to do, like what you can do with the trackmania AIs. I'm guessing that here that would require some object detection module running at the same time though.
I don't think object detection is required for this form of AI, infact Deep Reinforcement Learning (what I'm using) has been used before to teach a single AI to play 57 different atari games to above human standard. The main issue I currently have is limited compute resources to train the AI, as learning multiple tracks can take a huge amount of training
needs to be punished for cancelling wheelies and not getting a miniturbo
I'm currently looking into the solution of extra rewards for speed of over 88km/h and penalties for under 78km/h. A drift is 84km/h, so this will encourage the AI to get mini turbos (they give speed of over 100), but when the AI keeps cancelling wheelies the speed can drop into the 60s-70s, so will be punished. In addition, wheelies cause the speed to reach up to 97km/h, so it will be rewarded for holding the wheelie.
God I love AI, except I will always prefer natural learning AIs
Glad to hear it! What do you mean natural learning AIs?
It would be awesome to see an IA playing by teams that tries to make the other team get in worse position instead of trying to beat everyone
It would be interesting to see it just hunt down the enemy team, would be entertaining
i like this. what's the song at the beginning of the training montage though? banger for sure
Glad to hear it! That song is called Good Times by Patrick Patrikios
I feel like doing this on Time Trials would be really cool
Check out my other videos! Before doing AI vs AI, all my videos were time trial based! They didn't have mushrooms, but apart from that it was a time trial!
I'm curious, is it possible to save what the AI has learned and have it pick up training later? I'm assuming this is running for 50 hours straight, but I could be wrong.
If it's possible, I think it would be interesting if a final agent from another video could pick up where it left off to become even stronger.
It is running for 50 hours straight generally, but I am able to save training and continue later. I plan on exploring this idea soon! The only reason I haven’t in the past is because I keep changing the rewards drastically, which makes it difficult for the AI to use what it knows.
This is not funky kong, this is flashy kong!
So fundamentals of teaching here make sure you as the teacher prioritise the right motivators with your students and give them the amount of time and encouragement they need to flush out there understanding. Litterally we are using ai to relearn how to teach things... if only we got the chemicals from our family and frienda and communities that we get from watching this ai grow in the exact same way...
Interesting point. Reinforcement Learning does show some interesting things about teaching since agents are solely driven by rewards. This of course does something go horribly wrong, which I have to thing quite hard about to avoid!
Fun, but the other racers may as well not be there, they are invisible to the AI as far as it's concerned and only serves to slow down the start of the race with unknown variables that make it reset itself. The only thing it's (unknowingly) learning is how to manipulate the enemy AI from a fixed starting position until it can get into first place reliably and quickly, at which point the rest of the track is static learning. It helps he's heavier too and can bump most of the racers to reduce rng jitter.
Yeah the AI really doesn't care about the other racers, they are just kind of obstacles. They also serve as a lot of noise in the input too compared to when the AI is in first and everything looks the same. In many videos I used multiple starting positions to try and alleviate this, but its still a tough issue. Would need quite a lot of varied positions to try and fix it
The speed reward is giving it a warm time with upcoming corners. Not sure what would help with that
Hey have you heard of test data? You should try taking the final AI and running it on a new course, or new starting position that it did not train on. I'm very interested in how this goes! Keep improving!
By the way, huge fan of this project. Would love a tutorial video on how to run this code so we can play around with the reward function! Could even make this open source on github 👀👀
I have! I actually did in a previous video of mine! The AI did kind of ok, but really wasn't great indicating there was some level of overfitting going on, but am hoping to improve this over time. I am currently working on making the code accessible, however the current version is very hacky and unconventional, so even if I open sourced it in its current form it probably wouldn't be useable to anyone but me. I am hoping to make a new version which is faster and easier to use, so that is coming soon!
@@aitango that's awesome! I'm glad I subscribed to you :)
Great work on this!! I apologize if you already answered this one, but by chance have you released the code anywhere? I’d love to take a peek and see what’s under the hood
Thanks! I haven't released the code yet, however am looking to do so reasonably soon! I was thinking of doing this as a 1000 subscriber special, so if you want to see it be sure to subscribe! I would likely do a whole video on it, since the way its set up makes it very difficult to get working just from code since it requires some work from the side of dolphin emulator.
so crazy idea, give the ai all the wii buttons and some way of selecting multiple (maybe there’s some cutoff for the value to decide if it’s pressed or not, and shaking or turning the wiimote processed in some other way), and let it loose in multiple games. would take longer to train but maybe it would learn a more general understanding of how to play video games on the wii. maybe 3d understanding would conflict with 2d but i’d guess there’s a lot of shared understanding between games in each category.
i think if you had multiple instances of dolphin running, and if you could programmatically pause and start them and send stuff like inputs and savestate loading to the right one, you could have that switch between multiple games without manual input
Making a general Wii AI would be really interesting, but sadly can't be done by me alone as this would require a LOT of compute. We're talking like 1 billion frames plus, whereas these AIs are usually trained for between 5-25 million. Perhaps a company like Deepmind could take a shot at it though, or I just need to buy a load of computers haha. Using multiple instances of dolphin is something I've looked into. It's possible, just a real pain to get set up and would takes ages to re-setup for every new AI I make, so I've kept away so far.
Mentioned this on the last video, if you're using a CNN have you tried giving it a horizontally symmetric kernel so it will function identically in mirror mode?
I am using a CNN, but haven't tried horizontally symmetric kernels. Would be interesting to see though. Furthermore, not using a horizontally symmetric kernel but training the AI on both the mirror and non-mirror version of a track could be an interesting way of preventing overfitting
ur voice is so calming
Thanks!
So this is how Skynet starts, huh?
I imagine the first scene of the film would be it playing Mario Kart haha
You should add item use and enemy hit. That way it can take shortcuts and it could be rewarded for its place.
I'm working on it!
I would like to know more on how you are reading the memory values from the game, i been wanting to this for a while but i couldn't find a way to read memory from dolphin. also how are you figuring out what memory addresses are for what value like for speed and race completion?
The code I show in the video gives most of how to access dolphin's memory values, just after installing the dolphin memory engine library. Finding the specific memory addresses is really difficult though
I’m very impressed, I don’t know if you could optimize it much more.
Glad to hear it! I tried my best, but after finishing I think I needed something to encouraging wheelies
@@aitango yeah man encouraging wheelies, wait do you have a control set for the ai to do wheelie?
Edit: when will you make the ai learn how to rocket start?
Yeah I can make the AI do wheelies! Rocket starting is a little odd since it’s so different to the rest of the game, it can quite heavily stunt the progress of the ai
For 1:50 how did you get the value 17964696? Is there a wiki page or spreadsheet somewhere with this info?
Much to my dismay, there is not! I had to find these values myself, and it can take some substantial effort. Even using a fairly smart search to look through memory, I had still had troubles finding them
@@aitango Insane. Is there atleast a place that mentions all the possible variables exist somewhere in MKW’s memory?
@tyler clark Yeah I can’t really find anything either. I tried using the same words he uses in the video as well with dorks and can’t find any info about this anywhere. I also couldn’t find comments in the previous video mentioning the lap completion variable; I assume the people who commented that would know where to find this info
I'm wondering, to prevent over-fitting, could you just change the map every x generations, while keeping the same ai. Might take more time to train, but then it would be suited to a variety of tracks.
On an old video (doing time trials rather than vs other AI), I actually trained the AI on multiple tracks. Its hard to say if it prevented overfitting, but it definitely helped the issue. Took ages to train though!
Subbed
I was wondering about something. Does this use a navmesh that gives higher point values based on apexes of turns?
No this does not use a Navmesh!
@@aitango whoahhhh now I'm hyper curious
would it be possible to make the Ai "see" the map on the bottom right, or rather the vision cone of that map, and make a decision based on that information? It might keep it from trying to turn left when the track obviously goes to the right
The bottom right is currently in the image the AI is given, but its probably too low resolution to make much out. Would be interesting to see how it would do just with the map though; as you say it should simply turning significantly, even if it loses some precision by not knowing exactly how close to walls and things it it
Would it be possible to minimize hopping by allowing turns that aren't drifts?
I think that’s something I’m looking to add in the next version as most of the AIs I’ve made have had a problem with alignment due to not being able to just slightly turn
Time trials and let it use mushrooms
Assign negative weight to using mushrooms that is only outweighed by a good shortcut
If I was to add mushrooms, that is a good idea. Otherwise the AI would likely just spam them all at the start unless sufficiently deterred
Are you running this at 1.0x emulation speed or different for training?
This runs at 2x speed during training. I wish I could get it faster, but dolphin sadly wasn’t built for this type of thing
@@aitango Is this because of the framerate cap limit or because your PC struggles to get past 2x speed?
@@RMED24 It's actually running the AI algorithm. Currently my setup runs one frame of the game, then does one training iteration and repeat. Another problem as well though is how quickly dolphin can accept input, as I'm not sure if it can accept greater than 60 unique inputs per second
new thought
find a texture pack that removes textures and just shows the ai what is road (green) offroad (red) boostpanel (yellow) and maybe so other stuff like that
that will probably give it a better understanding of where it is going, idk if this exists, but its a thought
I've had a few comments suggesting this, and I may look into mods/texture packs which do something like this as I think it would be a really interesting comparison to view
now it needs to learn how to use items, how to attack other raceers and defend itselfe from them, plus useing shrooms and stars for shortcuts. this would be hard to programm
I've avoided them so far, but its definitely possible, would just require a huge amount more training. If I had google's servers to myself I definitely would haha
Really nice video and well done with you AIs. I don't mean this a critique to you, but I'm still leaning towards that this type of AI hasn't really learned anything. It's just forced to act towards a predetermined goal that you have set rules for. Take for example, as a simplified thought experiment, that your goal is for an AI to achieve a goal which we represent as the alphabet, ABCD... etc. If we "reward" getting to B, C, D etc. and punish A, D, B..., eventually even by random picks, we will reach the goal given enough time. Is that really learning though?
To me, the "ultimate" AI learning would be to allow the AI to learn a game through a somewhat simple tutorial where it gets to experience different concepts of the game for a limited amount of time, and then the AI goes through the actual game putting all the pieces together to overcome both new and previously seen obstacles. Obviously this has to be done without 1000000 tries "learning" each level. Haven't seen anything getting close to that so far. (And I'm not saying that we should be there yet or that your videos aren't impressive and entertaining regardless.)
Just for curiosity sake, why add an object detection model to help provide more infomation for the AI. One model for object detection that can detect imoortant information about the track such as the location character itself and speed boosts on the track. Then somehow combine that infomation as rewards to the movement AI.
Note that i think it might be better for other games such as new super mario bros wii
That’s a very interesting idea that definitely has potential, especially for new super Mario bros as you mention. I will have to have a look about the RL literature and see if anyone has done something like this before
Is there a way you could discourage the constant hopping?
I’ll have to check the speeds again, but I believe I could use another speed threshold. If the constant hopping causes the speed to go below 84km/h (the speed of a drift), I could give it a negative reward causing it to avoid that behaviour
@@aitango or you could allow it to turn normally to align drifts
Once you have an AI learn the track really well, what happens when you drop that same AI onto a new track entirely? Does it's existing knowledge carry over and help it at all?
Also, what happens if you tie it's current race position into the reward variable? So it gets the highest reward for staying in first place. Could you have it train in multiplayer mode against copies of itself, to accelerate the learning process? Like, one is still in "learning mode" and the opponents are locked at the current "final agent" AI. Do you think that would affect the learning process at all?
I explore this a little in my video "Mario Kart Wii but its an AI". Current race position in the reward would help at the start, but then once it reaches first probably wouldn't do much. Could be an interesting way to accelerate the early stages though. Multiplayer is an interesting idea, as I am looking for ways to speed things up. The only issue is resetting the AI, is currently it dies whenever the speed drops below a certain amount, but it would be unclear when to reset it with multiple. If I had a good solution to that problem though, it would definitely be good! I'm not sure how having one in learning mode and the others in final agent would affect learning, but would definitely be interesting to see
@@aitango Keep in mind that the opponent AI in this case would be advanced enough to more likely pass the learning AI, and keep changing race position. It might still be useful in that case.
As far as when to reset it, it doesn't seem to make any more stupid mistakes at that point, any times it slows down too much after a while appears be be when it's trying to use a shortcut, and even then it gives up after enough penalties. I think that once you get the AI to a point where you can train it against multiples of itself, you don't need to have low speed as the death condition. At that point, you might want to change it to either "Speed = 0" or when the spin-out animation plays, because that would mean it hit something, and you can start teaching it exclusively to avoid obstacles/threats. Doing it this way also means that you can make the opponent AI spam the item button and see if you can teach your still-learning AI to try and evade enemy items.
How long do you reckon a human has to play before he can first courses on hardest difficulty?
That’s an interesting question that I haven’t really though about! It would vary hugely based on the player, but maybe anywhere from 25 hours to 250 hours is my guess! I’m curious to see what others think of that too
Couldn't you use a few different characters on the same track to prevent overfitting, rather than manipulating spawns, since each driver has different stats?
That is an interesting idea, it would force it to be much more general. I've considered using different vehicles as well such as the mach bike, for similar reasons. Perhaps using different spawns/vehicles/characters would create a very general agent
Just thinking, I saw it spam hopping, which could be a sign of 2 things. It could be trying to align a drift, even when it isn't necessary or it could be a sign of overfitting
I definitely think it uses the spam hopping to some extent to align for drifts. What do you mean by a sign of overfitting?
@@aitango it could be that spam hopping randomly happened to a successful AI so others try to replicate it
bro I can't see funky kong and dont remember WOTFI 2022😂😂
Hey if you get a friend who has a PC they'd be willing to use for AI training you could do 2 tracks at once.
edit: Actually, what about training with CTGP's item rain? The chaos could help the AI get better.
Also, has the AI learnt to use items and if not will it?
I wish I had a friend who would let me do that to their PC haha! Training on item rain would be interesting as it is just complete chaos, plus we be fun to see it learn to pick up bullets and stars and things. I haven't done items yet, but it is something I really want to do in the future. It has quite a few technical challenges which make it difficult, but if I can solve those I'll be straight on it!
I find it suspicious that the AI does really well and then, as soon as it reaches a new PB, completely forgets what a road is and immediately attacks the nearest wall. This does suggest that it is using its node formulas to encode a sort of memory of the track, through trial and error, rather than actually responding to its environment in the moment.
For learning a single track, especially at the start the network is likely to overfit, but that isn't really avoidable given the AI's limited experience. The hope however is that as it gets further through the track and explores more different areas, it will struggle to overfit, and be forced to learn a more general policy
@@aitango Wouldn't it have a really hard time switching to a completely different generalist approach, when everything it knows is the "memorization" approach? Intuitively this feels like teaching the AI to build a very specific traffic-light intersection, then urging it to optimize that intersection, and hoping that in this optimization it will suddenly invent the roundabout. I would expect it to stick to its known paradigm.
You should try this AI to play a t1 mogi in competitive Mario kart wii
Maybe after about 10000 hours training :)
Lol
at some pojnt even tho it adds a huge layer of complexity you gota teach it to use items
It's on my list, but I'm working up to that one slowly as there's a chance it'll take like a year to train
is it possible to teach it using some human races as well to learn things its unlikely to fgure out itself?
It's definitely possible, there's a fairly simple method called Imitation Experience Replay which basically just allows me to give the AI access to some human gameplay. This could drastically help the AI explore in the right direction
I bet you can encourage it to actually stay on the road by rewarding the color of the road on both sides of the character
Why not reusing the model from last time? Could maybe help in speeding the prcess and prevent overfitting
Typically I would, and that is something I will do in the future. The only reason I didn't is because while the knowledge is transferable, I keep changing the reward function which these AIs typically struggle to transfer between tasks. When I stop messing around with the reward function, it'll definitely be something I'll do
@@aitango I didn't know that changing the reward was harder for the AI. The more you know
Cool video, but your audio mixing could use some work; the music was much louder than your commentary.
Yeah I noticed that, I think in future videos I'll try and balance it out more
next time make it use items
It's on my to-do list, but will take quite a while to train!
i do wonder how many hours it would take for it to understand items
No idea, probably a lot though given how many different interactions their are to learn, such as getting starred, red shelled, blocking shells with items, the list goes on. I'm just hoping to get the AI to win a race with items in, let alone use them effectively!
@@aitango yeah id imagine its quite a while, its very interesting that its even a possibility though
Do Delfino Square next
Perhaps I can make that happen
Last Mario Kart Video, huh?
add higher rewards then 1 or -1
like -2 for losing a position in the game
and -4 for getting last
I typically try to keep the rewards in quite a small range, because they can add up quite quickly and neural networks are designed to predict values relatively close to 0. For example, a reward of 0.5 might not seem like a lot, but if it gets 0.5 every frame, it will end up predicting a value of 47.6, which is way too high. For one of rewards though such as finishing the race, I do use higher values such as +12 for finishing.
So basically you made an ai with a praise kink and told it to play mario kart?
Why does it keep jumping tho? :P
My guess is partially for alignment, and partially to avoid getting wheelie bumped, but that's just speculation
@@aitango Yeah it's very likely that's what's causing it since turning in a wheelie immediately reduces speed. The main challenge now is creating an AI that learns how to get proper alignments.
Hi, nice improvement. Could you please share this code too??
Thanks! I will need to do a video when I release the code as it requires quite a lot of setup rather than just the source code. Was planning to do it as a 1000 subscriber special or something like that
Impressive, now stop "playing" with the AI. 🤣
Can the AI "see" the time?
The timer is in the region of the screen the AI can see, so yes. No Idea if it ever actually uses it though
@@aitango ok, I hope that it doesn't just learn at which times it should output something ...
Using multiple start locations will prevent this as it will start in different places with different times. Plus looking at the final agent it completes the track in a variety of different times
@@aitango So it can't just learn the 4 (there are 4 starting positions, right?) variations?
I know
What do you know?
@@aitango look below
Your so looks like its in last
this is weak. optimal AI bunny hops so fast its like a hovercraft
Code:
import dolphin_memory_engine as dme
import time
dme.hook()
print(dme.is_hooked())
class Reward():
def __init__(self):
self.last_comp = 0
self.check = 1.05
self.check_inc = 0.05
self.check_timer = time.time()
def reward_function(self):
reward = 0
terminal = False
speed = dme.read.float(17964696)
race_completion = dme.read_float(17964696)
reward = speed / 2200
if speed < 37:
return -1, True
if self.last_comp != 0:
reward += (race_completion - self.last_comp) * 100
self.last_comp = race_completion
if race_completion > 3.99:
return 10, True
if speed > 100:
reward += 0.1
if race_completion > self.check:
reward += 1 / (time.time() - self.check_timer)
self.check += self.check_inc
self.check_timer = time.time()
if terminal:
ran = np.random.random()
if ran < 0.25:
#spawn location 1
elif ran < 0.2:
#spawn location 2
elif ran < 0.75:
#spawn location 3
else:
#spawn location 4
return reward, terminal
i find it interesting how the AI was able to recognise the shortcut. maybe it was building where it would go based off the positions of walls?
I think with this course a way that the AI could've picked up the layout is that grey is good, and red and white stripes means it needs to turn. I wonder what it actually did
True, given that it tried to go through the gap that does sound like a plausible explanation!