AI Learns PvP in Old School RuneScape (Reinforcement Learning)
Вставка
- Опубліковано 22 чер 2024
- This showcases how a neural network was trained to PvP in Old School RuneScape (osrs) using deep reinforcement learning (machine learning).
Check out the code on GitHub to train and test your own models on a simulated version of the game: github.com/Naton1/osrs-pvp-re....
0:00 - Intro
0:52 - Overview
1:18 - Real Game Impact
2:05 - Observations & Actions
2:38 - Rewards
3:50 - Training Simulation
5:08 - Training Session Statistics
6:29 - Network Architecture (High-Level)
6:44 - Extra Technical Details
7:26 - How To Use
10:29 - Outro
10:45 - Gameplay Footage
Thanks for sharing. Interesting project despite the potential for abuse. Imagine if Jagex introduced NPCs with this behaviour!
Do you have the model checkpoints saved between the 50-95%? Would be a great tool for people to practice and level up their pvp skills by practicing against the different model checkpoints.
Yeah - running a training job by default saves all model checkpoints. There's also a built-in tool to generate elo ratings for each model checkpoint so you can compare how good they are relative to each other. Would be interesting to explore ways to practice PvP in this way!
Very cool project. This is something I've always wanted to try. Thanks for sharing!
This is great, I imagine the custom action space you define is important to get this working so well. At the beginning of the project did you start with more complex action spaces (e.g. click inventory slot x, left-click opponent)? I'm curious to know how the tradeoff of complexity vs. performance played out
Thanks! I think simplifying the action space helped a lot. The more you can simplify your problem, the easier it tends to be to learn. It’d be interesting to explore with pixel based actions but I think it’d be incredibly challenging to learn since the action space size would massively increase.
I didn’t do much experimentation with lower level actions like that since I don’t think it’d give value to be able to say left click opponent or click inventory slot over the higher level actions defined here.
Excellent video and really cool project
That's my Naton!
super interesting!
This is so cool!
this is, in my opinion, the best explanation of how reinforcement learning works that doesn't rely on mathematical terminology! also great video and project! really interesting to see (:
Hey I'd love to hear if you had any issues with defining your rewards. Did it take a while for you to come across a set of rewards that "work"? Or did it sort of just work the first try? In addition I'd love a further video explaining the rational behind your definition of the observational state space and action space.
For the rewards, I started with a simple win/loss reward at first, and nothing else. It kind of worked, but it wasn't great - I think it had trouble figuring out specifically which actions were good and which were bad. The next obvious reward idea was a damage reward/penalty which I found significantly sped up learning speed. I also added in those prayer rewards which I think helped a bit but I honestly didn't do a lot of experimentation with that. It also took awhile to come up with good scales for each reward, at first the damage rewards were too low, then I had them too high, and finally found a sweet spot that seemed to work well for this. In short, lots of experimentation + intuition.
I could go on and on about the observation/action space and why I chose those haha. A video or something more in-depth on that could be interesting!
@@Naton1 I would personally love to see that video! Do you also have a discord by any chance that you wouldn't mind me adding? I find RL to be so challenging because there could just be so many things that could go wrong, so always love to connect with people who have done something in this space!
I do - discord is naton2.
This is fascinating! If the model only has a time-horizon of one tick, does it have much predictive power, or is it mostly reactive?
Also, how do action combos work? For example I noticed “food” and “karambwan” are defined as separate actions, so the model must be aware that multiple actions can be performed in the same tick. The model must assign scores to *sets* of actions? And the require_all/require_none configs define mutually exclusive actions?
Yes it effectively only has a time-horizon of one tick - almost all game state is available in the latest tick. The main useful information would be opponent attack styles/overhead prayer usage to help with prediction as you mention. The help give the model the ability to predict, a set of observations are given for the % usage of each attack style/overhead prayer for the player/target throughout the whole fight, and for the last 5 game ticks. This, combined with equipment bonuses, is relatively effective in helping the model predict attacks/overheads. However, it could be interesting to test with longer memory in various ways to improve prediction capability.
"action combos" work by each action type being a separate action head in a multi-discrete action space. So every game tick it can pick between like 10 different actions, and can use multiple of these at once (such as eating and karambwan). The require_all/require_none configs can define mutually exclusive actions as you mention, but are generally used for action parameterization - example being if using a magic attack, select the spell type to use.
nice!
Really nice work! Awesome use of RSPS
Thank you man! Your project with Pokémon was awesome too! I watched your video awhile back and it inspired me to use a novelty reward here too!
Nice work! Is there a way to use this on my own rsps? Would be awesome to have people try to beat it for a tournament with rewards!
That would be really cool! It definitely could be adapted to other rsps’s, but would require a non-trivial amount of work to do so. You’d have to essentially re-create the environment and provide all inputs (definitely doable, but would be an effort).
Extremely cool :)
Interesting
Very impressive! Did you do any attempts at using screen image as observations?
I didn't experiment with this for a few reasons. Primarily, my goal was to train the best model possible - without any restrictions - and feeding it the observations directly would perform better then having to learn them from pixels. Secondly, it would require a significantly more compute to render the game clients for every agent in the training simulation. At times during training there were over 1000 agents playing across multiple simulations, and the compute required for this would have been far more than accessible to me.
I do think it could be interesting to explore. Perhaps couple the screen image as observations with using mouse/keyboard actions too to have it learn a more 'human' experience.
Don’t lie and say ur not giving the script out because 99% you are to friends or high paying people.
You’re the guy beezyR or whatever it was. You’ve been seen abusing this in wildy before so dont lie.
I can promise I've never given the plugin out to anyone! And I'm not sure who that is
So cool! Is there a way to not play against the script but to be the script user in elvarg?
Good question - there is! I've added a command you can type in-game to make the agent 'play' in the logged in account via ::enableagent. There isn't much documentation on this, but I'll link the code. An example would be ::enableagent model=FineTunedNh envParams=accountBuild=PURE,randomizeGear=true. The main caveat here is the command has to select the account build/gear so you can't use your own gear setup (as how it's implemented right now).
github.com/Naton1/osrs-pvp-reinforcement-learning/blob/master/simulation-rsps/ElvargServer/src/main/java/com/github/naton1/rl/command/EnableAgentCommand.java
This is amazing. Thanks!@@Naton1
So I just tried what you said. I followed the steps in the video. Loaded up the pure gear setup, walked up to the bot and typed: ::enableagent model=FineTunedNh envParams=accountBuild=PURE, randomizeGear=true. I got a message that the agent is enabled. But nothing happened. What am I doing wrong?@@Naton1
If all the accounts used were banned, why did you blur the username?
The opponents names are blurred
What type of gpus were you running this on, or were you using cpu only? This is an awesome project, very impressive.
Thanks! This was mostly CPU-bound for rollout collection, but I did use a GPU as well.
CPU: Ryzen 3950x
GPU: RTX 4060 TI 16GB VRAM
Genuinely an amazing project, as someone who only knows some python, what would you reccomend as resources to improving? I'd love to reach a point where I'm able to create things like this.
One thought I had watching this as well from the game perspective was switching overheads to smite on opponents offtick, amps up the war of attrition to a point that humans would be near inable.
Great video :)
Thank you! To be honest, just working on a bunch of personal projects in topics that you find interesting is the best way to learn in my opinion. If you're not interested, you won't want to do it.
It actually does have the ability to do this! The agent can choose to use smite instead of overhead prayers when it's attacking the next tick. Smite unfortunately isn't available in LMS so I disabled it for the model trained throughout this video, but support for this has already been added. Can be nice in places like PvP Arena.
congrats on destroying LMS
he loved the game so much he made sure nobody else could
This is sick. You're gonna get big
Amazing project, just seeing this amazing piece of content now. Curious as to what learning resources you used along your journey and what background you had coming into this project? Seems like you have some professional machine learning experience as well as a great understanding of Runescape's systems as well. I just started machine learning programming at my software engineering job and would love to one day attain a level of knowledge required to make a project like this from scratch one day! Amazing!
Thanks! To be honest, I don't have professional machine learning experience. This is all self-taught (and a few courses from back in college). I had always been interested in machine learning so I spent a ton of time learning how to use it in a project like this. I do work professionally as a software engineer though.
The best way to learn in my opinion (and everyone learns different - this is just how I learn) is to just apply it to a project that you're interested in. You can read a ton of books and take a ton of courses but won't truly learn it until you apply it to a real project.
I do have a lot of experience with RuneScape though - been playing on and off since 2005 - and have written my own "third-party" client before.
Holy shit
So cool!! Can you make a video where you go through the technical details with all the tricks you used to make RL work? Like which policy, do you use optimistic exploration or did you craft some nifty things yourself? Am absolutely thrilled to learn more without having to read the code 😊
Can this work for basic woodcutting skills?
Woodcutting is so simple it's not too helpful to apply this kind of thing there. The best actions can be computed through simple logic and no machine learning is needed.
Hi, I worked on some osrs computer vision personal project, don't have experience in reinforcement learning tho, do you use any sort of computer vision? Or is there code injections? Because I see your AI switching prayer without switching to prayer tab, how does it know the position of potions etc??? I would love to know thx
Yeah it hooks directly into the game client to perform higher level actions. The agent will choose things like use protect from melee, drink a combat potion, and eat food.
@@Naton1thank you for the reply, do you have a discord bud? i have few questions, would appreciate if you can get in touch with me, not going to ask you about your plug in or anything, I just wanna develop something specific but without having access to the client, I am familiar with pytorch computer visions but I wanna learn some reinforcement learning and got some questions.
@@roofyosrs3513 I do, my discord is naton2.
@@Naton1 I added you bro
Such a cool project
I love this! Need more than a like button, we need a love button.
Thanks for sharing and masterfully done - I'm currently studying robotics as part of my postgraduate in AI.
I was wondering if someone had managed to simulate rs for RL!
What's your day job? Do you do any ML? been keeping up with the project since you first posted and blows me away. This is fantastic work.
Appreciate it! I’ve been working professionally as a software engineer since I graduated from school two years ago. No ML or anything as a day job.
What an incredible project.
I'm not sure it was the best idea to let people run their own models without having to write a single line of python. Now LMS has 4-5 of them a game. I'm big fan of open source but I think you have an inherent responsibility when you release stuff like this to not allow others to harm the integrity of the game with your work. And this has, and will continue to.
It doesn't actually matter where you think it came from. This is just inevitable, unfortunately.
@@BigDaddyWes venes & pakis would never program something like this, he did the heavy lifting for them.
@@BigDaddyWesright. This guy was working on this and there definitely at least 10 others doing the same.
Hey can I reach out you you about a way to use this on other RuneScape private servers themselves to kill gamblers and bullshit
This is actually a very good project, well done!
Do you have a disc channel?
Glad this didn’t blow up and you use this little clout to start up your dream UA-camr coding career! Stay off our 20 year old game
Aktually osrs is only 11 years old
You’ve been on rsps’s/osrs for years abusing this. Nh staking taking ppls gp while they dont realise u aint even clicking.rat.
I have never used this outside of LMS/PvP Arena, and it was just created in the last year!
@@Naton1 stop lying
was this a solo project?
Yes it was!