AI Learns PvP in Old School RuneScape (Reinforcement Learning)

Поділитися
Вставка
  • Опубліковано 22 чер 2024
  • This showcases how a neural network was trained to PvP in Old School RuneScape (osrs) using deep reinforcement learning (machine learning).
    Check out the code on GitHub to train and test your own models on a simulated version of the game: github.com/Naton1/osrs-pvp-re....
    0:00 - Intro
    0:52 - Overview
    1:18 - Real Game Impact
    2:05 - Observations & Actions
    2:38 - Rewards
    3:50 - Training Simulation
    5:08 - Training Session Statistics
    6:29 - Network Architecture (High-Level)
    6:44 - Extra Technical Details
    7:26 - How To Use
    10:29 - Outro
    10:45 - Gameplay Footage

КОМЕНТАРІ • 70

  • @dexthefish96
    @dexthefish96 7 днів тому +1

    Thanks for sharing. Interesting project despite the potential for abuse. Imagine if Jagex introduced NPCs with this behaviour!

  • @SchlakRS
    @SchlakRS 8 днів тому +1

    Do you have the model checkpoints saved between the 50-95%? Would be a great tool for people to practice and level up their pvp skills by practicing against the different model checkpoints.

    • @Naton1
      @Naton1  6 днів тому

      Yeah - running a training job by default saves all model checkpoints. There's also a built-in tool to generate elo ratings for each model checkpoint so you can compare how good they are relative to each other. Would be interesting to explore ways to practice PvP in this way!

  • @jacksonwaschura3549
    @jacksonwaschura3549 4 місяці тому

    Very cool project. This is something I've always wanted to try. Thanks for sharing!

  • @EdanMeyer
    @EdanMeyer 4 місяці тому +2

    This is great, I imagine the custom action space you define is important to get this working so well. At the beginning of the project did you start with more complex action spaces (e.g. click inventory slot x, left-click opponent)? I'm curious to know how the tradeoff of complexity vs. performance played out

    • @Naton1
      @Naton1  4 місяці тому +1

      Thanks! I think simplifying the action space helped a lot. The more you can simplify your problem, the easier it tends to be to learn. It’d be interesting to explore with pixel based actions but I think it’d be incredibly challenging to learn since the action space size would massively increase.
      I didn’t do much experimentation with lower level actions like that since I don’t think it’d give value to be able to say left click opponent or click inventory slot over the higher level actions defined here.

  • @Poibos
    @Poibos 2 місяці тому

    Excellent video and really cool project

  • @tivia4929
    @tivia4929 4 місяці тому

    That's my Naton!

  • @PurpleGod
    @PurpleGod 2 місяці тому

    super interesting!

  • @howuhh8960
    @howuhh8960 4 місяці тому

    This is so cool!

  • @ArcadeZMC
    @ArcadeZMC 4 місяці тому +5

    this is, in my opinion, the best explanation of how reinforcement learning works that doesn't rely on mathematical terminology! also great video and project! really interesting to see (:

  • @iamonyourside1398
    @iamonyourside1398 3 місяці тому

    Hey I'd love to hear if you had any issues with defining your rewards. Did it take a while for you to come across a set of rewards that "work"? Or did it sort of just work the first try? In addition I'd love a further video explaining the rational behind your definition of the observational state space and action space.

    • @Naton1
      @Naton1  3 місяці тому

      For the rewards, I started with a simple win/loss reward at first, and nothing else. It kind of worked, but it wasn't great - I think it had trouble figuring out specifically which actions were good and which were bad. The next obvious reward idea was a damage reward/penalty which I found significantly sped up learning speed. I also added in those prayer rewards which I think helped a bit but I honestly didn't do a lot of experimentation with that. It also took awhile to come up with good scales for each reward, at first the damage rewards were too low, then I had them too high, and finally found a sweet spot that seemed to work well for this. In short, lots of experimentation + intuition.
      I could go on and on about the observation/action space and why I chose those haha. A video or something more in-depth on that could be interesting!

    • @iamonyourside1398
      @iamonyourside1398 3 місяці тому

      @@Naton1 I would personally love to see that video! Do you also have a discord by any chance that you wouldn't mind me adding? I find RL to be so challenging because there could just be so many things that could go wrong, so always love to connect with people who have done something in this space!

    • @Naton1
      @Naton1  3 місяці тому

      I do - discord is naton2.

  • @mike-ny1zg
    @mike-ny1zg 2 місяці тому

    This is fascinating! If the model only has a time-horizon of one tick, does it have much predictive power, or is it mostly reactive?
    Also, how do action combos work? For example I noticed “food” and “karambwan” are defined as separate actions, so the model must be aware that multiple actions can be performed in the same tick. The model must assign scores to *sets* of actions? And the require_all/require_none configs define mutually exclusive actions?

    • @Naton1
      @Naton1  2 місяці тому

      Yes it effectively only has a time-horizon of one tick - almost all game state is available in the latest tick. The main useful information would be opponent attack styles/overhead prayer usage to help with prediction as you mention. The help give the model the ability to predict, a set of observations are given for the % usage of each attack style/overhead prayer for the player/target throughout the whole fight, and for the last 5 game ticks. This, combined with equipment bonuses, is relatively effective in helping the model predict attacks/overheads. However, it could be interesting to test with longer memory in various ways to improve prediction capability.
      "action combos" work by each action type being a separate action head in a multi-discrete action space. So every game tick it can pick between like 10 different actions, and can use multiple of these at once (such as eating and karambwan). The require_all/require_none configs can define mutually exclusive actions as you mention, but are generally used for action parameterization - example being if using a magic attack, select the spell type to use.

  • @lazaraslong
    @lazaraslong 4 місяці тому

    nice!

  • @peterwhidden
    @peterwhidden 4 місяці тому +6

    Really nice work! Awesome use of RSPS

    • @Naton1
      @Naton1  4 місяці тому +6

      Thank you man! Your project with Pokémon was awesome too! I watched your video awhile back and it inspired me to use a novelty reward here too!

  • @yomusiko
    @yomusiko 3 місяці тому

    Nice work! Is there a way to use this on my own rsps? Would be awesome to have people try to beat it for a tournament with rewards!

    • @Naton1
      @Naton1  3 місяці тому

      That would be really cool! It definitely could be adapted to other rsps’s, but would require a non-trivial amount of work to do so. You’d have to essentially re-create the environment and provide all inputs (definitely doable, but would be an effort).

  • @chairwood
    @chairwood 4 місяці тому

    Extremely cool :)

  • @damendoyeee
    @damendoyeee 2 місяці тому

    Interesting

  • @badnam3189
    @badnam3189 4 місяці тому +1

    Very impressive! Did you do any attempts at using screen image as observations?

    • @Naton1
      @Naton1  4 місяці тому +3

      I didn't experiment with this for a few reasons. Primarily, my goal was to train the best model possible - without any restrictions - and feeding it the observations directly would perform better then having to learn them from pixels. Secondly, it would require a significantly more compute to render the game clients for every agent in the training simulation. At times during training there were over 1000 agents playing across multiple simulations, and the compute required for this would have been far more than accessible to me.
      I do think it could be interesting to explore. Perhaps couple the screen image as observations with using mouse/keyboard actions too to have it learn a more 'human' experience.

  • @Pinkgloves322
    @Pinkgloves322 3 місяці тому +2

    Don’t lie and say ur not giving the script out because 99% you are to friends or high paying people.
    You’re the guy beezyR or whatever it was. You’ve been seen abusing this in wildy before so dont lie.

    • @Naton1
      @Naton1  3 місяці тому

      I can promise I've never given the plugin out to anyone! And I'm not sure who that is

  • @mucahiddemiry5258
    @mucahiddemiry5258 3 місяці тому

    So cool! Is there a way to not play against the script but to be the script user in elvarg?

    • @Naton1
      @Naton1  3 місяці тому +1

      Good question - there is! I've added a command you can type in-game to make the agent 'play' in the logged in account via ::enableagent. There isn't much documentation on this, but I'll link the code. An example would be ::enableagent model=FineTunedNh envParams=accountBuild=PURE,randomizeGear=true. The main caveat here is the command has to select the account build/gear so you can't use your own gear setup (as how it's implemented right now).
      github.com/Naton1/osrs-pvp-reinforcement-learning/blob/master/simulation-rsps/ElvargServer/src/main/java/com/github/naton1/rl/command/EnableAgentCommand.java

    • @mucahiddemiry5258
      @mucahiddemiry5258 3 місяці тому

      This is amazing. Thanks!@@Naton1

    • @mucahiddemiry5258
      @mucahiddemiry5258 3 місяці тому

      So I just tried what you said. I followed the steps in the video. Loaded up the pure gear setup, walked up to the bot and typed: ::enableagent model=FineTunedNh envParams=accountBuild=PURE, randomizeGear=true. I got a message that the agent is enabled. But nothing happened. What am I doing wrong?@@Naton1

  • @OfficialMastercape
    @OfficialMastercape 2 місяці тому +1

    If all the accounts used were banned, why did you blur the username?

    • @Naton1
      @Naton1  2 місяці тому

      The opponents names are blurred

  • @Rainingson
    @Rainingson 3 місяці тому

    What type of gpus were you running this on, or were you using cpu only? This is an awesome project, very impressive.

    • @Naton1
      @Naton1  3 місяці тому +1

      Thanks! This was mostly CPU-bound for rollout collection, but I did use a GPU as well.
      CPU: Ryzen 3950x
      GPU: RTX 4060 TI 16GB VRAM

  • @JrViiYt
    @JrViiYt 3 місяці тому

    Genuinely an amazing project, as someone who only knows some python, what would you reccomend as resources to improving? I'd love to reach a point where I'm able to create things like this.
    One thought I had watching this as well from the game perspective was switching overheads to smite on opponents offtick, amps up the war of attrition to a point that humans would be near inable.
    Great video :)

    • @Naton1
      @Naton1  3 місяці тому +1

      Thank you! To be honest, just working on a bunch of personal projects in topics that you find interesting is the best way to learn in my opinion. If you're not interested, you won't want to do it.
      It actually does have the ability to do this! The agent can choose to use smite instead of overhead prayers when it's attacking the next tick. Smite unfortunately isn't available in LMS so I disabled it for the model trained throughout this video, but support for this has already been added. Can be nice in places like PvP Arena.

  • @currentcommerce4774
    @currentcommerce4774 9 днів тому +1

    congrats on destroying LMS
    he loved the game so much he made sure nobody else could

  • @skrillmurray4317
    @skrillmurray4317 3 місяці тому +1

    This is sick. You're gonna get big

  • @JohnSmith-yr4vi
    @JohnSmith-yr4vi 17 днів тому

    Amazing project, just seeing this amazing piece of content now. Curious as to what learning resources you used along your journey and what background you had coming into this project? Seems like you have some professional machine learning experience as well as a great understanding of Runescape's systems as well. I just started machine learning programming at my software engineering job and would love to one day attain a level of knowledge required to make a project like this from scratch one day! Amazing!

    • @Naton1
      @Naton1  13 днів тому +1

      Thanks! To be honest, I don't have professional machine learning experience. This is all self-taught (and a few courses from back in college). I had always been interested in machine learning so I spent a ton of time learning how to use it in a project like this. I do work professionally as a software engineer though.
      The best way to learn in my opinion (and everyone learns different - this is just how I learn) is to just apply it to a project that you're interested in. You can read a ton of books and take a ton of courses but won't truly learn it until you apply it to a real project.
      I do have a lot of experience with RuneScape though - been playing on and off since 2005 - and have written my own "third-party" client before.

  • @xGod305
    @xGod305 3 місяці тому

    Holy shit

  • @CHRISTICAUTION
    @CHRISTICAUTION Місяць тому

    So cool!! Can you make a video where you go through the technical details with all the tricks you used to make RL work? Like which policy, do you use optimistic exploration or did you craft some nifty things yourself? Am absolutely thrilled to learn more without having to read the code 😊

  • @kingcoconut3697
    @kingcoconut3697 28 днів тому

    Can this work for basic woodcutting skills?

    • @Naton1
      @Naton1  27 днів тому

      Woodcutting is so simple it's not too helpful to apply this kind of thing there. The best actions can be computed through simple logic and no machine learning is needed.

  • @roofyosrs3513
    @roofyosrs3513 3 місяці тому

    Hi, I worked on some osrs computer vision personal project, don't have experience in reinforcement learning tho, do you use any sort of computer vision? Or is there code injections? Because I see your AI switching prayer without switching to prayer tab, how does it know the position of potions etc??? I would love to know thx

    • @Naton1
      @Naton1  3 місяці тому

      Yeah it hooks directly into the game client to perform higher level actions. The agent will choose things like use protect from melee, drink a combat potion, and eat food.

    • @roofyosrs3513
      @roofyosrs3513 3 місяці тому

      @@Naton1thank you for the reply, do you have a discord bud? i have few questions, would appreciate if you can get in touch with me, not going to ask you about your plug in or anything, I just wanna develop something specific but without having access to the client, I am familiar with pytorch computer visions but I wanna learn some reinforcement learning and got some questions.

    • @Naton1
      @Naton1  3 місяці тому

      @@roofyosrs3513 I do, my discord is naton2.

    • @roofyosrs3513
      @roofyosrs3513 2 місяці тому

      @@Naton1 I added you bro

  • @kell7689
    @kell7689 9 днів тому

    Such a cool project

  • @sallyjones5231
    @sallyjones5231 4 місяці тому

    I love this! Need more than a like button, we need a love button.

  • @Eyedwiz
    @Eyedwiz 3 місяці тому

    Thanks for sharing and masterfully done - I'm currently studying robotics as part of my postgraduate in AI.
    I was wondering if someone had managed to simulate rs for RL!

  • @chrisc.6601
    @chrisc.6601 4 місяці тому

    What's your day job? Do you do any ML? been keeping up with the project since you first posted and blows me away. This is fantastic work.

    • @Naton1
      @Naton1  4 місяці тому

      Appreciate it! I’ve been working professionally as a software engineer since I graduated from school two years ago. No ML or anything as a day job.

  • @iggynub
    @iggynub 3 місяці тому +1

    What an incredible project.

  • @sol12498
    @sol12498 2 місяці тому +13

    I'm not sure it was the best idea to let people run their own models without having to write a single line of python. Now LMS has 4-5 of them a game. I'm big fan of open source but I think you have an inherent responsibility when you release stuff like this to not allow others to harm the integrity of the game with your work. And this has, and will continue to.

    • @BigDaddyWes
      @BigDaddyWes Місяць тому +4

      It doesn't actually matter where you think it came from. This is just inevitable, unfortunately.

    • @currentcommerce4774
      @currentcommerce4774 9 днів тому +1

      @@BigDaddyWes venes & pakis would never program something like this, he did the heavy lifting for them.

    • @l0lan00b3
      @l0lan00b3 6 днів тому

      @@BigDaddyWesright. This guy was working on this and there definitely at least 10 others doing the same.

  • @somebodynothing8028
    @somebodynothing8028 Місяць тому

    Hey can I reach out you you about a way to use this on other RuneScape private servers themselves to kill gamblers and bullshit

  • @Snoootz
    @Snoootz 2 місяці тому

    This is actually a very good project, well done!
    Do you have a disc channel?

  • @350dpi
    @350dpi 6 днів тому

    Glad this didn’t blow up and you use this little clout to start up your dream UA-camr coding career! Stay off our 20 year old game

    • @tivia4929
      @tivia4929 5 днів тому +1

      Aktually osrs is only 11 years old

  • @Pinkgloves322
    @Pinkgloves322 3 місяці тому +1

    You’ve been on rsps’s/osrs for years abusing this. Nh staking taking ppls gp while they dont realise u aint even clicking.rat.

    • @Naton1
      @Naton1  3 місяці тому

      I have never used this outside of LMS/PvP Arena, and it was just created in the last year!

    • @Pinkgloves322
      @Pinkgloves322 2 місяці тому +1

      @@Naton1 stop lying

  • @Teo-uw7mh
    @Teo-uw7mh 4 місяці тому +1

    was this a solo project?

    • @Naton1
      @Naton1  4 місяці тому +2

      Yes it was!