Unity ML-Agents 1.0+ - Self Play explained

Поділитися
Вставка
  • Опубліковано 21 сер 2020
  • ML-Agents in Unity implements self-play, a mechanism for an agent to play against past versions of itself. This expands the possibilities of reinforcement learning as the environment and rewards are always scaling to the abilities of the agent. I really believe self play to be one of the key ideas in machine learning / artificial intelligence and I hope this video gave you some insight into the idea.
    Github-Link: github.com/Sebastian-Schuchma...
    I have created a Discord Channel for everybody wanting to learn ML-Agents. It's a place where we can help each other out, ask questions, share ideas, and so on. You can join here: / discord
    Support me on Patreon: www.patreon.com/user?u=25285137
    Keep in touch (Twitter): / sebastianschuc7
  • Наука та технологія

КОМЕНТАРІ • 22

  • @Prokage
    @Prokage 3 роки тому +3

    You are contributing so much to my Master thesis, thank you so much. This is brilliant work. Your teaching ability is awesome!

  • @epjm_
    @epjm_ 3 роки тому

    Great tutorial!!!! thank you so much for all the mlagent videos , they've really been helping me out. Just joined the discord too so cant wait to see all the new things about mlagents im gonna get to learn there.

  • @nuttaphoomboonmee2163
    @nuttaphoomboonmee2163 3 роки тому

    I've been finding your youtube ch for hours...remembering your ch name is as hard as understanding your video, but I DO LOVE IT

  • @BramOuwerkerk
    @BramOuwerkerk 3 роки тому

    Great video! I'll definitely check the project and Discord Server out

  • @ziquaftynny9285
    @ziquaftynny9285 3 роки тому +1

    I was already subscribed :)

  • @alexanderyau6347
    @alexanderyau6347 2 роки тому

    Good explanation

  • @leo8755
    @leo8755 2 роки тому +1

    Awesome video!
    Could you please elaborate on the policies?
    How are they handled and used?
    If there are 2 agents and each has a policy, what is the purpose and how do they use the other policies in the stack?
    Thanks!

  • @robosergTV
    @robosergTV 3 роки тому

    very cool

  • @lunli4435
    @lunli4435 2 роки тому

    Thanks for your wonderful job. How can I find the detailed training code?

  • @philip9611
    @philip9611 3 роки тому +1

    Hi man, appreciate the tutorials! Just wondering though, why was your starting elo 1200? Does this affect training in anyway?

    • @philip9611
      @philip9611 3 роки тому

      Also how much of a jump in Elo is good?

  • @adelAKAdude
    @adelAKAdude Рік тому

    Great video and great project, really enjoyed going through your project, very insightful.
    So I need some help, I am training my agent for almost a month now (as in training stopping and trying something else), I can't seem to get to the max level, the best agent I trained simply stops me from wining, but rarely go for the win, like if he has two in a row, and you have two in a row, he blocks you instead of wining, and I get to this point on the first 700k ish steps, if I trained longer, the agent goes nuts and is no more an AI and more like I do 1 then 2 then 3 player.
    I would appreciate the help if you gave me some advices or anything you done under the hood with your agents.
    And you mentioned in the video that your AI isn't perfect, any idea on that ? can't I reach a perfect player with self-play ?

  • @nikhilsharma2236
    @nikhilsharma2236 3 роки тому

    hello, can i use unreal as well for training my models, or unity is the only option as everyone is using it, plz do tell me

  • @MZXD
    @MZXD 2 роки тому

    interesting so how can this apply to even more multiple ai? So for example, having 6 agents all racing each other. Would you suggest for training just having a 1v1 so 2 cars racing each other or a 3v3 even tho the goal of the game is to win and not for teams to win?

  • @ForsmanTomas
    @ForsmanTomas 3 роки тому

    The reason win or loose rewards doesn't work has been on my mind since I saw Wargames when I was 9. If you want the ai to understand that a draw is preffered there has to be a reward for that. As you point out, being the player who force a draw is only rewarding if you go second. In Wargames th AI learns that it's more rewarding not to play even though there is no reward for that, or AFAIK even an option for that. I figured a time reward (not loosing x time) could make that happen since that is what was the driving factor behind the cold War. It would mean taking time to make a move is rewarding if a loss is expected to be likely.
    In reality human players know that tic tac toe should end in a draw so winning or drawing is the same, the goal is only to not loose.

  • @mikailkotan5246
    @mikailkotan5246 3 роки тому +1

    Hi , I have done that you say but when i try to train i see that error pls help me :)
    WARNING [trainer.py:240] Your environment contains multiple
    teams, but PPOTrainer doesn't support adversarial games. Enable self-play to
    train adversarial games.

  • @yyyrational
    @yyyrational 3 роки тому

    I have a general question here... Do u use C# or python in unity? Sorry I am very new to the ropic..

  • @skinnyboystudios9722
    @skinnyboystudios9722 3 роки тому

    How many gtx3090s you need to train your own AlphaZero?

  • @CGoni
    @CGoni 3 роки тому

    Hello, thank you for this useful video. I am a subscriber who wants to apply ml-agent to the game. I am wondering if there is any way or example that can apply learning in the game.

    • @Norbingel
      @Norbingel 3 роки тому

      By this do you mean in game ai learning and improving? Because I've been wondering about this as well. Is ML something you do out of formal game build then when you do have the build the ai learning is fixed or is it something they can develop while in the game itself?

  • @meiyiluan7177
    @meiyiluan7177 3 роки тому

    100.000 dollars!!🍻