How I Made the Best AI Snake

Поділитися
Вставка
  • Опубліковано 12 лис 2024

КОМЕНТАРІ • 36

  • @bikramjeetdasgupta
    @bikramjeetdasgupta 8 місяців тому +8

    Bro I reaallyy thoughht I was watcing a youtuber with atleast 300K subs... This is gonna surely blow up if the youtube algorithm supports you.. keep it up👌

    • @IDontModWTFz
      @IDontModWTFz 8 місяців тому +2

      I didn't even notice the subs... That's mental!

  • @ramsprojects8375
    @ramsprojects8375 5 місяців тому +3

    NIce Contents!. Keep it up. Please enable subtitles for users like me preferring subtitled videos.

    • @TwoPointCode
      @TwoPointCode  5 місяців тому +1

      Thank you for the compliment and the tip! I didn’t realize the subtitles were disabled. I just enabled the auto generated ones and will go through them soon to make sure they’re accurate.

  • @TwoPointCode
    @TwoPointCode  5 місяців тому +3

    Something I want to address due to some comments:
    While working on this project I watched both Code Bullet's and AlphaPhoenix's videos. Code Bullet first tried to create an AI using Q learning but was running into many issues with the snake not eating the apple, so he decided to turn this problem into a searching/maze-finding problem. He then began only using searching algorithms and had the snake follow these paths. Because he turned to pathfinding, I don't consider the snake he made to use AI and instead considered it to fall under the category of automated snake games. For example, he brings up an interesting use case of maze solving in his video, GPS. Even though a GPS gives you directions, most people wouldn't consider it an AI. I see it the same way with the snake he made. The searching algorithm gives directions, and the snake automatically follows this path, hence why I consider it automated.
    In AlphaPhoenix's video, he also approaches this problem as a pathfinding problem. At 2:50 in his video, he states that "[His] goal is to make an algorithm that fills the board every single game in as few moves as possible." The way he solves this problem is in a brute force way where when a problem pops up he creates more conditions and rules for that specific case. The final solution he came up with is very impressive, but again I don't consider the snake he made to be an AI as it did not learn and correct from past experiences to make a certain decision, it instead was made to make certain moves based on different rules or conditions, because of this I believe it should fall under the category of automated snake games.
    Both of the videos mentioned above are very interesting and were very helpful in giving ideas and insight into what would and wouldn't work, but the final solutions to this problem were very different.

  • @BeanDev1
    @BeanDev1 8 місяців тому +4

    Great work

  • @Rasil1
    @Rasil1 8 місяців тому +1

    nice one man this vid is in the algorithm thats why i was recommended this

  • @sergioaugustoangelini9326
    @sergioaugustoangelini9326 8 місяців тому +1

    That's a wonderful solution! Great job!

  • @sotofpv
    @sotofpv 8 місяців тому

    Amazing, so happy to have found you, looking forward to your future videos given how well done your first video is :)

  • @rodrigorila1605
    @rodrigorila1605 8 місяців тому +3

    really cool AI

  • @krysdoran
    @krysdoran 8 місяців тому

    You really explained this in a simple, comprehensible way!!

  • @ArminPaslar
    @ArminPaslar 8 місяців тому +1

    You deserve so much more

  • @maxencehm1764
    @maxencehm1764 2 місяці тому +2

    hi the video was great ,hope youwill make other, I have different question :
    - how the snakes are evolving because the only way I know is by doing a genetic algorithm but here there is only one snake,
    - how did the snake choose to take a direction
    - what did you use to make the gaphics

    • @TwoPointCode
      @TwoPointCode  2 місяці тому

      Thank you!
      The reinforcement learning algorithm I used is PPO. PPO has two networks. One network is called the value network and it predicts the sum of future rewards that will be earned from an observation. The other is called the policy network and it outputs probabilities for each move given an observation. So, in this case, probabilities for the moves up, down, left, and right are outputted from the policy network. While training the AI the current observation, selected move, reward, and updated observation after the move are saved. The value network is then used to predict the future rewards of both saved observations. If the predicted new observation’s future reward plus the reward gained for the action is less than the predicted future reward of the original observation, then the action must not have been the best, so the probability of selecting that action given that observation is decreased. If it’s larger, then it was a good move and the probability for that move increases. If it’s the same, then it stays the same. This is normally done after a set number of steps and in batches of steps to avoid large, unusual updates. This is all repeated until you stop the training.
      After training, the policy network is given an observation and the action with the highest probability is used.
      For the graphics, while showing the training progress I used OpenCV in python, but, towards the end of the video when I was showing the final model, I actually created an in browser environment using JavaScript, CSS, and HTML.
      I tried to keep this comment a reasonable length while keeping the important information in it so I didn’t want to go too far into the details. If you have any questions, feel free to ask!

  • @Me-0063
    @Me-0063 3 місяці тому +1

    I thought for a second about reward systems and came up with this:
    1) Reward for eating the apple
    2) Reward based on distance to the apple
    3) Small punishment for each move
    4) Bigger punishment for each move past the minimum number of moves it would take the snake to eat the apple. Could possible grow the punishment exponentially
    5) Punishment for death. Should always be the biggest number
    This prompts the snake to get closer and eat apples as fast as it can, acting a bit like a pathfinding algorithm thanks to point 4, whilst not compromising how long the snake lasts. I understand that this might not work if the AI is too shortsighted, preferring small rewards now instead of big rewards later, but I think there was a way to counteract that.

  • @desredes519
    @desredes519 8 місяців тому +3

    Great video

  • @gchd1232
    @gchd1232 8 місяців тому +2

    Love the video! Could you teach an AI how to play something like billiard next time?
    Also: How long did it take to program this AI without the training part?

    • @TwoPointCode
      @TwoPointCode  8 місяців тому +2

      Thank you! It would be interesting to see two AIs learn to play billiards against each other….. I’ll have to look into it. If I were to strictly talk about the time I spent programming this, it would be multiple days worth of straight coding, but the final code is actually quite small. The reason it took so long is because of the lack of information around this and the amount of trail and error I had to go through to get this AI to the level it’s at. If I include the time I’ve spent training different models and testing different things then I’ll tell you that I started this project mid November, so 4 months straight. Again, the reason for that is because of how long it takes to train new models.

  • @Peledcoc
    @Peledcoc 8 місяців тому +1

    Great video!

  • @palmero7606
    @palmero7606 8 місяців тому +1

    Amazing Video.💪🏼

  • @sotofpv
    @sotofpv 8 місяців тому +2

    Oooh just thought of a genuine question. When you change the grid size, are you adding more inputs to the network? Needing to retrain it?

    • @sotofpv
      @sotofpv 8 місяців тому

      I think you answered my question at around minute 25:20 hehe

    • @TwoPointCode
      @TwoPointCode  8 місяців тому

      Ya, good question! That final snake I showed with the differing grid size was actually a single model that was trained on a 20x20 input. The observation space was basically the same, but during training a random board size under 20x20 was picked each game and 1’s were used to fill the board to make the board appear the correct random size.

  • @stevencowmeat
    @stevencowmeat 8 місяців тому

    Awe man went to find another vid by you this is the only one. RIP hope you continue making vids. Solid music choice as well. Also is your logo generated by Dalle? Idc if it is just looks like its style.

  • @lucasgaperez
    @lucasgaperez 8 місяців тому +4

    3 subs whatthe fuck

  • @petr-heinz
    @petr-heinz 8 місяців тому +1

    How can the AI know when its gonna get punished if it doesnt get previous frames as input? Does it have a dedicated counter input for the limit?

    • @TwoPointCode
      @TwoPointCode  8 місяців тому +1

      Good question! Some things I had to brush over/simplify for the sake of the video and that was one of the things I decided to leave out. In the observation space the AI is also given a value of how many remaining moves it has and once that value hits 0 the game will end and it will be punished for running out of moves. This is done so the AI can learn why it is being punished and learn to avoid that value reaching 0.

  • @mafiawerbung
    @mafiawerbung 8 місяців тому +2

    64x64 when?

    • @TwoPointCode
      @TwoPointCode  8 місяців тому +1

      Maybe….. but it would be quite some time from now…..

  • @gabboman92
    @gabboman92 8 місяців тому

    hey do you have a fedi presence? mastodon n stuff

  • @montageofchips
    @montageofchips 4 місяці тому +1

    A little too slow, but overall a good video

  • @Me-0063
    @Me-0063 3 місяці тому +1

    I thought for a second about reward systems and came up with this:
    1) Reward for eating the apple
    2) Reward based on distance to the apple
    3) Small punishment for each move
    4) Bigger punishment for each move past the minimum number of moves it would take the snake to eat the apple. Could possible grow the punishment exponentially
    5) Punishment for death. Should always be the biggest number
    This prompts the snake to get closer and eat apples as fast as it can, acting a bit like a pathfinding algorithm thanks to point 4, whilst not compromising how long the snake lasts. I understand that this might not work if the AI is too shortsighted, preferring small rewards now instead of big rewards later, but I think there was a way to counteract that.