Deep Q-Learning/Deep Q-Network (DQN) Explained | Python Pytorch Deep Reinforcement Learning

Поділитися
Вставка
  • Опубліковано 25 лип 2024
  • This tutorial contains step by step explanation, code walkthru, and demo of how Deep Q-Learning (DQL) works. We'll use DQL to solve the very simple Gymnasium FrozenLake-v1 Reinforcement Learning environment. We'll cover the differences between Q-Learning vs DQL, the Epsilon-Greedy Policy, the Policy Deep Q-Network (DQN), the Target DQN, and Experience Replay. After this video, you will understand DQL.
    Want more videos like this? Support me here: www.buymeacoffee.com/johnnycode
    GitHub Repo: github.com/johnnycode8/gym_so...
    Part 2 - Add Convolution Layers to DQN: • Get Started with Convo...
    Reinforcement Learning Playlist: • Gymnasium (Deep) Reinf...
    Resources mentioned in video:
    How to Solve FrozenLake-v1 with Q-Learning: • How to Use Q-Learning ...
    Need help installing the Gymnasium library? • Install Gymnasium (Ope...
    Solve Neural Network in Python and by hand: • How to Calculate Loss,...
    00:00 Video Content
    01:09 Frozen Lake Environment
    02:16 Why Reinforcement Learning?
    03:12 Epsilon-Greedy Policy
    03:55 Q-Table vs Deep Q-Network
    06:51 Training the Q-Table
    10:10 Training the Deep Q-Network
    14:49 Experience Replay
    16:03 Deep Q-Learning Code Walkthru
    29:49 Run Training Code & Demo
  • Наука та технологія

КОМЕНТАРІ • 85

  • @johnnycode
    @johnnycode  6 місяців тому +8

    Please like and subscribe if this video is helpful for you 😀
    Check out my DQN PyTorch for Beginners series where I explain DQN in much greater details and show the whole process of implementing DQN to train Flappy Bird: ua-cam.com/video/arR7KzlYs4w/v-deo.html

    • @hrishikeshh
      @hrishikeshh 5 місяців тому

      Subscribed. You have really understood the concept of DQN. I was trying to implement DQN for a hangman game (Guessing characters to finally guess the word in less than 6 attempts), and your explanation helped drastically.
      Although I need your opinion regarding the input size and dimensions for hangman game since in this Frozen lake game, the grid is predefined.
      What could we do in case of word guessing game where each word have a different length ? Thanks for the help in advance.

    • @johnnycode
      @johnnycode  5 місяців тому +1

      Hi, not sure if this will work, but try this:
      - Use a vector of size 26 to represent the guessed/unguessed letters: 0 = available to use for guessing, 1 = correct guess, -1 = wrong guess.
      - As you'd mentioned, words are variable length, but maybe set a fixed max length of say 15, so use a vector of size 15 to represent the word: 0 = unrevealed letter, 1 = revealed letter, -1 = unused position.
      Concatenate the 2 vectors into 1 for the input layer. Try training with really short words and a fixed max length of maybe 5, so you can quickly see if it works or not.

    • @hrishikeshh
      @hrishikeshh 5 місяців тому +2

      @@johnnycode Thanks man, the concatenation advice worked. You are really good.

    • @johnnycode
      @johnnycode  5 місяців тому

      @@hrishikeshhGreat job getting your game to train!🎉

  • @user-zw7pd5io3e
    @user-zw7pd5io3e 21 день тому +2

    The best teachers are those who teach a difficult lesson simply. thank you

  • @bagumamartin
    @bagumamartin 2 місяці тому +4

    Johnny, you explained what I've been trying to wrap my head around for 9 months in a few minutes. Keep up the good work.

  • @thefall0190
    @thefall0190 6 місяців тому +3

    Thank you for making this video. Your explanations were clear👍 , and I learned a lot. Also, I find your voice very pleasant to listen to.

  • @user-rh9bn1zc5q
    @user-rh9bn1zc5q 6 місяців тому

    Thanks for this tutorial. It helped me to understand DQN .

  • @drm8164
    @drm8164 7 місяців тому

    You are a great teacher, thank you so much and Merry Christmas 2023

  • @fawadkhan8905
    @fawadkhan8905 6 місяців тому

    Wonderful Explanation!

  • @johngrigoriadis
    @johngrigoriadis 6 місяців тому +3

    These videos on the new gymnasium version of gym are great. ❤ Could you do a video about thr bipedal walker environment?

  • @johndowling1861
    @johndowling1861 3 дні тому

    This was a really well explained example

  • @codelock69
    @codelock69 7 місяців тому

    awesome video!

  • @peterhpchen
    @peterhpchen Місяць тому

    Excellent!

  • @kimiochang
    @kimiochang 2 місяці тому +1

    Thanks again for your good work to help me understand reinforcement learning better.

    • @johnnycode
      @johnnycode  2 місяці тому

      You’re welcome, thanks for all the donations😊😊😊

  • @user-ks2kc9qz3d
    @user-ks2kc9qz3d 2 місяці тому +1

    merci beaucoup pour votre réponse

  • @koka-lf8ui
    @koka-lf8ui Місяць тому

    thank you so much.can u please implement any env with dqn showing the forgetting problem?

  • @dylan-652
    @dylan-652 7 місяців тому

    the goat

  • @nimo9503
    @nimo9503 5 місяців тому

    Thank you for making this video, may I ask a question about the Q network, why you set the input space for the network 16 input at 5:19 rather than 1 that represents the state index only?
    I think 1 input is enough

    • @johnnycode
      @johnnycode  5 місяців тому +1

      Hi, good observation. You can change the input to 1 node of 0-15, rather than 16 nodes of 0 or 1, and the training will work.
      Currently, the trained Q-values will not work if we were to reconfigure the locations of the ice holes. That is because we are not passing in the map configuration to the network and reconfiguring the map during training. I was thinking of encoding the map configuration into the 16 nodes, but I left that out of the video to keep things simple. I hope this answers why I had 16 nodes instead of 1.

    • @nimo9503
      @nimo9503 5 місяців тому

      @@johnnycode thank you for this quick reply which I was't expect
      This makes sense to me

  • @rickyolal
    @rickyolal Місяць тому

    Hey Johnny! Thanks so much for these videos! I have a question, is it possible to apply this algorithm to a continuous action space? For example, select a number in a range between [0, 120] as an action, or should I investigate other algorithms?

    • @johnnycode
      @johnnycode  Місяць тому

      Hi, DQN only works on discrete actions. Try a policy gradient type algorithm. My other video talks about choosing an algorithm: ua-cam.com/video/2AFl-iWGQzc/v-deo.html

  • @envelopepiano2453
    @envelopepiano2453 7 місяців тому

    sry, may i ask how can i find the max_step in the training of every epsiode? i do i know that max action is 200?

    • @johnnycode
      @johnnycode  7 місяців тому +1

      You actually have to add a few lines of code to enable the max step truncation. Like this:
      from gymnasium.wrappers import TimeLimit
      env = gym.make("FrozenLake-v1", map_name="8x8")
      env = TimeLimit(env, max_episode_steps=200)

    • @envelopepiano2453
      @envelopepiano2453 7 місяців тому

      @@johnnycode tyvm sryy may i ask another question if now my agent have a health state =16, every step took one blood and how can I let the agent know this? I mean.. let the agent also consider the health state in training of DQN. Can u give me some thought ? sry , im a newbe, and this may be stupid to ask.. thanks

    • @johnnycode
      @johnnycode  7 місяців тому

      You are talking about changing the reward/penalty scheme, that is something you have to change in the environment. You have to make a copy of the FrozenLake environment and then make your code changes. If you are on Windows, you can find the file here: C:\Users\\.conda\envs\gymenv\Lib\site-packages\gymnasium\envs\toy_text\, find frozen_lake .py
      For example, I made some visual changes in the FrozenLake environment in the following video, however, I did not modify the reward scheme: ua-cam.com/video/1W_LOB-0IEY/v-deo.html

  • @ProjectOfTheWeek
    @ProjectOfTheWeek 7 місяців тому +1

    Thanks for the video! Can you make a example with a BattleShips game? im trying, but the action (ex. position 12) its the same that the new state (12)😢

    • @johnnycode
      @johnnycode  7 місяців тому

      I actually never played Battle Ship (I'm assuming the board game?) before :D
      Do you have a link to the environment?

    • @ProjectOfTheWeek
      @ProjectOfTheWeek 7 місяців тому

      @@johnnycode i dont have the environment finish yet

  • @user-ks2kc9qz3d
    @user-ks2kc9qz3d 2 місяці тому

    merci d'avoir dit moi comment crier l'environement en preier lieu , puisque je voudrais crier un environement tels que votre 4 x 4 mais avec d'autre images .

  • @DEVRAJ-np2og
    @DEVRAJ-np2og 20 днів тому

    how to start learning reinforcement learning? i knew panda numpy matplotlib and basic ml algo?

  • @AfitQADirectorate
    @AfitQADirectorate 4 місяці тому

    Great video, thanks. Please I am from cyber security background, please do you have idea on Network Attack Simulator (NASim) which also uses Deep Q-learning and openai gym? If you don't, please can you guide on where to find tutorials on it? I have checked youtube for weeks but couldn't get any. THANKS

    • @johnnycode
      @johnnycode  4 місяці тому

      Are you referring to networkattacksimulator.readthedocs.io/ ?
      Did you try the tutorial in the documentation? What are you looking to do?

  • @henriquefantato9804
    @henriquefantato9804 6 місяців тому

    great video! Maybe a next env to try is mario?

    • @johnnycode
      @johnnycode  6 місяців тому

      Thanks, I’ll consider it.

  • @fernandomaroli8481
    @fernandomaroli8481 5 місяців тому

    Any chance you can show us how to use Keras and RL on Tetris?

    • @johnnycode
      @johnnycode  5 місяців тому

      Thanks for the interest, but I'm no RL expert and totally not qualified to give private lessons on this subject. I'm just learning myself and making the material easier to understand for others. As for Tetris, I have not tried it but may attempt it in the future.

  • @ProjectOfTheWeek
    @ProjectOfTheWeek 7 місяців тому +1

    I don't quite understand, because if you change the position of the puddles, the trained model will no longer be able to find the reward, right? What is the purpose of Qlearning then?

    • @johnnycode
      @johnnycode  7 місяців тому

      Q-Learning provides a general way to find the "most-likely-to-succeed" set of actions to the goal. You are correct that the trained model only works on a specific map. In order for the model to solve (almost) any map, the agent has to be trained on as many map layouts as possible. The input to the neural network will probably need to include the map layout.

    • @ProjectOfTheWeek
      @ProjectOfTheWeek 7 місяців тому

      @@johnnycode Do you know other AI systems other than QLearning? I wouldn't like to pass on the layout since that would be like 'cheating' (talking about the battle ships board game for example)

    • @johnnycode
      @johnnycode  6 місяців тому

      @TutorialesHTML5 I'm not sure what other learning algorithm would work for Battleship, but Deep Q-Learning should work. Your "input" to the neural network would be your Target Grid, i.e. the shots fired/missed/hit. When there is a hit, the model should guess that the next highest chance of hitting is one of the adjacent squares. This video is for understanding the underlying algorithm, you might want to use a Reinforcement Library like what I show in this video: ua-cam.com/video/OqvXHi_QtT0/v-deo.html

    • @ProjectOfTheWeek
      @ProjectOfTheWeek 6 місяців тому

      ​@@johnnycode yes but every game change the boats position.. will work? If you like make video.. 😂😊

    • @johnnycode
      @johnnycode  6 місяців тому

      I might give it a shot, but no guarantees 😁

  • @codelock69
    @codelock69 7 місяців тому

    Just curious how you learned all of this? Did you just read the documentation or watch other videos?

    • @johnnycode
      @johnnycode  7 місяців тому +2

      These 2 resources were helpful to me:
      huggingface.co/learn/deep-rl-course/unit3/deep-q-algorithm
      pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

  • @clashwithdheeraj1599
    @clashwithdheeraj1599 5 місяців тому +1

    for i in range(1000);
    print("thankyou")

  • @JJGhostHunters
    @JJGhostHunters 2 місяці тому

    Hi Johnny...Do you know where I can find an example of applying a DQN to the Taxi-V3 environment?

    • @johnnycode
      @johnnycode  Місяць тому

      The Taxi env can be solved with regular q-learning. If you take my code from the Frozen Lake video and swap in Taxi, it should work.

    • @JJGhostHunters
      @JJGhostHunters Місяць тому

      @@johnnycode Hi Johnny...I made it work with Q-Learning, howerver I was wanting to replace the Q table with a simple DQN. Seems like this would be possible. I tried searching and even asked ChatGPT to help but cannot quite get it to work.

    • @johnnycode
      @johnnycode  Місяць тому

      How did you encode the input to the policy/target network?

    • @JJGhostHunters
      @JJGhostHunters Місяць тому

      @@johnnycode Can I send you my code? It is a very short script that attempts to use a CNN to solve the Taxi-V3 problem.

    • @johnnycode
      @johnnycode  Місяць тому

      Sorry, I can’t review your code. If you have specific questions, I can try to answer them.

  • @sosukeyuto2199
    @sosukeyuto2199 13 днів тому

    Hi there. Thanks for the video guide. Super interesting and useful to better understand the topic. I'm trying to implement Tabular-Q-Learning and Deep-Q-Learning in the context of atari environment (especially pitfall), but I'm struggling to understand how to handle the number of possible states. I must use ram observation space but in this observation space a state is represented by an array of 128 cells each of type uint8 and with values that range from 0 to 255. So I cannot know a priori what is the exact number of states, since changing just one value of a cell will result in a different new state. Have you got any suggestions or do you know some guide to better understand how to manage this environment?

    • @johnnycode
      @johnnycode  13 днів тому

      Tabular q-learning can not solve Atari games. You have to use deep q-learning (DQN) or another type of advanced algorithm.
      For Pitfall, you should use rgb or grayscale instead of ram. Watch my video that explains some basics on how to use DQN on atari games: ua-cam.com/video/qKePPepISiA/v-deo.html
      However, Pitfall is probably too difficult to start with. You should try training non-atari environments first. For example, my series on training Flappy Bird: ua-cam.com/video/arR7KzlYs4w/v-deo.html

    • @sosukeyuto2199
      @sosukeyuto2199 13 днів тому

      @@johnnycode I wish I could change to another environment but I must do this because it is a project for university, so I need to stick with this one.
      For Tabular I though the same thing, but my professor suggested that I could do it using a map, instead of a matrix, which accepts arrays or tuples as index, and whenever there is a state which was not initialized before, then init it to the default.
      As for dqn I will happily check out your video to see if I can figure this up. Thank you.

    • @sosukeyuto2199
      @sosukeyuto2199 13 днів тому

      And Ram is another condition I should maintain on observation space

    • @johnnycode
      @johnnycode  12 днів тому +1

      I don't mean the challenge your professor, but it is impossible to solve pitfall with tabular q-learning. 128 combinations of 256 is 128 to the power of 256, which no computer memory can hold. You need to use a neutral network-based algorithm like dqn.

    • @sosukeyuto2199
      @sosukeyuto2199 12 днів тому

      @@johnnycode I agree with you. I had the same thought about it and I even passed the last two days trying to train it with Tabular with this approach. I managed to make it work (in terms of structure of table and writing the q values), but of course I had just some episodes in which it maintained a reward of 0 and a lot of other episodes where the reward was negative (and it could never get to collect a single treasure).
      So I don't know, maybe he wants to make me do it anyway and demonstrate that it can't be implemented like that. I will ask for a meeting with him and see what comes out.
      Thank you so much for your time and your considerations. When I finish with the tabular I will start the dql with cnn. If I have some doubts, would you mind if I still ask you some things? I don't mean to disturb you further, but I found this debate and your videos about the topic really useful to better approach the problem.

  • @World-Of-Mr-Motivater
    @World-Of-Mr-Motivater 7 днів тому

    sir i have one doubt.
    sir we are training the policy network first and then copy pasting it as the target network
    then again we are letting our agent to go through the policy network and updating it based on the target network
    but again you mentinoed target network uses the dqn formula
    i am totally confused sir.can you give it in crisp steps?

    • @johnnycode
      @johnnycode  7 днів тому +1

      My video series on implementing DQN to train Flappy Bird has much more detailed explanations of the end-to-end process, check it out: ua-cam.com/video/arR7KzlYs4w/v-deo.html

    • @World-Of-Mr-Motivater
      @World-Of-Mr-Motivater 7 днів тому

      @@johnnycode ok sir thanks a lot

    • @World-Of-Mr-Motivater
      @World-Of-Mr-Motivater 6 днів тому

      @@johnnycode sir can i use the deep q n to generate the pixel positions of stones in a snake game , where the stones acts as obstacles?
      please guide me sir

    • @johnnycode
      @johnnycode  5 днів тому

      @World-Of-Mr-Motivater See if this video answers your question: ua-cam.com/video/AoGRjPt-vms/v-deo.html

  • @user-ks2kc9qz3d
    @user-ks2kc9qz3d 2 місяці тому +1

    please , give me a code source of this algorithm DQN , and thank you for this explanation

    • @johnnycode
      @johnnycode  2 місяці тому +1

      github.com/johnnycode8/gym_solutions

    • @user-ks2kc9qz3d
      @user-ks2kc9qz3d 2 місяці тому

      @@johnnycode merci beaucoup pour votre réponse, je travail sur colab ( python enline ) puisque j'ai des contraintes matérielles, et je comprend pas comment utiliser les codes sources que vous avez m'envoyé pour faire d'autres implémentation sur ce code , merci d'avoir expliquer comment faire cette opération , , et c'est possible par un simple vidéo ....., merci infiniment d'autre fois .

    • @johnnycode
      @johnnycode  2 місяці тому

      Here are some suggestions:
      How to modify this code for MountainCar: ua-cam.com/video/oceguqZxjn4/v-deo.html
      You might be interest in using a library like Stable Baselines3: ua-cam.com/video/OqvXHi_QtT0/v-deo.html

  • @ElisaFerrari-q5i
    @ElisaFerrari-q5i 22 дні тому

    how can we solve the problem using a single DQN instead of 2?

    • @johnnycode
      @johnnycode  22 дні тому

      You can use the same policy network in place of the target network. Training results may be worst than using 2 networks.

    • @ElisaFerrari-q5i
      @ElisaFerrari-q5i 6 днів тому

      And which is the best solution between the single DQN and the Q-learning method?

    • @johnnycode
      @johnnycode  5 днів тому

      @@ElisaFerrari-q5i dqn is the advanced version of q-learning.

  • @user-ks2kc9qz3d
    @user-ks2kc9qz3d 2 місяці тому