Reinforcement Learning Jump Start | Complete Deep Learning Course

Поділитися
Вставка
  • Опубліковано 21 лис 2024

КОМЕНТАРІ • 37

  • @MachineLearningwithPhil
    @MachineLearningwithPhil  5 років тому +7

    This content is sponsored by my Udemy courses. Level up your skills by learning to turn papers into code. See the links in the description.
    Time stamps for all the modules:
    Intro 00:00:00
    Intro to Deep Q Learning 00:01:30
    How to Code Deep Q Learning in Tensorflow 00:08:56
    Deep Q Learning with Pytorch Part 1: The Q Network 00:52:03
    Deep Q Learning with Pytorch part 2: Coding the Agent 01:06:21
    Deep Q Learning with Pytorch part 3: Coding the main loop 01:28:54
    Intro to Policy Gradients 01:46:39
    How to Beat Lunar Lander with Policy Gradients 01:55:01
    How to Beat Space Invaders with Policy Gradients 02:21:32
    How to Create Your Own Reinforcement Learning Environment Part 1 02:34:41
    How to Create Your Own Reinforcement Learning Environment Part 2 02:55:39
    Fundamentals of Reinforcement Learning 03:08:20
    Markov Decision Processes 03:17:09
    The Explore Exploit Dilemma 03:23:02
    Reinforcement Learning in the Open AI Gym: SARSA 03:29:19
    Reinforcement Learning in the Open AI Gym: Double Q Learning 03:39:56
    Conclusion 03:54:07

  • @theencryptedpartition4633
    @theencryptedpartition4633 22 дні тому +11

    Anyone woke up to this😅?

    • @atomicdad3920
      @atomicdad3920 18 днів тому

      Yeah that was a wild jump from what I fell asleep to

    • @mandoreforger6999
      @mandoreforger6999 16 днів тому

      Yes😂😂😂😂

    • @wayne_._
      @wayne_._ 12 днів тому

      i put on someone making leather shoes, didnt expect programing.

    • @roganblack4113
      @roganblack4113 9 днів тому

      Me too

    • @St3rnbergStudi0z
      @St3rnbergStudi0z 5 днів тому

      Went to sleep to pokemon, woke up to this 💀

  • @alvarorodriguez1592
    @alvarorodriguez1592 5 років тому +4

    Hi newcomer! Don't be afraid by the 4 hour long video!! Its really just several lessons concatenated, the first one containing a whole program in 52min!
    You also have Phil's GitHub in the video description if you prefer to study the code and go to the video only when having a hard time figuring sth out.
    Thank you Phil for such substantial content!

  • @anantasin
    @anantasin 5 років тому +1

    This is the best lecture on RL ever! Thank you do much!

  • @thelaconicguy
    @thelaconicguy 4 роки тому +2

    Hey Phil, I want to thank you for sharing such good content for free. I have one question for you. Are you planning to do a series on imitation learning techniques for continuous action and state space? An overview of how to achieve this task will also be great.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  4 роки тому +1

      I hadn't planned on it but I can add it to the list

    • @thelaconicguy
      @thelaconicguy 4 роки тому +1

      @@MachineLearningwithPhil Thanks! that would be very great of you

  • @seth8141
    @seth8141 5 років тому +1

    Wow awesome Phil. I'll take a look someday XD

  • @fadop3156
    @fadop3156 5 років тому +1

    Thank you so much!

  • @RedShipsofSpainAgain
    @RedShipsofSpainAgain 5 років тому +1

    Hey Phil, do you have a video on how to set up your virtual environment for these tutorials? Conda/pip/gym/PyTorch/Tensorflow, etc packages, linter and IntelliSense on Visual Studio Code? Thanks

  • @tamirtsogbayar3912
    @tamirtsogbayar3912 7 місяців тому

    Hello Phil
    I appreciate with your create video.
    I'm planning to develop AI game bot for Dota 2 game based on the method Deepmind used in Starcraft bot. but i still have no idea how to start what is the components. Could you help me for that please

  • @RedShipsofSpainAgain
    @RedShipsofSpainAgain 5 років тому +1

    This is a great vid Phil, thank you! BTW, at 2:44 I know it's just an example, but are those ballpark salaries accurate? Amazon for $350,000?!?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  5 років тому

      hah! Nope, I just pulled them out of thin air. Glassdoor indicates starting compensation of around $170,000 with stocks included.

  • @anirban123321
    @anirban123321 5 років тому +1

    Thanks for the tutorials. They really helped. I saw this tutorial on youtube and went on to get your intro to RL course at o'reilly. I am really enjoying the course , esp how you create simple quizzes on the study material, which make it much easier to understand the subject.
    I have a question with regards the "maze running robot" topic .
    The actionSpace is = {'U': (-1,0), 'D': (1,0), 'L': (0,-1), 'R': (0,1)}
    and the maze is of the size (6,6) (x,y) co-ordinate. The "state" variable has (x,y) co ordinates in it. However when we def the function isAllowedMove(self, state, action): there we take y , x =state (essentially reversing x and y). I am not able to understand why would I need to invert the maze[x,y] to maze[y,x] ?
    Rgds,
    Anirban

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  5 років тому +1

      Great question. Sorry for the delayed reply, I was out of town and this comment escaped me.
      It's because the x coordinate represents the columns and the y coordinate represents the row. In a right handed coordinate system, x is on the horizontal axis, and y is on the vertical axis. Hence, x is the column and y is the row. Since the indexing of numpy is row, column, we have to switch the two indices. I really should have used i and j instead of x and y to avoid confusion.

  • @Gottii92
    @Gottii92 5 років тому +1

    i rewatched this video and didnt really understand why we need 2 NNs at 5:55 and what "elminating bias and the estimates of the actions" means 🤔

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  5 років тому +2

      Good questions! We need 2 neural networks because if we use 1 we are effectively chasing a moving target. We use the same network to learn the value of states as well as to choose the actions. The bias comes in because we are taking a max over actions, which implicitly biases the estimates.

  • @anubhav2198
    @anubhav2198 5 років тому

    Thank you sooo much 👍

  • @Gottii92
    @Gottii92 5 років тому

    hello, did you look into my problem?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  5 років тому +1

      I haven't forgotten you :) I'm working on it now. I finished up another project with a DQN and made some improvements that may benefit you. I'll do a new video on that this weekend, and will initiate a pull request on your repo if I get it working.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  5 років тому +1

      OK, I've gotten it to run on my local machine with some improvements. I've forked the repo and sent in a pull request with some suggestions on how to push the project forward. Let me know what you think!

    • @Gottii92
      @Gottii92 5 років тому +1

      ​@@MachineLearningwithPhil hello man, it's really exciting that you looked into my project, at the moment as i talk i'm executing/training it myself, your answer was very long, so i probably have to read it several times, while trying different things, is there some way to direct chat with you? for example on discord that would make things easier if you are not busy
      i'll try having a smaller action space, i've already tried it with 32x32, cause the environment itself is somewhat generic
      also i might try totally rebuilding the environment and try other ways of approaching my bigger problem
      if you heared about tf.agents they offer a policy gradient agent, i've tried that on my environment, but also didnt get a very good result :D
      i don't fully understand policy gradient yet also to be honest
      sorry for my messy structure, i am not very experienced with programming in a team, or git/github
      i propose to you to make a discord server, there you could probably reach more people combined with twitter, your website and youtube, also it takes like 2 min to setup
      it's somewhat unpleasant to write in the youtube comment section :P
      if you add me on discord under the tag "Gotti#0140" i could probably communicate better with you
      thank you for your look and pull into my project

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  5 років тому +1

      I can set up a discord server, no problem. I'll get to that later this weekend. We can collab on the project and maybe something cool will come of it. Thanks!

    • @Gottii92
      @Gottii92 5 років тому

      @@MachineLearningwithPhil nice 😅

  • @portiseremacunix
    @portiseremacunix 4 роки тому

    Great course though the tf is now tf2...

  • @liangyumin9405
    @liangyumin9405 5 років тому +1

    Could you tell me your develop environment, I use win10 & python3.7(anaconda) but I can not install all gym environments....[cry]

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  5 років тому

      I'm running Ubuntu 18.04 and Python 3.6.7. Which environments are giving you issues?

    • @liangyumin9405
      @liangyumin9405 5 років тому

      @@MachineLearningwithPhil gym does not support python 3.7 very well....

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  5 років тому +1

      @@liangyumin9405 You can do conda create -n NewEnvironment python=3.6
      Then activate the environment and try installing gym and seeing if it works.

    • @liangyumin9405
      @liangyumin9405 5 років тому +2

      @@MachineLearningwithPhil use py3.6 virtual env. may be a good idea~, thk u