Reinforcement Learning with Stable Baselines 3 - Introduction (P.1)

Поділитися
Вставка
  • Опубліковано 4 лют 2022
  • Welcome to a tutorial series covering how to do reinforcement learning with the Stable Baselines 3 (SB3) package. The objective of the SB3 library is to be for reinforcement learning like what sklearn is for general machine learning.
    Text-based tutorial and sample code: pythonprogramming.net/introdu...
    Neural Networks from Scratch book: nnfs.io
    Channel membership: / @sentdex
    Discord: / discord
    Reddit: / sentdex
    Support the content: pythonprogramming.net/support...
    Twitter: / sentdex
    Instagram: / sentdex
    Facebook: / pythonprogramming.net
    Twitch: / sentdex

КОМЕНТАРІ • 90

  • @AakashKumar-gt9ip
    @AakashKumar-gt9ip 2 роки тому +115

    By the way, when you were comparing the models you were still using env.step(env.action_space.sample())
    Which is why they were almost the same and didn’t look like they were learning

    • @AakashKumar-gt9ip
      @AakashKumar-gt9ip 2 роки тому +10

      For anyone wondering how to get the predicted action, the text-based tutorial has the correct code but it is:
      action, _states = model.predict(obs)

    • @SimonEliasen123
      @SimonEliasen123 2 роки тому +4

      @@AakashKumar-gt9ip Hahaha, this is hilarious, but also so close to the reality of developing with reinforcement learning ;-)

    • @fus3n
      @fus3n 2 роки тому +8

      dude yea I was like whattt why isn't he using predicted actions

    • @mikehoops
      @mikehoops 2 роки тому

      Yeah, was puzzled as I was watching the video how come he didn’t correct it

    • @jindy94
      @jindy94 2 роки тому

      I was wondering the same thing!! Thank you for clarifying :)

  • @ashu-
    @ashu- 2 роки тому +12

    I have been using stable baselines 2 for last year or so for my work and it's super convenient, the docs are great, great examples for custom env etc. It's a great library.

  • @Mutual_Information
    @Mutual_Information 2 роки тому +16

    This is very useful. I'm working on an RL video series myself (the theory side, so no overlap here) and I was just looking for prebuilt RL algos. Stable baseline's 3 is by far the most complete/well tested suite I've come across. This really makes a big differences - thanks!
    Also, it's nice to see super technical coverage like can yield a 1M+ followers. Awesome.

    • @arminneashrafi2846
      @arminneashrafi2846 10 місяців тому

      Hi, I love your work! Keep up the amazing videos.
      Love from Iran.

  • @amogh3275
    @amogh3275 2 роки тому

    Honestly loving this series, i hope you make a indepth tutorial series on this. Thanks

  • @hendrixkid2362
    @hendrixkid2362 2 роки тому +2

    Your videos always inspire me to continue working on my own projects!!!

  • @enriquesnetwork
    @enriquesnetwork 2 роки тому

    wow, great video! really can't wait for the rest to come out and learn more.
    Thanks for all the info you provide us!

  • @vernonvkayhypetuttz
    @vernonvkayhypetuttz Рік тому

    SentDex youre a legend, brother. The thought of implementing these using deep learning libraries alone, instant grey hair! Thank you

  • @alishbakhan1084
    @alishbakhan1084 2 роки тому

    I had so much fun learning with you.... can't wait to follow you again after completing my web project

  • @thetiesenvy4859
    @thetiesenvy4859 2 роки тому

    Even without watching it . Thanks for your good work and content sentdex

  • @tytobieola2766
    @tytobieola2766 2 роки тому

    Happy New Year SEndex, was learning machine learning during the lockdown & I had no idea in the Field . U teach so well

  • @Shaunmcdonogh-shaunsurfing
    @Shaunmcdonogh-shaunsurfing 2 роки тому

    Awesome. Can’t wait for the next one

  • @Djellowman
    @Djellowman 2 роки тому

    Looking forward to the next one!

  • @ahmarhussain8720
    @ahmarhussain8720 Рік тому

    awesome video, learned a lot, keep up the good work

  • @pfrivik
    @pfrivik 2 роки тому

    LETS GOOOOOO THIS IS EXACTLY WHAT I WANTED THANK YOU SO MUCH

  • @arthurflores4585
    @arthurflores4585 2 роки тому

    Thank you, these video tutorial will be of big help to my thesis. I going to support you.
    I have many doubts I hope this can resolved them.

  • @KennTollens
    @KennTollens 2 роки тому

    Thank you for this tutorial. I am just getting into AI. It is over my head immediately, but your overview of the parts such as observation and agent were helpful for the bigger picture.

  • @OhItsAnthony
    @OhItsAnthony 2 роки тому +12

    If you're following along using a Conda environment and the Lunar Lander environment gives you an error (namely "module 'gym.envs.box2d' has no attribute 'LunarLander'") then I found that you need to also install two other packages; swig and box2d-py:
    conda install -c conda-forge swig box2d-py

    • @djbroake9810
      @djbroake9810 2 роки тому +1

      conda install swig then pip install gym[box2d] worked for me.

  • @VaibhavSingh-lf6ps
    @VaibhavSingh-lf6ps 2 роки тому

    Thanks for introducing the Stable Baseline 3,
    and yeah sometime we forget to use model!

  • @DaZMan772
    @DaZMan772 2 роки тому +2

    This is really interesting and new to me! You mentioned going over creating custom environments in future videos which sounds like exactly what I am eager to know next so I’m really looking forward to that video! Is there anything I should educate myself on in the meantime?

  • @0OTheIDaveO0
    @0OTheIDaveO0 2 роки тому +25

    I think you were still getting random results because you still had the .sample method call in the rendered tests for A2C and PPO. They learned, but you did not use the trained model for testing.

    • @amaressa1924
      @amaressa1924 6 місяців тому

      I was just going to point out the same!!

  • @markd964
    @markd964 2 роки тому

    Great series as always...needs the next step, developing asynchronous (multiprocessing) models, eg: PPO into Asynchronous-PPO (APPO) on custom environments...Thx

  • @ebrahimpichka
    @ebrahimpichka 2 роки тому

    looking forward for the next episodes. BTW, at the end you were still using random actions after training the model.

  • @yashwanth9549
    @yashwanth9549 2 роки тому

    Please add more videos about reinforcement learning

  • @DasJonski
    @DasJonski 2 роки тому

    Am I the only one trying to clean the screen from dust looking like a fool at the term explanations? Anyways, great video Harrison, really enjoy your videos!

  • @AIdreamer_AIdreamer
    @AIdreamer_AIdreamer 3 місяці тому

    Can you please talk about how we use the RL to model and optimize satellite networks and HAP( high altitude platforms)??
    How we control the direction and angle of a projector embedded into HAP or UAV so that it directs its light beams towards an special area of interest on the Earth??

  • @pfrivik
    @pfrivik 2 роки тому

    How often will these videos be released?? Im so excited to start watching and keep watching tzhe series!!

    • @sentdex
      @sentdex  2 роки тому +2

      close to daily if not daily for 4 parts. Havent written a p5 yet so no idea there, but should be everything up to custom envs pretty quick.

  • @criscanto7040
    @criscanto7040 2 роки тому

    Awesome

  • @furkank5614
    @furkank5614 2 роки тому

    It seems garage has finally turned into a studio =)

  • @ddos87
    @ddos87 2 роки тому

    youre such a beauty man

  • @oguzhanoguz8890
    @oguzhanoguz8890 2 роки тому

    Little heads up for the next video if you can explore it : the saving and loading of a sb3 baseline model depends on the " deterministic " flag.. Sometimes when used the eval procedure given in sb3, even if the u saved the model in deterministic manner you get unstable results. Can you explore that too ? Thank g8 video

  • @wwooo62023jk
    @wwooo62023jk 2 роки тому

    So expect your class!

  • @coolkaran1234
    @coolkaran1234 2 роки тому +1

    by when do you think you are going to have the whole series out? it might be very helpful for my research and masters

    • @sentdex
      @sentdex  2 роки тому +5

      next 3 parts will come pretty quick, just need to review them and release pretty much, probably ~ close to daily

    • @martinsosmucnieks8515
      @martinsosmucnieks8515 2 роки тому

      @@sentdex Whaaaat? That is so cool! I wanted to get into stable baselines earlier but had a hard time and didnt know what to try and do. Loving this series!!!🥳 Thank you very much for making them!

    • @coolkaran1234
      @coolkaran1234 2 роки тому

      @@sentdex thats awesome man, you are awesome, my research group focusses on using DeepRL to control Drones and underwater vehicles and we use stable baselines for that, since i am new to the group, I need to catch up, this will be incredibly helpful!! Thanks!!

  • @bluedade2100
    @bluedade2100 Рік тому +1

    What does the variable episodes represent here?

  • @noorwertheim2515
    @noorwertheim2515 2 роки тому

    Could this algorithm also be used for multi-agent multi-objective environments?

  • @connorvaughan356
    @connorvaughan356 2 роки тому

    Very excited for this series. I'm following along and when the lunar lander game displays, it plays incredibly quickly. Probably 4-5 times faster than in the video. Does anyone know how to adjust the speed at which the game plays?

  • @vaibhavkumar642
    @vaibhavkumar642 2 роки тому

    💥💥💥

  • @narayanbandodker5482
    @narayanbandodker5482 2 роки тому

    At first I thought it is a part 3 of a series and missed something. Then I read the description to find out that's the package's name. Big dumb moment

  • @ahmedyamany5065
    @ahmedyamany5065 2 роки тому

    Thanks in advance, my issue with stable Baseline3 the installation, I got many errors last month whether installing the package on Windows or Ubuntu.

  • @davidcristobal7152
    @davidcristobal7152 2 роки тому

    Don't you have to define a neural model? I mean, what if you have an image as an input? Does Stable Baselines automagically asumes the neural network to pass through de values of the observations?

  • @luisbarba9532
    @luisbarba9532 6 місяців тому

    can SB3 be extended to pettingzoo and used for MARL?

  • @Veptis
    @Veptis 2 роки тому

    I have watched a bunch of videos about what reinforcement learning can do. But I gave up on the Steve Brunton series. Perhaps I watch this series instead and understand how learning is done everything I did so far has been just gradient based learning. And I don't know if reinforcement learning applies to language. Maybe in a conversational setting.
    I have a game from my childhood: Mirror's Edge mobile edition. Which you can't no longer buy as EA removed it from the store instead of updating it. As it essentially just has 6 discrete inputs I could see how it can be learned. But the levels are limited, so it might overfit easily. And rewards can't just be time, as that requires success in the first place.

  • @EnglishRain
    @EnglishRain 2 роки тому

    What does one use this for IRL?

  • @michpo1445
    @michpo1445 8 місяців тому

    "Your environment must inherit from the gymnasium.Env class cf." can you address this error?

  • @walterwang5996
    @walterwang5996 Рік тому

    I have a small question: why A2C only uses one "MlpPolicy" in Stable_baselines3? Actually, it has two networks, am I right? Thanks.

  • @ReOp14
    @ReOp14 9 місяців тому

    Im at the start of the tutorial after adding the env.render().. why is it that its not rendering anything when I run the code? I'm running python=3.9 on a windows machine w/ conda

    • @ReOp14
      @ReOp14 9 місяців тому +1

      Alright I found a fix by restarting my pc and downgrading to gym==0.25.0

  • @poomchantarapornrat5685
    @poomchantarapornrat5685 Рік тому

    What operating system do you use to run these on?

  • @andreamaiellaro6581
    @andreamaiellaro6581 Рік тому

    I followed all the instructions but when I try to run the notebook I get error on the step function; It advise me: raise NotImplememtedError.....>.

  • @rgel3762
    @rgel3762 2 роки тому

    Have you considered unity+mlagents? Why not to go that way?

  • @pythonocean7879
    @pythonocean7879 2 роки тому

    ❤️ for ❤️

  • @shreeshaaithal-
    @shreeshaaithal- 2 роки тому +1

    Then can u say how can I make gym to play valorant game 😅 can we do this with gym or can it play call of duty: cold war

  • @user-yw5jc1fi2l
    @user-yw5jc1fi2l 2 роки тому +1

    You still used the random sample for testing.

  • @sanjaydokula5140
    @sanjaydokula5140 2 роки тому

    I see that yours is using cuda device, how do i make mine use cuda device instead of cpu?

  • @bluedade2100
    @bluedade2100 Рік тому

    Guys anyone having problem with installing/running stabe baseline in MacBook? I can't run on either MacBook or linux

  • @sharmakartikeya
    @sharmakartikeya 2 роки тому

    Hi Harrison sir,
    I live in India and conversion from USD to INR is quite expensive. Is there any way to get a discount?

    • @sentdex
      @sentdex  2 роки тому

      Send me an email to harrison@pythonprogramming.net

  • @PerfectNight123
    @PerfectNight123 2 роки тому

    Does anybody know how to train the model using GPU? I tried changing the model parameter to device='cude', but it's still using cpu device when learning.

    • @adomet2123
      @adomet2123 11 місяців тому

      Did you find the way ?

  • @randywelt8210
    @randywelt8210 2 роки тому

    10:40 I dont get the reward calculation. Also what is a step just the next frame?

    • @EctoMorpheus
      @EctoMorpheus 2 роки тому

      A step is indeed one frame. The reward is defined by the environment, and in the case of LunarLander it's some function of the fuel spent and the distance to the landing area. You typically get a reward every frame, and then maybe a large (negative) one once the episode ends.

    • @randywelt8210
      @randywelt8210 2 роки тому

      @@EctoMorpheus so why not using an accelerator+gyro reward. the fuel reward does not make much sense to me. anyway thxs for clarification.

  • @monlewi1976
    @monlewi1976 2 роки тому

    wow

  • @Stinosko
    @Stinosko 2 роки тому

    Hello 👋👋👋

  • @migarsormrapophis2755
    @migarsormrapophis2755 2 роки тому +1

    UA-cam: 2 Comments
    Meanwhile I count five

    • @sentdex
      @sentdex  2 роки тому +1

      math and programming hard.

  • @rverm1000
    @rverm1000 Рік тому

    coding along it doesnt work. at least not in google colab

  • @AIdreamer_AIdreamer
    @AIdreamer_AIdreamer 3 місяці тому

    Can you please talk about how we use the RL to model and optimize satellite networks and HAP( high altitude platforms)??

  • @piyushjaininventor
    @piyushjaininventor 2 роки тому

    You are still taking random actions.

  • @whoisabishag3433
    @whoisabishag3433 2 роки тому

    Timestamps
    [ 00:01:22 ] ... : just pip install

  • @karthikbharadhwaj9488
    @karthikbharadhwaj9488 2 роки тому

    Hey Sentdex, Actually in env.step() method you have passed the env.action_space.sample() instead of model.predict() !!!!! @sentdex

    • @DarkOceanShark
      @DarkOceanShark Рік тому +1

      yeah I am glad at least you noticed that.