Reinforcement Learning with Stable Baselines 3 - Introduction (P.1)
Вставка
- Опубліковано 4 лют 2022
- Welcome to a tutorial series covering how to do reinforcement learning with the Stable Baselines 3 (SB3) package. The objective of the SB3 library is to be for reinforcement learning like what sklearn is for general machine learning.
Text-based tutorial and sample code: pythonprogramming.net/introdu...
Neural Networks from Scratch book: nnfs.io
Channel membership: / @sentdex
Discord: / discord
Reddit: / sentdex
Support the content: pythonprogramming.net/support...
Twitter: / sentdex
Instagram: / sentdex
Facebook: / pythonprogramming.net
Twitch: / sentdex
By the way, when you were comparing the models you were still using env.step(env.action_space.sample())
Which is why they were almost the same and didn’t look like they were learning
For anyone wondering how to get the predicted action, the text-based tutorial has the correct code but it is:
action, _states = model.predict(obs)
@@AakashKumar-gt9ip Hahaha, this is hilarious, but also so close to the reality of developing with reinforcement learning ;-)
dude yea I was like whattt why isn't he using predicted actions
Yeah, was puzzled as I was watching the video how come he didn’t correct it
I was wondering the same thing!! Thank you for clarifying :)
I have been using stable baselines 2 for last year or so for my work and it's super convenient, the docs are great, great examples for custom env etc. It's a great library.
This is very useful. I'm working on an RL video series myself (the theory side, so no overlap here) and I was just looking for prebuilt RL algos. Stable baseline's 3 is by far the most complete/well tested suite I've come across. This really makes a big differences - thanks!
Also, it's nice to see super technical coverage like can yield a 1M+ followers. Awesome.
Hi, I love your work! Keep up the amazing videos.
Love from Iran.
Honestly loving this series, i hope you make a indepth tutorial series on this. Thanks
Your videos always inspire me to continue working on my own projects!!!
wow, great video! really can't wait for the rest to come out and learn more.
Thanks for all the info you provide us!
SentDex youre a legend, brother. The thought of implementing these using deep learning libraries alone, instant grey hair! Thank you
I had so much fun learning with you.... can't wait to follow you again after completing my web project
Even without watching it . Thanks for your good work and content sentdex
Happy New Year SEndex, was learning machine learning during the lockdown & I had no idea in the Field . U teach so well
Awesome. Can’t wait for the next one
Looking forward to the next one!
awesome video, learned a lot, keep up the good work
LETS GOOOOOO THIS IS EXACTLY WHAT I WANTED THANK YOU SO MUCH
Thank you, these video tutorial will be of big help to my thesis. I going to support you.
I have many doubts I hope this can resolved them.
Thank you for this tutorial. I am just getting into AI. It is over my head immediately, but your overview of the parts such as observation and agent were helpful for the bigger picture.
If you're following along using a Conda environment and the Lunar Lander environment gives you an error (namely "module 'gym.envs.box2d' has no attribute 'LunarLander'") then I found that you need to also install two other packages; swig and box2d-py:
conda install -c conda-forge swig box2d-py
conda install swig then pip install gym[box2d] worked for me.
Thanks for introducing the Stable Baseline 3,
and yeah sometime we forget to use model!
This is really interesting and new to me! You mentioned going over creating custom environments in future videos which sounds like exactly what I am eager to know next so I’m really looking forward to that video! Is there anything I should educate myself on in the meantime?
I think you were still getting random results because you still had the .sample method call in the rendered tests for A2C and PPO. They learned, but you did not use the trained model for testing.
I was just going to point out the same!!
Great series as always...needs the next step, developing asynchronous (multiprocessing) models, eg: PPO into Asynchronous-PPO (APPO) on custom environments...Thx
looking forward for the next episodes. BTW, at the end you were still using random actions after training the model.
Please add more videos about reinforcement learning
Am I the only one trying to clean the screen from dust looking like a fool at the term explanations? Anyways, great video Harrison, really enjoy your videos!
Can you please talk about how we use the RL to model and optimize satellite networks and HAP( high altitude platforms)??
How we control the direction and angle of a projector embedded into HAP or UAV so that it directs its light beams towards an special area of interest on the Earth??
How often will these videos be released?? Im so excited to start watching and keep watching tzhe series!!
close to daily if not daily for 4 parts. Havent written a p5 yet so no idea there, but should be everything up to custom envs pretty quick.
Awesome
It seems garage has finally turned into a studio =)
youre such a beauty man
Little heads up for the next video if you can explore it : the saving and loading of a sb3 baseline model depends on the " deterministic " flag.. Sometimes when used the eval procedure given in sb3, even if the u saved the model in deterministic manner you get unstable results. Can you explore that too ? Thank g8 video
So expect your class!
by when do you think you are going to have the whole series out? it might be very helpful for my research and masters
next 3 parts will come pretty quick, just need to review them and release pretty much, probably ~ close to daily
@@sentdex Whaaaat? That is so cool! I wanted to get into stable baselines earlier but had a hard time and didnt know what to try and do. Loving this series!!!🥳 Thank you very much for making them!
@@sentdex thats awesome man, you are awesome, my research group focusses on using DeepRL to control Drones and underwater vehicles and we use stable baselines for that, since i am new to the group, I need to catch up, this will be incredibly helpful!! Thanks!!
What does the variable episodes represent here?
Could this algorithm also be used for multi-agent multi-objective environments?
Very excited for this series. I'm following along and when the lunar lander game displays, it plays incredibly quickly. Probably 4-5 times faster than in the video. Does anyone know how to adjust the speed at which the game plays?
did you find any method to do so?
💥💥💥
At first I thought it is a part 3 of a series and missed something. Then I read the description to find out that's the package's name. Big dumb moment
Thanks in advance, my issue with stable Baseline3 the installation, I got many errors last month whether installing the package on Windows or Ubuntu.
Don't you have to define a neural model? I mean, what if you have an image as an input? Does Stable Baselines automagically asumes the neural network to pass through de values of the observations?
can SB3 be extended to pettingzoo and used for MARL?
I have watched a bunch of videos about what reinforcement learning can do. But I gave up on the Steve Brunton series. Perhaps I watch this series instead and understand how learning is done everything I did so far has been just gradient based learning. And I don't know if reinforcement learning applies to language. Maybe in a conversational setting.
I have a game from my childhood: Mirror's Edge mobile edition. Which you can't no longer buy as EA removed it from the store instead of updating it. As it essentially just has 6 discrete inputs I could see how it can be learned. But the levels are limited, so it might overfit easily. And rewards can't just be time, as that requires success in the first place.
What does one use this for IRL?
"Your environment must inherit from the gymnasium.Env class cf." can you address this error?
I have a small question: why A2C only uses one "MlpPolicy" in Stable_baselines3? Actually, it has two networks, am I right? Thanks.
Im at the start of the tutorial after adding the env.render().. why is it that its not rendering anything when I run the code? I'm running python=3.9 on a windows machine w/ conda
Alright I found a fix by restarting my pc and downgrading to gym==0.25.0
What operating system do you use to run these on?
I followed all the instructions but when I try to run the notebook I get error on the step function; It advise me: raise NotImplememtedError.....>.
Have you considered unity+mlagents? Why not to go that way?
❤️ for ❤️
Then can u say how can I make gym to play valorant game 😅 can we do this with gym or can it play call of duty: cold war
You still used the random sample for testing.
I see that yours is using cuda device, how do i make mine use cuda device instead of cpu?
Guys anyone having problem with installing/running stabe baseline in MacBook? I can't run on either MacBook or linux
Hi Harrison sir,
I live in India and conversion from USD to INR is quite expensive. Is there any way to get a discount?
Send me an email to harrison@pythonprogramming.net
Does anybody know how to train the model using GPU? I tried changing the model parameter to device='cude', but it's still using cpu device when learning.
Did you find the way ?
10:40 I dont get the reward calculation. Also what is a step just the next frame?
A step is indeed one frame. The reward is defined by the environment, and in the case of LunarLander it's some function of the fuel spent and the distance to the landing area. You typically get a reward every frame, and then maybe a large (negative) one once the episode ends.
@@EctoMorpheus so why not using an accelerator+gyro reward. the fuel reward does not make much sense to me. anyway thxs for clarification.
wow
Hello 👋👋👋
UA-cam: 2 Comments
Meanwhile I count five
math and programming hard.
coding along it doesnt work. at least not in google colab
Can you please talk about how we use the RL to model and optimize satellite networks and HAP( high altitude platforms)??
You are still taking random actions.
Timestamps
[ 00:01:22 ] ... : just pip install
Hey Sentdex, Actually in env.step() method you have passed the env.action_space.sample() instead of model.predict() !!!!! @sentdex
yeah I am glad at least you noticed that.