Hello again, Stable-Baselines (SB3) maintainer here ;) Nice video showing the variability of results in RL. For what you are doing, you should take a look at SB3 callbacks, especially CheckpointCallback and EvalCallback to save the best model automatically. And to have a more fair comparison, you can always check the tuned hyperparameters from the RL Zoo.
Hi Antonin, any chance you can share an example of exporting the SB model into C++. I found the example of the PPO in cpp but I am currently mostly working with a DQN structure. Thanks!
Hello Antonin, when I'm running SB3's DQN/A2C algs I see very little GPU utilization, using nvidia-smi I get around 15%. I have seen many others have the same problem but all have gone unanswered. Is this some problem on our end, or is it something inherent in SB3/pytorch or is it the incremental online aspect of rl the problem? I just got access to a tesla v100 so I'm kinda sad that i get literally ZERO second speedup using it. I don't really have any experience in profiling my application but maybe that's something I have to learn. What do you recon?
I like this series a lot. I have tried to learn To use ml-agents with Unity, but getting started is quite hard. Yes, there is official examples in Mlagents repo. But to get something new to work is hard to me. Hopefully there will be new book for this scene. There are some, but those are quite old.
What you said about the jumping and rolling robot dog got me thinking - do you have a way to force minimum exertion policy? This is a guiding principle in most living organisms, only use the minimal effort (energy) that will produce the desired result. It purpose While you likely don't care about the energy conservation in that way, it could filter out the behavior patterns like the one you mentioned.
Hey thankyou so much for such a good explanation. I wanted to ask how you are using GPU? Because whenever I run the same code it gives "Using CPU Device", I have a NVIDIA GeForce GTX 1650 Card
Say you partially train a model for say 50000 steps. Is it possible to once its finished you wish to reload that same trained model and continue training it for an additional say 20000 steps. I have a partially trained DQN but its not performing as well as it should and would like to continue the training but I am not sure if it is possible or will I just have to train an entirely new model
hello,i'm trying to use PPO and A2C for my discrete-box environment. i have set negative rewards in order to teach the agent to avoid impossible operation in my environment. most of the training time is spent to learn to avoid those operations with negative rewards. Is there a method to directly "tell" the agent (inside the agent itself) to avoid those operations, instead of spend training time? thanks in advance
I used the save code similar to this, but it gave me several curves in tensorboard. and the reward curves are not even close, they have big gaps. I am wondering why, and my training cost time, can I train it and save the model after some episodes? Thanks.
I have an issue after leading a trained agent, the model.predict(obs) outputs always the same action, even though the agent was not doing that at all during learning.
How to save the most optimized value of the model. I understood that this follow value "ep_rew_mean tag: rollout/ep_rew_mean" should be highest possible and this "value_loss tag: train/value_loss" should be lowest possible , so how to get or save the best model when this happens, any idea?
Bro Plz help me. I would like to learn reinforcement learning. I'm good at machine learning( supervised and unsupervised) and deep learning. I would like to learn reinforcement learning but don't know where to start it. Plz guide me through my journey bro. Where should I start
Probably should have commented last video, but I keep on getting this error on Mac OS: 2022-05-23 16:22:46.979 python[10741:91952] Warning: Expected min height of view: () to be less than or equal to 30 but got a height of 32.000000. This error will be logged once per view in violation. I tried to resolve using some stuff on stack overflow but the same thing happens. This error, I believe prevents me for running more than 1 episode. It will run one episode in the environment then crash :/
what, those lil guys? ;) I like to check while recording sometimes bc if the GPU 0 hits 100%, it will cause a lot of jitter/lag in the recording. There's a real reason, I promise :D
Hii can I ask you something : import os import gymnasium as gym import gym.envs.registration import pybullet_envs import rl_zoo3.gym_patches from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize from stable_baselines3 import PPO apply_api_compatibility=True # Note: pybullet is not compatible yet with Gymnasium # you might need to use `import rl_zoo3.gym_patches` # and use gym (not Gymnasium) to instantiate the env # Alternatively, you can use the MuJoCo equivalent "HalfCheetah-v4" vec_env = DummyVecEnv([lambda: gym.make("HalfCheetah-v4")]) # Automatically normalize the input features and reward vec_env = VecNormalize(vec_env, norm_obs=True, norm_reward=True, clip_obs=10.) model = PPO("MlpPolicy", vec_env) model.learn(total_timesteps=2000) # Don't forget to save the VecNormalize statistics when saving the agent log_dir = "/tmp/" model.save(log_dir + "ppo_halfcheetah") stats_path = os.path.join(log_dir, "vec_normalize.pkl") env.save(stats_path) # To demonstrate loading del model, vec_env # Load the saved statistics vec_env = DummyVecEnv([lambda: gym.make("HalfCheetah-v4")]) vec_env = VecNormalize.load(stats_path, vec_env) # do not update them at test time vec_env.training = False # reward normalization is not needed at test time vec_env.norm_reward = False # Load the agent model = PPO.load(log_dir + "ppo_halfcheetah", env=vec_env) And I got this : Traceback (most recent call last): File "d:\download baru\import gymnasium as gym 2.py", line 27, in env.save(stats_path) ^^^ NameError: name 'env' is not defined What should I do? Thanks
@@andhikaindra5427 you are using a variable named "vec_env" and not "env" (which is normally used). To solve this error it would be enough to change "env.save(stats_path)" for "vec_env.save(stats_path)"
Hello again, Stable-Baselines (SB3) maintainer here ;)
Nice video showing the variability of results in RL.
For what you are doing, you should take a look at SB3 callbacks, especially CheckpointCallback and EvalCallback to save the best model automatically.
And to have a more fair comparison, you can always check the tuned hyperparameters from the RL Zoo.
Hi Antonin, any chance you can share an example of exporting the SB model into C++. I found the example of the PPO in cpp but I am currently mostly working with a DQN structure. Thanks!
Hello Antonin, when I'm running SB3's DQN/A2C algs I see very little GPU utilization, using nvidia-smi I get around 15%. I have seen many others have the same problem but all have gone unanswered. Is this some problem on our end, or is it something inherent in SB3/pytorch or is it the incremental online aspect of rl the problem? I just got access to a tesla v100 so I'm kinda sad that i get literally ZERO second speedup using it. I don't really have any experience in profiling my application but maybe that's something I have to learn. What do you recon?
Can u do tutorial on the checkpointcallback thing
For some reason, this is not only helpful but actually entertaining. Usually, I barely comment, but this is some good stuff :)
Only P.2 and I already love this series so much! Thanks for all your great content man!
This is cool, looking forward to this series.
Side note:
To create those folders, instead of an if clause, you can simply use makedirs with exist_ok = True
Quick question, how are you training on CUDA device? I have a GPU installed but im training on cpu device?
i would really like to know this!
I like this series a lot. I have tried to learn To use ml-agents with Unity, but getting started is quite hard. Yes, there is official examples in Mlagents repo. But to get something new to work is hard to me. Hopefully there will be new book for this scene. There are some, but those are quite old.
sb3 makes it so easy that a video about saving the model is longer than the inital intro and first steps
great video sentdex!! will you also talk about multi agent reinforcement learning?
What you said about the jumping and rolling robot dog got me thinking - do you have a way to force minimum exertion policy? This is a guiding principle in most living organisms, only use the minimal effort (energy) that will produce the desired result. It purpose While you likely don't care about the energy conservation in that way, it could filter out the behavior patterns like the one you mentioned.
Can't wait for the custom environment video. If I may suggest, please do a trading environment as an example. Thanks.
your wait is now already over ;) ua-cam.com/video/uKnjGn8fF70/v-deo.html
Thanks for the content. Can you show us how to customize actor and critic network architecture?
Hey thankyou so much for such a good explanation. I wanted to ask how you are using GPU? Because whenever I run the same code it gives "Using CPU Device", I have a NVIDIA GeForce GTX 1650 Card
Wow...one video per day ...that's super cool 😎 👌 👍
Stable Baselines 3 is very helpful!
Say you partially train a model for say 50000 steps. Is it possible to once its finished you wish to reload that same trained model and continue training it for an additional say 20000 steps. I have a partially trained DQN but its not performing as well as it should and would like to continue the training but I am not sure if it is possible or will I just have to train an entirely new model
Looking forward for the custom environment.
Well explained...please make some video on machine translation from scratch like loading huge data , training and testing etc...
Thank you so much for this
Awesome move forward ⚡
Hey, thanks for the video. I was wondering if I can ask some questions around loading a model here or would you prefer somewhere else?
hello,i'm trying to use PPO and A2C for my discrete-box environment. i have set negative rewards in order to teach the agent to avoid impossible operation in my environment. most of the training time is spent to learn to avoid those operations with negative rewards. Is there a method to directly "tell" the agent (inside the agent itself) to avoid those operations, instead of spend training time? thanks in advance
I used the save code similar to this, but it gave me several curves in tensorboard. and the reward curves are not even close, they have big gaps. I am wondering why, and my training cost time, can I train it and save the model after some episodes? Thanks.
Can you load and continue training instead of starting from scratch again
I have an issue after leading a trained agent, the model.predict(obs) outputs always the same action, even though the agent was not doing that at all during learning.
Great video!
How to save the most optimized value of the model. I understood that this follow value "ep_rew_mean
tag: rollout/ep_rew_mean" should be highest possible and this "value_loss
tag: train/value_loss" should be lowest possible , so how to get or save the best model when this happens, any idea?
When i do this, i don't get the rollout/ep_len_mean or rollout/ep_rew_mean graphs, but instead a bunch of time graphs...
Thank you!
Hello, Is possible to use DQN model in tensorboard?
Hey, great vidéos as always :) Shouldn't you use predict_step instead of predict for faster execution ?
Possibly, first im hearing about it, but I'm certainly no SB3 expert. Try it and let us know results!
🤚Can I have a question: is there going to be ML Agents RL tutorial as well❓It could be a nice sequel to SB3 series 😀
thanks
Bro
Plz help me.
I would like to learn reinforcement learning. I'm good at machine learning( supervised and unsupervised) and deep learning. I would like to learn reinforcement learning but don't know where to start it. Plz guide me through my journey bro.
Where should I start
Probably should have commented last video, but I keep on getting this error on Mac OS: 2022-05-23 16:22:46.979 python[10741:91952] Warning: Expected min height of view: () to be less than or equal to 30 but got a height of 32.000000. This error will be logged once per view in violation.
I tried to resolve using some stuff on stack overflow but the same thing happens. This error, I believe prevents me for running more than 1 episode. It will run one episode in the environment then crash :/
Hey Sentdex, Actually in env.step() method you have passed the env.action_space.sample() instead of model.predict() !!!!!
Can we a tutorial on custom policy
im having real good results with TRPO under sb3-contrib
managed to hit a reaward-mean of 209 with only 600k steps
Another side note:
When dealing with files and directory, basically anything related to path, it is recommended to use pathlib instead of os.
why?
"nvidia-smi" nice flex :P
what, those lil guys? ;)
I like to check while recording sometimes bc if the GPU 0 hits 100%, it will cause a lot of jitter/lag in the recording. There's a real reason, I promise :D
I wonder, how they play so good atari games if they cant land on moon properly
Hello again
22.12.22 15:00
Hii can I ask you something :
import os
import gymnasium as gym
import gym.envs.registration
import pybullet_envs
import rl_zoo3.gym_patches
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
from stable_baselines3 import PPO
apply_api_compatibility=True
# Note: pybullet is not compatible yet with Gymnasium
# you might need to use `import rl_zoo3.gym_patches`
# and use gym (not Gymnasium) to instantiate the env
# Alternatively, you can use the MuJoCo equivalent "HalfCheetah-v4"
vec_env = DummyVecEnv([lambda: gym.make("HalfCheetah-v4")])
# Automatically normalize the input features and reward
vec_env = VecNormalize(vec_env, norm_obs=True, norm_reward=True,
clip_obs=10.)
model = PPO("MlpPolicy", vec_env)
model.learn(total_timesteps=2000)
# Don't forget to save the VecNormalize statistics when saving the agent
log_dir = "/tmp/"
model.save(log_dir + "ppo_halfcheetah")
stats_path = os.path.join(log_dir, "vec_normalize.pkl")
env.save(stats_path)
# To demonstrate loading
del model, vec_env
# Load the saved statistics
vec_env = DummyVecEnv([lambda: gym.make("HalfCheetah-v4")])
vec_env = VecNormalize.load(stats_path, vec_env)
# do not update them at test time
vec_env.training = False
# reward normalization is not needed at test time
vec_env.norm_reward = False
# Load the agent
model = PPO.load(log_dir + "ppo_halfcheetah", env=vec_env)
And I got this :
Traceback (most recent call last):
File "d:\download baru\import gymnasium as gym 2.py", line 27, in
env.save(stats_path)
^^^
NameError: name 'env' is not defined
What should I do? Thanks
You never initialized env?
@@cashmoney5202 It's from stable baselines3. So what should I do?
@@andhikaindra5427 you are using a variable named "vec_env" and not "env" (which is normally used). To solve this error it would be enough to change "env.save(stats_path)" for "vec_env.save(stats_path)"