Saving and Loading Models - Stable Baselines 3 Tutorial (P.2)

Поділитися
Вставка
  • Опубліковано 21 жов 2024

КОМЕНТАРІ • 56

  • @AntoninRaffin2
    @AntoninRaffin2 2 роки тому +56

    Hello again, Stable-Baselines (SB3) maintainer here ;)
    Nice video showing the variability of results in RL.
    For what you are doing, you should take a look at SB3 callbacks, especially CheckpointCallback and EvalCallback to save the best model automatically.
    And to have a more fair comparison, you can always check the tuned hyperparameters from the RL Zoo.

    • @ILoveMattBellamy
      @ILoveMattBellamy 2 роки тому

      Hi Antonin, any chance you can share an example of exporting the SB model into C++. I found the example of the PPO in cpp but I am currently mostly working with a DQN structure. Thanks!

    • @robmarks6800
      @robmarks6800 2 роки тому

      Hello Antonin, when I'm running SB3's DQN/A2C algs I see very little GPU utilization, using nvidia-smi I get around 15%. I have seen many others have the same problem but all have gone unanswered. Is this some problem on our end, or is it something inherent in SB3/pytorch or is it the incremental online aspect of rl the problem? I just got access to a tesla v100 so I'm kinda sad that i get literally ZERO second speedup using it. I don't really have any experience in profiling my application but maybe that's something I have to learn. What do you recon?

    • @ApexArtistX
      @ApexArtistX 10 місяців тому

      Can u do tutorial on the checkpointcallback thing

  • @bluebox6307
    @bluebox6307 Рік тому +4

    For some reason, this is not only helpful but actually entertaining. Usually, I barely comment, but this is some good stuff :)

  • @renkeludwig7405
    @renkeludwig7405 2 роки тому +4

    Only P.2 and I already love this series so much! Thanks for all your great content man!

  • @fuba44
    @fuba44 2 роки тому +7

    This is cool, looking forward to this series.

  • @HT79
    @HT79 2 роки тому +25

    Side note:
    To create those folders, instead of an if clause, you can simply use makedirs with exist_ok = True

  • @PerfectNight123
    @PerfectNight123 2 роки тому +6

    Quick question, how are you training on CUDA device? I have a GPU installed but im training on cpu device?

    • @anotherKyle
      @anotherKyle 2 місяці тому

      i would really like to know this!

  • @harrivayrynen
    @harrivayrynen 2 роки тому +3

    I like this series a lot. I have tried to learn To use ml-agents with Unity, but getting started is quite hard. Yes, there is official examples in Mlagents repo. But to get something new to work is hard to me. Hopefully there will be new book for this scene. There are some, but those are quite old.

  • @satyamedh
    @satyamedh 8 місяців тому

    sb3 makes it so easy that a video about saving the model is longer than the inital intro and first steps

  • @giacomocarfi5015
    @giacomocarfi5015 2 роки тому +2

    great video sentdex!! will you also talk about multi agent reinforcement learning?

  • @ytsks
    @ytsks 2 роки тому

    What you said about the jumping and rolling robot dog got me thinking - do you have a way to force minimum exertion policy? This is a guiding principle in most living organisms, only use the minimal effort (energy) that will produce the desired result. It purpose While you likely don't care about the energy conservation in that way, it could filter out the behavior patterns like the one you mentioned.

  • @yawar58
    @yawar58 2 роки тому

    Can't wait for the custom environment video. If I may suggest, please do a trading environment as an example. Thanks.

    • @sentdex
      @sentdex  2 роки тому

      your wait is now already over ;) ua-cam.com/video/uKnjGn8fF70/v-deo.html

  • @anasalrifai2217
    @anasalrifai2217 2 роки тому +1

    Thanks for the content. Can you show us how to customize actor and critic network architecture?

  • @MuazRazaq
    @MuazRazaq 2 роки тому

    Hey thankyou so much for such a good explanation. I wanted to ask how you are using GPU? Because whenever I run the same code it gives "Using CPU Device", I have a NVIDIA GeForce GTX 1650 Card

  • @jyothishmohan5613
    @jyothishmohan5613 2 роки тому +1

    Wow...one video per day ...that's super cool 😎 👌 👍

  • @serta5727
    @serta5727 2 роки тому

    Stable Baselines 3 is very helpful!

  • @carlanjackson1490
    @carlanjackson1490 Рік тому +1

    Say you partially train a model for say 50000 steps. Is it possible to once its finished you wish to reload that same trained model and continue training it for an additional say 20000 steps. I have a partially trained DQN but its not performing as well as it should and would like to continue the training but I am not sure if it is possible or will I just have to train an entirely new model

  • @georgebassemfouad
    @georgebassemfouad 2 роки тому +1

    Looking forward for the custom environment.

  • @sudhansubaladas2322
    @sudhansubaladas2322 2 роки тому

    Well explained...please make some video on machine translation from scratch like loading huge data , training and testing etc...

  • @rohitchan007
    @rohitchan007 2 роки тому

    Thank you so much for this

  • @nnpy
    @nnpy 2 роки тому

    Awesome move forward ⚡

  • @sarvagyagupta1744
    @sarvagyagupta1744 2 роки тому

    Hey, thanks for the video. I was wondering if I can ask some questions around loading a model here or would you prefer somewhere else?

  • @alessandrocoppelli3056
    @alessandrocoppelli3056 4 місяці тому

    hello,i'm trying to use PPO and A2C for my discrete-box environment. i have set negative rewards in order to teach the agent to avoid impossible operation in my environment. most of the training time is spent to learn to avoid those operations with negative rewards. Is there a method to directly "tell" the agent (inside the agent itself) to avoid those operations, instead of spend training time? thanks in advance

  • @walterwang5996
    @walterwang5996 10 місяців тому

    I used the save code similar to this, but it gave me several curves in tensorboard. and the reward curves are not even close, they have big gaps. I am wondering why, and my training cost time, can I train it and save the model after some episodes? Thanks.

  • @ApexArtistX
    @ApexArtistX 11 місяців тому

    Can you load and continue training instead of starting from scratch again

  • @scudice
    @scudice Рік тому

    I have an issue after leading a trained agent, the model.predict(obs) outputs always the same action, even though the agent was not doing that at all during learning.

  • @arjunkrishna5790
    @arjunkrishna5790 2 роки тому

    Great video!

  • @sarc007
    @sarc007 2 роки тому

    How to save the most optimized value of the model. I understood that this follow value "ep_rew_mean
    tag: rollout/ep_rew_mean" should be highest possible and this "value_loss
    tag: train/value_loss" should be lowest possible , so how to get or save the best model when this happens, any idea?

  • @MegaBd23
    @MegaBd23 7 місяців тому

    When i do this, i don't get the rollout/ep_len_mean or rollout/ep_rew_mean graphs, but instead a bunch of time graphs...

  • @enriquesnetwork
    @enriquesnetwork 2 роки тому

    Thank you!

  • @KaikeCastroCarvalho
    @KaikeCastroCarvalho 7 місяців тому

    Hello, Is possible to use DQN model in tensorboard?

  • @konkistadorr
    @konkistadorr 2 роки тому

    Hey, great vidéos as always :) Shouldn't you use predict_step instead of predict for faster execution ?

    • @sentdex
      @sentdex  2 роки тому

      Possibly, first im hearing about it, but I'm certainly no SB3 expert. Try it and let us know results!

  • @marcin.sobocinski
    @marcin.sobocinski Рік тому

    🤚Can I have a question: is there going to be ML Agents RL tutorial as well❓It could be a nice sequel to SB3 series 😀

  • @mehranzand2873
    @mehranzand2873 2 роки тому

    thanks

  • @unknown3.3.34
    @unknown3.3.34 2 роки тому +2

    Bro
    Plz help me.
    I would like to learn reinforcement learning. I'm good at machine learning( supervised and unsupervised) and deep learning. I would like to learn reinforcement learning but don't know where to start it. Plz guide me through my journey bro.
    Where should I start

  • @Nerdimo
    @Nerdimo 2 роки тому

    Probably should have commented last video, but I keep on getting this error on Mac OS: 2022-05-23 16:22:46.979 python[10741:91952] Warning: Expected min height of view: () to be less than or equal to 30 but got a height of 32.000000. This error will be logged once per view in violation.
    I tried to resolve using some stuff on stack overflow but the same thing happens. This error, I believe prevents me for running more than 1 episode. It will run one episode in the environment then crash :/

  • @karthikbharadhwaj9488
    @karthikbharadhwaj9488 2 роки тому

    Hey Sentdex, Actually in env.step() method you have passed the env.action_space.sample() instead of model.predict() !!!!!

  • @vaizerdgrey
    @vaizerdgrey 2 роки тому

    Can we a tutorial on custom policy

  • @raven9057
    @raven9057 2 роки тому

    im having real good results with TRPO under sb3-contrib

    • @raven9057
      @raven9057 2 роки тому

      managed to hit a reaward-mean of 209 with only 600k steps

  • @ruidian8157
    @ruidian8157 2 роки тому

    Another side note:
    When dealing with files and directory, basically anything related to path, it is recommended to use pathlib instead of os.

  • @raven9057
    @raven9057 2 роки тому +1

    "nvidia-smi" nice flex :P

    • @sentdex
      @sentdex  2 роки тому +1

      what, those lil guys? ;)
      I like to check while recording sometimes bc if the GPU 0 hits 100%, it will cause a lot of jitter/lag in the recording. There's a real reason, I promise :D

  • @RafaParkoureiro
    @RafaParkoureiro 10 місяців тому

    I wonder, how they play so good atari games if they cant land on moon properly

  • @Stinosko
    @Stinosko 2 роки тому

    Hello again

  • @niklasdamm6900
    @niklasdamm6900 Рік тому

    22.12.22 15:00

  • @andhikaindra5427
    @andhikaindra5427 Рік тому +1

    Hii can I ask you something :
    import os
    import gymnasium as gym
    import gym.envs.registration
    import pybullet_envs
    import rl_zoo3.gym_patches
    from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
    from stable_baselines3 import PPO
    apply_api_compatibility=True
    # Note: pybullet is not compatible yet with Gymnasium
    # you might need to use `import rl_zoo3.gym_patches`
    # and use gym (not Gymnasium) to instantiate the env
    # Alternatively, you can use the MuJoCo equivalent "HalfCheetah-v4"
    vec_env = DummyVecEnv([lambda: gym.make("HalfCheetah-v4")])
    # Automatically normalize the input features and reward
    vec_env = VecNormalize(vec_env, norm_obs=True, norm_reward=True,
    clip_obs=10.)
    model = PPO("MlpPolicy", vec_env)
    model.learn(total_timesteps=2000)
    # Don't forget to save the VecNormalize statistics when saving the agent
    log_dir = "/tmp/"
    model.save(log_dir + "ppo_halfcheetah")
    stats_path = os.path.join(log_dir, "vec_normalize.pkl")
    env.save(stats_path)
    # To demonstrate loading
    del model, vec_env
    # Load the saved statistics
    vec_env = DummyVecEnv([lambda: gym.make("HalfCheetah-v4")])
    vec_env = VecNormalize.load(stats_path, vec_env)
    # do not update them at test time
    vec_env.training = False
    # reward normalization is not needed at test time
    vec_env.norm_reward = False
    # Load the agent
    model = PPO.load(log_dir + "ppo_halfcheetah", env=vec_env)
    And I got this :
    Traceback (most recent call last):
    File "d:\download baru\import gymnasium as gym 2.py", line 27, in
    env.save(stats_path)
    ^^^
    NameError: name 'env' is not defined
    What should I do? Thanks

    • @cashmoney5202
      @cashmoney5202 Рік тому

      You never initialized env?

    • @andhikaindra5427
      @andhikaindra5427 Рік тому +1

      @@cashmoney5202 It's from stable baselines3. So what should I do?

    • @Berserker_WS
      @Berserker_WS Рік тому

      @@andhikaindra5427 you are using a variable named "vec_env" and not "env" (which is normally used). To solve this error it would be enough to change "env.save(stats_path)" for "vec_env.save(stats_path)"