Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation

Поділитися
Вставка
  • Опубліковано 1 лип 2024
  • Instructor: John Schulman (OpenAI)
    Lecture 6 Deep RL Bootcamp Berkeley August 2017
    Nuts and Bolts of Deep RL Experimentation

КОМЕНТАРІ • 15

  • @SinaEbrahimi-ee3fq
    @SinaEbrahimi-ee3fq 25 днів тому

    Awesome talk!
    Still very relevant!

  • @mansurZ01
    @mansurZ01 4 роки тому +51

    1:12 Outline
    1:36 Approaching New Problems
    2:00 When you have a new algorithm
    4:50 When you have a new task
    6:21 POMDP design
    9:31 Run baselines
    10:56 Run algorithms reproduced from paper with more samples than stated
    13:00 Ongoing development and tuning
    13:18 Don't be satisfied if it works
    14:50 Continually benchmark your code
    15:25 Always use multiple random seeds
    17:10 Always be ablating
    18:21 Automate experiments
    19:17 Question on frameworks for tracking experiment results
    19:47 General tuning strategies for RL
    19:58 Standardizing data
    22:17 Generally important hyperparameters
    25:10 General RL Diagnostics
    26:15 Policy Gradient strategies
    26:21 Entropy
    27:02 KL
    28:07 Explained variance
    29:41 Policy initialization
    30:21 Q-learning strategies
    31:27 Miscellaneous advice
    35:00 Questions
    35:21 how long to wait until deciding whether code works or not
    36:18 unit tests
    37:35 what algorithm to choose
    39:28 recommendations on older textbooks
    40:27 comment on evolution strategies and OpenAI blog post on it
    43:49 favorite hyperparameter search framework

  • @TheAIEpiphany
    @TheAIEpiphany 3 роки тому +9

    I love John's presenting style he's super positive and enthusiastic, great tips thank you!

  • @agarwalaksbad
    @agarwalaksbad 6 років тому +10

    This is a super useful lecture. Thanks, John!

  • @FalguniDasShuvo
    @FalguniDasShuvo Рік тому

    Wow! I love how simply John conveys great ideas. Very interesting lecture!

  • @ProfessionalTycoons
    @ProfessionalTycoons 5 років тому

    this was a great talk .

  • @cheeloongsoon9090
    @cheeloongsoon9090 6 років тому +2

    What a number to end the video, 44:44.

  • @BahriddinAbdiev
    @BahriddinAbdiev 6 років тому +2

    We (3 students) exploring DQN and different types of it i.e. Double DQN, Doube Duelling DQN, Prioritized Experience Replay, etc. There is one thing that we all are facing: even it converges, if you run it long enough at some point it diverges again. Is this normal or it should converge and stay there or become even better always? Cheers!

    • @alexanderyau6347
      @alexanderyau6347 5 років тому

      Hi, I think it normal. But I don't know how does it come out. Maybe the model learned too much and become stupid, LOL.

    • @yoloswaggins2161
      @yoloswaggins2161 5 років тому +7

      No this is not supposed to happen. I've seen it happen for a couple of reasons but the most common is people scaling by a standard deviation that gets very close to 0 due to too much similar data.

  • @zhenghaopeng6633
    @zhenghaopeng6633 4 роки тому +1

    Hi there! Can I upload this lecture in Bilibili, a similar-to-youtube, famous video website in China? Many students are there and wish to get access to this insightful talks! Thanks!

  • @georgeivanchyk9376
    @georgeivanchyk9376 4 роки тому

    If you cut all the times he said 'ah', the video would be 2 times shorter