Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Deep RL Bootcamp Lecture 7 SVG, DDPG, and Stochastic Computation Graphs (John Schulman)

Deep RL Bootcamp Lecture 4A: Policy Gradients

это самое вкусное блюдо

Кто Последний Уснёт - Получит 250.000 Рублей! (Хазяева, Сатир, Кокошка, Дилблин) Часть 1

버블티로 체감되는 요즘 물가

Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation

AI Prism

Переглядів 36 285

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 1 лип 2024
Instructor: John Schulman (OpenAI)
Lecture 6 Deep RL Bootcamp Berkeley August 2017
Nuts and Bolts of Deep RL Experimentation

КОМЕНТАРІ • 15

@SinaEbrahimi-ee3fq 25 днів тому
Awesome talk!
Still very relevant!
@mansurZ01 4 роки тому ⁺⁵¹
1:12 Outline
1:36 Approaching New Problems
2:00 When you have a new algorithm
4:50 When you have a new task
6:21 POMDP design
9:31 Run baselines
10:56 Run algorithms reproduced from paper with more samples than stated
13:00 Ongoing development and tuning
13:18 Don't be satisfied if it works
14:50 Continually benchmark your code
15:25 Always use multiple random seeds
17:10 Always be ablating
18:21 Automate experiments
19:17 Question on frameworks for tracking experiment results
19:47 General tuning strategies for RL
19:58 Standardizing data
22:17 Generally important hyperparameters
25:10 General RL Diagnostics
26:15 Policy Gradient strategies
26:21 Entropy
27:02 KL
28:07 Explained variance
29:41 Policy initialization
30:21 Q-learning strategies
31:27 Miscellaneous advice
35:00 Questions
35:21 how long to wait until deciding whether code works or not
36:18 unit tests
37:35 what algorithm to choose
39:28 recommendations on older textbooks
40:27 comment on evolution strategies and OpenAI blog post on it
43:49 favorite hyperparameter search framework
@TheAIEpiphany 3 роки тому ⁺⁹
I love John's presenting style he's super positive and enthusiastic, great tips thank you!
@agarwalaksbad 6 років тому ⁺¹⁰
This is a super useful lecture. Thanks, John!
@FalguniDasShuvo Рік тому
Wow! I love how simply John conveys great ideas. Very interesting lecture!
@ProfessionalTycoons 5 років тому
this was a great talk .
@cheeloongsoon9090 6 років тому ⁺²
What a number to end the video, 44:44.
@BahriddinAbdiev 6 років тому ⁺²
We (3 students) exploring DQN and different types of it i.e. Double DQN, Doube Duelling DQN, Prioritized Experience Replay, etc. There is one thing that we all are facing: even it converges, if you run it long enough at some point it diverges again. Is this normal or it should converge and stay there or become even better always? Cheers!
@alexanderyau6347 5 років тому
Hi, I think it normal. But I don't know how does it come out. Maybe the model learned too much and become stupid, LOL.
@yoloswaggins2161 5 років тому ⁺⁷
No this is not supposed to happen. I've seen it happen for a couple of reasons but the most common is people scaling by a standard deviation that gets very close to 0 due to too much similar data.
@zhenghaopeng6633 4 роки тому ⁺¹
Hi there! Can I upload this lecture in Bilibili, a similar-to-youtube, famous video website in China? Many students are there and wish to get access to this insightful talks! Thanks!
@piyushjaininventor Рік тому
may be view on youtube? its free :)
@georgeivanchyk9376 4 роки тому
If you cut all the times he said 'ah', the video would be 2 times shorter

Наступне

Автоматичне відтворення

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Deep RL Bootcamp Lecture 7 SVG, DDPG, and Stochastic Computation Graphs (John Schulman)

Deep RL Bootcamp Lecture 7 SVG, DDPG, and Stochastic Computation Graphs (John Schulman)

Deep RL Bootcamp Lecture 4A: Policy Gradients

Deep RL Bootcamp Lecture 4A: Policy Gradients

это самое вкусное блюдо

это самое вкусное блюдо

Кто Последний Уснёт - Получит 250.000 Рублей! (Хазяева, Сатир, Кокошка, Дилблин) Часть 1

Кто Последний Уснёт - Получит 250.000 Рублей! (Хазяева, Сатир, Кокошка, Дилблин) Часть 1

버블티로 체감되는 요즘 물가

버블티로 체감되는 요즘 물가

MEGA BOXES ARE BACK!!!

MEGA BOXES ARE BACK!!!

An Observation on Generalization

An Observation on Generalization

The Problem with Wind Energy

The Problem with Wind Energy

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

Reinforcement Learning with sparse rewards

Reinforcement Learning with sparse rewards

Erdős-Woods Numbers - Numberphile

Erdős–Woods Numbers - Numberphile

Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Jacob Andreas | What Learning Algorithm is In-Context Learning?

Jacob Andreas | What Learning Algorithm is In-Context Learning?

Deep RL Bootcamp Lecture 2: Sampling-based Approximations and Function Fitting

Deep RL Bootcamp Lecture 2: Sampling-based Approximations and Function Fitting

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

LISA - ROCKSTAR (Official Music Video)

LISA - ROCKSTAR (Official Music Video)

Гайд по лучшим странам для криптовалют 🟡 Hamster Academy

Гайд по лучшим странам для криптовалют 🟡 Hamster Academy

DOROFEEVA - Хартбіт (Official Music Video)

DOROFEEVA - Хартбіт (Official Music Video)

Новый Тренд Деньги Звонят #янгер #shorts

Новый Тренд Деньги Звонят #янгер #shorts

❌Хамло в такси получил по заслугам. А вы когда-нибудь хамили таксистам? #pov

❌Хамло в такси получил по заслугам. А вы когда-нибудь хамили таксистам? #pov

Другие: 1-8 серии подряд

Другие: 1-8 серии подряд

Кто Последний Уснёт - Получит 250.000 Рублей! (Хазяева, Сатир, Кокошка, Дилблин) Часть 1

Кто Последний Уснёт - Получит 250.000 Рублей! (Хазяева, Сатир, Кокошка, Дилблин) Часть 1

🟦🟨 ДЕНЬ КОНСТИТУЦІЇ 👊🤨 НАРОД ПРОТИ ЧИНОВНИКІВ 👺💸

🟦🟨 ДЕНЬ КОНСТИТУЦІЇ 👊🤨 НАРОД ПРОТИ ЧИНОВНИКІВ 👺💸