Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 4 - Model Free Control

AI Olympics (multi-agent reinforcement learning)

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

ПОСЛЕДНЯЯ ГОНКА! Тамаев vs Венгалби. Проиграл Машину!

Worlds smallest 4K headset 😎 Visor.com #tech #vr #technology #virtualreality #insideout2

Volkswagen судится с Volkswagen?

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 - Model-Free Policy Evaluation

Stanford Online

Переглядів 109 750

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 3 лип 2024
For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/ai
Professor Emma Brunskill, Stanford University
stanford.io/3eJW8yT
Professor Emma Brunskill
Assistant Professor, Computer Science
Stanford AI for Human Impact Lab
Stanford Artificial Intelligence Lab
Statistical Machine Learning Group
To follow along with the course schedule and syllabus, visit: web.stanford.edu/class/cs234/i...
0:00 Introduction
3:32 Dynamic Programming for Policy Evaluation
5:53 Dynamic Programming Policy Evaluation
15:27 First-Visit Monte Carlo (MC) On Policy Evaluation
23:44 Every-Visit Monte Carlo (MC) On Policy Evaluation
26:02 Incremental Monte Carlo (MC) On Policy Evaluation, Running Mean
27:35 Check Your Understanding: MC On Policy Evaluation
32:14 MC Policy Evaluation
34:30 Monte Carlo (MC) Policy Evaluation Key Limitations
37:35 Monte Carlo (MC) Policy Evaluation Summary
39:40 Temporal Difference Learning for Estimating V
48:08 Check Your Understanding: TD Learning
56:30 Check Your Understanding For Dynamic Programming MC and TD Methods, Which Properties Hold?

КОМЕНТАРІ • 17

@user-cx5ni7me6l 2 роки тому ⁺²⁹
Thanks to everyone who made it possible to upload this video.
@robensonlarokulu4963 Рік тому ⁺⁹
This is real quick. The first half of the book is covered just in three lectures which corresponds to digesting 60 pages per week.
@DimanjanDahal Рік тому ⁺⁴
what book? Sutton and BartoI?
@flecart Рік тому ⁺²
@@DimanjanDahal yeah that book
@prhc 7 днів тому
masters in computing from stanford requires 45 units. Full time student expected to complete in 1/2 years, part time in 3/5.
Does anyone know how many units this course is worth? Wondering how many of these courses someone could complete at the same time given that half of the S&B textbook is covered in what looks like two weeks! X.X
@NguyenAn-kf9ho 14 днів тому
When we talk about Monte Carlo, when we evaluate V^(pi)(s), in order to pickout the best policy, we have to evaluate all possible policy ? and then pick the best one? Im a bit confused on how to do control here
thanks :D
@mohammadrezanargesi2439 Рік тому ⁺¹
Hi,
Can anyone please explain how the Monte Carlo method should be implemented in real world where we have no model of
The professor explains we repeat an experiment over and over again and average over all the values.
But in some cases it's not possible to gain in insight of the environment, suppose we are sending a rover to europa moon of the Jupiter. We would have no time to carry out experiments in such cases...
Also let's assume we can carry out experiment.
Suppose the experiment is living in this world and history repeats itself.
However the conditions is changing all the time. How can we calculate the values in such cases.
@zonghaoli4529 Рік тому
43:34 should that V^{pi}(s_{t}) be approximated over s instead of s`?
@shaozefan8268 6 місяців тому
Also think it should be s, s' indicates s_{t+1}
@MengLi-yw7ix 2 місяці тому
In the Mars rover example, why it remains s_2 when it takes action a_1 at state s_2??
@NguyenAn-kf9ho 21 день тому ⁺¹
due to stochastic, taking action still have probability that the robot remains in current state :D
I take "action" to go to work... but my body decides to sleep :D
@namluong4647 5 місяців тому
Who can explain for me the difference between trajectory and episode?
@yaboidaggerlirette2391 4 місяці тому ⁺¹
Trajectory is the specific path taken to a termination state while an episode is just the term we give to a single "run" in this case. The term episode is just more broad but it is just the name of some process. In the case of the Mars rover, the trajectory is the state, action, and next state pair, while the episode is just the process from the start state to the termination state. Again, it's not specific, it's just what we call a process.
@jeffreyalidochair Рік тому ⁺¹
episode = trajectory?
@flecart Рік тому ⁺²
No, it's a simulation, you can see it as a sequence of states, action, reward, until you get to a terminal state. That is a episode
@jeffreyalidochair Рік тому ⁺¹
@@flecart then what's a trajectory?
@ernestbonnah1489 Рік тому ⁺³
@@jeffreyalidochair In motion planning, I think episode = trajectory.

Наступне

Автоматичне відтворення

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 4 - Model Free Control

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 4 - Model Free Control

AI Olympics (multi-agent reinforcement learning)

AI Olympics (multi-agent reinforcement learning)

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

ПОСЛЕДНЯЯ ГОНКА! Тамаев vs Венгалби. Проиграл Машину!

ПОСЛЕДНЯЯ ГОНКА! Тамаев vs Венгалби. Проиграл Машину!

Worlds smallest 4K headset 😎 Visor.com #tech #vr #technology #virtualreality #insideout2

Worlds smallest 4K headset 😎 Visor.com #tech #vr #technology #virtualreality #insideout2

Volkswagen судится с Volkswagen?

Volkswagen судится с Volkswagen?

Час на цвинтар ❗️ Кім Чен Ин подарував Путіну надгробок

Час на цвинтар ❗️ Кім Чен Ин подарував Путіну надгробок

Stanford CS25: V4 I Hyung Won Chung of OpenAI

Stanford CS25: V4 I Hyung Won Chung of OpenAI

Erdős-Woods Numbers - Numberphile

Erdős–Woods Numbers - Numberphile

Model Based Reinforcement Learning: Policy Iteration, Value Iteration, and Dynamic Programming

Model Based Reinforcement Learning: Policy Iteration, Value Iteration, and Dynamic Programming

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

Classical Mechanics | Lecture 1

Classical Mechanics | Lecture 1

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

Lecture 1 | The Theoretical Minimum

Lecture 1 | The Theoretical Minimum

Stanford's FREE data science book and course are the best yet

Stanford's FREE data science book and course are the best yet

Самый лучший видеооператор в Майнкрафте #shorts #майнкрафт #minecraft

Самый лучший видеооператор в Майнкрафте #shorts #майнкрафт #minecraft

Этот Малыш Маленький Гений 👏

Этот Малыш Маленький Гений 👏

Час на цвинтар ❗️ Кім Чен Ин подарував Путіну надгробок

Час на цвинтар ❗️ Кім Чен Ин подарував Путіну надгробок

Повага | GOVOR TikTok #govor #shots

Повага | GOVOR TikTok #govor #shots

Скрипін про Миколу Тищенко

Скрипін про Миколу Тищенко

THE POLICE TAKES ME! feat @PANDAGIRLOFFICIAL #shorts

THE POLICE TAKES ME! feat @PANDAGIRLOFFICIAL #shorts

❌А с малыми только таким способом! Не бить же их #pov #story

❌А с малыми только таким способом! Не бить же их #pov #story

Как Reddit изменил Уолл-стрит: сага GameStop ⚡️ Hamster Academy

Как Reddit изменил Уолл-стрит: сага GameStop ⚡️ Hamster Academy