Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 - Model-Free Policy Evaluation
Вставка
- Опубліковано 3 лип 2024
- For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/ai
Professor Emma Brunskill, Stanford University
stanford.io/3eJW8yT
Professor Emma Brunskill
Assistant Professor, Computer Science
Stanford AI for Human Impact Lab
Stanford Artificial Intelligence Lab
Statistical Machine Learning Group
To follow along with the course schedule and syllabus, visit: web.stanford.edu/class/cs234/i...
0:00 Introduction
3:32 Dynamic Programming for Policy Evaluation
5:53 Dynamic Programming Policy Evaluation
15:27 First-Visit Monte Carlo (MC) On Policy Evaluation
23:44 Every-Visit Monte Carlo (MC) On Policy Evaluation
26:02 Incremental Monte Carlo (MC) On Policy Evaluation, Running Mean
27:35 Check Your Understanding: MC On Policy Evaluation
32:14 MC Policy Evaluation
34:30 Monte Carlo (MC) Policy Evaluation Key Limitations
37:35 Monte Carlo (MC) Policy Evaluation Summary
39:40 Temporal Difference Learning for Estimating V
48:08 Check Your Understanding: TD Learning
56:30 Check Your Understanding For Dynamic Programming MC and TD Methods, Which Properties Hold?
Thanks to everyone who made it possible to upload this video.
This is real quick. The first half of the book is covered just in three lectures which corresponds to digesting 60 pages per week.
what book? Sutton and BartoI?
@@DimanjanDahal yeah that book
masters in computing from stanford requires 45 units. Full time student expected to complete in 1/2 years, part time in 3/5.
Does anyone know how many units this course is worth? Wondering how many of these courses someone could complete at the same time given that half of the S&B textbook is covered in what looks like two weeks! X.X
When we talk about Monte Carlo, when we evaluate V^(pi)(s), in order to pickout the best policy, we have to evaluate all possible policy ? and then pick the best one? Im a bit confused on how to do control here
thanks :D
Hi,
Can anyone please explain how the Monte Carlo method should be implemented in real world where we have no model of
The professor explains we repeat an experiment over and over again and average over all the values.
But in some cases it's not possible to gain in insight of the environment, suppose we are sending a rover to europa moon of the Jupiter. We would have no time to carry out experiments in such cases...
Also let's assume we can carry out experiment.
Suppose the experiment is living in this world and history repeats itself.
However the conditions is changing all the time. How can we calculate the values in such cases.
43:34 should that V^{pi}(s_{t}) be approximated over s instead of s`?
Also think it should be s, s' indicates s_{t+1}
In the Mars rover example, why it remains s_2 when it takes action a_1 at state s_2??
due to stochastic, taking action still have probability that the robot remains in current state :D
I take "action" to go to work... but my body decides to sleep :D
Who can explain for me the difference between trajectory and episode?
Trajectory is the specific path taken to a termination state while an episode is just the term we give to a single "run" in this case. The term episode is just more broad but it is just the name of some process. In the case of the Mars rover, the trajectory is the state, action, and next state pair, while the episode is just the process from the start state to the termination state. Again, it's not specific, it's just what we call a process.
episode = trajectory?
No, it's a simulation, you can see it as a sequence of states, action, reward, until you get to a terminal state. That is a episode
@@flecart then what's a trajectory?
@@jeffreyalidochair In motion planning, I think episode = trajectory.