Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 - Model-Free Policy Evaluation

Поділитися
Вставка
  • Опубліковано 3 лип 2024
  • For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/ai
    Professor Emma Brunskill, Stanford University
    stanford.io/3eJW8yT
    Professor Emma Brunskill
    Assistant Professor, Computer Science
    Stanford AI for Human Impact Lab
    Stanford Artificial Intelligence Lab
    Statistical Machine Learning Group
    To follow along with the course schedule and syllabus, visit: web.stanford.edu/class/cs234/i...
    0:00 Introduction
    3:32 Dynamic Programming for Policy Evaluation
    5:53 Dynamic Programming Policy Evaluation
    15:27 First-Visit Monte Carlo (MC) On Policy Evaluation
    23:44 Every-Visit Monte Carlo (MC) On Policy Evaluation
    26:02 Incremental Monte Carlo (MC) On Policy Evaluation, Running Mean
    27:35 Check Your Understanding: MC On Policy Evaluation
    32:14 MC Policy Evaluation
    34:30 Monte Carlo (MC) Policy Evaluation Key Limitations
    37:35 Monte Carlo (MC) Policy Evaluation Summary
    39:40 Temporal Difference Learning for Estimating V
    48:08 Check Your Understanding: TD Learning
    56:30 Check Your Understanding For Dynamic Programming MC and TD Methods, Which Properties Hold?

КОМЕНТАРІ • 17

  • @user-cx5ni7me6l
    @user-cx5ni7me6l 2 роки тому +29

    Thanks to everyone who made it possible to upload this video.

  • @robensonlarokulu4963
    @robensonlarokulu4963 Рік тому +9

    This is real quick. The first half of the book is covered just in three lectures which corresponds to digesting 60 pages per week.

    • @DimanjanDahal
      @DimanjanDahal Рік тому +4

      what book? Sutton and BartoI?

    • @flecart
      @flecart Рік тому +2

      @@DimanjanDahal yeah that book

    • @prhc
      @prhc 7 днів тому

      masters in computing from stanford requires 45 units. Full time student expected to complete in 1/2 years, part time in 3/5.
      Does anyone know how many units this course is worth? Wondering how many of these courses someone could complete at the same time given that half of the S&B textbook is covered in what looks like two weeks! X.X

  • @NguyenAn-kf9ho
    @NguyenAn-kf9ho 14 днів тому

    When we talk about Monte Carlo, when we evaluate V^(pi)(s), in order to pickout the best policy, we have to evaluate all possible policy ? and then pick the best one? Im a bit confused on how to do control here
    thanks :D

  • @mohammadrezanargesi2439
    @mohammadrezanargesi2439 Рік тому +1

    Hi,
    Can anyone please explain how the Monte Carlo method should be implemented in real world where we have no model of
    The professor explains we repeat an experiment over and over again and average over all the values.
    But in some cases it's not possible to gain in insight of the environment, suppose we are sending a rover to europa moon of the Jupiter. We would have no time to carry out experiments in such cases...
    Also let's assume we can carry out experiment.
    Suppose the experiment is living in this world and history repeats itself.
    However the conditions is changing all the time. How can we calculate the values in such cases.

  • @zonghaoli4529
    @zonghaoli4529 Рік тому

    43:34 should that V^{pi}(s_{t}) be approximated over s instead of s`?

    • @shaozefan8268
      @shaozefan8268 6 місяців тому

      Also think it should be s, s' indicates s_{t+1}

  • @MengLi-yw7ix
    @MengLi-yw7ix 2 місяці тому

    In the Mars rover example, why it remains s_2 when it takes action a_1 at state s_2??

    • @NguyenAn-kf9ho
      @NguyenAn-kf9ho 21 день тому +1

      due to stochastic, taking action still have probability that the robot remains in current state :D
      I take "action" to go to work... but my body decides to sleep :D

  • @namluong4647
    @namluong4647 5 місяців тому

    Who can explain for me the difference between trajectory and episode?

    • @yaboidaggerlirette2391
      @yaboidaggerlirette2391 4 місяці тому +1

      Trajectory is the specific path taken to a termination state while an episode is just the term we give to a single "run" in this case. The term episode is just more broad but it is just the name of some process. In the case of the Mars rover, the trajectory is the state, action, and next state pair, while the episode is just the process from the start state to the termination state. Again, it's not specific, it's just what we call a process.

  • @jeffreyalidochair
    @jeffreyalidochair Рік тому +1

    episode = trajectory?

    • @flecart
      @flecart Рік тому +2

      No, it's a simulation, you can see it as a sequence of states, action, reward, until you get to a terminal state. That is a episode

    • @jeffreyalidochair
      @jeffreyalidochair Рік тому +1

      @@flecart then what's a trajectory?

    • @ernestbonnah1489
      @ernestbonnah1489 Рік тому +3

      @@jeffreyalidochair In motion planning, I think episode = trajectory.