Markov Decision Processes 2 - Reinforcement Learning | Stanford CS221: AI (Autumn 2019)

Поділитися
Вставка
  • Опубліковано 2 чер 2024
  • For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/2Zv1JpK
    Topics: Reinforcement learning, Monte Carlo, SARSA, Q-learning, Exploration/exploitation, function approximation
    Percy Liang, Associate Professor & Dorsa Sadigh, Assistant Professor - Stanford University
    onlinehub.stanford.edu/
    Associate Professor Percy Liang
    Associate Professor of Computer Science and Statistics (courtesy)
    profiles.stanford.edu/percy-l...
    Assistant Professor Dorsa Sadigh
    Assistant Professor in the Computer Science Department & Electrical Engineering Department
    profiles.stanford.edu/dorsa-s...
    To follow along with the course schedule and syllabus, visit:
    stanford-cs221.github.io/autu...

КОМЕНТАРІ • 8

  • @albert2266
    @albert2266 Місяць тому

    Just to clarify a concept as I think 7:29 is not true because value function shouldn't be equal to the Q value. Value function is the expected utility for "all possible actions" at a given state. Therefore, it should be the expected Q_pi rather than just simply equal to Q_pi since Q_pi is the expected utility for "a given action" at a given state. Please correct me if I'm wrong.

  • @aojing
    @aojing 2 місяці тому

    A legacy question from last MDP-1 is still hovering around 2: What is the Transition function for this class? Is it a function of Action?

  • @henkjekel4081
    @henkjekel4081 Рік тому

    Yeah, u really need to be having an episode to play this game

  • @black-sci
    @black-sci 3 місяці тому

    Somehow Lecture left me confused in the end. may be I should rewatch.

  • @JumbyG
    @JumbyG Рік тому +2

    I think there may be a typo at 28:27, it states that the Qpi is (4+8+16)/3 however I believe it should be (4+8+12)/3? Please correct me if I am wrong

    • @seaotterlabs1685
      @seaotterlabs1685 Рік тому +2

      I think it should be (4+8+16)/3, as I believe their last run has four 4 values.

    • @endoumamoru3835
      @endoumamoru3835 5 місяців тому

      he is calculating sum of all rewards you can get. First time sum was 4 as only one reward was present and next was 8 as 2 rewards and then next it was 16 as 4 rewards were there.

  • @Moriadin
    @Moriadin 15 днів тому

    not as good as the previous lecture. harder to follow.