Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 - Model-Free Policy Evaluation

Markov Chains Clearly Explained! Part - 1

AI Olympics (multi-agent reinforcement learning)

ВІКТОРИНА #34. ІВАН ЛЮЛЕНОВ ТА ВЛАД СТЕБЛІВСЬКИЙ х КУРАН ТА ВЕНЯ | МІЖНАРОДНО-СКАНДАЛЬНИЙ ПОЄДИНОК

Арестович & Быков: Украине уже ничего не поможет?

Пробую самое сладкое вещество во Вселенной

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 2 - Given a Model of the World

Stanford Online

Переглядів 202 049

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 3 лип 2024
For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/ai
Professor Emma Brunskill, Stanford University
stanford.io/3eJW8yT
Professor Emma Brunskill
Assistant Professor, Computer Science
Stanford AI for Human Impact Lab
Stanford Artificial Intelligence Lab
Statistical Machine Learning Group
To follow along with the course schedule and syllabus, visit: web.stanford.edu/class/cs234/i...
0:00 Introduction
2:55 Full Observability: Markov Decision Process (MDP)
3:55 Recall: Markov Property
4:50 Markov Processor Markov Chain
5:53 Example: Mars Rover Markov Chain Transition Matrix, P
12:06 Example: Mars Rover Markov Chain Episodes
13:05 Markov Reward Process (MRP)
14:37 Return & Value Function
16:32 Discount Factor
18:23 Example: Mars Rover MRP
23:19 Matrix Form of Bellman Equation for MRP
26:52 Iterative Algorithm for Computing Value of a MRP
33:29 MDP Policy Evaluation, Iterative Algorithm
34:44 Policy Evaluation: Example & Check Your Understanding
36:39 Practice: MDP 1 Iteration of Policy Evaluation, Mars Rover Example
50:48 MDP Policy Iteration (PI)
55:44 Delving Deeper into Policy Improvement Step

КОМЕНТАРІ • 17

@mohammadrezanargesi2439 Рік тому ⁺¹
Thank you for sharing the contents
@meetsaiya5007 2 роки тому ⁺³
Can the common or good questions of piazza be put up somewhere to refer to?
@pierrecurie Рік тому ⁺⁴
25:47
Conjecture: inverse exists if gamma in [0,1), and fails to exist if gamma=1.
Easy to check for 1 or 2 state systems.
@moritzbroesamle4566 5 місяців тому
True, for gamma < 1 the matrix is strictly diagonally dominant, thus invertible
@meetsaiya5007 2 роки тому ⁺⁴
About the gammas being in GP has a very good interpretation in finance and I believe it stems from there and is just not mathematical. It does have some mathematical properties though. It's to do with interest which means if we earn 1 now and there's 10% interest, then after 1 year the it is 1.1 which means if after 1 year if I am earning 1, it is equivalent to earning 0.909 now and since interest are always in 10 to 20 25% range ballpark, this gives us rough values of gamma as 0.8 to 0.9 or so. A gamma of 0.5 would mean I would leverage the reward such that it would double in following time step. This is compounded over time and that is how it's a GP. However, this would imply if I have a reward on 1 this year, I can leverage it over following years (collect interest) which seems reasonable to think in terms of learning from experience early on in a sense... However this is my understanding and might be biased..
@user-di5kn1wu6w 11 місяців тому
we said if policy is deterministic we can simplify value function to Vπk(s) = r(s, π(s)) + γXs0∈Sp(s0|s, π(s))Vπk−1(s0) but how we can write max(a) Q(s,a) >= V(s) when policy is deterministic and we can choose just one action?
@John83118 4 місяці тому
I'm under its spell. I had the pleasure of reading something similar, and I was under its spell. "The Art of Saying No: Mastering Boundaries for a Fulfilling Life" by Samuel Dawn
@arpitqw1 Рік тому
How return function is different from value function ? How come return will be different from value function when process is not stochastic .( both having sum of reward )
@adityanarendra5886 Рік тому ⁺²
What is the tool that Prof Emma is using for the presentation and annotation, it looks really helpful?
@gravitas8297 Рік тому ⁺¹
Beamer? I guess
@adityanarendra5886 Рік тому
@@gravitas8297 Does beamer allow annotation ? I thought it was a latex class for making presentations ? I wanted to know the annotation tool she is using for iPad. That would be really helpful .
@gravitas8297 Рік тому
@@adityanarendra5886 Err I haven't tried that sorry :(
@muhammadhassanshakeel7544 Рік тому
Does anybody understand how did she get to 2nd step of the equation on 1:11:56?
@kaiqizhang6524 Рік тому
We dont care about a or a'. Suppposed that BV_k >= BV_j, a_j = a' making the maximum of BV_j. When a_j = a, we get BV_j{a_j=a}
@user-cy4wb8eo1v Рік тому ⁺¹
47:13 Someone just asked what I wanted to! 😂
@marciamarquene5753 6 місяців тому
V tú e horário normal e o valor da entrada e o valor e horário normal e o valor da taxa de ontem e o valor e horário normal e
@marciamarquene5753 6 місяців тому
G o resto x ela quiser vir me CP g vi agora só r ela e e horário então só r r viu se ela quiser e te amo e o valor e horário da manhã r viu se e o valor e horário

Наступне

Автоматичне відтворення

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 - Model-Free Policy Evaluation

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 - Model-Free Policy Evaluation

Markov Chains Clearly Explained! Part - 1

Markov Chains Clearly Explained! Part - 1

AI Olympics (multi-agent reinforcement learning)

AI Olympics (multi-agent reinforcement learning)

ВІКТОРИНА #34. ІВАН ЛЮЛЕНОВ ТА ВЛАД СТЕБЛІВСЬКИЙ х КУРАН ТА ВЕНЯ | МІЖНАРОДНО-СКАНДАЛЬНИЙ ПОЄДИНОК

ВІКТОРИНА #34. ІВАН ЛЮЛЕНОВ ТА ВЛАД СТЕБЛІВСЬКИЙ х КУРАН ТА ВЕНЯ | МІЖНАРОДНО-СКАНДАЛЬНИЙ ПОЄДИНОК

Арестович & Быков: Украине уже ничего не поможет?

Арестович & Быков: Украине уже ничего не поможет?

Пробую самое сладкое вещество во Вселенной

Пробую самое сладкое вещество во Вселенной

«Леопарди» були перед нами в «сірій» зоні», - начальник штабу «Азову» друг «Тавр»

«Леопарди» були перед нами в «сірій» зоні», — начальник штабу «Азову» друг «Тавр»

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

Stanford CS25: V4 I Transformers that Transform Well Enough to Support Near-Shallow Architectures

Stanford CS25: V4 I Transformers that Transform Well Enough to Support Near-Shallow Architectures

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

Markov Decision Processes - Computerphile

Markov Decision Processes - Computerphile

ML Was Hard Until I Learned These 5 Secrets!

ML Was Hard Until I Learned These 5 Secrets!

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Lecture 1 | Programming Methodology (Stanford)

Lecture 1 | Programming Methodology (Stanford)

Reinforcement Learning, by the Book

Reinforcement Learning, by the Book

Поважай захисників | GOVOR TikTok #govor #shots

Поважай захисників | GOVOR TikTok #govor #shots

ЯРОСЛАВ АМОСОВ В КЛУБІ ДИЛЕТАНТІВ #38

ЯРОСЛАВ АМОСОВ В КЛУБІ ДИЛЕТАНТІВ #38

УГАДАЙ КАКАЯ ЧАСТЬ МАНДАРИНА НАРИСОВАННАЯ! (99% СПРАВИТСЯ) #Shorts #Глент

УГАДАЙ КАКАЯ ЧАСТЬ МАНДАРИНА НАРИСОВАННАЯ! (99% СПРАВИТСЯ) #Shorts #Глент

Кто Последний Уснёт - Получит 250.000 Рублей! (Хазяева, Сатир, Кокошка, Дилблин) Часть 1

Кто Последний Уснёт - Получит 250.000 Рублей! (Хазяева, Сатир, Кокошка, Дилблин) Часть 1

Проведал Маму Самвела В Больнице!Прогулялись ! Все Под Контролем ! Идет На Поправку ❤️‍🩹

Проведал Маму Самвела В Больнице!Прогулялись ! Все Под Контролем ! Идет На Поправку ❤️‍🩹

THE POLICE TAKES ME! feat @PANDAGIRLOFFICIAL #shorts

THE POLICE TAKES ME! feat @PANDAGIRLOFFICIAL #shorts

САНКЦИИ, РУССКИЕ, ГОЙДА! БЕСЕДА С ВИТАЛИЕМ ПОРТНИКОВЫМ @portnikov.argumenty

САНКЦИИ, РУССКИЕ, ГОЙДА! БЕСЕДА С ВИТАЛИЕМ ПОРТНИКОВЫМ @portnikov.argumenty

Повага | GOVOR TikTok #govor #shots

Повага | GOVOR TikTok #govor #shots