Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 5 - Value Function Approximation

Reinforcement Learning Series: Overview of Methods

13. Learning: Genetic Algorithms

«Здається, якщо не прийду - буде ображатися». Батьки Степана Романюка щодня відвідують його могилу.

От первого лица: Школа 7😡 ПРОВЕЛИ НОЧЬ в МЕНТОВКЕ 😱 УЖАСНЫЙ 1 СЕНТЯБРЯ 😰 НОВЕНЬКАЯ ГЛАЗАМИ ШКОЛЬНИКА

СИРОПНЫЙ ЧЕЛЛЕНДЖ ТАКОЙ СМЕШНОЙ! #Shorts #Глент

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 4 - Model Free Control

Stanford Online

Переглядів 77 636

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 2 жов 2024
For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/ai
Professor Emma Brunskill, Stanford University
onlinehub.stanf...
Professor Emma Brunskill
Assistant Professor, Computer Science
Stanford AI for Human Impact Lab
Stanford Artificial Intelligence Lab
Statistical Machine Learning Group
To follow along with the course schedule and syllabus, visit: web.stanford.ed...

КОМЕНТАРІ • 9

@odycaptain 2 роки тому ⁺⁸
Thank you for the course
@RUBOROBOT Рік тому ⁺⁶
In the monotonic e-greedy Policy Improvement theorem, why do we add the (1-e)/(1-e) factor instead of just using the (1-e) that is already there? I see this step unnecessary and confusing as the original (1-e) is canceled with the unnecessary new (1-e) denominator, and it's therefore never used.
@zonghaoli4529 Рік тому ⁺²
This proof around 33:03, to a certain extent, is very cumbersome since all these transformations did not take you anywhere to be honest. Essentially, what is really important is that for V^{pi_{i+1}}, there is a taking max{Q} over action, which is certain always larger or equal to Q over action, which is what V^{pi_{i}} gives you in the previous iteration. On average, it is the greedy action that ensures the monotonic improvement.
@michaelbondarenko4650 11 місяців тому ⁺¹
Interestingly, they didn't fix the proof in the 2023 class either
@ZinzinsIA 10 місяців тому
the (1 - eps) / (1 - eps) is just to show that we did not changed the value of the sum by multiplying by this quantity and then we can write 1 - eps another way i.e with the sum of pi(a|s) - eps. Then, sum pi(a|s) - eps = (1 - eps + eps/card(A) + [(card(A) -1)/card(A) * eps)] - eps. The expression between parentheses corresponds to the sum of probas by definition of the epsilon soft policy. Be careful that they put the epsilon inside the sum over a but it is not correct, we do not sum epsilon over all possible actions. Then with the simplification and the fact that max_a(Q_pi(s, a)) >= Q_pi(s, a) you get the result and what is interesting with this "cumbersome" writing is that the simplification it gives you corresponds to V_pi(s) and so you have shown that there's a policy improvement. You can also check the book Sutton & Barto Aabout RL the proof is phrased a little bit differently but it's the same idea (p.101 of the 2018 edition)
@zonghaoli4529 Рік тому
26:23 I think the reason why people got a bit confused and obatain different answers was because they forgot the essence of MC for policy evaluation. For MC policy evaluation, it will start only when a full episode is completed. In this case, therefore, G_{i,t} for all existed state-action pair (s3,a1), (s2,a2), and (s1,a1), are all 1 as gamma is zero. Then just follow the pesudo code for MC, you will get the right answer. If you are doing TD, where policy evaluation starts immediately without waiting for the completion of the entire episode, I think the first student's answer was correct.
@takihasan8310 Місяць тому
No man, see the instance carefully, the gamma is 1.
@BeckCaesar-r8l 20 днів тому
Anderson Charles Anderson Melissa Smith Melissa

Наступне

Автоматичне відтворення

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 5 - Value Function Approximation

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 5 - Value Function Approximation

Reinforcement Learning Series: Overview of Methods

Reinforcement Learning Series: Overview of Methods

13. Learning: Genetic Algorithms

13. Learning: Genetic Algorithms

«Здається, якщо не прийду - буде ображатися». Батьки Степана Романюка щодня відвідують його могилу.

«Здається, якщо не прийду — буде ображатися». Батьки Степана Романюка щодня відвідують його могилу.

От первого лица: Школа 7😡 ПРОВЕЛИ НОЧЬ в МЕНТОВКЕ 😱 УЖАСНЫЙ 1 СЕНТЯБРЯ 😰 НОВЕНЬКАЯ ГЛАЗАМИ ШКОЛЬНИКА

От первого лица: Школа 7😡 ПРОВЕЛИ НОЧЬ в МЕНТОВКЕ 😱 УЖАСНЫЙ 1 СЕНТЯБРЯ 😰 НОВЕНЬКАЯ ГЛАЗАМИ ШКОЛЬНИКА

СИРОПНЫЙ ЧЕЛЛЕНДЖ ТАКОЙ СМЕШНОЙ! #Shorts #Глент

СИРОПНЫЙ ЧЕЛЛЕНДЖ ТАКОЙ СМЕШНОЙ! #Shorts #Глент

«Його били, а потім розстріляли» Як загинув Герой України Петро Федорчук.

«Його били, а потім розстріляли» Як загинув Герой України Петро Федорчук.

Cosmology Lecture 1

Cosmology Lecture 1

Peter Hitchens in heated clash over Israel's war

Peter Hitchens in heated clash over Israel's war

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill

Stanford CS149 I Parallel Computing I 2023 I Lecture 1 - Why Parallelism? Why Efficiency?

Stanford CS149 I Parallel Computing I 2023 I Lecture 1 - Why Parallelism? Why Efficiency?

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 7 - Imitation Learning

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 7 - Imitation Learning

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 8 - Policy Gradient I

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 8 - Policy Gradient I

Lecture 1 | Modern Physics: Quantum Mechanics (Stanford)

Lecture 1 | Modern Physics: Quantum Mechanics (Stanford)

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

Lecture 10 | Recurrent Neural Networks

Lecture 10 | Recurrent Neural Networks

ТУК ТУК ТУК репетиція 😍 Хочете чути цю пісню на концертах?

ТУК ТУК ТУК репетиція 😍 Хочете чути цю пісню на концертах?

Загадочная череда смертей участников группы Ласковый май | Документальный фильм

Загадочная череда смертей участников группы Ласковый май | Документальный фильм

ЗАГС. 1 СЕРИЯ. Мелодрама

ЗАГС. 1 СЕРИЯ. Мелодрама

Техас - новое место силы Америки / вДудь

Техас – новое место силы Америки / вДудь

Папа из-за ТАКОГО снова за хлебом ушёл😁А у тебя есть папа?🤔@KOTFIN

Папа из-за ТАКОГО снова за хлебом ушёл😁А у тебя есть папа?🤔@KOTFIN

Україна - Венесуела: ПРЯМА ТРАНСЛЯЦІЯ МАТЧУ / футзал, Чемпіонат світу-2024, 1/4 фіналу

Україна – Венесуела: ПРЯМА ТРАНСЛЯЦІЯ МАТЧУ / футзал, Чемпіонат світу-2024, 1/4 фіналу

РЕШАЮЩИЙ РАЗГОВОР: Золкин и Карпенко нашли ее мужа / "Жди меня" отдыхает!

РЕШАЮЩИЙ РАЗГОВОР: Золкин и Карпенко нашли ее мужа / "Жди меня" отдыхает!

Військовий прощається із побратимом #війна #war #зсу #україна

Військовий прощається із побратимом #війна #war #зсу #україна