CS885 Module 3: Imitation Learning

Sergey Levine: Control as Inference and Soft Deep RL

Regular session 3

АЗИЯ! 1000 дней ВЫЖИВАНИЯ на КЛАНОВОМ СЕРВЕРЕ в РАСТ/RUST

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Артем Пивоваров x Max Barskih - Так ніхто не кохав

CS885 Module 2: Maximum Entropy Reinforcement Learning

Pascal Poupart

Переглядів 8 591

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 11 гру 2024

КОМЕНТАРІ • 15

@Miaumiau3333 Рік тому
This is SO GOOD, very clear and straight to the point
@viraatchandra8498 3 роки тому
best video out there for SQL math, particularly some of the derivations which aren't properly explained in the paper
@datascience_with_yetty 4 роки тому ⁺¹
You’re a great teacher. Please do more videos. Thanks
@InquilineKea 2 роки тому
I AM THE LIVING EMBODIMENT OF THIS
@mrbeancanman 3 роки тому ⁺¹
at 36:36, (SAC) why do we need to use a network to approximate softmax(Q_w \ l) for the policy (can we not just use it directly?)
@phanindraparashar8930 3 роки тому
Wow!!!! Amazing. It really helped me a lot. Can you do a video on Option Critic and Hierarchical Reinforcement Learning
@miroslavkosanic2917 4 роки тому
Hi, I wanted to ask, at 17:31, if the sum in the denominator of all actions in any state is equal to 1, and that is by the definition of the stochastic policy, wouldn't that mean that we are actually just dividing the numerator by 1 and the softmax, in theory, isn't needed and is put there for the practical/implementation reasons?
@miroslavkosanic2917 4 роки тому
And also, shouldn't we have used that same condition, the sum of all policies different actions in some state equals to 1, in the objective function?
@astaragmohapatra9 3 роки тому ⁺¹
@@miroslavkosanic2917 The softmax value of all action combined is equal to 1, not the denominator. Not entirely sure of this
@yueying9083 2 роки тому ⁺¹
The sum of Pi should be 1. So it is a constrained optimization, and the derivation should use Lagrangian.
@thomashirtz 3 роки тому
This course is amazing! Very nice job
(I was struggling with Soft policies, now I think I understand much more :) )
I have some quick questions though:
- at 16:15 you say that the objective is concave, but why is that ?
- at 25:30 We try to prove that Q grows for any pair (a,s). I can understand that the sum over the action of Q(a,s) will grow, but I am confused about the any pair, if at one state, the Q value was overestimated, and we than adjust it because we learn, the Q value of this action will decrease, and the Q value of the other action will increase, no ? I am just confused how the Q values can monotonically increase, even for bad actions
- at 31:00 I am very confused how repeatedly applying the derivation can give Q pi+1
- at 31:39 where the epsilon comes from ? (I didn't see it anywhere before)
Thanks again to have posted this class :)
@astaragmohapatra9 3 роки тому ⁺¹
1) Because you can see the differentiation is of the form dJ/dpi = y -(mx+1), so the integral form must be x^2, hence concave
@wunkewldewd 3 роки тому ⁺¹
Thanks for the video! but I think slide 3 is slightly wrong, or at least phrased confusingly: in some cases the optimal policy is not deterministic. For example, in rock paper scissors, the optimal policy is stochastic.
@DhruvaKartik 2 роки тому
For single-agent problems where we trying to maximize the expectation of a reward, there will always exist a deterministic optimal policy. Rock-paper-scissors is a two-player zero-sum game, so it does not fall under the standard reinforcement learning setup.
@AKNiloy 2 роки тому
sir is this an undergraduate course or graduate course?

Наступне

Автоматичне відтворення

CS885 Module 3: Imitation Learning

CS885 Module 3: Imitation Learning

Sergey Levine: Control as Inference and Soft Deep RL

Sergey Levine: Control as Inference and Soft Deep RL

Regular session 3

Regular session 3

АЗИЯ! 1000 дней ВЫЖИВАНИЯ на КЛАНОВОМ СЕРВЕРЕ в РАСТ/RUST

АЗИЯ! 1000 дней ВЫЖИВАНИЯ на КЛАНОВОМ СЕРВЕРЕ в РАСТ/RUST

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Артем Пивоваров x Max Barskih - Так ніхто не кохав

Артем Пивоваров x Max Barskih - Так ніхто не кохав

Зустріч Зеленського з Трампом і Макроном

Зустріч Зеленського з Трампом і Макроном

CS885 Module 1: Trust region & proximal policy optimization

CS885 Module 1: Trust region & proximal policy optimization

Policy Gradient Theorem Explained - Reinforcement Learning

Policy Gradient Theorem Explained - Reinforcement Learning

Soft Actor Critic (V2)

Soft Actor Critic (V2)

Why Maximum Entropy?

Why Maximum Entropy?

Deep RL Bootcamp Lecture 4A: Policy Gradients

Deep RL Bootcamp Lecture 4A: Policy Gradients

MIT 6.S191: Reinforcement Learning

MIT 6.S191: Reinforcement Learning

SLS2024: Inversion of IOPs from Rrs, ZhongPing Lee

SLS2024: Inversion of IOPs from Rrs, ZhongPing Lee

Alexandre Blais - Quantum Computing with Superconducting Qubits (Part 1) - CSSQI 2012

Alexandre Blais - Quantum Computing with Superconducting Qubits (Part 1) - CSSQI 2012

L5 DDPG and SAC (Foundations of Deep RL Series)

L5 DDPG and SAC (Foundations of Deep RL Series)

ДИНАМО - ОЛЕКСАНДРІЯ. ПРЯМА ТРАНСЛЯЦІЯ. УПЛ. 16 ТУР

ДИНАМО — ОЛЕКСАНДРІЯ. ПРЯМА ТРАНСЛЯЦІЯ. УПЛ. 16 ТУР

Outsmarted😅 Subscribe to me 🙌🏻

Outsmarted😅 Subscribe to me 🙌🏻

😮‍💨🔥 Усик знову перевершив себе: подивіться на це! #усик #україна #славаукраїні

😮‍💨🔥 Усик знову перевершив себе: подивіться на це! #усик #україна #славаукраїні

😯 Подарила сыну БМВ, но не ожидала такой реакции на машину! | Новостничок

😯 Подарила сыну БМВ, но не ожидала такой реакции на машину! | Новостничок

СТЕПА ЗАБЕРЕМЕННЕЛ И РОДИЛ ДЕТЕЙ 🤷‍♂️

СТЕПА ЗАБЕРЕМЕННЕЛ И РОДИЛ ДЕТЕЙ 🤷‍♂️

Cheerleader Transformation That Left Everyone Speechless! #shorts

Cheerleader Transformation That Left Everyone Speechless! #shorts

Правильный подход к детям

Правильный подход к детям

Anyone know what the name of this song is??? I can’t find it

Anyone know what the name of this song is??? I can’t find it