DeepMind x UCL RL Lecture Series - Function Approximation [7/13]

Reinforcement Learning: Machine Learning Meets Control Theory

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

格斗裁判暴力执法！#fighting #shorts

Чому у росіян на передовій багато нір? - Євген “Гром” Громадський, Герой України #україна #війна

💥 Штурмовики показали зачистку ворожих позицій поблизу Часового Яру!

DeepMind x UCL RL Lecture Series - Model-free Control [6/13]

Google DeepMind

Переглядів 20 421

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 20 тра 2024
Research Scientist Hado van Hasselt covers prediction algorithms for policy improvement, leading to algorithms that can learn good behaviour policies from sampled experience.
Slides: dpmd.ai/modelfreecontrol
Full video lecture series: dpmd.ai/DeepMindxUCL21
Наука та технологія

КОМЕНТАРІ • 13

@i0dollas 2 роки тому ⁺¹³
I really appreciate Hados teaching style demonstrated in the past 2 videos. There's some things I struggled to understand about the core concepts of various reinforcement learning aspects/math formulations, some of these confused concepts were introduced in the first lectures.
I realize without working through a lot of examples myself or really thinking about all the implications I'd never have a strong intuition of the learning material and it would be an impediment to learning further material.
But Hado's consistent review and focus on explaining different important nuances not self evidently clear by just looking at various formulations, really strongly enhanced the growth of a robust useful clear intuition over these concepts, the type of intuition necessary to play around with these concepts in a novel way.
@Shubham_Chaudhary 2 роки тому ⁺⁷
What an amazing lecture from the inventor of Double Q-learning!
@unknown3.3.34 2 роки тому
Can I take this course bro?
Planning to start this. I would like to learn RL but I'm Afride that I might take up the wrong course.
Can you suggeste some good courses on RL.
@lucabommarito5922 4 місяці тому ⁺¹
59:50
Classical Q learning: 99% of gamblers quit right before they win big
@marcin.sobocinski Рік тому ⁺²
I wish there was a practical part of those lectures with some coding in Python 😀
@chentao4686 2 роки тому ⁺²
Thanks for such wonderful lecture! 6/13
@roro4787 Рік тому ⁺¹
I'm here now!
@jmybll Рік тому
Does anyone know if there are different versions of the double dqn algorithm. From my point of view there could be a couple of different ways to implement the idea of breaking the overestimation. For example, for updating q you could choose any of the following update rules:
r+\gamma*q'(s`,\argmax_a' q(s',a')) as in the video
r+\gamma*q(s`,\argmax_a' q'(s',a')) choosing the action according to q', but evaluating it by q
@graham8316 2 роки тому
cancelation at 1:25:00 seems strange to me. If our policies {mu} and {pi} are choosing different actions (because they're different policies), then wouldn't we get to different A_{t}'s and then not be able to cancel them?
Edit: if anyone else is confused by this we're literally finding the probability of {tau} under {mu} and {pi}, so A_{t} is just A_{t} and we're finding the probability of it occurring, so we can do all the cancelation.
@mofra 2 роки тому
can someone plz explain again the req on alpha in 39:38?
and why 1/t is ok and hold the req
@raunakkbanerjee9016 2 роки тому
This one is serious stuff..
@mofra 2 роки тому ⁺¹
WHAT DOES THE I NOTATION MEAN? I() like in 16:16
@viktorkm9 2 роки тому ⁺²
I(TRUE) = 1
I(FALSE) = 0

Наступне

Автоматичне відтворення

DeepMind x UCL RL Lecture Series - Function Approximation [7/13]

DeepMind x UCL RL Lecture Series - Function Approximation [7/13]

Reinforcement Learning: Machine Learning Meets Control Theory

Reinforcement Learning: Machine Learning Meets Control Theory

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

格斗裁判暴力执法！#fighting #shorts

格斗裁判暴力执法！#fighting #shorts

Чому у росіян на передовій багато нір? - Євген “Гром” Громадський, Герой України #україна #війна

Чому у росіян на передовій багато нір? — Євген “Гром” Громадський, Герой України #україна #війна

💥 Штурмовики показали зачистку ворожих позицій поблизу Часового Яру!

💥 Штурмовики показали зачистку ворожих позицій поблизу Часового Яру!

"Випустіть хлопців з гаражів"#shortsvideo

"Випустіть хлопців з гаражів"#shortsvideo

Reinforcement Learning Series: Overview of Methods

Reinforcement Learning Series: Overview of Methods

RL Course by David Silver - Lecture 8: Integrating Learning and Planning

RL Course by David Silver - Lecture 8: Integrating Learning and Planning

Reinforcement Learning, by the Book

Reinforcement Learning, by the Book

MIT Introduction to Deep Learning (2023) | 6.S191

MIT Introduction to Deep Learning (2023) | 6.S191

MIT 6.S191 (2023): Reinforcement Learning

MIT 6.S191 (2023): Reinforcement Learning

RL Course by David Silver - Lecture 6: Value Function Approximation

RL Course by David Silver - Lecture 6: Value Function Approximation

How I'd Learn AI (If I Had to Start Over)

How I'd Learn AI (If I Had to Start Over)

DeepMind x UCL | Deep Learning Lectures | 10/12 | Unsupervised Representation Learning

DeepMind x UCL | Deep Learning Lectures | 10/12 | Unsupervised Representation Learning

Overview of Deep Reinforcement Learning Methods

Overview of Deep Reinforcement Learning Methods

How much charging is in your phone right now? 📱➡️ 🔋VS 🪫

How much charging is in your phone right now? 📱➡️ 🔋VS 🪫

Кто такие ТЕСТИРОВЩИКИ на самом деле? #компьютер #программирование #тестирование #айти

Кто такие ТЕСТИРОВЩИКИ на самом деле? #компьютер #программирование #тестирование #айти

А какой Windows пользуешься ты? #windows10 #windows #магазин #электроника #юмор #smartphone #пк

А какой Windows пользуешься ты? #windows10 #windows #магазин #электроника #юмор #smartphone #пк

Apples super sticky tape keeps your iPhones battery in place 😮 #iphonerepair #battery

Apples super sticky tape keeps your iPhones battery in place 😮 #iphonerepair #battery

S24UItra VS S23UItra anti-shake function comparison Samsung mobile phone digital mobile #shorts

S24UItra VS S23UItra anti-shake function comparison Samsung mobile phone digital mobile #shorts

Что еще за обходная зарядка?

Что еще за обходная зарядка?

Home Gadgets Haven😘Versatile Utensil (Inventions & Ideas)|Home Gadgets Haven #shorts #viral #tiktok

Home Gadgets Haven😘Versatile Utensil (Inventions & Ideas)|Home Gadgets Haven #shorts #viral #tiktok

Компания Самсунг затлолила Apple одной рекламой. Скрытые приемы маркетинга

Компания Самсунг затлолила Apple одной рекламой. Скрытые приемы маркетинга