Reinforcement Learning 1: Introduction to Reinforcement Learning

A History of Reinforcement Learning - Prof. A.G. Barto

The Most Important Algorithm in Machine Learning

Каха инструкция по шашлыку

❌Не пускают в тц с животными. В чем проблема, не пойму!?!? #pov #story

ДОНБАС ПОРОЖНЯК ВБИВАЄ: донецький клан | темна історія України

Reinforcement Learning 10: Classic Games Case Study

Google DeepMind

Переглядів 41 747

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 15 чер 2024
David Silver, Research Scientist, discusses classic games as part of the Advanced Deep Learning & Reinforcement Learning Lectures.
Наука та технологія

КОМЕНТАРІ • 33

@LuisYu 5 років тому ⁺¹⁰
amazing high quality lectures. especially enjoyed attention, memory, alpha zero talks.
@stevecarson7031 2 роки тому
Thankyou so much for this series of lectures!
@samagrasharma7755 5 років тому ⁺¹
Two lectures (CNN and RNN) are missing from this series. Can anyone tell if they are available online?
@Kingstanding23 4 роки тому ⁺³
A Nash equilibrium sounds like what happens on roads where traffic evens itself out amongst all the roads towards some destination. When a new road is built, nothing really changes because the traffic just redistributes itself to an new equilibrium.
@LucyRockprincess Рік тому
great real life analogy
@TheGreatBlackBird 3 роки тому ⁺¹
Shouldn't there also be a reward present in TD error at 42:30 and 50:25 ?
edit: ok, it's explained a bit more in the 2015 lecure that this version assumes no intermediate reward
@helinw 3 роки тому ⁺¹
Did David do another RL course in 2018? Or just one lecture?
@ShortVine 3 роки тому ⁺¹
i was thinking the same & searched a lot, but i think he did just one lecture in 2018
@alexanderyau6347 5 років тому ⁺⁴
I can comment now. See you again David.
@johangras3522 5 років тому ⁺⁶
It is possible to access to the course slides ?
@TuhinChattopadhyay 3 роки тому
@@Sigmav0 Link not working
@Sigmav0 3 роки тому ⁺²
@@TuhinChattopadhyay The slide has been moved to www.davidsilver.uk/wp-content/uploads/2020/03/games.pdf
Hope this helps !
@TuhinChattopadhyay 3 роки тому
@@Sigmav0 Got it... many thanks
@Sigmav0 3 роки тому ⁺¹
@@TuhinChattopadhyay No problem ! 👍
@dojutsu6861 3 роки тому
@@Sigmav0 these slides are from an older UCLxDeepMind lecture series lead primarily by David Silver. They do not include content on the newer AlphaZero models. Do you by any chance know if these updated slides are available online
@jakubbielan4784 5 років тому ⁺³
Anyone know what was the exact hardware used to train Alpha Go Zero?
@luisbanegassaybe6685 5 років тому
deepmind.com/blog/alphago-zero-learning-scratch/
@domino14 2 роки тому ⁺²
The level of computer play in Scrabble is not superhuman. Quackle beats Maven, and the best humans can 50-50 Quackle in a long series.
@Dina_tankar_mina_ord 5 років тому
I would love to see how deepmind would build a city on its own in Cityskyline. See how its optimization would create the best and most efficient layout in real time. Maybe we could learn alot from that.
@mohammadkhan5430 3 роки тому ⁺²
I love him, how sad the room is empty
@KayzeeFPS 4 роки тому ⁺¹
Here's a link to the same video but with slides visible ua-cam.com/video/N1LKLc6ufGY/v-deo.html
@yidingyu2739 5 років тому ⁺¹
Why so many empty seats?
@yoloswaggins2161 5 років тому ⁺⁵
This stuff not on the exam
@matveyshishov 5 років тому ⁺²
The number of people is lower with later lectures for some reason.
@markdonald4538 5 років тому
@@matveyshishov stupid ppl
@julioandresgomez3201 5 років тому
Despite the success of A 0 nets in several games, I feel that is better starting point playing (random number) games with humans. Only then, when it has grasped some basic basics (by itself, not forcibly inserted by hand), let it play against itself. This way it could accomplish in thousands of self-play games what from scratch it´d take millions of self-play games, due to the total randomness and clueless of the first games. It´s not the absolute zero approach, but it has no "artificial" parameters handcrafted either. It learns from its own games all the way.
@Avandale0 4 роки тому ⁺¹
Playing with humans takes considerably more time than running simulations - so actually, playing millions of games by itself is still faster than playing 100 games from playing humans. Knowing that a game of go takes around 1h, you'd have finished 3 games with a human in the time that it took AlphaZero to reach human level play.
Same for chess, when you realise it took Alpha Zero 4 hours reach a level higher than Stockfish...
It should be clear from these examples that one of the particularities of AlphaZero is the speed at which it learns. Playing humans here both defeats the purpose of self-learning and actually wastes time.
@omarcusmafait7202 5 років тому
why does nobody take notes?
@yoloswaggins2161 5 років тому ⁺¹
Not on the exam
@Sigmav0 5 років тому ⁺⁴
@William Davis Sure... In primary school...
@vijayabhaskarj3095 5 років тому ⁺⁴
because slides are available online and lectures are available online, I would listen carefully first in the class

Наступне

Автоматичне відтворення

Reinforcement Learning 1: Introduction to Reinforcement Learning

Reinforcement Learning 1: Introduction to Reinforcement Learning

A History of Reinforcement Learning - Prof. A.G. Barto

A History of Reinforcement Learning - Prof. A.G. Barto

The Most Important Algorithm in Machine Learning

The Most Important Algorithm in Machine Learning

Каха инструкция по шашлыку

Каха инструкция по шашлыку

❌Не пускают в тц с животными. В чем проблема, не пойму!?!? #pov #story

❌Не пускают в тц с животными. В чем проблема, не пойму!?!? #pov #story

ДОНБАС ПОРОЖНЯК ВБИВАЄ: донецький клан | темна історія України

ДОНБАС ПОРОЖНЯК ВБИВАЄ: донецький клан | темна історія України

Закон тайги | 1 сезон | 10 серия | Час нетопыря

Закон тайги | 1 сезон | 10 серия | Час нетопыря

Multi-Agent Hide and Seek

Multi-Agent Hide and Seek

Reinforcement Learning 3: Markov Decision Processes and Dynamic Programming

Reinforcement Learning 3: Markov Decision Processes and Dynamic Programming

Reinforcement Learning 6: Policy Gradients and Actor Critics

Reinforcement Learning 6: Policy Gradients and Actor Critics

Stanford CS224N | 2023 | Lecture 10 - Prompting, Reinforcement Learning from Human Feedback

Stanford CS224N | 2023 | Lecture 10 - Prompting, Reinforcement Learning from Human Feedback

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

An introduction to Reinforcement Learning

An introduction to Reinforcement Learning

DeepMind x UCL RL Lecture Series - Introduction to Reinforcement Learning [1/13]

DeepMind x UCL RL Lecture Series - Introduction to Reinforcement Learning [1/13]

TD Learning - Richard S. Sutton

TD Learning - Richard S. Sutton

RL Course by David Silver - Lecture 2: Markov Decision Process

RL Course by David Silver - Lecture 2: Markov Decision Process

Mem VPN - в Apple Store

Mem VPN - в Apple Store

Mi primera placa con dios

Mi primera placa con dios

i love you subscriber ♥️ #iphone #iphonefold #shortvideo

i love you subscriber ♥️ #iphone #iphonefold #shortvideo

Як правильно знімати на iPhone? #купитиайфон #купитиiphone #iphoneукраїна #купитиайфонукраїна

Як правильно знімати на iPhone? #купитиайфон #купитиiphone #iphoneукраїна #купитиайфонукраїна

DC Fast 🏃‍♂️ Mobile 📱 Charger

DC Fast 🏃‍♂️ Mobile 📱 Charger

Как ВЫЛЕЧИТЬ программиста? #компьютер #программирование #юмор #айти

Как ВЫЛЕЧИТЬ программиста? #компьютер #программирование #юмор #айти

МОЩНЕЕ ТВОЕГО ПК - iPad Pro M4 (feat. Brickspacer)

МОЩНЕЕ ТВОЕГО ПК - iPad Pro M4 (feat. Brickspacer)

lol Apple Intelligence is dumb...

lol Apple Intelligence is dumb...