Training AI Without Writing A Reward Function, with Reward Modelling

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Curiosity-driven Exploration by Self-supervised Prediction

Outsmarted 😂

МАМАША, Когда обидели Ребёнка (смешное видео, юмор, приколы, поржать)

😱 БЕЗУМЦЫ! РФ впервые АТАКОВАЛА Украину МЕЖКОНТИНЕНТАЛЬНОЙ баллистической ракетой #shorts

Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions

Yannic Kilcher

Переглядів 10 036

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 28 лис 2024

КОМЕНТАРІ • 31

@whatsinthepapers6112 5 років тому ⁺¹⁹
Not going to lie - was fooled up until magnetic chess board! Can't put anything past Schmidhuber
@herp_derpingson 5 років тому ⁺²⁸
Academics now have to use meme knowledge and tactics, to get their papers noticed. What a time to be alive.
@michael-nef 5 років тому ⁺¹⁵
starting strong, upside down characters in an academic paper. high teir memer
@michael-nef 5 років тому ⁺⁴
@Dmitry Akimov Lighten up a bit, these people just want recognition for their work and using catchy titles and more light-hearted introductions draws attention. It's not really their fault when it's what they're incentivized to do, something something reward-action.
@herp_derpingson 5 років тому
@dmitry I dont think its going to happen. There are so many research papers, if you want to get noticed, you need to stand out.
@michael-nef 4 роки тому ⁺³
@Dmitry Akimov ok boomer
@ronen300 3 роки тому ⁺³
One of the funniest 3 minutes in the field ! I was seriously laughing out loud 😂
@CosmiaNebula 4 роки тому ⁺⁶
skip to 4:08 if you don't want memes
@foobar1231 5 років тому ⁺²
Sorry, if something wrong, I'm not a specialist in RL.
It is a kind of dynamic programming: agent remembers its previous experience (command) and acts according to observation and experience. Experience is from the episodes (positive and negative, they are like palps). The longer an episodes (more steps), the bigger the horizon. So, calculate the mean reward from episodes and demand a little bit more (on one standard deviation more). What does it mean (to demand more)? As I understood, remain and develop only successful episodes further and cut negative episodes (palps).
@quickdudley 4 роки тому ⁺¹
Let's call the agent f, the observations s, the reward r, the demand d, and actions a. At each step of experience generation a = f(s,d). Then later once the reward is known f is updated such that f(s,r) is pulled towards a.
@CyberneticOrganism01 2 роки тому ⁺¹
interesting new perspective on how to do RL ☺️
@justinlloyd3 Рік тому
during the first few minutes I am like "hmm I don't think that's gonna work" LOL
@softerseltzer 3 роки тому
Thank you for the video!
One thing I don't understand though is why does the first paper says that you must use RNN's for non-deterministic environments, yet in the experiments paper, they just stack a few frames for the VizDoom example without any RNN's.
@scottmiller2591 5 років тому ⁺⁷
My cursor, hovering, hovering over the downvote icon - "This guy totally neither read nor understood the paper..." Finally, he says "Just kidding!" and actually reviews the paper.
@YannicKilcher 5 років тому ⁺⁴
Gotcha 😉
@richardwebb797 4 роки тому ⁺¹
If you have 2 actions A and B, and you explore / train an input of desired reward 0 to produce action A, how does that help you do the right thing with an input desired reward 1 (select action B)?
@YannicKilcher 4 роки тому
I guess ideally you would learn both, or at least recongize that you now want a different reward, so you should probably do a different action
@richardwebb797 4 роки тому
@@YannicKilcher possible to explain in more concrete terms? The idea is to sample actions better than randomly, but seems hand-wavy to say optimizing a probability distribution given one input will make the output distrib for another input good. Then again I guess that's the exactly what a neural net tries to do
@robosergTV 4 роки тому ⁺¹
what a great video, thanks!
@NanachiinAbyss 5 років тому ⁺¹
Can't you do the same by simply adding some logic to the function where the actions are chosen?
If you have a Network that outputs expected values you can just choose actions that have the expected value match with what you want.
@YannicKilcher 5 років тому
The value function has a hard coded horizon (until the end of the episode), where as UDRL can deal with any horizon.
@snippletrap 4 роки тому ⁺³
Negative 5 billion billion trillion is a pretty bad reward.
@DeepGamingAI 5 років тому ⁺⁴
Pronounced "Lara"?
@jonathanballoch 3 роки тому
This is just a generalization of goal-conditioned imitation learning, no?
@patf9770 3 роки тому
Or maybe that's just a special case of ⅂ꓤ ;)
@ambujmittal6824 5 років тому
Hi, can you do a video on Capsule networks also? Thank you :)
Btw, I love your videos.
@DanieleMarchei 5 років тому ⁺²
he already did it ^^
ua-cam.com/video/nXGHJTtFYRU/v-deo.html

Наступне

Автоматичне відтворення

Training AI Without Writing A Reward Function, with Reward Modelling

Training AI Without Writing A Reward Function, with Reward Modelling

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Curiosity-driven Exploration by Self-supervised Prediction

Curiosity-driven Exploration by Self-supervised Prediction

МАМАША, Когда обидели Ребёнка (смешное видео, юмор, приколы, поржать)

МАМАША, Когда обидели Ребёнка (смешное видео, юмор, приколы, поржать)

😱 БЕЗУМЦЫ! РФ впервые АТАКОВАЛА Украину МЕЖКОНТИНЕНТАЛЬНОЙ баллистической ракетой #shorts

😱 БЕЗУМЦЫ! РФ впервые АТАКОВАЛА Украину МЕЖКОНТИНЕНТАЛЬНОЙ баллистической ракетой #shorts

Водопад Ангела (2006)

Водопад Ангела (2006)

Reinforcement Learning with sparse rewards

Reinforcement Learning with sparse rewards

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

AI can't cross this line and we don't know why.

AI can't cross this line and we don't know why.

Reinforcement Learning, by the Book

Reinforcement Learning, by the Book

Yann LeCun: Why RL is overrated | Lex Fridman Podcast Clips

Yann LeCun: Why RL is overrated | Lex Fridman Podcast Clips

Cracking Enigma in 2021 - Computerphile

Cracking Enigma in 2021 - Computerphile

[Classic] Playing Atari with Deep Reinforcement Learning (Paper Explained)

[Classic] Playing Atari with Deep Reinforcement Learning (Paper Explained)

Reinforcement Learning for Trading Practical Examples and Lessons Learned by Dr. Tom Starke

Reinforcement Learning for Trading Practical Examples and Lessons Learned by Dr. Tom Starke

«Водії ставляться з розумінням». Волонтер, який возить тіла полеглих, про свою роботу

«Водії ставляться з розумінням». Волонтер, який возить тіла полеглих, про свою роботу

«Машина з такою швидкістю летіла, і такий гул, я думала, що це ракета летить» #shortsvideo #дтп

«Машина з такою швидкістю летіла, і такий гул, я думала, що це ракета летить» #shortsvideo #дтп

Внезапно! Что на самом деле подорвал «Орешник»

Внезапно! Что на самом деле подорвал «Орешник»

Молодой боец приземлил легенду!

Молодой боец приземлил легенду!

BD556+ Smoke Silencer.Who needs this for Christmas? #toys #gelblasters #gelblasterguns #airsoft

BD556+ Smoke Silencer.Who needs this for Christmas? #toys #gelblasters #gelblasterguns #airsoft

ШАМАНКА НЕ СТРИМАЛА ЕМОЦІЙ! “ЧОМУ ВИ НЕ ЗБЕРІГАЄТЕ ЖИТТЯ УКРАЇНСЬКИХ СОЛДАТ?!” - СЕЙРАШ

ШАМАНКА НЕ СТРИМАЛА ЕМОЦІЙ! “ЧОМУ ВИ НЕ ЗБЕРІГАЄТЕ ЖИТТЯ УКРАЇНСЬКИХ СОЛДАТ?!” - СЕЙРАШ

От первого лица: Школа 7 😡ПОЖЕРТВОВАЛ СОБОЙ РАДИ ДРУГА 🤯ДРАКА на СТРИМЕ 💔ПРИСТАВАЛ ГЛАЗАМИ ШКОЛЬНИКА

От первого лица: Школа 7 😡ПОЖЕРТВОВАЛ СОБОЙ РАДИ ДРУГА 🤯ДРАКА на СТРИМЕ 💔ПРИСТАВАЛ ГЛАЗАМИ ШКОЛЬНИКА

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts