Multicore Deep Reinforcement Learning | Asynchronous Advantage Actor Critic (A3C) Tutorial (PYTORCH)

How to Code RL Agents Like DeepMind

Mastering Robotics with Hindsight Experience Replay | Paper Analysis

[UA] Team Vitality проти NAVI | IEM Cologne 2024

⚡"УДАЧІ, БАТЯ!": воїни ЗСУ і місцевий мешканець у м. Суджа Курської області

Экстремальные Прятки в Огромной Усадьбе Закрытая Школа!

Can a Reinforcement Learning Agent Learn with NO Rewards? Intrinsic Curiosity Coding Tutorial

Machine Learning with Phil

Переглядів 7 493

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 сер 2024

КОМЕНТАРІ • 24

@elijahberegovsky8957 2 роки тому
First I’ve gotta say thank you for making this video. I’ve just read the paper, enjoyed it immensely, and wanted to find an implementation. And bang! here you are, with an in-depth guide on making it work.
Also, please, do Never Give Up as well!
@softerseltzer 2 роки тому
Nice! Some weekend activity, thanks!
@amegatron07 2 роки тому
Thank you very much for giving an example of how to implement ICM. I'm looking forward to try it myself, and also to make my own further experiments with it. I could perhaps give one tip: as a strong adherent of separation of concerns, I believe it would be better to focus less on other parts of the code, which are less relevant to the core topic, and perhaps just take already written components. I believe that would save a lot of time :)
@TaganMorgul 2 роки тому
Thank you very much for such a detailed ICM explanation! I was trying to implement it some time ago but with gym envs like cart pole or lunar lander I found it doesn't perform as expected, probably due to absence of "states encoding" part which I thought is a very important part of the work. I also didn't use a3c for my experiments but rather used a2c. In the end, I found that "Random Network Distillation" algorithm works way better for the same purpose and also free of "TV on the wall" defect like ICM.
@qiluo6299 2 роки тому
This is great video and thanks for sharing !
@MachineLearningwithPhil 2 роки тому
Thanks for watching
@orsimhon133 Рік тому ⁺¹
Hi Phill, thank you very much for this tutorial !
As I understood the ICM, the Inverse model should be trained together with the encoder NN (which we do not use here) in order to inform the encoder about the parts of the states that controllable by the agent.
So if we dont need the encoder here, we also dont need the Inverse model, isnt ?
Expecting to some answers, thanks again!
@akashvyas7715 2 місяці тому
I was thinking the same thing. Did you try removing the inverse model?
@61Marsh 2 роки тому ⁺²
I worked on this last year and ended up developing it, but my full solution never quite held up to my expectations. I always wondered if implemented it correctly, time to verify against yours. Thanks.
@MachineLearningwithPhil 2 роки тому
Let me know if you've any improvements
@masternobody1896 2 роки тому
@@MachineLearningwithPhil you are the best
@masternobody1896 2 роки тому
@@MachineLearningwithPhil can you do ai course beginner to expert
@bobingstern4448 2 роки тому ⁺²
Hey, I was working on a genetic NEAT like algorithm but I don’t how to crossover two neural networks with different topology. Is there a procedure to doing this or do you just choose a random one when this happens?
@royvivat113 2 роки тому
If you look at the neat paper it explains specifically how to do it. It has to do with keeping track of the topological history I believe.
@leo.y.comprendo 2 роки тому
I was just reading about this!
@MachineLearningwithPhil 2 роки тому
It's an awesome topic.
@WilliamChen-pp3qs 21 день тому
How would it perform compare with HER (hindsight experience replay)?
@yualan2158 Рік тому
First of all, I have to thank you for making this video. I have made some necessary modification to apply "MountainCar-v0" problem, which is a real "sparse reward" environment. However, it doesn't work. Can you check the code if it is successful in this environment? Thanks!
@mehranzand2873 2 роки тому
thanks a lot
@tanerylmaz8340 Рік тому
Hello there
Can we save the trained model in this example? Then is it possible to test the model we trained for another environment? How are we going to do? Thus, we can see the success and performance of the trained model more clearly. Could you help?
@tsunamio7750 2 роки тому ⁺¹
I'm pretty sure we can compact everything you said with fewer words and fewer domain-specific words. At some points, I can follow you, but the jargon is exploding my face.
@tsunamio7750 2 роки тому
feature vector, featur map. We have so many terms.
@chadmcintire4128 2 роки тому
This seems really similar to the entropy of SAC.
@sounakmojumder5689 Місяць тому
HI, did anyone run this in google colab? is there any problem with spawning

Наступне

Автоматичне відтворення

Multicore Deep Reinforcement Learning | Asynchronous Advantage Actor Critic (A3C) Tutorial (PYTORCH)

Multicore Deep Reinforcement Learning | Asynchronous Advantage Actor Critic (A3C) Tutorial (PYTORCH)

How to Code RL Agents Like DeepMind

How to Code RL Agents Like DeepMind

Mastering Robotics with Hindsight Experience Replay | Paper Analysis

Mastering Robotics with Hindsight Experience Replay | Paper Analysis

[UA] Team Vitality проти NAVI | IEM Cologne 2024

[UA] Team Vitality проти NAVI | IEM Cologne 2024

⚡"УДАЧІ, БАТЯ!": воїни ЗСУ і місцевий мешканець у м. Суджа Курської області

⚡"УДАЧІ, БАТЯ!": воїни ЗСУ і місцевий мешканець у м. Суджа Курської області

Экстремальные Прятки в Огромной Усадьбе Закрытая Школа!

Экстремальные Прятки в Огромной Усадьбе Закрытая Школа!

💣Все! Під КУРСЬК зайшли БІЛОРУСИ на танках. У Київ везуть ПОСЛАННЯ ПУТІНА. ТАКОГО ТОЧНО ще не було!

💣Все! Під КУРСЬК зайшли БІЛОРУСИ на танках. У Київ везуть ПОСЛАННЯ ПУТІНА. ТАКОГО ТОЧНО ще не було!

Curiosity-driven Exploration by Self-supervised Prediction

Curiosity-driven Exploration by Self-supervised Prediction

Reinforcement Learning, by the Book

Reinforcement Learning, by the Book

Can a Random Reinforcement Learning Agent Maximize its Score? Soft Actor Critic (SAC) in Tensorflow2

Can a Random Reinforcement Learning Agent Maximize its Score? Soft Actor Critic (SAC) in Tensorflow2

Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions

Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions

Reinforcement Learning with sparse rewards

Reinforcement Learning with sparse rewards

How Large of A Replay Buffer Do You Need? A Deeper Look at Experience Replay | Paper Analysis & Code

How Large of A Replay Buffer Do You Need? A Deeper Look at Experience Replay | Paper Analysis & Code

On Intrinsic Rewards and Continual Learning

On Intrinsic Rewards and Continual Learning

Reinforcement Learning Series: Overview of Methods

Reinforcement Learning Series: Overview of Methods

AI Learns to Walk (deep reinforcement learning)

AI Learns to Walk (deep reinforcement learning)

Втрачене дитинство | GOVOR TikTok #govor #shots

Втрачене дитинство | GOVOR TikTok #govor #shots

Kabağ hiç böyle pişirdinizmi! İnanılmaz lezzetli #kabak #yemek #un #domates #tarif #kahvaltı

Kabağ hiç böyle pişirdinizmi! İnanılmaz lezzetli #kabak #yemek #un #domates #tarif #kahvaltı

В ДЕТСТВЕ ОТПРАШИВАЕШЬСЯ НА РЕЧКУ У МАМЫ

В ДЕТСТВЕ ОТПРАШИВАЕШЬСЯ НА РЕЧКУ У МАМЫ

⚡"УДАЧІ, БАТЯ!": воїни ЗСУ і місцевий мешканець у м. Суджа Курської області

⚡"УДАЧІ, БАТЯ!": воїни ЗСУ і місцевий мешканець у м. Суджа Курської області

😳 Все русские уже знают итальянский?🇮🇹

😳 Все русские уже знают итальянский?🇮🇹

Справжнє кохання | GOVOR TikTok #govor #shots

Справжнє кохання | GOVOR TikTok #govor #shots

Удар по російській колоні в Курській області #shorts #війна #курськ #арміярф

Удар по російській колоні в Курській області #shorts #війна #курськ #арміярф