The Evolution of AlphaGo to MuZero

Connor Shorten

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 16 січ 2020
This video covers the developments progression from AlphaGo to AlphaGo Zero to AlphaZero, and the latest algorithm, MuZero. These algorithms from the DeepMind team have gone from superhuman Go performance up to 57 different Atari games. Hopefully this video helps explain how these are related!
Thanks for watching! Please Subscribe!
Paper Links:
AlphaGo: www.nature.com/articles/natur...
AlphaGo Zero: www.nature.com/articles/natur...
AlphaZero: arxiv.org/abs/1712.01815
MuZero: arxiv.org/abs/1911.08265
Наука та технологія

КОМЕНТАРІ • 23

@connorshorten6311 4 роки тому ⁺⁴
1:04 Timeline Overview
2:24 AlphaGo
6:15 AlphaGo Zero
10:05 AlphaZero
11:40 MuZero
13:54 Concluding Overview
@koozdra 4 роки тому ⁺⁴
I'm so excited about muzero. Thanks for the great video.
@connorshorten6311 4 роки тому
Really cool algorithm! Thank you so much!!
@SteveOmohundro 4 роки тому ⁺³
Excellent summary! Thank you. It's so exciting to see this progression. It really shows Sutton's "Bitter Lesson" where dropping handcrafted structure improved performance. MuZero looks like a very clean general architecture. It's great that the choice of hidden state representation can be driven by its ability to predict the reward signal. It will be very interesting to see how well that works for real-world tasks.
@connorshorten6311 4 роки тому ⁺¹
I agree, it looks really interesting! I think that the evolution of these papers was interesting in relation to Sutton's "The BItter Lesson" because it shows how the research evolves to see how to drop handcrafted structures, without alphago to alphazero, I don't think they would have ever thought of something like MuZero. I can't wait to see MuZero applied to some kind of robotic control task!
@memoai7276 4 роки тому ⁺¹
Very nice explanation! Thank you
@connorshorten6311 4 роки тому
Thank you!!
@cleo1488 3 роки тому ⁺³
Where do they go after MuZero?
@NextFuckingLevel 4 роки тому ⁺²
Thank you, very entertaining
@connorshorten6311 4 роки тому ⁺¹
Thank you!!
@RibhuLahiri 4 роки тому ⁺³
Hey Henry! Are you planning to bring back the AI weekly review videos back anytime soon? They were a lifesaver. Great video as always!
@connorshorten6311 4 роки тому ⁺¹
Hey Ribhu, Thank you! Thanks for the interest in the AI weekly update series, I am working on bringing it back soon!
@RibhuLahiri 4 роки тому
Looking forward to it. Cheers mate!
@simonstrandgaard5503 4 роки тому ⁺¹
Very interesting journey.
@connorshorten6311 4 роки тому
Really enjoyed making videos about these papers! Excited to see what follows MuZero!
@colmmoore2409 5 днів тому
Hey Connor, is there any place to learn Go based on what AlphaGo does? Can it teach?
@faysoufox 4 роки тому
Nice video. You could also put a link to your medium article here.
@connorshorten6311 4 роки тому
Thank you!!
@DavenH 3 роки тому
Around 3:04 I lost you. The partial derivatives of the action selector function--How is that differentiable when the rewards are discrete (and binary win/lose)? I guess the P(expected reward|action) is differentiable with some kernel density estimator... still very fuzzy to me.
@mim8312 3 роки тому
I think that too many people are focusing on the game, which I also follow, as if this were an ordinary player. Since I have significant knowledge, and since I believe that Hawking and Musk were right, I am really anxious by the self-taught nature of this AI.
This particular AI (and its more generalized, even more recent variant MuZero) is not the worrisome thing, albeit it has obvious, potential applications in military logistics, military strategy, etc. The really scary part is how fast these were developed after AlphaGO debuted.
We are not creeping up on the goal of human-level intelligence. We are likely to shoot past that goal amazingly soon without even realizing it, if things continue progressing as they have.
The early, true AIs will also be narrow and not very competent or threatening, even if they become "superhuman" in intelligence. They will also be harmless, idiot savants at first.
Upcoming Threat to Humanity.
The scary thing is the fact that computer speed (and thereby, probably eventually AI intelligence) doubles about every year, and will likely double faster when super-intelligent AIs start designing chips, working with quantum computers as co-processors, etc. How fast will our AIs progress to such levels that they become indispensable -- while their utility makes hopeless any attempts to regulate them or retroactively impose restrictions on beings that are smarter than their designers?
At first, they may have only base functions, like the reptilian portion of our brain. However, when will they act like Nile crocodiles and react to any threat with aggression? Ever gone skinny dipping with Nile crocodiles?
I fear that very soon, before we realize it, we will all be doing the equivalent of skinny dipping with Nile crocodiles, because of how fast AIs will develop by the time that the children born today reach their teens or middle age. Like crocodiles that are raised by humans, AIs may like us for a while. I sure hope that lasts.
In Jurassic Park, I believe the quote was that someone did not stop to think if they should but thought only if they could, or words to that effect. As the announcer in Jeopardy said about a program that was probably not really an advanced AI long ago, I, for one, welcome our future, AI overlords.
@MusikPiratCH 3 роки тому
Concerning Chess I really doubt Alpha Zero could beat Stockfish at the TCEC conditions (as showed by Stockfish in TCEC 18-20 vs Lc0)! All conditions were in favor of Alpha Zero! :P
@gsm1 4 роки тому
Can you post the python implementation for muzero?

Наступне

Автоматичне відтворення