AlphaGo - How AI mastered the hardest boardgame in history

Поділитися
Вставка
  • Опубліковано 12 лис 2017
  • In this episode I dive into the technical details of the AlphaGo Zero paper by Google DeepMind.
    This AI system uses Reinforcement Learning to beat the world's Go champion using only self-play, a remarkable display of clever engineering on the path to stronger AI systems.
    DeepMind Blogpost: deepmind.com/blog/alphago-zer...
    AlphaGo Zero paper: storage.googleapis.com/deepmi...
    If you want to support this channel, here is my patreon link:
    / arxivinsights --- You are amazing!! ;)
    If you have questions you would like to discuss with me personally, you can book a 1-on-1 video call through Pensight: pensight.com/x/xander-steenbr...
  • Наука та технологія

КОМЕНТАРІ • 113

  • @alonamaloh
    @alonamaloh 6 років тому +18

    I've been programming board game engines for 25 years and I've followed the development of CNNs to play go quite closely. This video is a really good description of the AlphaGo Zero paper, with very clear explanations. Well, the explanation of MCTS was completely wrong, but other than that this video was great. I'll make sure to check out more from this channel.

  • @shamimhussain396
    @shamimhussain396 6 років тому +9

    We, humans, run simulations in our heads all the time because sometimes simple intuitions are not enough... So, I guess, it isn't surprising that inclusion of Monte Carlo Tree Search would always drastically improve performance no matter how good the value function estimates are, even with the help of deep learning... The question is how to search more efficiently and also how to build an efficient model...

  • @AlessandroOrlandi83
    @AlessandroOrlandi83 4 роки тому

    Thank you for taking the time to explain it so well. Still difficult for me that I'm not familiar with the matter yet, but you did really a good job of showing it clearly!

  • @kkkkjjjj4517
    @kkkkjjjj4517 6 років тому +168

    7:24 Your explanation of MCTS is not correct. For one instance of simulation: It picks the top move recommended by the network (greedy) most of the time, with random moves some of the time (epsilon). Then it walks into that move and repeats the same. It does it to completion. Then it backs up and keeps track of win vs visit ratio for each state as shown in the picture. It repeats this whole process 1600 times. As it is performing these walkthroughs it trains the networks and updates the values. So eventually, the more often you see a state, it will statistically converge to optimal value. MCTS runs to completion, its not a depth pruning algorithm. Temporal Difference stops somewhere in the middle, this was not used in AGZ. MCTS algorithm is discussed by David Silver in his lecture #8 towards the end.

    • @ArxivInsights
      @ArxivInsights  6 років тому +87

      I checked the paper and you are indeed correct! The used MCTS doesn't always play out every thread until the very end of the game (they use value tresholds for early stopping) but I did misinterpret the meaning of the '1600 simulations', thanks for pointing this out!

    • @andreys7944
      @andreys7944 5 років тому +2

      Do I get it correct: with average depth ~300 moves and no early stops that will be ~1600*300 network queries just for the first move?

    • @edeneden97
      @edeneden97 5 років тому

      so it plays out and if he wins then those moves he picked are trained to be 1 and value 1? and if he losses everything 0?

    • @DiapaYY
      @DiapaYY 5 років тому

      AGZ doesn't really use MCTS as it doesn't use rollouts, it doesn't play the game out to the end.

    • @philippheinrich7461
      @philippheinrich7461 4 роки тому

      So I am going down the search tree (based on an algorithm that takes exploration and other things into account), until I reach a leaf node.
      I put the leaf node position into my neural net and get a policy and an evaluation as a result. The policy adds new leaf nodes to my current position
      and the value function gives me an evaluation of my current position.
      Is this correct ?

  • @noranta4
    @noranta4 6 років тому +55

    This is a valuable explanation, this channel is a great discovery

    • @ArxivInsights
      @ArxivInsights  6 років тому +4

      noranta4 thanks man, just started two weeks ago ;) More vids coming up :p

  • @clrajapaksha
    @clrajapaksha 6 років тому +2

    You explained technical stuff very clearly. Thanks Arxiv Insights

  • @antonystringfellow5152
    @antonystringfellow5152 6 років тому +3

    Clearest and most informative video I've seen on AlphaGo. Thanks!

  • @SantoshGupta-jn1wn
    @SantoshGupta-jn1wn 6 років тому +44

    You explanation skills are fantastic! I like how he has an outline at the begging of his video, very simple thing yet very effective when it comes to teaching a subject, yet so few educational videos do that.
    If I were to figure out the paper by myself, it would have taken me personally ~2x longer.
    Subscribed.

  • @elishaishaal7958
    @elishaishaal7958 Рік тому

    Thank you! This is the one of the clearest and most concise explanations of any paper I've found thus far.

  • @augustopertence2804
    @augustopertence2804 6 років тому +7

    Best explanation I found about AlphaGo Zero

  • @Sl4ab
    @Sl4ab 2 роки тому

    It's very clear, thank you! I can't wait to discover the other videos :)

  • @Hyrtsi
    @Hyrtsi 3 роки тому +1

    Excellent explanation, thanks!! I'm going to make my own 9*9 alphago zero version

  • @dankelly
    @dankelly Рік тому

    Awesome explination! (And, you're greenscreen work looks great!)

  • @ericfeuilleaubois40
    @ericfeuilleaubois40 6 років тому

    Damn great video! Carry on ! Makes it very easy to get into these advanced subjects :)

  • @myj313
    @myj313 5 років тому

    Great summary of the paper! Thank you :)

  • @arijit07
    @arijit07 4 роки тому

    This is the best video regarding Alpha GO paper. Just Amazing !!!

  • @2000chinadragon
    @2000chinadragon 6 років тому

    Fantastic explanation! Few people balance simplicity with thoroughness as well as you do.

    • @ArxivInsights
      @ArxivInsights  6 років тому +1

      That's the goal indeed, thx for the feedback :)

  • @shafu0x
    @shafu0x 5 років тому +1

    Thank you for this great explanation!

  • @SiavashFahimi
    @SiavashFahimi 5 років тому +1

    Thank you, finally I found a good video on this paper.

  • @welcomeaioverlords
    @welcomeaioverlords 4 роки тому

    Excellent video, thanks for making it!

  • @daehankim2437
    @daehankim2437 6 років тому

    This helps a lot for those who need insights of machine learning trends :)

  • @siskon912
    @siskon912 5 років тому

    Great explanation. Thank you!

  • @Leibniz_28
    @Leibniz_28 4 роки тому

    Excellent explanation, thanks

  • @Moonz97
    @Moonz97 5 років тому

    Loving your channel!

  • @siddarthc7091
    @siddarthc7091 Рік тому

    the transition 'dhkk' hits hard

  • @guitarchessplayer
    @guitarchessplayer 6 років тому +2

    Thanks for the great explanation! Im still wondering how alpha go zero learns that certain moves are obviously bad like playin in the corner for example without playing a game till the end?

  • @SreeramAjay
    @SreeramAjay 6 років тому

    Wow, this is really a great explanation

  • @diracsea2774
    @diracsea2774 6 років тому

    Excellent Presentation

  • @matrixmoeniaclegacy
    @matrixmoeniaclegacy 4 роки тому

    Thank you for this valuable explanation!
    I just want to request, if you would like to highlight the parts of your images more, that you are talking about? E.g. in the diagrams, you show. This would make it easier to follow!

  • @alaad1009
    @alaad1009 5 місяців тому

    Excellent video

  • @PALYGAP
    @PALYGAP 6 років тому

    A little question on the AlphaGo Zero MCTS. The Monte-carlo aspect of the Alpha Zero MCTS seems to be gone AFAIK. Can't see random number or random choices in that MCTS. It seem to be replaced by the CNN calculating the probability of a board position leading to victory. What's your take on it ?

  • @curiousalchemist
    @curiousalchemist 6 років тому +16

    Brilliant - thanks for this! Really enjoyed watching and I think it takes away all the right information from the paper.
    Just a quick point: is there any chance you could quieten down the background music for your next video? It was slightly distracting and I think it detracted a bit from your great explanation!
    Merry Christmas!

    • @ArxivInsights
      @ArxivInsights  6 років тому +6

      Thanks a lot, great to hear :) And for the background music: I got the same feedback from a few different people! This was my first video (every other video you'll find on my channel has this fixed) :p

  • @brahimelmssilha7234
    @brahimelmssilha7234 6 років тому

    maaaan you are doing a great work keep up

  • @LOGICZOMBIE
    @LOGICZOMBIE 2 роки тому

    GREAT WORK

  • @johnvonhorn2942
    @johnvonhorn2942 4 роки тому +1

    Xander, you look like "The Hoff" (David Hasslehoff) and that's a great look!

  • @Bippy55
    @Bippy55 Рік тому

    10 Nov 2022 - I just discovered this video and your channel. Fantastic explanation of granted a difficult subject to even tackle. Did you mention what kind of computer hardware the newest AlphaGo system uses? I assume it’s a mainframe of some type. Also, I wonder if the system can decide in advance to play a peaceful game or a highly combative game? I have seen games where they were very few prisoners taken off the board. Otherwise called the peaceful game. Still there is a winner nonetheless. Anyway bravo for an excellent video.

  • @seleejanegaelebale9192
    @seleejanegaelebale9192 4 роки тому

    Thanks for the impressive explanation, where can I learn the source code ?

  • @yolomein415
    @yolomein415 4 роки тому +1

    How is the value representation trained?

  • @ruanjiayang
    @ruanjiayang 4 роки тому

    I am wondering what the output "policy vector" is like in the neural network

  • @RomeoKienzler
    @RomeoKienzler 4 роки тому

    7:27 u said "certain depth", did u mean "certain width"? btw. I'd say this is one of the very best channels on the DL topic I've ever seen! thanks so much!

  • @bhargav7476
    @bhargav7476 2 роки тому +1

    that's some giga chad jaw u have there

  • @zzewt
    @zzewt 3 місяці тому

    This is cool, but after the third random jumpscare sound I couldn't pay attention to what you were saying--all I could think about was when the next one would be. Gave up halfway through since it was stressing me out

  • @PesarTarofi
    @PesarTarofi 6 років тому +3

    cant wait for this thing to perform in sc2

  • @generichuman_
    @generichuman_ Рік тому

    The part I don't understand, is how they dispense with rollout in MCTS. It seems like this is the only way to get a ground truth value ( by reaching a terminal state) which can then be propagated back up the chain. If you reach a non terminal state, you're back propagating a value from the policy network which won't have useful values until it can be trained on useful values from the tree search. It seems like it's pulling itself up by it's bootstraps. Is it the case that the true values come from the odd time that a simulation reaches a terminal state? Or am I missing something fundamental?

  • @davidm.johnston8994
    @davidm.johnston8994 6 років тому

    Interesting video :-)

  • @railgunpat3170
    @railgunpat3170 4 роки тому

    wow, i see some mistakes and also I didn't watch to much of your videos, but i find this channel is definitely underrated

  • @petercruz1688
    @petercruz1688 6 років тому

    Danger Will Robinson!!!

  • @columbus8myhw
    @columbus8myhw 6 років тому +6

    You should open this up to community captioning

  • @einemailadressenbesitzerei8816
    @einemailadressenbesitzerei8816 3 роки тому

    What im interested in is how does the cnn work? What is the old target/label what is the new target/label? How does it update the label in training? so what is the prediction and what is the target. i mean the cnn is dependent on these. Ok the prediction is in the beginning randomized. The network says its a win, but in the end it was a loss so it can update the weigths. But i dont understand it in detail. I mean it needs to play the whole game until it can update the weights. Does it update every output (p,v) for every position played in this game? And somehow it plays the same position a lot of times to update the output.

  • @timleonard2668
    @timleonard2668 6 років тому

    Is it hard to implement this algorithm by myself? Could I create a super-human go Player on say a 7x7 board with just my laptop?
    How big could I make the board using just a normal laptop?

    • @ArxivInsights
      @ArxivInsights  6 років тому

      There's a ton of open source implementations on GitHub: github.com/topics/alphago-zero but I know that many people are having issues reproducing the full strength of DeepMinds version. I don't know if the 'interesting game mechanics' of Go also emerge on a small board like 7x7 but I would guess that you can definitely train a decent model on a laptop for such a small game-state. Additionally, you could also apply the algorithm to chess, which has a much smaller branching factor so it's easier to train, although again I think in order to get decent results you would have to throw some cloud computing power in the mix :)

  • @zee1645
    @zee1645 6 років тому

    did u guys teach alphago how to beat security systems yet? and take over teh stock market and all nuclear launch codes?

  • @tenacityisthekey
    @tenacityisthekey 3 роки тому +1

    Does anybody know if the shape of the output layer changes for every phase of the game? In the video, he explains that the network produces probability distribution over possible moves and the number of possible moves is dynamic. Does that mean the output layer's dimension is also dynamic? If so, how is it achieved? Can anyone help me understand? Thanks!

    • @dshin83
      @dshin83 Рік тому

      No, the output layer shape is static. You need to zero out the illegal moves from the output and then renormalize the probabilities to sum to 1.

  • @robertkinslow8953
    @robertkinslow8953 3 роки тому

    Ok. So how do you play and what is the idea of it?

  • @thiru42
    @thiru42 4 роки тому +1

    The history (extra 7 layers) is also used to identify ko (kind of similar to threefold repetition in Chess)

  • @LaurentLaborde
    @LaurentLaborde 3 роки тому

    i'm confused as your explanation contradict the points you mentionned in the introduction

  • @karFLY1
    @karFLY1 6 років тому +1

    Keep on! It's great.

  • @truehurukan
    @truehurukan 4 роки тому

    Thank you very much for the effort to educate the ignorants about the mechanisms linked to Alpha Go that is taken from a lot of ignorants like a "beast" like "terminator-like machine"... for me to simplify, I would say that the Go champion played versus 50000 professional go players -> no chance to win at all. As kasparov failed winning against 50000 human amateur players in this last decade.For me this is the massive parallel processes and recursive functions that beated the champion, technically beaten, this is definetly NOT intelligence but MASSIVELY PARALLEL processes clustered on thousands of CPU and GPU (floating point operations).

  • @jakekim1357
    @jakekim1357 4 роки тому

    yo this video is dope. it's super fire. Just letting u know i'm a dan player
    I want to know more about this i hope your my school teacher

  • @arnavrawat9864
    @arnavrawat9864 5 років тому

    What if instead of self training, the ai is trained with the data of matches of a previously trained alpha zero ai?

    • @ArxivInsights
      @ArxivInsights  5 років тому

      You could use that to speed up training in the beginning for version 2.0, but eventually performance will saturate and you wont do better.. And if you're building a version 2.0 you're hoping to do better than 1.0, so bootstrapping on gameplay that is worse than what you want to achieve doenst really makes sense. Similarly AlphaGoZero got better than AlphaGo by NOT bootstrapping on human games...

  • @andresnet1827
    @andresnet1827 3 роки тому

    Do Alphafold 2 when paper comes out)

  • @dhrumilbarot1431
    @dhrumilbarot1431 6 років тому

    Epic👌👌👌👌

  • @vornamenachname906
    @vornamenachname906 2 роки тому

    11:39 two of four of your "very popular moves that stands for thousands of years" was dismissed by alphago after 50 hours of training.

  • @keylllogdark
    @keylllogdark 5 років тому +30

    my brain feels sexually abused after watching this video...

  • @funkyderrick3589
    @funkyderrick3589 4 роки тому

    very nice video ! put a microphone closer to you so we don't have that annoying reverb please

  • @XChen-te7hk
    @XChen-te7hk 5 років тому

    7:37 "... to play about 1600 simulations for every single board evaluation ... " I have a question. How do they do this? Even if it's not 19*19*17, let's say it's just 19*19*2 around 700, there would be (2**700) "board evaluation"s (maybe less if there are some illegal board states, but not much less). How could they even just play one simulation for so many "board evaluation"s? Guess I'm missing something...

    • @ArxivInsights
      @ArxivInsights  5 років тому +1

      To build out the search tree, potential actions are sampled from the policy network (both for white and black moves) (so this hugely constrains the tree rollout to the most-likely / best moves). And then they also do pruning according to the value network, so whenever a part of the search tree results in a very low chance of winning (below some threshold according to the value network) it is discarded and not explored further. Combining these two approaches they build out the search tree and finally decide what move to play at the top of the tree by looking at the average value estimates for each of the tree's branches.

  • @fyrerayne8882
    @fyrerayne8882 2 роки тому

    🔥🧠🔥

  • @RASKARZ34
    @RASKARZ34 4 роки тому

    +1 sub

  • @these2menrgannadoit
    @these2menrgannadoit 4 роки тому

    *Guitar Noise*

  • @thekitchfamily
    @thekitchfamily 5 років тому +2

    So not really AI, just number crunching using statistical analysis (montecarlo tree).

    • @ArxivInsights
      @ArxivInsights  5 років тому +1

      Well, it uses deep neural nets (value estimate + policy net) + self-play training (Reinforcement Learning) to make the Monte Carlo Tree Search tractable on the exponentially scaling Go search space. So yes it's number crunching, but that's what AI is all about...

  • @confucamus3536
    @confucamus3536 5 років тому

    so really it's just one big ass flow chart, if this then that

  • @muschas1
    @muschas1 4 роки тому

    well, basically like humans acquire skills ... from scratch.
    cool

  • @MilesBellas
    @MilesBellas 5 років тому +71

    distracting music

  • @uncledevin700
    @uncledevin700 5 років тому

    It’s too difficult to explain how great of alphago to the people who don’t know how to play wei qi.

  • @IBMua
    @IBMua 6 років тому

    Anybody knows wtf they use a move history for? Aside from obfuscating learning and computation by many many times over? Seems like nonesense.

    • @ArxivInsights
      @ArxivInsights  6 років тому +1

      Ihor Menshykov yeah had the same thought at first. Apparently including the history let's the network learn a form of attention over the important/active parts of the game. But I agree that theoretically, it shouldn't really be necessary... See the Reddit Q&A for more details!

  • @angloland4539
    @angloland4539 9 місяців тому

    😊

  • @dougdevine27
    @dougdevine27 6 років тому +1

    Good info but you should consider losing those annoying and jarring scene transition guitar strums+kick drum sounds. They detract very much from the presentation.

    • @ArxivInsights
      @ArxivInsights  6 років тому +1

      dougdevine27 haha very true, this was my first video :p I removed them in all my other content ;) Unfortunately, once uploaded UA-cam doesn't let you change anything anymore..

  • @420_gunna
    @420_gunna 6 років тому +1

    You're...gwern? What?

  • @nightmareTomek
    @nightmareTomek Рік тому

    Nice video. But your sound effects and music are VERY loud. Maybe normalize a bit?

  • @ophello
    @ophello 4 роки тому +1

    Get rid of that sound effect. It’s weird and jarring. Do a graphical transition instead.

  • @briandecker8403
    @briandecker8403 5 років тому

    So big spreadsheet and not AI - got it.

  • @saifufuerte3349
    @saifufuerte3349 3 роки тому

    You explain shi@ too complicated

  • @amird1889
    @amird1889 5 років тому

    i am not getting a single thing out of this
    sry bad made