[Classic] Playing Atari with Deep Reinforcement Learning (Paper Explained)

Поділитися
Вставка
  • Опубліковано 13 чер 2024
  • #ai #dqn #deepmind
    After the initial success of deep neural networks, especially convolutional neural networks on supervised image processing tasks, this paper was the first to demonstrate their applicability to reinforcement learning. Deep Q Networks learn from pixel input to play seven different Atari games and outperform baselines that require hand-crafted features. This paper kicked off the entire field of deep reinforcement learning and positioned DeepMind as one of the leading AI companies in the world.
    OUTLINE:
    0:00 - Intro & Overview
    2:50 - Arcade Learning Environment
    4:25 - Deep Reinforcement Learning
    9:20 - Deep Q-Learning
    26:30 - Experience Replay
    32:25 - Network Architecture
    33:50 - Experiments
    37:45 - Conclusion
    Paper: arxiv.org/abs/1312.5602
    Abstract:
    We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
    Authors: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
    Links:
    UA-cam: / yannickilcher
    Twitter: / ykilcher
    Discord: / discord
    BitChute: www.bitchute.com/channel/yann...
    Minds: www.minds.com/ykilcher
    Parler: parler.com/profile/YannicKilcher
    LinkedIn: / yannic-kilcher-488534136
    If you want to support me, the best thing to do is to share out the content :)
    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar (preferred to Patreon): www.subscribestar.com/yannick...
    Patreon: / yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
  • Наука та технологія

КОМЕНТАРІ • 54

  • @aa-xn5hc
    @aa-xn5hc 3 роки тому +33

    Totally love your historical papers reviews

  • @kumarsubham2078
    @kumarsubham2078 3 роки тому +3

    Thanks for historical papers series Yannic. Great explanation of contents with plenty citations of related happenings. Helps understand the evolution of DL. Hope to see more coming soon!

  • @mahermokhtar
    @mahermokhtar Рік тому +4

    i literally whatched 1000 of videos and i couldn't fully understand the DRL untill i watched this video .. very impressive detailed explanation .. thank you for it

  • @alexwhb122
    @alexwhb122 3 роки тому

    Absolutely love your videos! Thank you for making these. I've learned a lot!

  • @bikrammajhi3020
    @bikrammajhi3020 Рік тому

    I am loving it. Thank you so much. YOU DESERVE MILLION SUBSCRIBER. HOPE YOU GET THERE SOON.

  • @DefiantElf
    @DefiantElf 3 роки тому +4

    Thanks for the great explanation! Regarding sticky actions (29:05), I think those were proposed after in the paper "Revisiting the Arcade Learning Environment..." by Machado et. al. in 2018 to add stochasticity to the Atari problem

  • @sebastianrada4107
    @sebastianrada4107 3 місяці тому

    What a great video! Please keep doing this kind of content 😀

  • @MrjbushM
    @MrjbushM 3 роки тому

    Thanks very useful for us learning deep learning!!!!!! I love the classic papers series

  • @chris--tech
    @chris--tech 3 роки тому +3

    Recently i am learning RL painfully, i understand what's happening in DQN until i watched your videos, thanks a lot.

  • @genesisevolution4243
    @genesisevolution4243 Рік тому

    Damn! this was exactly what I wanted to learn!! Thank you so much...

  • @snehalraj6898
    @snehalraj6898 3 роки тому

    This was really awesome! Thanks

  • @coderboy4683
    @coderboy4683 3 роки тому

    I came to understand paper but I realised a lot of things what I used to feel very difficult in RL. Awesome explanation sir. Thank you.

  • @dark808bb8
    @dark808bb8 3 роки тому +1

    Great video! I just coded a dqn type neural net to play Othello. It has only fully connected layers with a 64 dim input vector and 64 dim output vector. I hope to do some experiments with it in the future.

  • @PyTechVision
    @PyTechVision 2 роки тому

    Thanks for great explanation.

  • @zerorusher
    @zerorusher 6 місяців тому +1

    It's November 2023 and you hear the magic name everybody is talking about: 20:52

  • @heyrmi
    @heyrmi 3 роки тому +11

    AlphaGo did to RL what Alex-net did to DL.
    David Silver got me interested in this field. Tho I am a beginner but I too want to contribute in this field.
    Thanks for covering this.

    • @TheThirdLieberkind
      @TheThirdLieberkind 3 роки тому

      I wouldn't entirely agree with this, as in my opinion, AlphaGO presented very few novel ideas, but was able to package 4 clever networks together to make something very practical - something reinforcement learning hadn't had before.
      AlphaZero, on the other hand, did have a couple of major novel ideas, but even then debatably, were not the inventors of those ideas.
      In my opinion most of the Alpha projects, while being more practically impressive than most research projects, did not invent the network architectures, but rather improved and were able to unload a massive amount of computing on it.

    • @Rhannmah
      @Rhannmah 3 роки тому

      @@TheThirdLieberkind having the AI play against itself and learn from that was pretty novel and definitely at the core of the success of AlphaGO.

    • @danielguffey
      @danielguffey 3 роки тому +1

      @@Rhannmah Wasn't RL founded with self-play in checkers?

    • @Rhannmah
      @Rhannmah 3 роки тому

      @@danielguffey Was it? I thought it was trained in human play.

    • @danielguffey
      @danielguffey 3 роки тому

      @@Rhannmah "The Samuel Checkers-playing Program was among the world's first successful self-learning programs"

  • @CHINNOJISANTOSHKUMARNITAP
    @CHINNOJISANTOSHKUMARNITAP 6 місяців тому

    thanks for the explanation, can i expect a video on RAINBOW DQN

  • @RinkuYadav-pn4jo
    @RinkuYadav-pn4jo 9 місяців тому

    Yeahh....nice review..thankx

  • @MMc9081
    @MMc9081 3 роки тому +1

    @Yannic - Great video as always and really helped me get a grip on the basics of RL.
    Just wondering tho, did you mean to have adverts throughout the video? Up to now I have only seen them at the beginning, maybe the end too I cannot remember. But this video had 1 at start and then 3 during. I appreciate you need to generate some income from these videos (and you deserve it), but having the adverts during the video is very offputting. Would you consider having several at the start instead (if possible)?

    • @YannicKilcher
      @YannicKilcher  3 роки тому +1

      Thanks for the feedback. I turned them on in the middle during this video just to see the effect, but I agree they're annoying.

  • @utku_yucel
    @utku_yucel 3 роки тому +1

    Thanks!

  • @michelprins
    @michelprins 5 місяців тому

    thx great video

  • @marekdziubinski850
    @marekdziubinski850 Рік тому

    Nice joystick you’ve got there, Yannic 😂. But seriously, I enjoy your work - thank you for the contributions 😊

  • @jesschil266
    @jesschil266 3 роки тому

    Hi Yannic! Love your video so much! But there was one thing I am not clear about, Is y_i equal to the Q function approximated by at (i-1)th time, the weights of a neural network? Best

    • @YannicKilcher
      @YannicKilcher  3 роки тому

      It's the target value, so yes, the Q value to approximate

  • @foodmart5122
    @foodmart5122 4 місяці тому +1

    What does he mean by latex savagery around 2:30?

  • @davidromero1373
    @davidromero1373 Рік тому

    which program do you use in your ipad to make those annotations outside the margins of the papers?

  • @billykotsos4642
    @billykotsos4642 3 роки тому

    niceeeee

  • @mikhailkhlyzov6205
    @mikhailkhlyzov6205 3 роки тому +3

    what happened in Pong? C'mon, David!

  • @ThinkTank255
    @ThinkTank255 Рік тому

    Does anyone know what he is talking about at 2:10 ? LaTex savagery???

    • @TruMystery
      @TruMystery 8 місяців тому

      did you understnad??

  • @HappyDancerInPink
    @HappyDancerInPink 3 роки тому +8

    What would you replace LaTeX with? Surely not Word?😂

    • @herp_derpingson
      @herp_derpingson 3 роки тому +7

      Markdown with MathJax. Or just use Jupyter Notebooks with inline code.

    • @snippletrap
      @snippletrap 3 роки тому +4

      @@herp_derpingson Exactly. Paperswithcode and distill.pub already moving in this direction. No reason that papers can't be interactive.

    • @SuperEmanuel98
      @SuperEmanuel98 3 роки тому

      Surely there is alternatives but the thing is that everyone knows latex so it is easy to collab and it is fast. Getting math formulas quickly and looking good is easy. Latex has some quirks but it is not hard to workaround and fix said things. I would say that there are alternatives but nothing come close.

  • @iliasp4275
    @iliasp4275 3 роки тому

    ai lob yiu

  • @JoaoVitor-mf8iq
    @JoaoVitor-mf8iq 3 роки тому +2

    Savagery is ok if it doesn't decrease the quality of the research, formating is so boring...

  • @sui-chan.wa.kyou.mo.chiisai
    @sui-chan.wa.kyou.mo.chiisai 3 роки тому

    y13 really ooold paper

  • @lawchakra7813
    @lawchakra7813 3 роки тому +1

    I cant share this gold mine content with anyone. I dont know anybody who would be interested in all this.

    • @42nb
      @42nb 3 роки тому +2

      But you can always find someone in this community later on, just stay interested :D

  • @CHINNOJISANTOSHKUMARNITAP
    @CHINNOJISANTOSHKUMARNITAP 6 місяців тому

    thanks for the explanation, can i expect a video on RAINBOW DQN