[Classic] Playing Atari with Deep Reinforcement Learning (Paper Explained)
Вставка
- Опубліковано 13 чер 2024
- #ai #dqn #deepmind
After the initial success of deep neural networks, especially convolutional neural networks on supervised image processing tasks, this paper was the first to demonstrate their applicability to reinforcement learning. Deep Q Networks learn from pixel input to play seven different Atari games and outperform baselines that require hand-crafted features. This paper kicked off the entire field of deep reinforcement learning and positioned DeepMind as one of the leading AI companies in the world.
OUTLINE:
0:00 - Intro & Overview
2:50 - Arcade Learning Environment
4:25 - Deep Reinforcement Learning
9:20 - Deep Q-Learning
26:30 - Experience Replay
32:25 - Network Architecture
33:50 - Experiments
37:45 - Conclusion
Paper: arxiv.org/abs/1312.5602
Abstract:
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
Authors: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
Links:
UA-cam: / yannickilcher
Twitter: / ykilcher
Discord: / discord
BitChute: www.bitchute.com/channel/yann...
Minds: www.minds.com/ykilcher
Parler: parler.com/profile/YannicKilcher
LinkedIn: / yannic-kilcher-488534136
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar (preferred to Patreon): www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n - Наука та технологія
Totally love your historical papers reviews
Thanks for historical papers series Yannic. Great explanation of contents with plenty citations of related happenings. Helps understand the evolution of DL. Hope to see more coming soon!
i literally whatched 1000 of videos and i couldn't fully understand the DRL untill i watched this video .. very impressive detailed explanation .. thank you for it
Same here!
Absolutely love your videos! Thank you for making these. I've learned a lot!
I am loving it. Thank you so much. YOU DESERVE MILLION SUBSCRIBER. HOPE YOU GET THERE SOON.
Thanks for the great explanation! Regarding sticky actions (29:05), I think those were proposed after in the paper "Revisiting the Arcade Learning Environment..." by Machado et. al. in 2018 to add stochasticity to the Atari problem
What a great video! Please keep doing this kind of content 😀
Thanks very useful for us learning deep learning!!!!!! I love the classic papers series
Recently i am learning RL painfully, i understand what's happening in DQN until i watched your videos, thanks a lot.
Damn! this was exactly what I wanted to learn!! Thank you so much...
This was really awesome! Thanks
I came to understand paper but I realised a lot of things what I used to feel very difficult in RL. Awesome explanation sir. Thank you.
Great video! I just coded a dqn type neural net to play Othello. It has only fully connected layers with a 64 dim input vector and 64 dim output vector. I hope to do some experiments with it in the future.
Thanks for great explanation.
It's November 2023 and you hear the magic name everybody is talking about: 20:52
AlphaGo did to RL what Alex-net did to DL.
David Silver got me interested in this field. Tho I am a beginner but I too want to contribute in this field.
Thanks for covering this.
I wouldn't entirely agree with this, as in my opinion, AlphaGO presented very few novel ideas, but was able to package 4 clever networks together to make something very practical - something reinforcement learning hadn't had before.
AlphaZero, on the other hand, did have a couple of major novel ideas, but even then debatably, were not the inventors of those ideas.
In my opinion most of the Alpha projects, while being more practically impressive than most research projects, did not invent the network architectures, but rather improved and were able to unload a massive amount of computing on it.
@@TheThirdLieberkind having the AI play against itself and learn from that was pretty novel and definitely at the core of the success of AlphaGO.
@@Rhannmah Wasn't RL founded with self-play in checkers?
@@danielguffey Was it? I thought it was trained in human play.
@@Rhannmah "The Samuel Checkers-playing Program was among the world's first successful self-learning programs"
thanks for the explanation, can i expect a video on RAINBOW DQN
Yeahh....nice review..thankx
@Yannic - Great video as always and really helped me get a grip on the basics of RL.
Just wondering tho, did you mean to have adverts throughout the video? Up to now I have only seen them at the beginning, maybe the end too I cannot remember. But this video had 1 at start and then 3 during. I appreciate you need to generate some income from these videos (and you deserve it), but having the adverts during the video is very offputting. Would you consider having several at the start instead (if possible)?
Thanks for the feedback. I turned them on in the middle during this video just to see the effect, but I agree they're annoying.
Thanks!
thx great video
Nice joystick you’ve got there, Yannic 😂. But seriously, I enjoy your work - thank you for the contributions 😊
Hi Yannic! Love your video so much! But there was one thing I am not clear about, Is y_i equal to the Q function approximated by at (i-1)th time, the weights of a neural network? Best
It's the target value, so yes, the Q value to approximate
What does he mean by latex savagery around 2:30?
which program do you use in your ipad to make those annotations outside the margins of the papers?
niceeeee
what happened in Pong? C'mon, David!
Does anyone know what he is talking about at 2:10 ? LaTex savagery???
did you understnad??
What would you replace LaTeX with? Surely not Word?😂
Markdown with MathJax. Or just use Jupyter Notebooks with inline code.
@@herp_derpingson Exactly. Paperswithcode and distill.pub already moving in this direction. No reason that papers can't be interactive.
Surely there is alternatives but the thing is that everyone knows latex so it is easy to collab and it is fast. Getting math formulas quickly and looking good is easy. Latex has some quirks but it is not hard to workaround and fix said things. I would say that there are alternatives but nothing come close.
ai lob yiu
Savagery is ok if it doesn't decrease the quality of the research, formating is so boring...
y13 really ooold paper
I cant share this gold mine content with anyone. I dont know anybody who would be interested in all this.
But you can always find someone in this community later on, just stay interested :D
thanks for the explanation, can i expect a video on RAINBOW DQN