Hi Siraj, could you include pseudocode of algorithms you talk about? I think it is crucial to be able to implement algorithms you learn about (ie "What I cannot code myself, I do not understand"). Explaining pseudocode is a great way to communicate algorithms in a clear, complete, and non-ambiguous way.
Hey did u know? The brain can't feel pain, it senses the pain of the body and transmits it but it can't feel the pain... ..It can get hurt tho and get brain damage
Great Improvement Brother. I am sorry, but previous videos were not good. Nice Tutorial and Intuition. although I Do recommend watching Deep mind's Reinforcement Learning Tutorial before jumping into Practical Application
Siraj I'm a huge fan of your UA-cam channel and I truly admire the way you taught yourself ML. I'm in my final year of undergrad, and I was thinking of not pursuing a master's degree rn. Any advice on what resources to use to teach myself ML or how to get some industry level exposure ? Thanks in advance 😉
Hi Siraj, could you have a video mention the OpenAI bot that beat a pro gamer at Dota 2 a few days ago? It's great that you released this video so close to this current event
i Like this interesting video rather than the previous purely theoretical video, more humor is better For me, now the most important question: 1. for Machine learning beginners to train machine model which one is better buy gpu graphics card, or buy amazon cloud gpu hours 2.the tip of deep learning environment configuration, 3. the tip of Programming development skills,
Thanks a lot Siraj! This video provided a great insight on applications of Q learning and RL. Are there any programming assignments (that includes a dataset) for this?
Hi Siraj, I want to know that I am going to do a path planning project to navigate a robot with Q learning. How much minimum hardware will be required to do this? Do we need a GPU? Will a core i5 PC only with CPU will be enough?
I believe siraj already has a video on Genetic evolution decision making if im not mistaken. Does't Seth explain it pretty indepth tho, he talks about everything from math to how he programmed it with pearl I think.
Siraj Raval Hey Siral, in a previous stream you mentioned that learning this kind of thing (neural networks/Machine Learning) is best to do on the internet. I was wondering, for a near complete beginner, (minor experience with Processing.JS) where would you suggest that I start off? (I'm 15 and want to get into this field as soon as possible)
icancto Did you read the NEAT paper? If not, I'd recommend it, because it's actually really smart and comprehensible. NEAT is not just picking the best randomly generated genomes but uses a crossover mechanism which makes sure that only connections are crossed over that have a similar "purpose" inside of the neural net. It can intelligently crossover neural networks of different topologies which are created through mutation, starting with minimal neural networks. That way it improves the weights AND selects the ideal topology of the neural nets. Comparing NEAT to back-propagation doesn't make any sense because it's purpose is to be used, when you can't use back-propagation. MarI/O is a good example for this. What target-data would you use for back-propagation there? ;-)
Hi Siraj, I love your teaching style and I am a member in UDACITY's deep learning foundation program in which you are an instructor, Here my doubt is that can we use DEEP Q-LEARNING in any other situations where image or pixel input would not be there, If yes can you tell how. I have read that for building Q-table we can use neural networks instead of table(state * action). can you explain it or if possible do a video about this.
I understand that a convolutional neural network can be used to simplify the state from an array of pixels to a smaller collection of values, but how does the algorithm use a deep network to approximate the Q-function? 8:19 Thank you!
I still don't understand how we can store these Qs. Wouldn't they contain a quadrillions of states and actions for a pretty simple game? Seems pretty inefficient, so I would love to know where I'm wrong in my understanding of Q learning. Is there some generalization in place or what?
You can store all possible actions for all possible state in a matrix for a simple game like tic tac toe. However as you say it is impossible for more complexe game, it's why we use a neural network wich replace this matrice by taking the pixels of the screen as input (the state) wich then output an action. After training it is suppose to give the optimal action for any state we give as input.
The way this works is by approximating an optimal Q function. A Q function is a function of state and action, so Q(s,a). Q*(s,a) is the optimal Q function. This is great for games with few states, but because of combinatorics, it does not scale to games with hundreds of thousands of states, such as video games. To accommodate this, we approximate Q* by using a parameterized Q function, Q(s,a,Theta), where Theta is a set of parameters that we need to optimize to bring us to approximating Q*. A type of function that's excellent at iteratively approximating functions through parameters is a neural network. So that's where Deep Q learning comes in, optimizing a neural network to approximate Q*.
Can Q-Learning be used for solving classification problem? If it does then how? could you explain or make a video regarding this topic? If you do it will very helpful.
Thanks for the great video. I‘m still confused how this algorithm can generalize in any game. Is the generalization of algorithm different from the generalization of a specific AI program? Since the input and label (or control/buttons whatever) are fixed in a game, I don't think you can make an AGI just with this algorithm.
I am starting a deep learning course at university this semester. And maybe I can do a homework project. There is a mobile game from my childhood: Mirror's Edge mobile which launched on iOS and Windows Phone in like 2011 but is no longer available. If I somehow find a way to emulate the game on a computer and get either frames or game state values and manage to give it one of four different inputs per frame, I might try and teach a network to play the game. I also want to have it beat levels really fast and explore speedrunning this way.
hey siraj , can you help me explain this.. in sethbling video , the bot learned to play a mario level. But he didn't use the learning on new data or level. isn't this a overfitting, i mean bot just learned that level from trial n error.
7:46 Well I don't think the pooling layer is used to get insensitive about the locations of the objects in an image. The convolutional layer can already do that since the convolutional operation is actually a pixel window going from location to location until all locations are considered under the set stride. The pooling layer is used to semantically merge similar features into one, like in the max pooling example used in this video, you can see the image is partitioned into 4 parts and in each part, the max number is preserved. The max number can semantically represent a feature in that region. It's more like image compression but we have preserved the key features of this object in this image. Feeding this pooled image into the neural net could be more efficient.
So I am working on an AI for a hidden information game (for the sake of simplicity, you can think of poker). Optimal play would actually be a nash equilibrium problem, where each action is being taken some percentage of the time. Would the proper way to make an AI for this be to use a random number generator, and scale the frequency of each action to its Q value?
Sir I don't know who you are, but you totally blew me away with your comment. It is very rare to come across an individual who did us(the viewers on the Internet) a huge help in debunking certain methodologies in machine learning. I would love to see more of your writings. Folks at Vicarious are of a different breed I believe, maybe it is because of their influence by the Redwood Neuroscience Institute. It would certainly be a privilege if you consider my request. Truly humbled. Thanks Sir. I hop
no tensorflow and most of the other libraries handle almost all of the higher level math, all you will need buddy is to learn basic object orientation and then move into ML techniques. Don't fret, most of the complex math has been solved, all you will need to do is creativity implement that. Trust me it gets very easy once you learn the flow. If you are interested in advanced topics where you want to build your own ML algorithm then learn Linear Algebra with an emphasis on higher dimensional linear algebra will help greatly.
Like others have mentioned, have a math and some machine learning background helps understand these faster pace videos. Another thing you can do is look in the description, read some of the blogs on the topics under "learning resources", and then come back and watch video again, it should make more sense.
Great place to start is coursera's class on machine learning. It's free and a solid intro to the core concepts From there, there are plenty of step by step tutorials on UA-cam. SentDex has a great channel with lots of content - check him out
Tip if you're trying to start don't start with Siraj. Start with someone slow (possibly the udemy machine learning micro-degree) as Siraj is very fast and awesome to expand your understanding but hard to start learning with.
Question: Why do pooling layers make the Network spatially invariant? Don't they just compress information? I thought convolutional layers do that, which the model does have
Max pooling compresses information, but it's lossy. On the first pooling operation you lose a pixel or two of position information. On a final pooling operation you might effectively be taking the max across the entire image.
So with a Markov discrete process, there will always be some reward function R because getting the reward depends only on the states and actions we take. Thus, our AI can learn Q simply by going?
Just a piece of advice, I hope you see this : never speak while showing text ! (I remember Vsauce saying this in a video too) But really, either show text and read it, or show images / yourself while talking; but displaying a text while saying something different is really hard to follow. If you want to talk about a part of the text, try to darken everything but the line you're talking about; overwise we won't know where to stop and whether to listen to you or read. (at least that's what most "educational" UA-camrs I follow do, and it works quite well) Especially when you're talking about such complicated subjects (and with such pace), I think that's important ! Hope it'll be useful somehow; Thanks for the vid' !
Hi Siraj, is there any way we can train a machine learning mode with a raw text file and properly arranged data from the text file in .csv file? So that when we input a new text file it automatically converts that text file into the .csv file format with columns and rows which we used as training data. Is this even possible?
Siraj all your videos are awsome. Could you make a video about temoral-difference learning which is announced by Professor Sutton. I also ask you to make another one about General Game Players and Monte Carlo Tree Search. Thanks
Wouldn't it work better if you trained a variational autoencoder on the screen data to capture the important patterns, then trained the deepQ model on the encoded screen. That way the VarAuto can learn a lot of info about how the world works even when rewards are scarce. I would use a bottleneck thats about 1/4 the dimensions of the image with say 3 layers. Leave the shrinking down from convolutional layers to dense layers for the deepQ.
Hey siraj , can you please share the link to code by winner and runner up on lda challenge , i know i am pretty late but i would really appreciate if you could help
Could you guys give me any hint on how i can approach pong game to build a model where i can apply q learning? (I have all the informations necessary, like ball x and y position, player x and y position, ball speed, etc). I'm struggling at this :_:
Edgar Vega As soon as its intelligence starts increasing exponentially, we won't be able to keep up with it and understand it. Everything we don't understand is dangerous at some point (I'm referring to AGI and ASI)
Hi, can someon please explain to me how the model is predicting in this sequence of code when it hasn't been trained yet? I'd really appreciate it. Thanks!! if np.random.rand()
Hey there, so the epislon tells us when it is ready to exploit Q-values instead of explore the map. The main idea is: 1) We specify an exploration rate “epsilon,” which we set to 1 in the beginning. This is the rate of steps that we’ll do randomly. In the beginning, this rate must be at its highest value, because we don’t know anything about the values in Q-table. This means we need to do a lot of exploration, by randomly choosing our actions. 2) We generate a random number. If this number > epsilon, then we will do “exploitation” (this means we use what we already know to select the best action at each step). Else, we’ll do exploration. The idea is that we must have a big epsilon at the beginning of the training of the Q-function. Then, reduce it progressively as the agent becomes more confident at estimating Q-values. Here is a nice graph of this idea: cdn-media-1.freecodecamp.org/images/1*9StLEbor62FUDSoRwxyJrg.png Hope that helped! :D
@@lefos99 Hi, thanks for helping me out! I understand that (at least I think), but I don't understand how the model can predict if it hasn't been trained. At what point is the model learning from the D values and is able to "exploit." I'm from more of a c background but I don't get how it's learning until the next block of code where it does "Experience Replay."
@@mattgoralka3941 Oh okay now I see your question. Well it depends on the Reinforcement Learning Technique you use. For example if you use simple Q-Learning, you just create a matrix (row is for state and column is fro action). There are plenty of concepts that you use, that I cannot explain in just one UA-cam comment. A reallllly good and simple tutorial is this: simoninithomas.github.io/Deep_reinforcement_learning_Course/#syllabus In this tutorial you will find not only mathematical explanation but also explanation with examples in simple games. Check this out! ;)
Hey Siraj, I don't think any of us can beat the Dota 2 bot that OpenAI just unveiled. Those guys really deserve a shout-out.
This is a high quality video and I'm sure a lot of people can tell you put a lot of effort into these.
Finally a UA-camr I always wanted to watch. Speed cool and great information
Hi Siraj, could you include pseudocode of algorithms you talk about? I think it is crucial to be able to implement algorithms you learn about (ie "What I cannot code myself, I do not understand"). Explaining pseudocode is a great way to communicate algorithms in a clear, complete, and non-ambiguous way.
Just saw this now, your jokes we're KILLING IT back then
Thanks Siraj. Can't wait for the Super Mario Bros Bot. I enjoyed your videos in the deep learning ND. Cheers your effort is appreciated.
OMG, i googled "Q Learning with Neural Network" a few months back without realising it was this important.
haha awesome
I can't feel my brain anymore
oh no
Tell me about it :/
Hey did u know? The brain can't feel pain, it senses the pain of the body and transmits it but it can't feel the pain...
..It can get hurt tho and get brain damage
@@randomorange6807 but what if you punch a brain? does it feel pain?
One of your best vids Siraj!!
thanks Brandon!
Great Improvement Brother. I am sorry, but previous videos were not good. Nice Tutorial and Intuition. although I Do recommend watching Deep mind's Reinforcement Learning Tutorial before jumping into Practical Application
I don't understand 10% of what you say but your videos are just epic! Please keep posting them often :)
you explained it so well. thank you
You should do a video comparing this with NEAT, which is popular for this same use case.
Siraj I'm a huge fan of your UA-cam channel and I truly admire the way you taught yourself ML. I'm in my final year of undergrad, and I was thinking of not pursuing a master's degree rn. Any advice on what resources to use to teach myself ML or how to get some industry level exposure ?
Thanks in advance 😉
thanks rhea! see the ML subreddit
Very nice information and rythm, subscribed !
Awesome and optimized explanations in all the videos! Thanks a lot!!
Your videos are amazing, thanks.
thank you Siraj for your awesome content, you really made learning fun and easier!
Hi Siraj, could you have a video mention the OpenAI bot that beat a pro gamer at Dota 2 a few days ago? It's great that you released this video so close to this current event
That was an awesome explanation. Thanks.
Hey Siraj great job on the videos. :) what do you think of the dota 2 ai that beat a pro player?
Smart guy, talented teacher.
Great, I do really love your way of explaining!! thanks
i Like this interesting video rather than the previous purely theoretical video,
more humor is better
For me, now the most important question:
1. for Machine learning beginners to train machine model
which one is better buy gpu graphics card, or buy amazon cloud gpu hours
2.the tip of deep learning environment configuration,
3. the tip of Programming development skills,
I love you man, I always wanted to do this myself
Thanks a lot Siraj! This video provided a great insight on applications of Q learning and RL. Are there any programming assignments (that includes a dataset) for this?
Very nice! Do you have a video with more detail on Q learning? Would be interesting to see how the Q matrix evolves over play of a simple game.
Hi Siraj, I want to know that I am going to do a path planning project to navigate a robot with Q learning. How much minimum hardware will be required to do this? Do we need a GPU? Will a core i5 PC only with CPU will be enough?
Hi Siraj. Have you thought about using Capsules (CapsNet) instead of not having a MaxPooling layer?
Hey Siraj could you expand on this topic and explain how Sethbling's MarI/O program works?
I believe siraj already has a video on Genetic evolution decision making if im not mistaken. Does't Seth explain it pretty indepth tho, he talks about everything from math to how he programmed it with pearl I think.
genetic algo vid coming this week (similar to what he used)
Siraj Raval Hey Siral, in a previous stream you mentioned that learning this kind of thing (neural networks/Machine Learning) is best to do on the internet. I was wondering, for a near complete beginner, (minor experience with Processing.JS) where would you suggest that I start off? (I'm 15 and want to get into this field as soon as possible)
icancto
Did you read the NEAT paper? If not, I'd recommend it, because it's actually really smart and comprehensible. NEAT is not just picking the best randomly generated genomes but uses a crossover mechanism which makes sure that only connections are crossed over that have a similar "purpose" inside of the neural net. It can intelligently crossover neural networks of different topologies which are created through mutation, starting with minimal neural networks. That way it improves the weights AND selects the ideal topology of the neural nets.
Comparing NEAT to back-propagation doesn't make any sense because it's purpose is to be used, when you can't use back-propagation. MarI/O is a good example for this. What target-data would you use for back-propagation there? ;-)
he used lua
RIP Chester.
Hi Siraj, I love your teaching style and I am a member in UDACITY's deep learning foundation program in which you are an instructor, Here my doubt is that can we use DEEP Q-LEARNING in any other situations where image or pixel input would not be there, If yes can you tell how. I have read that for building Q-table we can use neural networks instead of table(state * action). can you explain it or if possible do a video about this.
Cool video. Thanks.
But who to adjust this for certain purpose (like collecting all coins / getting the less score / speedrunning)?
Great Video Siraj Thanks
But I don't get something. How do you input 4 Gamescreens?
Do you combine them as one input?
That part where he says hello world it's siraj... I'm replaying it again and again coz it's soo funny xD
I understand that a convolutional neural network can be used to simplify the state from an array of pixels to a smaller collection of values, but how does the algorithm use a deep network to approximate the Q-function? 8:19
Thank you!
I still don't understand how we can store these Qs. Wouldn't they contain a quadrillions of states and actions for a pretty simple game? Seems pretty inefficient, so I would love to know where I'm wrong in my understanding of Q learning. Is there some generalization in place or what?
You can store all possible actions for all possible state in a matrix for a simple game like tic tac toe. However as you say it is impossible for more complexe game, it's why we use a neural network wich replace this matrice by taking the pixels of the screen as input (the state) wich then output an action. After training it is suppose to give the optimal action for any state we give as input.
great answer pierre
Thanks for the reply, this clarified it for me. Much thanks ^^
The way this works is by approximating an optimal Q function. A Q function is a function of state and action, so Q(s,a). Q*(s,a) is the optimal Q function. This is great for games with few states, but because of combinatorics, it does not scale to games with hundreds of thousands of states, such as video games. To accommodate this, we approximate Q* by using a parameterized Q function, Q(s,a,Theta), where Theta is a set of parameters that we need to optimize to bring us to approximating Q*. A type of function that's excellent at iteratively approximating functions through parameters is a neural network. So that's where Deep Q learning comes in, optimizing a neural network to approximate Q*.
Can Q-Learning be used for solving classification problem? If it does then how? could you explain or make a video regarding this topic? If you do it will very helpful.
Thanks for the great video. I‘m still confused how this algorithm can generalize in any game. Is the generalization of algorithm different from the generalization of a specific AI program? Since the input and label (or control/buttons whatever) are fixed in a game, I don't think you can make an AGI just with this algorithm.
I am starting a deep learning course at university this semester. And maybe I can do a homework project. There is a mobile game from my childhood: Mirror's Edge mobile which launched on iOS and Windows Phone in like 2011 but is no longer available. If I somehow find a way to emulate the game on a computer and get either frames or game state values and manage to give it one of four different inputs per frame, I might try and teach a network to play the game. I also want to have it beat levels really fast and explore speedrunning this way.
Nice video!
Are u coming to Pune or Mumbai ??
Mumbai
Siraj Raval when and where ??
Can ur fans meet you?
hey siraj , can you help me explain this.. in sethbling video , the bot learned to play a mario level. But he didn't use the learning on new data or level. isn't this a overfitting, i mean bot just learned that level from trial n error.
7:46 Well I don't think the pooling layer is used to get insensitive about the locations of the objects in an image. The convolutional layer can already do that since the convolutional operation is actually a pixel window going from location to location until all locations are considered under the set stride. The pooling layer is used to semantically merge similar features into one, like in the max pooling example used in this video, you can see the image is partitioned into 4 parts and in each part, the max number is preserved. The max number can semantically represent a feature in that region. It's more like image compression but we have preserved the key features of this object in this image. Feeding this pooled image into the neural net could be more efficient.
The videos of David Silver from Deepmind are worth watching, that might be the bast reinforcement learning courses on web.
So I am working on an AI for a hidden information game (for the sake of simplicity, you can think of poker). Optimal play would actually be a nash equilibrium problem, where each action is being taken some percentage of the time. Would the proper way to make an AI for this be to use a random number generator, and scale the frequency of each action to its Q value?
And now open AI beats human in Dota2 1v1 matchups
stuff is moving fast
Sir I don't know who you are, but you totally blew me away with your comment. It is very rare to come across an individual who did us(the viewers on the Internet) a huge help in debunking certain methodologies in machine learning. I would love to see more of your writings. Folks at Vicarious are of a different breed I believe, maybe it is because of their influence by the Redwood Neuroscience Institute. It would certainly be a privilege if you consider my request. Truly humbled. Thanks Sir. I hop
Can't find the links to the winner and runner up. Great series of videos!
Can you give some insights for Deep Q Learning in Mobile Networking?
When are you forming a mid 90s boy band with machine learning themed ballads?
At 8:08 whats the input_shape supposed to be ?? the challenge code and what you show are different ......
thank you for deep q video game video
first, but seriously nice vid Siraj you are amazing at what you do !
What is the difference between static and dynamic dataset? Can you elaborate more?
i cannot understand your videos. how should i start learning?
Start by learning basic calculus, statistics, and linear algebra. Once you understand the basics, learning advanced concepts is not that hard.
no tensorflow and most of the other libraries handle almost all of the higher level math, all you will need buddy is to learn basic object orientation and then move into ML techniques. Don't fret, most of the complex math has been solved, all you will need to do is creativity implement that. Trust me it gets very easy once you learn the flow. If you are interested in advanced topics where you want to build your own ML algorithm then learn Linear Algebra with an emphasis on higher dimensional linear algebra will help greatly.
Like others have mentioned, have a math and some machine learning background helps understand these faster pace videos. Another thing you can do is look in the description, read some of the blogs on the topics under "learning resources", and then come back and watch video again, it should make more sense.
Great place to start is coursera's class on machine learning. It's free and a solid intro to the core concepts
From there, there are plenty of step by step tutorials on UA-cam. SentDex has a great channel with lots of content - check him out
Tip if you're trying to start don't start with Siraj. Start with someone slow (possibly the udemy machine learning micro-degree) as Siraj is very fast and awesome to expand your understanding but hard to start learning with.
hey siraj, can you make a full tutorial on reinforcement learning? thanks siraj
Question: Why do pooling layers make the Network spatially invariant? Don't they just compress information? I thought convolutional layers do that, which the model does have
Max pooling compresses information, but it's lossy. On the first pooling operation you lose a pixel or two of position information. On a final pooling operation you might effectively be taking the max across the entire image.
Hey Siraj, fantastic work. I am a unity developer so how can i integrate this functionality in games i already coded. Best wishes for future videos.
Thanks for the video =)
So with a Markov discrete process, there will always be some reward function R because getting the reward depends only on the states and actions we take. Thus, our AI can learn Q simply by going?
The only thing I am impressed is with his creativity
Just a piece of advice, I hope you see this : never speak while showing text ! (I remember Vsauce saying this in a video too)
But really, either show text and read it, or show images / yourself while talking; but displaying a text while saying something different is really hard to follow.
If you want to talk about a part of the text, try to darken everything but the line you're talking about; overwise we won't know where to stop and whether to listen to you or read. (at least that's what most "educational" UA-camrs I follow do, and it works quite well)
Especially when you're talking about such complicated subjects (and with such pace), I think that's important !
Hope it'll be useful somehow;
Thanks for the vid' !
great point thanks
Excellent point
If advices were good it would not be for free..
i just pause.
Hey Siraj, Deep Mind also works on a StarCraft 2 Learning Enviroment. I would love to see a video about it :)
And, what happen to rewards functions?, are the same for all these games?
Probably score? Did you get an answer?
Hi Siraj, is there any way we can train a machine learning mode with a raw text file and properly arranged data from the text file in .csv file? So that when we input a new text file it automatically converts that text file into the .csv file format with columns and rows which we used as training data. Is this even possible?
Hey Siraj! Great stuff! it could be really cool if you would combine Recurrent Neural Network and Deep Q-network = DRQN in a video! Thanks!
The memes were distracting, was to busy laughing that I didn't learn anything.
Bill Nye of Computer Science
Kanye of Code
Beyonce of Neural Networks
Osain Bolt of Learning
Chuck Norris of Python
Jesus Christ of Machine Learning
thanks Sandzz
I copied it from your channel description...I don't deserve that "thanks"
I copied it from your channel description....I don't deserve that "thanks" .
Can you show the game Mario game actually running? It throws an error in my notebook. I'm using python 3.6 so maybe its a translation issue?
hey siraj
I have a 4 node raspberry pi cluster computer, can I use it to train this Mario game?
This is ultimate !! A game bot !! Thanks a lot Siraj ! When are you heading to India for a meet-up ?
thanks! Sept 1 delhi one-way ticket. i'll figure things out from there
Too fast. I need a longer video :(
поставь скорость 0,5 )
more to come
How would reinforcement learning work on a game with a town hub? One that requires mouse clicks to go into a dungeon, eg, Diablo, MMOs.
at 5:15
you say the more in the fuure the reward is - more are we uncertain of it? i didn't get it-can you explain with an example ?
been waiting so long for this! havent even watched it , but know it's going to be great already
edit: confused, but not dissapointed :D
How do you get the computer to play the game by itself and read the screen?
The link of the paper
web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf
so this version is without a NN, at which point to you need a NN?
I'm a hardware guy mostly, how do I go about ai or algorithms?
A video about Q learning for video games on actual games (without opengym) would be great
will consider thanks
Siraj Raval Awesome, thank you.
Video uploaded Aug 2017 and it's only 9:46 long? Autolike from me :)
Wait, where my brain at?
Siraj all your videos are awsome.
Could you make a video about temoral-difference learning which is announced by Professor Sutton.
I also ask you to make another one about General Game Players and Monte Carlo Tree Search.
Thanks
Yes a video on TD Learning would be wonderful
Yeah! Specially if its differences and similarities with reinforcement learning would be pointed out.
I think TD learning is just an extension to back propagation, Its pretty fascinating
Wouldn't it work better if you trained a variational autoencoder on the screen data to capture the important patterns, then trained the deepQ model on the encoded screen. That way the VarAuto can learn a lot of info about how the world works even when rewards are scarce. I would use a bottleneck thats about 1/4 the dimensions of the image with say 3 layers. Leave the shrinking down from convolutional layers to dense layers for the deepQ.
I don't know anything about this topic yet. But why don't you submit something along this line for the coding challenge for this week?
hmm good thought . an autoencoder could work well.
Hey siraj , can you please share the link to code by winner and runner up on lda challenge , i know i am pretty late but i would really appreciate if you could help
we all miss Chester!!😢
Awesome video :) But reminded me of Xin-Yang's HotDog-NotHitDog App :D
thst was a classifier , SUPERVISED learning
Is it possible to do that with games like Overwatch?
where can find the winner of stock prediction challange?
Could you guys give me any hint on how i can approach pong game to build a model where i can apply q learning? (I have all the informations necessary, like ball x and y position, player x and y position, ball speed, etc). I'm struggling at this :_:
Nice
and now OpenAI is beating the Dota 2 world champions in 5v5
Siraj Rival is the neurotransmitter of generation z
How I define when will give a reward to the bot?
Super Siraj AI. Who do you think is correct regarding the future of AI, Elon or Zuck?
Elon, because he's aware of the danger that AI might cause to human race if we lose control over it.
If you know AI,then you wont think AI as a Danger.
Edgar Vega As soon as its intelligence starts increasing exponentially, we won't be able to keep up with it and understand it. Everything we don't understand is dangerous at some point (I'm referring to AGI and ASI)
elon. we do need some regulation.
Siraj you have not accepted english subs in MoI #6 :(
just did thanks
Wow Nice
Hi, can someon please explain to me how the model is predicting in this sequence of code when it hasn't been trained yet? I'd really appreciate it. Thanks!!
if np.random.rand()
Hey there, so the epislon tells us when it is ready to exploit Q-values instead of explore the map.
The main idea is:
1) We specify an exploration rate “epsilon,” which we set to 1 in the beginning. This is the rate of steps that we’ll do randomly. In the beginning, this rate must be at its highest value, because we don’t know anything about the values in Q-table. This means we need to do a lot of exploration, by randomly choosing our actions.
2) We generate a random number. If this number > epsilon, then we will do “exploitation” (this means we use what we already know to select the best action at each step). Else, we’ll do exploration.
The idea is that we must have a big epsilon at the beginning of the training of the Q-function. Then, reduce it progressively as the agent becomes more confident at estimating Q-values.
Here is a nice graph of this idea: cdn-media-1.freecodecamp.org/images/1*9StLEbor62FUDSoRwxyJrg.png
Hope that helped! :D
@@lefos99 Hi, thanks for helping me out! I understand that (at least I think), but I don't understand how the model can predict if it hasn't been trained. At what point is the model learning from the D values and is able to "exploit." I'm from more of a c background but I don't get how it's learning until the next block of code where it does "Experience Replay."
@@mattgoralka3941 Oh okay now I see your question. Well it depends on the Reinforcement Learning Technique you use.
For example if you use simple Q-Learning, you just create a matrix (row is for state and column is fro action). There are plenty of concepts that you use, that I cannot explain in just one UA-cam comment.
A reallllly good and simple tutorial is this: simoninithomas.github.io/Deep_reinforcement_learning_Course/#syllabus
In this tutorial you will find not only mathematical explanation but also explanation with examples in simple games.
Check this out! ;)
Your videos are becoming ununderstandable.
nearing end of course, this is advanced stuff. going to get easier starting next week.
Yep! Thx.
Do I have to learn calculus to learn deep learning?
Pretty much
Sounds like q learning for investments