Thank you very much for your content! I can't seem to find "from paper to code" course on Udemy in the description neither directly on your profile on Udemy. Is this not out yet?
Deep Q Learning: www.udemy.com/course/deep-q-learning-from-paper-to-code/?couponCode=DQN-AUG-2021 Actor Critic Methods: www.udemy.com/course/actor-critic-methods-from-paper-to-code-with-pytorch/?couponCode=AC-AUG-2021 Natural Language Processing from First Principles: www.udemy.com/course/natural-language-processing-from-first-principles/?couponCode=NLP1-AUG-2021
Thanks A LOT for making this tutorial! Coming from a non-CS background, coding is always a bottleneck for me but this video helped pass that phase with ease.
Hey Phil, first of all thanks for the tutorial. I have two questions regarding some differences between this code and your code on github. 1)Why do you in your DQN implementation on Github convert the action_batch, after sampling, into tensors but in this implementation you don't? Is it because the pong game requires a multi-dimensional input, which you eventually flatten in your network code, while in this program because the input is a vector we don't need to do that? 2)On Github you use the max operation on this step: q_next = self.q_next.forward(states_).max(dim=1)[0] but here you use T.max(q_next, dim=1)[0]. Why is that?
Also I have trouble getting the network to converge consistently, it reaches an average of about 100 to 150 score in the 200 rounds mark and then it drops off.
Hi! Thanks for this tutorial Phil, it's nice to see a compacted DQN code. I have a question: when you convert arrays to tensors, is there any reason you insert the arrays in the form of lists? For example: "state = T.tensor([observation])..." instead of "state = T.tensor(observation)..."? Because Python print a warning saying that doing on "the list way" is extreme slowly.
Thanks for making this tutorial. I just have a tiny question. Why do we q_nex[terminal_batch] = 0.0? Question may be a bit stupid. Sorry, for being a newbie :)
Hey Phil! I'm looking forward to see a video where you show us how to define our own enviroment ! All the tutorials around are using gym, but i'd like to try reinforcement learning on some personal projects !
Thank you, Prof Phil. Very helpful! Can you expand on what Target networks do? I was reading the paper "Human - level control through deep reinforcement learning" where it talks about target network. Not clear what it is and what are the advantages of creating it. Thank you in advance
Target networks help to stabilize training. Using the same network to generate data and evaluate data each time step results in chasing a moving target. The target network changes more slowly so it's a more stable Target
Hi! Why do you use Q_eval.forward(state) instead of Q_eval(state)? I read that it's not good because the hooks aren't deployed, although I have no clue what hooks are. Thanks for the tutorial!
Amazing and perfect timing too. I was looking at your older code for my project and you just gave the better version. My only issue is that my environment returns a matrix(image). How do I modify your code to get it to work?
Sounds as though you need pytorch Convolutional layers at the front end of the Q neural network, if you have image, video based inputs. I suspect you may need to stack a few observations together, if you expect to detect motion from video.
is there a possibility to do a tutorial on multi-agent DQN ? I know there is a tutorial on A3C but in some cases, DQN is more suitable for gird wards environment more thatn a3c
very nice tutorial. why do you make a memory-array for every element ( state, new state, reward etc..), couldnt you just make one overally memory array and store named Tuples in the form (state, action, reward, newstate, done) ?
Hey Phil, how can I solve local minimum problems in PPO? I try to solve Luna Lander with PPO agent (with and without bonus entropy) but my agent stop in local minimum. I really appreciate your videos and I using them to improve my skills!! Tkss!!
Please don't tell people "you don't need any exposure fo deep learning etc". This is why people jump from projects to projects without understanding as they get excited.
In fairness, you don't need exposure to deep learning in order to follow this tutorial. However I can agree it may have been a little misleading as people may have assumed this was a top-down easy-to-digest intro video where it would all be explained in simple terms.
Hi Phil, any reason you are using the forward() method on your neural net instead of calling it directly As Q_eval() I.e. using __call__()? I believe in general calling forward() is unsafe, since there’s potentially some necessary magic involving hooks going on under the surface that you might miss.
Thankyou Dr. Phil for an amazing video. When I try to run this on colab, I get this error : "expected scalar type Float but found Double" at either 18th line or 23rd line of main**.py. I am trying it on cartpole environment and I have also tried to change the observation(line 16) to float 32 but it didn't work.
Great video! Helped me a lot with my bachelor thesis. I'm working on a private project now where the agent needs to predict a x_action between -1.0 and 1.0 and a y_action between -1.0 and 1.0. How can I manage the action indices in the learn()-method if I have multiple floats which describe one action? Or do I need a completely differen model for that? Thanks in advance :)
Hey Phil!, You are a class apart from others in explaining all these topics. I have a request for you. Since, reinforcement learning takes a lot of time when implemented on real world problems. Isn't it good to move your videos towards new techniques like 'Imitation learning', 'GANs' etc ?
This video is very helpful. Did something change with the store_transition function? I am getting an array mismatch saying the requested array would exceed the maximum number of dimension of 1
If you're using the latest version of gym, the API has changed. Reset returns observation and info and the step function returns observation, reward, done, truncated, info
A question: Is it really needed to make the terminal_batch a Tensor? Since null the q-values for terminal states on q_next, you could also use a np.array? is that correct?
Hi Phil. Just found this channel, nice :) I may be wrong, but i think there may be a problem in the learn process, mem_counter is never reset, so once its hit batchsize it will learn every time the learn function is called.
That is intended. As he explain in the course, this is because at first there is no information in the state memories due to having just been initialized. So we need the agent to run through X amount of games (where X is your batch size) at a minimum before the agent can start to properly learn. After that it's never supposed to stop learning :)
Pytorch takes care of calling the function. If you call it anything other than forward it won't work. You should check the pytorch docs if you want to learn more
I really appreciate this simple agent walkthrough. I find it easy to digest compared to other courses I've seen, and doesn't try to explain the math behind it TOO much, which for novices is pretty nice. My concern though is that because our agent is learning every step of every episode, it is also decaying epsilon every step as well. This leads to a much more rapid and unpredictable descent of epsilon (due to each episode having varying number of steps) for the lifetime of the agent vs other agents I have seen. (Full decay by episode 15-25) Is this intentional? If so, is there any way you could elaborate on why we would want epsilon to be fully decayed within 5% of the Agents training time?
Good question. It turns out that the epsilon decay schedule isn't super critical to learning, at least from my experience. You can get away with a rapid decay as long as epsilon is left sufficiently large as to allow exploration. If it were going all the way to zero(which you should never do unless you want to evaluate performance) then such an aggressive schedule would be a problem for sure.
Love your videos. I have a question though, if i want to implement the same code on games with pixel as observation space, how do i do that? I am getting multiple errors while trying to implement breakout-V2
I followed and built the tensorflow 2 version of this yesterday and it ran great. I haven't been able to get the pytorch version to ever get above 0. I've scoured the code looking for bugs, I've tried every combination of hyper parameters. Has something changed in pytorch that needs to be reflected in this code. My version is 1.13.1.
Hi, Dr. Phil. Great work for a deep Q network implementation and demo. I have been following your tutorial for a while. I am recently doing a DQN for a "multi-agent" collection, which means there is more than 1 agent in the system but we consider them all as a collection. Correspondingly, state(agent1, agent2,...)and action(action1, action2, ...) as collections are used to describe this collection. But the trick is we don't know the number of agents for sure, which gives me a hard time describing n_actions(if 1 agent has 8 actions, 2 would have 64). Does the DQN framework still apply here? If it does, is it possible for you to give me some suggestions about how to modify this framework? Thanks in advance!!!
I have a question... Say, one trains a model, and save its model state for later use... How would one go about loading the model state and performing testing of the agent?.... I've tried coding something (following what I've found on the internet, being, in a nutshell, loading the model state, changing it to eval model, then with torch no grad, selecting the actions greedily), which during training does pretty well at the end of its training (learning was expected), but when I try testing (for instance, to show others its performance), it performs horribly... can anybody help me?
Hi Phil, is it correct that epsilon already reaches the eps_min of 0.01 after only 11 episodes ? Does it mean that we have almost no exploration anymore after 11 episodes ?
Hi Phill, I just found your channel and I really like your content. Do you think reinforcement learning future compared to text mining and image recognition?
the line "self.state_memory[index] = state" in the store_transition() function is giving "ValueError: setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1." my code didn't work and then i copy-and-pasted your code, it's still getting the same error. why is this?
Because the latest version of gym changed the interface. Reset now returns observation and info, and step returns observation, reward, done, truncated, info.
does anyone have issues trying to load checkpoints after training? when i load the checkpoints my graph doesnt properly plot. It keeps a score of -21 at all episodes
I tried implementing this, I implemented it exactly, but it just gets worse and worse. It's hovering at around -500 average score, it seems to just press as many buttons as possible and stay up in the air as soon as epsilon reaches minimum. Any thoughts?
Hi Mr Phil, I have some issues with your code in the previous video with tf2. I used it for CartPole-v0 and FrozenLake-v0 of gym. for cartpole it did very well but for frozenlake was very very weak. I don't know why. BTW, in your code, in the body of build_dqn function, you didn't use input_dims; why?
Regarding the input dims, they're inferred with Keras. Define poor performance for frozen lake? In my course we get 70% win rate using regular Q learning.
@@MachineLearningwithPhil In which of your courses? I've got 70% win rate without neural network. I've expected much more with your tf2 code in the previous video but got under 10%. It was great for cartpole.
@@MachineLearningwithPhil I don't know exactly, I'm a beginner in reinforcement learning. I expected it could help our agent to learn better. Deep neural networks needs a lot of data.I know it is one of their limitations.
Neural nets are designed to work for large / continuous state spaces. They don't handle the small discrete ones very well. Tabular Q learning is far better suited for an environment like the frozen lake.
I omit the use of the target network in this tutorial. Hence the "simple" part of the title. It's intended to be the simplest implementation that actually works in a non trivial sense.
@@MachineLearningwithPhil You did a nice job! I am wondering if you have a similar video using two different networks for Q and Q*. Do you have such thing?
eps_dec = 5e-4 and each time learning happens it substract current epsilon by eps_dec, so starting from 1 it should output epsilon 1, 0.9995, 0.9990, 0.9985, etc... This is not true when I run main_py for lunar_lander. Why is that so? it shrinks like follows 0.99, 0.95, 0.89, 0.84, etc... seems like it decrease by 0.05.
Finally here I see for the first time the fucking plotLearning function ... god... I don't know how many videos I saw without know what that function was and why I couldn't use it ... now finally I know ... you made it ... next time remember to put ALWAYS a refer link under your videos regarding functions that you use and are not inside the packages... otherwise is no sense follow your tutorial... I'm saying this to you for the next time, because I'm a beginner and I can't understand that a function is not inside a package or less, if you don't explain it...
Good Tutorial. But man if you could just open your mouth when you speak! I had to enable subtitles just to understand what you're saying, and half the time subtitles were wrong because it can't understand what you're saying!
This content is sponsored by my Udemy courses. Level up your skills by learning to turn papers into code. See the links in the description.
Thank you very much for your content! I can't seem to find "from paper to code" course on Udemy in the description neither directly on your profile on Udemy. Is this not out yet?
Deep Q Learning:
www.udemy.com/course/deep-q-learning-from-paper-to-code/?couponCode=DQN-AUG-2021
Actor Critic Methods:
www.udemy.com/course/actor-critic-methods-from-paper-to-code-with-pytorch/?couponCode=AC-AUG-2021
Natural Language Processing from First Principles:
www.udemy.com/course/natural-language-processing-from-first-principles/?couponCode=NLP1-AUG-2021
this guy is so good he doesn't even need autocomplete.
Thanks A LOT for making this tutorial!
Coming from a non-CS background, coding is always a bottleneck for me but this video helped pass that phase with ease.
Thank you for your clean tutorials, hope you'll make a new one on non-stationary environments soon.
Hey Phil, first of all thanks for the tutorial.
I have two questions regarding some differences between this code and your code on github.
1)Why do you in your DQN implementation on Github convert the action_batch, after sampling, into tensors but in this implementation you don't? Is it because the pong game requires a multi-dimensional input, which you eventually flatten in your network code, while in this program because the input is a vector we don't need to do that?
2)On Github you use the max operation on this step: q_next = self.q_next.forward(states_).max(dim=1)[0] but here you use T.max(q_next, dim=1)[0]. Why is that?
Also I have trouble getting the network to converge consistently, it reaches an average of about 100 to 150 score in the 200 rounds mark and then it drops off.
Hi! Thanks for this tutorial Phil, it's nice to see a compacted DQN code. I have a question: when you convert arrays to tensors, is there any reason you insert the arrays in the form of lists? For example: "state = T.tensor([observation])..." instead of "state = T.tensor(observation)..."? Because Python print a warning saying that doing on "the list way" is extreme slowly.
It's not always necessary, but when you're working with CNNs they need a batch dimension.
Thanks a lot Dr. Phil, please make some videos about multi agent reinforcement learning ❤️❤️🌹🌹
Great suggestion but will take some time.
@@MachineLearningwithPhil Thank you so much for your attention 🙏🙏❤️🌹
Exceptionally clear presentation!! Pure genius! Will definitely take the course
Remember watching this 4 years ago and not understanding anything. We've come a long way.
This is my master's degree savior.
Thanks for making this tutorial. I just have a tiny question. Why do we q_nex[terminal_batch] = 0.0? Question may be a bit stupid. Sorry, for being a newbie :)
The terminal state has no future value, because no future rewards follow it.
The video is very good, I hope there will be a version with Chinese subtitles
Hey Phil! I'm looking forward to see a video where you show us how to define our own enviroment ! All the tutorials around are using gym, but i'd like to try reinforcement learning on some personal projects !
Start here
ua-cam.com/video/vmrqpHldAQ0/v-deo.html
hi Phil! Is this code running in the current version of torch and gym? thank u for you work on this video!
Thank you, Prof Phil. Very helpful! Can you expand on what Target networks do? I was reading the paper "Human - level control through deep reinforcement learning" where it talks about target network. Not clear what it is and what are the advantages of creating it. Thank you in advance
Target networks help to stabilize training. Using the same network to generate data and evaluate data each time step results in chasing a moving target. The target network changes more slowly so it's a more stable Target
Hi thank you I just have a request, if you can do this in colab , actually loading and saving model part in colab is bit messy , you can guide us
great tutorial, thanks. as a small criticism, please slightly move away from the microphone when coughing.
Sorry about that.
Hi! Why do you use Q_eval.forward(state) instead of Q_eval(state)? I read that it's not good because the hooks aren't deployed, although I have no clue what hooks are.
Thanks for the tutorial!
Hello Phil, I could not find the repo, please direct me where to find it
Amazing and perfect timing too. I was looking at your older code for my project and you just gave the better version. My only issue is that my environment returns a matrix(image). How do I modify your code to get it to work?
Make the output the shape of an image
Sounds as though you need pytorch Convolutional layers at the front end of the Q neural network, if you have image, video based inputs. I suspect you may need to stack a few observations together, if you expect to detect motion from video.
Thank you for this amazing content, Phil!
is there a possibility to do a tutorial on multi-agent DQN ?
I know there is a tutorial on A3C but in some cases, DQN is more suitable for gird wards environment more thatn a3c
Thanks for the great suggestion Rabee. I'll add it to the list!
Very well. Thank you.
Thanks Phil for an amazing tutorial!
very nice tutorial. why do you make a memory-array for every element ( state, new state, reward etc..), couldnt you just make one overally memory array and store named Tuples in the form (state, action, reward, newstate, done) ?
Yup, that's another way to do it. I use the named arrays because it's easier(for me) to keep track of where everything is stored.
Thank you for this tutorial!
Hey Phil, how can I solve local minimum problems in PPO?
I try to solve Luna Lander with PPO agent (with and without bonus entropy) but my agent stop in local minimum.
I really appreciate your videos and I using them to improve my skills!!
Tkss!!
Yeah it needs some hyper parameter tuning. Shoot me an email and I'll help you sort it out.
@@MachineLearningwithPhil Ok, thank you Phil!
Please don't tell people "you don't need any exposure fo deep learning etc". This is why people jump from projects to projects without understanding as they get excited.
In fairness, you don't need exposure to deep learning in order to follow this tutorial. However I can agree it may have been a little misleading as people may have assumed this was a top-down easy-to-digest intro video where it would all be explained in simple terms.
Thanks, excellent tutorial !
is it working??? the result is different from yours
I got an avg from -300 ~ -500
do other people run well??
Hey, how would I go about saving/loading this model?I adapted your network for a different game
life-saving video
Hi Phil, any reason you are using the forward() method on your neural net instead of calling it directly As Q_eval() I.e. using __call__()? I believe in general calling forward() is unsafe, since there’s potentially some necessary magic involving hooks going on under the surface that you might miss.
I was unaware, thanks for the heads up. I'll just use the call from now on.
How can we save DeepQ Model after full episodes of training? Thank you
Thankyou Dr. Phil for an amazing video. When I try to run this on colab, I get this error : "expected scalar type Float but found Double" at either 18th line or 23rd line of main**.py. I am trying it on cartpole environment and I have also tried to change the observation(line 16) to float 32 but it didn't work.
btw, to answer this:
def forward(self, state):
state = state.to(torch.float32)
x = F.relu(self.fc1(state))
x = F.relu(self.fc2(x))
actions = self.fc3(x)
return actions
Great video! Helped me a lot with my bachelor thesis. I'm working on a private project now where the agent needs to predict a x_action between -1.0 and 1.0 and a y_action between -1.0 and 1.0. How can I manage the action indices in the learn()-method if I have multiple floats which describe one action? Or do I need a completely differen model for that? Thanks in advance :)
If it's a continuous action try something like DDPG or TD3
Amazing! I love your video.
Hey Phil!, You are a class apart from others in explaining all these topics. I have a request for you. Since, reinforcement learning takes a lot of time when implemented on real world problems. Isn't it good to move your videos towards new techniques like 'Imitation learning', 'GANs' etc ?
Thanks a lot Phil,I am your big fan,by the way
Can u make some video about ppo and imitation learning
This video is very helpful. Did something change with the store_transition function? I am getting an array mismatch saying the requested array would exceed the maximum number of dimension of 1
If you're using the latest version of gym, the API has changed. Reset returns observation and info and the step function returns observation, reward, done, truncated, info
ah, can you just take the first argument of observation now with the new api? Also, I just bought your course, this tutorial was very helpful
Yup, you can discard the debug info.
Hey, how did you resolve the error?
@@saifal-wahaibi6448 you can just take the first element of that output. I can’t remember if I did it by indexing or doing .item
A question: Is it really needed to make the terminal_batch a Tensor? Since null the q-values for terminal states on q_next, you could also use a np.array? is that correct?
Hi Phil. Just found this channel, nice :) I may be wrong, but i think there may be a problem in the learn process, mem_counter is never reset, so once its hit batchsize it will learn every time the learn function is called.
Nope, functioning as intended.
That is intended. As he explain in the course, this is because at first there is no information in the state memories due to having just been initialized. So we need the agent to run through X amount of games (where X is your batch size) at a minimum before the agent can start to properly learn. After that it's never supposed to stop learning :)
can you explain that why we don't need call forward function in DeepQNetwork?
E.g: def forward()
forward()
Pytorch takes care of calling the function. If you call it anything other than forward it won't work. You should check the pytorch docs if you want to learn more
Genius. Appreciate you 💜
I really appreciate this simple agent walkthrough. I find it easy to digest compared to other courses I've seen, and doesn't try to explain the math behind it TOO much, which for novices is pretty nice.
My concern though is that because our agent is learning every step of every episode, it is also decaying epsilon every step as well. This leads to a much more rapid and unpredictable descent of epsilon (due to each episode having varying number of steps) for the lifetime of the agent vs other agents I have seen. (Full decay by episode 15-25)
Is this intentional? If so, is there any way you could elaborate on why we would want epsilon to be fully decayed within 5% of the Agents training time?
Good question. It turns out that the epsilon decay schedule isn't super critical to learning, at least from my experience. You can get away with a rapid decay as long as epsilon is left sufficiently large as to allow exploration. If it were going all the way to zero(which you should never do unless you want to evaluate performance) then such an aggressive schedule would be a problem for sure.
Love your videos. I have a question though, if i want to implement the same code on games with pixel as observation space, how do i do that? I am getting multiple errors while trying to implement breakout-V2
See my video "ai learns to play pong"
I followed and built the tensorflow 2 version of this yesterday and it ran great. I haven't been able to get the pytorch version to ever get above 0. I've scoured the code looking for bugs, I've tried every combination of hyper parameters. Has something changed in pytorch that needs to be reflected in this code. My version is 1.13.1.
Not that I'm aware of. Shoot me an email with a link to your GitHub. phil@neuralnet.ai
Did you manage to get it working?
Hi, Dr. Phil. Great work for a deep Q network implementation and demo. I have been following your tutorial for a while. I am recently doing a DQN for a "multi-agent" collection, which means there is more than 1 agent in the system but we consider them all as a collection. Correspondingly, state(agent1, agent2,...)and action(action1, action2, ...) as collections are used to describe this collection. But the trick is we don't know the number of agents for sure, which gives me a hard time describing n_actions(if 1 agent has 8 actions, 2 would have 64). Does the DQN framework still apply here? If it does, is it possible for you to give me some suggestions about how to modify this framework? Thanks in advance!!!
thank for the help
I have a question... Say, one trains a model, and save its model state for later use... How would one go about loading the model state and performing testing of the agent?.... I've tried coding something (following what I've found on the internet, being, in a nutshell, loading the model state, changing it to eval model, then with torch no grad, selecting the actions greedily), which during training does pretty well at the end of its training (learning was expected), but when I try testing (for instance, to show others its performance), it performs horribly... can anybody help me?
Are you sure you're loading the best parameters?
Hi Phil, is it correct that epsilon already reaches the eps_min of 0.01 after only 11 episodes ? Does it mean that we have almost no exploration anymore after 11 episodes ?
Mostly correct. Only 1% of actions will be exploratory but that's sufficient for learning.
Hi Phill,
I just found your channel and I really like your content.
Do you think reinforcement learning future compared to text mining and image recognition?
I think we'll see more applications of RL to those other fields. None of them will get us close to AGI.
Nice video. Well explained.
Glad it helped
ValueError: maximum supported dimension for an ndarray is 32, found 10000 ... from writing all code from here. What might be the issue here ?
What version of numpy?
Why did you set the epsion to 1?
Great tutorial!
Can you make a video that builds a DQN from scratch using Numpy?
the line "self.state_memory[index] = state" in the store_transition() function is giving "ValueError: setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1."
my code didn't work and then i copy-and-pasted your code, it's still getting the same error. why is this?
Because the latest version of gym changed the interface. Reset now returns observation and info, and step returns observation, reward, done, truncated, info.
@@MachineLearningwithPhil ah, okay. i changed the env line to "observation, _ = env.reset()" and everything works now. thank you
does anyone have issues trying to load checkpoints after training? when i load the checkpoints my graph doesnt properly plot. It keeps a score of -21 at all episodes
Nice!
Great content
is the udemy course done with pytorch or tensorflow?
Pytorch
How did you know to use [8] as input dims?
It's the default for lunar lander.
I tried implementing this, I implemented it exactly, but it just gets worse and worse. It's hovering at around -500 average score, it seems to just press as many buttons as possible and stay up in the air as soon as epsilon reaches minimum. Any thoughts?
Are you decaying epsilon and over time?
Not entirely sure what you were trying to say, but yes, epsilon is decreasing over time.
Could you tell me what version each library is supposed to be at, so that I can better recreate your setup?
Hi Mr Phil, I have some issues with your code in the previous video with tf2. I used it for CartPole-v0 and FrozenLake-v0 of gym. for cartpole it did very well but for frozenlake was very very weak. I don't know why.
BTW, in your code, in the body of build_dqn function, you didn't use input_dims; why?
Regarding the input dims, they're inferred with Keras.
Define poor performance for frozen lake? In my course we get 70% win rate using regular Q learning.
@@MachineLearningwithPhil In which of your courses? I've got 70% win rate without neural network. I've expected much more with your tf2 code in the previous video but got under 10%. It was great for cartpole.
Why do we use neural networks? What are their use cases and limitations?
@@MachineLearningwithPhil I don't know exactly, I'm a beginner in reinforcement learning. I expected it could help our agent to learn better. Deep neural networks needs a lot of data.I know it is one of their limitations.
Neural nets are designed to work for large / continuous state spaces. They don't handle the small discrete ones very well. Tabular Q learning is far better suited for an environment like the frozen lake.
wondering how you got pytorch to recognize np.bool for self.terminal_memory, brought up an error for me. I had to change dtype to np.uint8
Older versions of PyTorch used np.uint8. The newer (1.4) version requires np.bool and throws an error with np.uint8
I thought Q and Q* use a different NN, but it seems not the case here. Am I wrong?
I omit the use of the target network in this tutorial. Hence the "simple" part of the title. It's intended to be the simplest implementation that actually works in a non trivial sense.
@@MachineLearningwithPhil You did a nice job! I am wondering if you have a similar video using two different networks for Q and Q*. Do you have such thing?
ua-cam.com/video/a5XbO5Qgy5w/v-deo.html
@@MachineLearningwithPhil I am not familiar at all with Keras or Tensorflow. Do you have the equivalent with Pytorch?
If you check out my github (linked in description), the repo for my course is there. You can see the PyTorch equivalent.
what does "fc1" stands for ?
First fully connected layer
@@MachineLearningwithPhil thanks
eps_dec = 5e-4 and each time learning happens it substract current epsilon by eps_dec, so starting from 1 it should output epsilon 1, 0.9995, 0.9990, 0.9985, etc... This is not true when I run main_py for lunar_lander. Why is that so? it shrinks like follows 0.99, 0.95, 0.89, 0.84, etc... seems like it decrease by 0.05.
The decrement happens each time step; the print is at the end of every episode.
@@MachineLearningwithPhil Oh hahah my bad, thanks Phil!
Next time use font size 22 at least.
best
At 33:54 "is our children learn..., is our agent learning" funny
nice
dude, start using some ide from this millennium omg
Finally here I see for the first time the fucking plotLearning function ... god... I don't know how many videos I saw without know what that function was and why I couldn't use it ... now finally I know ... you made it ... next time remember to put ALWAYS a refer link under your videos regarding functions that you use and are not inside the packages... otherwise is no sense follow your tutorial... I'm saying this to you for the next time, because I'm a beginner and I can't understand that a function is not inside a package or less, if you don't explain it...
Good Tutorial. But man if you could just open your mouth when you speak! I had to enable subtitles just to understand what you're saying, and half the time subtitles were wrong because it can't understand what you're saying!