If you want to make copying lines easier, I have mapped the visual studio keybinds in vscode/vscodium to cut down on this. Also Ctrl+Alt+up/down enables multi-edit allowing you to edit multiple lines, escape to exit.
This is a great episode! How u develop ur own observation concept is super interesting. I guess the reward concept in the next episode will become even better!!
I am getting an error even though I copied it from the written tutorial. The error mentions, "AssertionError: The observation returned by the `reset()` method does not match the given observation space". how to resolve this ?
I came back and I believe I found an issue. When self.img is set to “uint8” it causes the arrays to be cast as that (I think) instead of np.float32. When I set every self.img as float32 and the observation a float 32 using astype() it ran without issues. I don’t know if it’s an OS thing or how my numpy array module is set but it it could cause issues
Getting the following error for checkenv: AssertionError: The observation returned by the `reset()` method does not match the given observation space Solved using the following - #self.observation_space = spaces.Box(low=-500, high=500, # shape=(5+SNAKE_LEN_GOAL,), dtype=np.float32)
I would like to remind you that there is no rationality check of the current action based on the pre_action in the env code: for example, when the snake is walking to the right, the current action is not allowed to go to the left, but can only go to the right, up and down. If this check is not performed, then randomly sampled actions will cause the snake to suddenly go from the right to the left, then collide with itself, then done = True, and return reward = -10, which may cause the model to often use -10 reward exit, may be detrimental to learning? I don't know, but if the learning is long enough, the model should come to this conclusion by itself: the action opposite to the current direction of motion cannot be performed😀😀😀😀
Thanks for the videos! It’s super helpful to see someone run through the entire process when starting out. What are your thoughts on imitation learning? Is there a similar library that you could demo?
I dabble a bit :P. Used to do DEs IRL for many years and did some racing pre-pandemic, just at the club level. After that it's been all iRacing since. You can find me under my name there. Maybe also of interest: ua-cam.com/video/dTW_PQyUHXA/v-deo.html
You want to have the rendering done separately and not within the step function. When training the model, the number of epochs( steps performed) needed to achieve good accuracy is usually very high (10K+ depending on the task complexity), so the number of computations in each step must be kept to a minimum. Rendering is only really used for visualization purposes and has no impact on the agent or its environment, so rendering within the step function is a big no-no!
this tutorial series was much needed for me. Thank you so much!in you're script you define self.action_space and self.observation_space but i don't see them used anywhere else. this confuses me. Do you really have to use the gym spaces to feed the algorithm? because i have a lot of trouble with those things... ah ok i found the answer(copied from forum): The observation_space defines the structure of the observations your environment will be returning. Learning agents usually need to know this before they start running, in order to set up the policy function. Some general-purpose learning agents can handle a wide range of observation types: Discrete, Box, or pixels (which is usually a Box(0, 255, [height, width, 3]) for RGB pixels).
Great video! I have one question i would greatly appreciate if anyone could help me with. If my observation space is a 5*6 grid (list in list; [[None] * 6] * 5) and a int, how would i make the spaces.Box? And how should i best structure the grid, and how should i return the observation? Thanks for any help!
hi sentdex i tried making a custom env but ran into a problem not being able to use image as observation, if you have any info on how or this is done please that would be great. love your videos by the way you single handily taught me ml.
I really wouldn't mind GitHub Copilot, although I think it might make you skip/forget to mention some things you wouldn't otherwise. But again, would prefer it speed wise. (bias, Copilot user here 😛)
First comment, dude your content it's awesome just like that, don't use copilot, it only follow the rithm of your head, but for me the progression of the coding it's exelent!
Hi i have one error when execute the code, the reset funcion in PPO mlpPolicies need a seed argument, this is the error: TypeError: SnakeEnv.reset() got an unexpected keyword argument 'seed' i tried to set in the definition of the reset function a default argument seed=None like this: def reset(self, seed=None): .... but now i have this value error: ValueError: too many values to unpack (expected 2) can you help me? thanks for the video!
can you guys show agent that actually lratned something? I mean agent that really play game and sent go to wall? actually real case must be like I have some working strategy, which I provide to agent and gym learn only improvements for my strategy. is such simple example available anywhere?
7:49 why did noone comment :o (I don't know it). Btw, using Ctrl d (select next occurrence) and multi curses would make the self. much more convenient :D
That sounds like you're micromanaging it a bit too much instead of making it learn that it should move towards the apple. A ticking clock that punishes being alive without eating an apple would be better, thought it shouldn't stack with death.
@@Brysett Your suggestion would 100% work and quite effectively I would assume. My problem with is that you're telling it what to do instead of letting it learn the environment itself.
So p4 is already recorded and I'll just spoiler it a bit. I inititally used euclidean distance to apple to essentially punish for distance. This wound up with hilariously bad effect. After an adjustment, this is what I ended with, but still tons of room for improvement. Typically in the fields of data science and ML, you want to be as hands off as possible when trying to engineer something to meet some objective (ie, don't data snoop), that's how I've always been taught. You come up with a theory and you test it, you do not do things like direct reward for getting closer to the apple, you want the agent to just learn to get the apple because that scores points. You wouldn't naturally want to reward for just getting close to the apple, since that's not a reward in the game. In this environment, I am confident after millions of steps, the agent could learn this with the settings used in this video, or something close. BUT, my findings thus far with RL and talking with people who use RL IRL for real things.... it's all reward hacking and tweaking and you have to throw out how much you wish it wasn't like that heheh. Snake is such an easy environment to learn that it may not matter much here, but yeah, any actually hard problem is going to require tricks like rewarding for getting closer to the apple (or something like this).
I don't imagine why not, but I do not know either of those. I used it with the Isaac sim from NVIDIA's omniverse and there wasn't any official pairings. I am also planning to use SB3 for an IRL bot training next, should be capable of adding this to just about anything you want.
Personally, especially for people learning how to use the language and tools, I strongly discourage the use of Co-Pilot. The reason behind this is that you become dependent on Co-Pilot for the answers and autocorrect, and many people just picking up the language can't write a single line of code without assistance. I do have real world experience within the last 3-6 months to draw on to form this conclusion. Normally when I mentor someone, we start with IDLE, but that would be silly with the amount of content that you have put together. A full IDE such as VS Code or Pycharm is great for speed and line by line explanation and if you were to stop every few lines and continue to explain the use of the line, then you would have taken my argument away from me. In conclusion, Co-Pilot is a great tool to add to your tool box / dev environment if you are iterating over things very quickly, but is a very poor teaching / tutorial tool.
Anyone tried Stable Baseline 3 with a NES game (e.g., Mario Bros.)? In comparison to prior Stable Baselines, it appears that scenario files are no longer operating.
This video on reinforcement learning has reinformed my learning of reinforcement learning. Thank you.
I love these videos on reinforcement learning. GIVE ME MORE!!!!
If you want to make copying lines easier, I have mapped the visual studio keybinds in vscode/vscodium to cut down on this. Also Ctrl+Alt+up/down enables multi-edit allowing you to edit multiple lines, escape to exit.
Heck ya, been looking forward to this! Thanks big man!
This is a great episode! How u develop ur own observation concept is super interesting. I guess the reward concept in the next episode will become even better!!
I am getting an error even though I copied it from the written tutorial. The error mentions, "AssertionError: The observation returned by the `reset()` method does not match the given observation space". how to resolve this ?
Try to change the dtype of each numpy array to "np.float32".
I came back and I believe I found an issue. When self.img is set to “uint8” it causes the arrays to be cast as that (I think) instead of np.float32. When I set every self.img as float32 and the observation a float 32 using astype() it ran without issues. I don’t know if it’s an OS thing or how my numpy array module is set but it it could cause issues
Had the same issue, thanks for sharing!
Thanks!! changing the observation to float32 worked for me
Awesome content as always. You are the greatest tech teacher!
This is just awesome man! Thank u for tutorial!!
Getting the following error for checkenv:
AssertionError: The observation returned by the `reset()` method does not match the given observation space
Solved using the following -
#self.observation_space = spaces.Box(low=-500, high=500,
# shape=(5+SNAKE_LEN_GOAL,), dtype=np.float32)
self.observation_space = spaces.Box(low=-500, high=500,
shape=(5+SNAKE_LEN_GOAL,), dtype=np.float64)
Hi! Thanks for the solution :) do you also happen to know why it works (my knowledge of coding and SB3 is the bare minimum still)?
@@Ecxify Hi sorry for the late reply :) I think the issue was with the OS I am running on, so this is probably the fix
@@udik1 Now it's my turn to apologize haha. Thanks ! :)
I would like to remind you that there is no rationality check of the current action based on the pre_action in the env code: for example, when the snake is walking to the right, the current action is not allowed to go to the left, but can only go to the right, up and down. If this check is not performed, then randomly sampled actions will cause the snake to suddenly go from the right to the left, then collide with itself, then done = True, and return reward = -10, which may cause the model to often use -10 reward exit, may be detrimental to learning? I don't know, but if the learning is long enough, the model should come to this conclusion by itself: the action opposite to the current direction of motion cannot be performed😀😀😀😀
when is part 10 of the "Neural Networks from Scratch in Python" series?
We need a code bullet colab with this.
the only two good programmers on youtube lolz
Thanks for the videos! It’s super helpful to see someone run through the entire process when starting out.
What are your thoughts on imitation learning? Is there a similar library that you could demo?
Is Harrison an avid iRacer?
Cheers for the videos man, really enjoying this series. 👍
I dabble a bit :P. Used to do DEs IRL for many years and did some racing pre-pandemic, just at the club level. After that it's been all iRacing since. You can find me under my name there. Maybe also of interest: ua-cam.com/video/dTW_PQyUHXA/v-deo.html
You want to have the rendering done separately and not within the step function. When training the model, the number of epochs( steps performed) needed to achieve good accuracy is usually very high (10K+ depending on the task complexity), so the number of computations in each step must be kept to a minimum. Rendering is only really used for visualization purposes and has no impact on the agent or its environment, so rendering within the step function is a big no-no!
this tutorial series was much needed for me. Thank you so much!in you're script you define self.action_space and self.observation_space but i don't see them used anywhere else. this confuses me. Do you really have to use the gym spaces to feed the algorithm? because i have a lot of trouble with those things...
ah ok i found the answer(copied from forum):
The observation_space defines the structure of the observations your environment will be returning. Learning agents usually need to know this before they start running, in order to set up the policy function. Some general-purpose learning agents can handle a wide range of observation types: Discrete, Box, or pixels (which is usually a Box(0, 255, [height, width, 3]) for RGB pixels).
Great video! I have one question i would greatly appreciate if anyone could help me with.
If my observation space is a 5*6 grid (list in list; [[None] * 6] * 5) and a int, how would i make the spaces.Box? And how should i best structure the grid, and how should i return the observation?
Thanks for any help!
hi sentdex i tried making a custom env but ran into a problem not being able to use image as observation, if you have any info on how or this is done please that would be great. love your videos by the way you single handily taught me ml.
Is it worth trying to use RL to see if it can make a supervised learning regression model better? or does that not make sense
copy and past in win10. Check with error: The observation returned by the `reset()` method does not match the given observation space
if anyone else has this issue, i fixed it by changing dtype=np.float32 to dtype=np.float64 in self.obesrvation_space
@@Jamspell thank you! it worked for me
@@Jamspell thank you so much!
I really wouldn't mind GitHub Copilot, although I think it might make you skip/forget to mention some things you wouldn't otherwise. But again, would prefer it speed wise. (bias, Copilot user here 😛)
Amazing series
Try multiple selection for the tedious.self part.
Finally you are the best
That laugh could be a meme 3:08
First comment, dude your content it's awesome just like that, don't use copilot, it only follow the rithm of your head, but for me the progression of the coding it's exelent!
learn some hotkeys like ctrl+d
By using the past actions, isn't that making this implementation stop being markovian? Since you are using the past...?
Hi i have one error when execute the code, the reset funcion in PPO mlpPolicies need a seed argument, this is the error: TypeError: SnakeEnv.reset() got an unexpected keyword argument 'seed'
i tried to set in the definition of the reset function a default argument seed=None like this:
def reset(self, seed=None):
....
but now i have this value error: ValueError: too many values to unpack (expected 2)
can you help me? thanks for the video!
can you guys show agent that actually lratned something? I mean agent that really play game and sent go to wall? actually real case must be like I have some working strategy, which I provide to agent and gym learn only improvements for my strategy. is such simple example available anywhere?
7:49 why did noone comment :o (I don't know it). Btw, using Ctrl d (select next occurrence) and multi curses would make the self. much more convenient :D
Hi, when’s nnfs coming back?
Dno on the series. Book is done.
This is super informative! Did you study/plan a lot prior to this video or is this you going off the cuff? Either way, impressive work.
Haha, I spotted the observation bug as soon as you copied it and thought it was going to be a nightmare to catch.
every time i try to load a trained snake env model from the .zip, i get a NotImplementedError for the env.render() line
"There you go you cell phone watchers" 😂
This no longer works, since gym was changed to v0.26. However, !pip install gym==0.21 command is broken
gym is replaced by gymnasium, do import gymnasium as gym
Im having this problem... importing gymnasium asks for seed parameter, couldnt figure it out how to do it. Any help would be very much appreciated
Remove OpenAI and Create custom environment is most difficult for me.
Thanks a lot
Quick guess about making the model learn better: Small reward for moving towards the apple (and possibly a small punishment for moving away).
That sounds like you're micromanaging it a bit too much instead of making it learn that it should move towards the apple. A ticking clock that punishes being alive without eating an apple would be better, thought it shouldn't stack with death.
@@sevret313 I think both of our suggestions could work. Would be interesting to see which one works better.
@@Brysett Your suggestion would 100% work and quite effectively I would assume. My problem with is that you're telling it what to do instead of letting it learn the environment itself.
So p4 is already recorded and I'll just spoiler it a bit. I inititally used euclidean distance to apple to essentially punish for distance. This wound up with hilariously bad effect.
After an adjustment, this is what I ended with, but still tons of room for improvement.
Typically in the fields of data science and ML, you want to be as hands off as possible when trying to engineer something to meet some objective (ie, don't data snoop), that's how I've always been taught. You come up with a theory and you test it, you do not do things like direct reward for getting closer to the apple, you want the agent to just learn to get the apple because that scores points. You wouldn't naturally want to reward for just getting close to the apple, since that's not a reward in the game.
In this environment, I am confident after millions of steps, the agent could learn this with the settings used in this video, or something close. BUT, my findings thus far with RL and talking with people who use RL IRL for real things.... it's all reward hacking and tweaking and you have to throw out how much you wish it wasn't like that heheh. Snake is such an easy environment to learn that it may not matter much here, but yeah, any actually hard problem is going to require tricks like rewarding for getting closer to the apple (or something like this).
Im trying a few different things but it seems the the snake just doesnt like going left. always right, either up or down.. but always to the right.
With the indentation can't you just tell Python to use 2 spaces?
Python don't care if it is 2 or 4 spaces. It just have to be aligned. But you can adjust the code editor or formatter to fix the indentation.
Can you make a video on custom policy?
could you make a simpler custom environment version, this snake game is too confusing as a start.
2:10 google is listening
Hello again 👋👋👋👋
Nice. I'm surprised you don't use Vim. I find any mouse movements to slow my fingers down.
hey-o! Thanks!
Hello, I'm very new to Stable Baseline 3 and was wondering if it would be possible to use this alongside AirSim within Unreal Engine?
I don't imagine why not, but I do not know either of those. I used it with the Isaac sim from NVIDIA's omniverse and there wasn't any official pairings. I am also planning to use SB3 for an IRL bot training next, should be capable of adding this to just about anything you want.
@@sentdex Okay and thanks for the speedy response
does this men know that he was recorded?))))
20.12.22 16:00
Personally, especially for people learning how to use the language and tools, I strongly discourage the use of Co-Pilot. The reason behind this is that you become dependent on Co-Pilot for the answers and autocorrect, and many people just picking up the language can't write a single line of code without assistance. I do have real world experience within the last 3-6 months to draw on to form this conclusion. Normally when I mentor someone, we start with IDLE, but that would be silly with the amount of content that you have put together. A full IDE such as VS Code or Pycharm is great for speed and line by line explanation and if you were to stop every few lines and continue to explain the use of the line, then you would have taken my argument away from me. In conclusion, Co-Pilot is a great tool to add to your tool box / dev environment if you are iterating over things very quickly, but is a very poor teaching / tutorial tool.
this isnt a real custom environment. Do a web browser game environment
Anyone tried Stable Baseline 3 with a NES game (e.g., Mario Bros.)? In comparison to prior Stable Baselines, it appears that scenario files are no longer operating.
2nd yay