Custom Environments - Reinforcement Learning with Stable Baselines 3 (P.3)

sentdex

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 жов 2024

КОМЕНТАРІ • 78

@esra_erimez 2 роки тому ⁺³⁵
This video on reinforcement learning has reinformed my learning of reinforcement learning. Thank you.
@godwyllaikins2075 2 роки тому ⁺⁴
I love these videos on reinforcement learning. GIVE ME MORE!!!!
@TECHN01200 2 роки тому ⁺⁴
If you want to make copying lines easier, I have mapped the visual studio keybinds in vscode/vscodium to cut down on this. Also Ctrl+Alt+up/down enables multi-edit allowing you to edit multiple lines, escape to exit.
@tljstewart 2 роки тому ⁺¹
Heck ya, been looking forward to this! Thanks big man!
@randywelt8210 2 роки тому
This is a great episode! How u develop ur own observation concept is super interesting. I guess the reward concept in the next episode will become even better!!
@muhammadsohailnisar6600 Рік тому ⁺⁵
I am getting an error even though I copied it from the written tutorial. The error mentions, "AssertionError: The observation returned by the `reset()` method does not match the given observation space". how to resolve this ?
@lly0571 Рік тому ⁺¹
Try to change the dtype of each numpy array to "np.float32".
@Nerdimo 2 роки тому ⁺⁷
I came back and I believe I found an issue. When self.img is set to “uint8” it causes the arrays to be cast as that (I think) instead of np.float32. When I set every self.img as float32 and the observation a float 32 using astype() it ran without issues. I don’t know if it’s an OS thing or how my numpy array module is set but it it could cause issues
@MrThisNameIsStupid 2 роки тому
Had the same issue, thanks for sharing!
@agnishwarbagchi4935 2 роки тому
Thanks!! changing the observation to float32 worked for me
@HaikuGuy Рік тому
Awesome content as always. You are the greatest tech teacher!
@enriquesnetwork 2 роки тому
This is just awesome man! Thank u for tutorial!!
@udik1 Рік тому ⁺⁴
Getting the following error for checkenv:
AssertionError: The observation returned by the `reset()` method does not match the given observation space
Solved using the following -
#self.observation_space = spaces.Box(low=-500, high=500,
# shape=(5+SNAKE_LEN_GOAL,), dtype=np.float32)

self.observation_space = spaces.Box(low=-500, high=500,
shape=(5+SNAKE_LEN_GOAL,), dtype=np.float64)
@Ecxify Рік тому
Hi! Thanks for the solution :) do you also happen to know why it works (my knowledge of coding and SB3 is the bare minimum still)?
@udik1 Рік тому ⁺¹
@@Ecxify Hi sorry for the late reply :) I think the issue was with the OS I am running on, so this is probably the fix
@Ecxify Рік тому
@@udik1 Now it's my turn to apologize haha. Thanks ! :)
@Alex-Marshall Рік тому ⁺²
I would like to remind you that there is no rationality check of the current action based on the pre_action in the env code: for example, when the snake is walking to the right, the current action is not allowed to go to the left, but can only go to the right, up and down. If this check is not performed, then randomly sampled actions will cause the snake to suddenly go from the right to the left, then collide with itself, then done = True, and return reward = -10, which may cause the model to often use -10 reward exit, may be detrimental to learning? I don't know, but if the learning is long enough, the model should come to this conclusion by itself: the action opposite to the current direction of motion cannot be performed😀😀😀😀
@Giant-Axe 2 роки тому ⁺⁴
when is part 10 of the "Neural Networks from Scratch in Python" series?
@TECHN01200 2 роки тому ⁺⁴
We need a code bullet colab with this.
@davidanalyst671 2 роки тому
the only two good programmers on youtube lolz
@samnman1 Рік тому ⁺¹
Thanks for the videos! It’s super helpful to see someone run through the entire process when starting out.
What are your thoughts on imitation learning? Is there a similar library that you could demo?
@raven9057 2 роки тому
Is Harrison an avid iRacer?
Cheers for the videos man, really enjoying this series. 👍
@sentdex 2 роки тому ⁺²
I dabble a bit :P. Used to do DEs IRL for many years and did some racing pre-pandemic, just at the club level. After that it's been all iRacing since. You can find me under my name there. Maybe also of interest: ua-cam.com/video/dTW_PQyUHXA/v-deo.html
@pedrowangler97 9 місяців тому
You want to have the rendering done separately and not within the step function. When training the model, the number of epochs( steps performed) needed to achieve good accuracy is usually very high (10K+ depending on the task complexity), so the number of computations in each step must be kept to a minimum. Rendering is only really used for visualization purposes and has no impact on the agent or its environment, so rendering within the step function is a big no-no!
@phoenix_fire_stone 11 місяців тому
this tutorial series was much needed for me. Thank you so much!in you're script you define self.action_space and self.observation_space but i don't see them used anywhere else. this confuses me. Do you really have to use the gym spaces to feed the algorithm? because i have a lot of trouble with those things...
ah ok i found the answer(copied from forum):
The observation_space defines the structure of the observations your environment will be returning. Learning agents usually need to know this before they start running, in order to set up the policy function. Some general-purpose learning agents can handle a wide range of observation types: Discrete, Box, or pixels (which is usually a Box(0, 255, [height, width, 3]) for RGB pixels).
@oysteinmb1 2 роки тому ⁺¹
Great video! I have one question i would greatly appreciate if anyone could help me with.
If my observation space is a 5*6 grid (list in list; [[None] * 6] * 5) and a int, how would i make the spaces.Box? And how should i best structure the grid, and how should i return the observation?
Thanks for any help!
@paulokafor3781 Рік тому
hi sentdex i tried making a custom env but ran into a problem not being able to use image as observation, if you have any info on how or this is done please that would be great. love your videos by the way you single handily taught me ml.
@JamesWattMusic 2 роки тому
Is it worth trying to use RL to see if it can make a supervised learning regression model better? or does that not make sense
@markcai8130 2 роки тому ⁺²
copy and past in win10. Check with error: The observation returned by the `reset()` method does not match the given observation space
@Jamspell 2 роки тому ⁺⁷
if anyone else has this issue, i fixed it by changing dtype=np.float32 to dtype=np.float64 in self.obesrvation_space
@isaishaqzulkifli602 Рік тому ⁺¹
@@Jamspell thank you! it worked for me
@marcoa5777 Рік тому
@@Jamspell thank you so much!
@jonasls 2 роки тому ⁺¹
I really wouldn't mind GitHub Copilot, although I think it might make you skip/forget to mention some things you wouldn't otherwise. But again, would prefer it speed wise. (bias, Copilot user here 😛)
@akarshrastogi3682 2 роки тому
Amazing series
@Charliemmag 22 дні тому
Try multiple selection for the tedious.self part.
@georgebassemfouad 2 роки тому
Finally you are the best
@omidalekasir4736 Рік тому
That laugh could be a meme 3:08
@JPy90 9 місяців тому
First comment, dude your content it's awesome just like that, don't use copilot, it only follow the rithm of your head, but for me the progression of the coding it's exelent!
@JPy90 9 місяців тому
learn some hotkeys like ctrl+d
@phantomBlurrrrr 9 місяців тому
By using the past actions, isn't that making this implementation stop being markovian? Since you are using the past...?
@PEDRONICOLASBONAFE 11 місяців тому
Hi i have one error when execute the code, the reset funcion in PPO mlpPolicies need a seed argument, this is the error: TypeError: SnakeEnv.reset() got an unexpected keyword argument 'seed'
i tried to set in the definition of the reset function a default argument seed=None like this:
def reset(self, seed=None):
....
but now i have this value error: ValueError: too many values to unpack (expected 2)
can you help me? thanks for the video!
@phono1231 Рік тому
can you guys show agent that actually lratned something? I mean agent that really play game and sent go to wall? actually real case must be like I have some working strategy, which I provide to agent and gym learn only improvements for my strategy. is such simple example available anywhere?
@peschebichsu 2 роки тому
7:49 why did noone comment :o (I don't know it). Btw, using Ctrl d (select next occurrence) and multi curses would make the self. much more convenient :D
@judedavis92 2 роки тому ⁺²
Hi, when’s nnfs coming back?
@sentdex 2 роки тому
Dno on the series. Book is done.
@hendrixkid2362 2 роки тому
This is super informative! Did you study/plan a lot prior to this video or is this you going off the cuff? Either way, impressive work.
@adamjones6916 2 роки тому
Haha, I spotted the observation bug as soon as you copied it and thought it was going to be a nightmare to catch.
@Lordg52-l9l 2 роки тому
every time i try to load a trained snake env model from the .zip, i get a NotImplementedError for the env.render() line
@Hazit90 Рік тому
"There you go you cell phone watchers" 😂
@michpo1445 11 місяців тому ⁺¹
This no longer works, since gym was changed to v0.26. However, !pip install gym==0.21 command is broken
@maxkhrisanfov Місяць тому
gym is replaced by gymnasium, do import gymnasium as gym
@gabrielgoncalvesazevedo9114 Місяць тому
Im having this problem... importing gymnasium asks for seed parameter, couldnt figure it out how to do it. Any help would be very much appreciated
@peterhpchen Рік тому
Remove OpenAI and Create custom environment is most difficult for me.
Thanks a lot
@Brysett 2 роки тому
Quick guess about making the model learn better: Small reward for moving towards the apple (and possibly a small punishment for moving away).
@sevret313 2 роки тому
That sounds like you're micromanaging it a bit too much instead of making it learn that it should move towards the apple. A ticking clock that punishes being alive without eating an apple would be better, thought it shouldn't stack with death.
@Brysett 2 роки тому
@@sevret313 I think both of our suggestions could work. Would be interesting to see which one works better.
@sevret313 2 роки тому ⁺¹
@@Brysett Your suggestion would 100% work and quite effectively I would assume. My problem with is that you're telling it what to do instead of letting it learn the environment itself.
@sentdex 2 роки тому ⁺³
So p4 is already recorded and I'll just spoiler it a bit. I inititally used euclidean distance to apple to essentially punish for distance. This wound up with hilariously bad effect.
After an adjustment, this is what I ended with, but still tons of room for improvement.
Typically in the fields of data science and ML, you want to be as hands off as possible when trying to engineer something to meet some objective (ie, don't data snoop), that's how I've always been taught. You come up with a theory and you test it, you do not do things like direct reward for getting closer to the apple, you want the agent to just learn to get the apple because that scores points. You wouldn't naturally want to reward for just getting close to the apple, since that's not a reward in the game.
In this environment, I am confident after millions of steps, the agent could learn this with the settings used in this video, or something close. BUT, my findings thus far with RL and talking with people who use RL IRL for real things.... it's all reward hacking and tweaking and you have to throw out how much you wish it wasn't like that heheh. Snake is such an easy environment to learn that it may not matter much here, but yeah, any actually hard problem is going to require tricks like rewarding for getting closer to the apple (or something like this).
@raven9057 2 роки тому
Im trying a few different things but it seems the the snake just doesnt like going left. always right, either up or down.. but always to the right.
@nielsencs 2 роки тому
With the indentation can't you just tell Python to use 2 spaces?
@franky12 2 роки тому ⁺¹
Python don't care if it is 2 or 4 spaces. It just have to be aligned. But you can adjust the code editor or formatter to fix the indentation.
@vaizerdgrey 2 роки тому
Can you make a video on custom policy?
@iceman1125 2 роки тому
could you make a simpler custom environment version, this snake game is too confusing as a start.
@JazevoAudiosurf 2 роки тому ⁺¹
2:10 google is listening
@Stinosko 2 роки тому ⁺¹
Hello again 👋👋👋👋
@jonclement 2 роки тому ⁺¹
Nice. I'm surprised you don't use Vim. I find any mouse movements to slow my fingers down.
@vatsalshukla5434 2 роки тому
hey-o! Thanks!
@laurence1320 2 роки тому
Hello, I'm very new to Stable Baseline 3 and was wondering if it would be possible to use this alongside AirSim within Unreal Engine?
@sentdex 2 роки тому ⁺³
I don't imagine why not, but I do not know either of those. I used it with the Isaac sim from NVIDIA's omniverse and there wasn't any official pairings. I am also planning to use SB3 for an IRL bot training next, should be capable of adding this to just about anything you want.
@laurence1320 2 роки тому
@@sentdex Okay and thanks for the speedy response
@vitaly1085 2 роки тому
does this men know that he was recorded?))))
@niklasdamm6900 Рік тому
20.12.22 16:00
@Magnathia 2 роки тому
Personally, especially for people learning how to use the language and tools, I strongly discourage the use of Co-Pilot. The reason behind this is that you become dependent on Co-Pilot for the answers and autocorrect, and many people just picking up the language can't write a single line of code without assistance. I do have real world experience within the last 3-6 months to draw on to form this conclusion. Normally when I mentor someone, we start with IDLE, but that would be silly with the amount of content that you have put together. A full IDE such as VS Code or Pycharm is great for speed and line by line explanation and if you were to stop every few lines and continue to explain the use of the line, then you would have taken my argument away from me. In conclusion, Co-Pilot is a great tool to add to your tool box / dev environment if you are iterating over things very quickly, but is a very poor teaching / tutorial tool.
@ApexArtistX 11 місяців тому
this isnt a real custom environment. Do a web browser game environment
@azleezlee 2 роки тому
Anyone tried Stable Baseline 3 with a NES game (e.g., Mario Bros.)? In comparison to prior Stable Baselines, it appears that scenario files are no longer operating.
@martis9453 2 роки тому
2nd yay

Наступне

Автоматичне відтворення

Tweaking Custom Environment Rewards - Reinforcement Learning with Stable Baselines 3 (P.4)