- 55
- 25 201
Robert Cowher - DevOps, Python, AI
United States
Приєднався 25 тра 2011
I'm a DevOps engineer who enjoys playing with Python & AI.
Zombies & RL - Part 8 - Converting to OpenAI Gym
In this series on Zombies and Reinforcement Learning, we're going to build a custom Zombie game with Pygame, convert that game into an OpenAI gym environment, and use double Q learning with PyTorch to train an autonomous agent to play it. In part 8, we'll be starting the conversion to an OpenAI gym interface.
Starter project can be downloaded here -
github.com/bobcowher/youtube-zombie-shooter-starter
Starter project can be downloaded here -
github.com/bobcowher/youtube-zombie-shooter-starter
Переглядів: 14
Відео
Zombies & RL - Part 7 - Adding Sounds
Переглядів 142 години тому
In this series on Zombies and Reinforcement Learning, we're going to build a custom Zombie game with Pygame, convert that game into an OpenAI gym environment, and use double Q learning with PyTorch to train an autonomous agent to play it. In part 7, we'll be adding sounds to our game. Starter project can be downloaded here - github.com/bobcowher/youtube-zombie-shooter-starter
Zombies & RL - Part 6 - Loot Drops
Переглядів 222 години тому
In this series on Zombies and Reinforcement Learning, we're going to build a custom Zombie game with Pygame, convert that game into an OpenAI gym environment, and use double Q learning with PyTorch to train an autonomous agent to play it. In part 6, we'll be building out the health and loot drop functionality. Starter project can be downloaded here - github.com/bobcowher/youtube-zombie-shooter-...
Zombies & RL - Part 5 - Hunting Zombies
Переглядів 82 години тому
In this series on Zombies and Reinforcement Learning, we're going to build a custom Zombie game with Pygame, convert that game into an OpenAI gym environment, and use double Q learning with PyTorch to train an autonomous agent to play it. In this video, we're going to build out the logic to move zombies towards the player and allow the player to shoot the zombies. Starter project can be downloa...
Zombies & RL - Part 4 - Bullets!
Переглядів 134 години тому
In this series on Zombies and Reinforcement Learning, we're going to build a custom Zombie game with Pygame, convert that game into an OpenAI gym environment, and use double Q learning with PyTorch to train an autonomous agent to play it. In this video, we're going to code out the Bullet classes and the basic functionality for the shotgun. Starter project can be downloaded here - github.com/bob...
Zombies & RL - Part 3 - Player Controls
Переглядів 84 години тому
In this series on Zombies and Reinforcement Learning, we're going to build a custom Zombie game with Pygame, convert that game into an OpenAI gym environment, and use double Q learning with PyTorch to train an autonomous agent to play it. This video will cover the main loop and basic player controls. Starter project can be downloaded here - github.com/bobcowher/youtube-zombie-shooter-starter
Zombies & RL - Part 2 - Player and Zombies
Переглядів 194 години тому
In this series on Zombies and Reinforcement Learning, we're going to build a custom Zombie game with Pygame, convert that game into an OpenAI gym environment, and use double Q learning with PyTorch to train an autonomous agent to play it. This video will cover the creation of the zombie and player classes. Starter project can be downloaded here - github.com/bobcowher/youtube-zombie-shooter-starter
Zombies & RL - Custom Gym Environments with Pygame - Part 1 - Intro & Main Loop
Переглядів 194 години тому
In this series on Zombies and Reinforcement Learning, we're going to build a custom Zombie game with Pygame, convert that game into an OpenAI gym environment, and use double Q learning with PyTorch to train an autonomous agent to play it. The first video is an intro and we'll start the main loop. Starter project can be downloaded here - github.com/bobcowher/youtube-zombie-shooter-starter
Autonomous Code Generation with ChatGPT and Python
Переглядів 12814 годин тому
In this video we're going to use the OpenAI API to automatically write code that passes a set of tests. This was a proof of concept to see how a code-writing agent might work. github.com/bobcowher/youtube-openai-dev-agent-1-starter
Training AI models remotely with RunPod + Tensorboard
Переглядів 77День тому
In this video, we're going to be walking through building a demo python script , deploying that script to RunPod on a GPU enabled instance, and getting logs back with Tensorboard
Cracking /etc/shadow passwords with C++ (yescrypt)
Переглядів 4872 місяці тому
Learn to crack modern yescrypt passwords in Linux's /etc/shadow file with C . This is a great exercise for Ethical Hacking students. github.com/bobcowher/youtube-password-cracker-shadow-starter
Building a Web App with RAG & ChatGPT - Part 4 - Querying the Model
Переглядів 1642 місяці тому
This is part 4 of a series on building a web app with RAG(Retrieval Augmented Generation) and ChatGPT. In this video, we'll build the text search and ask methods. Starter code: github.com/bobcowher/youtube-rag-web-starter/tree/main Completed code: github.com/bobcowher/youtube-rag-web Original series by Daniel Bourke: ua-cam.com/video/qN_2fnOPY-M/v-deo.html Human nutrition text: pressbooks.oer.h...
Building a Web App with RAG & ChatGPT - Part 3 - Building the File Processor
Переглядів 2012 місяці тому
This is part 3 of a series on building a web app with RAG(Retrieval Augmented Generation) and ChatGPT. In this video, we'll start building the RAG model & file processor. Starter code: github.com/bobcowher/youtube-rag-web-starter/tree/main Human nutrition text: pressbooks.oer.hawaii.edu/humannutrition2/open/download?type=pdf Precalculus textbook: www.opentextbookstore.com/precalc/
Building a Web App with RAG & ChatGPT - Part 2 - Building the Web Interface
Переглядів 2283 місяці тому
This is part 2 of a series on building a web app with RAG(Retrieval Augmented Generation) and ChatGPT. In this video, we'll start by building the framework for the web interface. Starter code: github.com/bobcowher/youtube-rag-web-starter/tree/main Human nutrition text: pressbooks.oer.hawaii.edu/humannutrition2/open/download?type=pdf Precalculus textbook: www.opentextbookstore.com/precalc/
Building a Web App with RAG & ChatGPT - Part 1 - Intro
Переглядів 4733 місяці тому
We're going to build a RAG(Retrieval Augmented Generation) application with ChatGPT. This is based on the work done by Daniel Bourke under - ua-cam.com/video/qN_2fnOPY-M/v-deo.html For the starter files for this video, visit - github.com/bobcowher/youtube-rag-web-starter/tree/main
Templating CloudFormation with Python(plus C++ comparison)
Переглядів 1043 місяці тому
Templating CloudFormation with Python(plus C comparison)
Robotic Arm Manipulation with Human Experiences & HRL - Part 9 - Training the Meta Agent
Переглядів 1243 місяці тому
Robotic Arm Manipulation with Human Experiences & HRL - Part 9 - Training the Meta Agent
Robotic Arm Manipulation with Human Experiences & HRL - Part 10 - The End
Переглядів 1093 місяці тому
Robotic Arm Manipulation with Human Experiences & HRL - Part 10 - The End
Robotic Arm Manipulation with Human Experiences & HRL - Part 8 - Building the Meta Agent
Переглядів 833 місяці тому
Robotic Arm Manipulation with Human Experiences & HRL - Part 8 - Building the Meta Agent
Robotic Arm Manipulation with Human Experiences & HRL - Part 7 - Training the Agent
Переглядів 1143 місяці тому
Robotic Arm Manipulation with Human Experiences & HRL - Part 7 - Training the Agent
Robotic Arm Manipulation with Human Experiences & HRL - Part 6 - Building the Agent
Переглядів 1253 місяці тому
Robotic Arm Manipulation with Human Experiences & HRL - Part 6 - Building the Agent
Robotic Arm Manipulation with Human Experiences & HRL - Part 5 - Building the Model
Переглядів 1064 місяці тому
Robotic Arm Manipulation with Human Experiences & HRL - Part 5 - Building the Model
Robotic Arm Manipulation with Human Experiences & HRL - Part 4 - Collecting Human Experiences
Переглядів 2254 місяці тому
Robotic Arm Manipulation with Human Experiences & HRL - Part 4 - Collecting Human Experiences
Robotic Arm Manipulation with Human Experiences & HRL - Part 2 - Setting up the Environment
Переглядів 1834 місяці тому
Robotic Arm Manipulation with Human Experiences & HRL - Part 2 - Setting up the Environment
Robotic Arm Manipulation with Human Experiences & HRL - Part 3 - Building the Replay Buffer
Переглядів 1304 місяці тому
Robotic Arm Manipulation with Human Experiences & HRL - Part 3 - Building the Replay Buffer
Robotic Arm Manipulation with Human Experiences & HRL - Part 1 - Intro (Advanced)
Переглядів 4974 місяці тому
Robotic Arm Manipulation with Human Experiences & HRL - Part 1 - Intro (Advanced)
Solving Mazes with Reinforcement Learning - Part 11 - Results & Next Steps
Переглядів 2074 місяці тому
Solving Mazes with Reinforcement Learning - Part 11 - Results & Next Steps
Solving Mazes with Reinforcement Learning - Part 10 - Building the Intrinsic Curiosity Module
Переглядів 1114 місяці тому
Solving Mazes with Reinforcement Learning - Part 10 - Building the Intrinsic Curiosity Module
Solving Mazes with Reinforcement Learning - Part 9 - Building the Test Method
Переглядів 844 місяці тому
Solving Mazes with Reinforcement Learning - Part 9 - Building the Test Method
Solving Mazes with Reinforcement Learning - Part 8 - Building the Train Method
Переглядів 675 місяців тому
Solving Mazes with Reinforcement Learning - Part 8 - Building the Train Method
For questions about the videos, or just to come talk about robots and AI, join me on Discord - discord.gg/dnhsk3pD2V If you like what I'm doing here, don't forget to subscribe. Enjoy the series.
Greate Thanks a lot 💖
fantastic video
is this a continuing series to continue training ai models?
Glad you're enjoying the videos. This one isn't part of a series, but if you look through my channel you'll find AI training videos centered around robots, maze solving, and even Atari games like Breakout. Have fun!
@robertcowher sincerely amazing, and is it tutorials? Can we get a code along? Because I want to make an AI for a project that uses fundamental analysis
UA-cam algorithm led me to your video. As a learning dev who uses ChatGPT to co-code i find this concept incredible. Like in all jobs people feel they are irreplaceable but i always say it just takes time or money. This will be the future and for the good devs it will be so rewarding.
Great video! appreciate all the work you put in.
Glad you enjoyed the series :)
the training stopped after ~ 1500 epochs due to RAM limit (16GB). Is there anyway to prevent that?
shiet I put memory_capacity = 1000000 :))
Yes. The easy answer is to reduce the buffer size. It’s what uses most of the memory. Keep in mind the buffer is there for stability so reducing it too much can have a negative impact on training. Play with that value and see where it gets you.
I am following this series, but I dont know why the Arcade Learning Environment usually go blank and show nothing. After restart the computer, I got the game once and then it show nothing for latter run. Do you have any ideas?
Are you in the training or testing phase? If you're in training, it should stay blank. If you're in testing, check the render mode.
@@robertcowher I was in testing phase. I dont know if this is because of the code or the hardware on my laptop... In the 1st run after restart the laptop ALE can show breakout, but for 2nd time it show nothing
@@robertcowher ah, I fixed it, it is because of the env.ale.* , I have to use env.unwrapper.ale.* instead
@@NhanLe-lq7xs Nice!
Amazing!!
Hi i want to ask how can i make a real robotic arm and teach this instead of simulation. I mean i need to work with real world robotic arm. Can you guide me pls ??
So what you're describing is a career journey, not a single project, and I'm not there yet but I can describe the path. If you wanted to do something like this in the real world, your path would be 1) Learn ROS, to control real world robots. 2) Learn how to physically build a robot arm. 3) Learn how to simulate your robot arm in ROS and Gazebo. 4) Learn how to port your simulated arm into a Gym-like environment for reinforcement learning. 5) Update this approach to work with your new environment and train the robot in simulation. 6) Take that trained model and move to your physical robot to complete training. If you're ready to start working with real robots, Antonio Brandi's Udemy courses are a great place to start - www.udemy.com/course/robotics-and-ros-2-learn-by-doing-manipulators
@@robertcowher Thanks a lot sir. It means a lot 🙏🙏🙏🙏😊😊
Hello, is there a corresponding paper for this case? Or can you make a video to explain its environment, such as state space, action space, and reward function?
Great video. Could you please make a video on manipulation using VLM
I haven't played with VLM's before but it sounds like an interesting concept. Is there a specific environment or challenge you'd like to see solved?
Hello. thank you for the lesson. I'm getting an error that says "undefined reference to 'crypt' collect2 : error : ld returned 1 exit status"
In your g++ command, make sure "-lcrypt" is set. Missing that could cause the error you're describing. I'm guessing you're either using VSCode without my tasks.json, or using another IDE.
That with the mutli objective single policy is really interesting and I'm looking forward to see the progress, no matter how good/bad it is. Can you at least anticipate, how do assign the rewards? Are there different reward for different tasks? How do you set the task execution order? Many thanks for all those wonderful videos.
So for now, I'm not even worrying about task execution order, just whether or not it can take an environment with multiple goals and accomplish them at all, and letting the environment hand out one reward per successfully completed task. I ended up taking a long break from the problem, but I'm going to pick it up in the next week or so and take another swing at it. Sometimes it helps to just..not work on something for a little while and look at it with fresh eyes.
Hey Robert. In the playlist this video (part 2) comes after part 3. Probably you should swap them. Thanks again for your videos
Good catch! Fixed.
Its like you read my mind hhhhh I have been trying to make this. boom you post it
Glad to help ;) I had a lot of fun with this one.
I hope you can create a tutorial video on creating a simulation environment and writing code for env and tasks.
I'm not building one specific to robots, but your comment sparked an idea and I've been working on a custom Pygame environment. At some point in the next month I'm hoping to put out a full series on building the game from scratch, converting it into a gym environment, and then training an agent on it.
After following all the courses with you, I have gained a lot of help. I have a few questions: First, I have successfully trained all tasks using your dataset except for the hinge_cabinet task, which is not working, and I don't know why. Second, why did I use PyCharm before, but in this series, I used VSCode? What is the difference between the two? Third, could you later release a tutorial on how to import and define my own robot and environment for training?
Great questions. 1) For hinge cabinet, what behavior are you seeing? Have you tried deleting the weights file for that network and retraining? 2) You can use whatever IDE you're most comfortable with for most of my videos. I switched over to VSCode full time because I needed something that supported Python, Jupyter notebooks, C++, and ROS and integrated well with ChatGPT. Jetbrains(the company that makes Pycharm) has solutions for all of those problems but it pushes you into their paid tier and they bundle some of those capabilities into different products(PyCharm v.s. CLion). 3) That's something I'm currently trying to figure out. When I do, I'll definitely make a video :)
I'm very excited to receive your reply, thank you very much, and I look forward to your new work@@robertcowher
Such a greate series. I have looked up this content for a long. Thank you so much for your efforts.
Glad you're enjoying the videos. I've also had a hard time finding good robotics-specific RL content. I may not be the best person to fill that gap, but I'm definitely having the most fun with it :)
Thank you for your description. After setting the `last_action` for the first goal, you already obtain the next state, which could be used as input for the policy of the second task. Instead of reusing the action from task 1, wouldn’t it be better to sample a fresh action based on the task-specific policy for task 2, given the last state from task 1? This new action should be more appropriate for task 2.
Good observation, and the answer is simple enough. The environment returns different observation shapes for some of the goals. You can absolutely take the last state, instead of last action, and generate a new action from the new policy, but it will be the wrong shape for some of the models and cause the agent to crash. I've been playing with solving that problem other ways(for example, padding the observations so all goals/models have the same shape), but that didn't make this implementation.
@@robertcowher Thank you, it is now clear to me. You talked about your omni agent in the last video; for this agent, you might have a large observation space consisting of the observation spaces of all objects in the environment.
With this approach do you collect manipulation data first and then train the model on this collected dataset, also another question I have is only manipulation data or some visual input too?
Great question. I'm not collecting visual data for this one, but the observations include a numerical representation of the environment. What you'll find a bit later in the course is that I'm using a hybrid approach, starting live SAC training with a buffer of experiences, and weighting the buffer towards picking up more of those live experiences for the first few hundred episodes.
Make a video on decision mamba - reinforcement learning that uses mamba network
I haven't tried MAMBA, but I'll check it out, and make a video if it looks interesting. Thanks for the tip!
YESSSS!!!! Now it works!
Hey, thanks for the problem report. I wouldn't have noticed for weeks. :)
Totally underestimated video series. I cannot stop watching it. Anyway, if I understood correctly, we need a _super_ policy, which selects one of the sub policies for accomplishing a complex task (e.g open microwave, turn on stove, close microwave, etc.), right?
Correct. Though in this case, the super policy is going to be static(i.e. looping through a list of goals and selecting the appropriate sub-policy). We'll get there in the next few videos :)
I strongly encourage everyone to generate their own data, but I've also posted my dataset to HuggingFace here - huggingface.co/datasets/robertcowher/farama-kitchen-sac-hrl-youtube/tree/main
Top videos! Did I understand correctly? You took 30.000 iteractions using the joystick?
Great question, and yes, but there's some nuance. Every timestep is considered a memory so continuous joystick actions tend to generate a lot of them very quickly. My average was right around 200 timesteps per task once I got good at it, so filling a 30K memory buffer takes about 150 successful completions of that task. I experimented with human memory buffers from 20K to 60K for various tasks and found 30K to be a good minimum buffer size to succeed at all tasks. To that end, the can_sample method we've coded here looks for batch_size(64) * 500, or 32,000. You could tweak that multiplier and experiment with less.
I was going to do this at the end of the series, but I went ahead and pinned a comment with my data set. I still recommend doing some of your own data generation to get the full process, but it's there to save you some time.
For questions about the videos, or just to come talk about robots and AI, join me on Discord - discord.gg/dnhsk3pD2V
Interestingly amazing tutorials. Looking forward to more of this. Thanks
Glad you enjoyed the series :) Any specific environments you'd like to see solved? I'm already working on object sorting with a virtual Emika Panda robot arm, but that's been a challenge so far.
why are we only using the forward model instead of both forward and inverse model of ICM?
I wish you could get a bunch of views
Hey, me too :) Honestly though, UA-cam helps me feel like there's a clear end goal when building these projects, and that's a big part of why I'm doing it. If it's useful to a few people, I've hit my goals.
keep posting
If you enjoyed these, you might like the new series I'm working on. It's a similar robotic arm in a much more complex environment with a human experience component. ua-cam.com/video/ma6fbvy77Uo/v-deo.html
Hi I just followed your tutorial and it worked! Thank you for making it clear and easy to follow. If I wanted to change the rewards policy in the future (such as rewarding proximity to the ball or longevity) would I be able to? Or is the reward policy pretty much set in place?
Glad to hear it, and great question. What you're describing is an observation wrapper. They're part of the gym library and here for exactly these kinds of changes. An observation wrapper let's you "wrap" the environment, and then add your custom logic on top of it. My recommendation would be to get a wrapper working for something really simple(say, doubling the reward when the agent hits a ball) to make sure you've got the wiring right, and then getting into the logic of what you really want to do. They sound complicated, but a basic observation wrapper can be 6 lines of code. Here's the doc to follow -> gymnasium.farama.org/api/wrappers/observation_wrappers/
Also worth noting - An observation wrapper is what I should have used instead of processing the observation in the training loop.
please provide gthub repo for this code
I get this error for the forward method of the ActorNetwork class, any idea how I can solve it? `RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x128 and 46x256)`
That error usually means that the output of one layer doesn't match the input of the next layer. Check for typos in the layer input/output sizes, and if nothing jumps out at you, start printing out the variables and their sizes as you go through the forward method in the network. One of them is likely going to be a different size than expected and that's your culprit.
Только учусь python и как раз искал нечто похожее. Спасибо за видео.
Absolutely. Glad you enjoyed it.
Hi, I keep getting the below error after a few iterations. I have combed through the code its the same as yours and chatgpt was not able to resolve the error. Can you please help? line 97, in train state_b, action_b, reward_b, done_b, next_state_b = self.memory.sample( File "C:\Users\asitr\Downloads\agent.py", line 37, in sample return [torch.cat(items).to(self.device) for items in batch] File "C:\Users\asitr\Downloads\agent.py", line 37, in <listcomp> return [torch.cat(items).to(self.device) for items in batch] TypeError: expected Tensor as element 29 in argument 0, but got tuple
Would you mind posting your code to GitHub so I can do a quick comparison?
@@robertcowher I have posted my GitHub link 3 times now and keeps getting deleted. Can I give it to you in any other way?
I am getting the below error when i am trying to run atari_breakout.py. Have you encountered this? x = torch.Tensor(x) ValueError: expected sequence of length 210 at dim 1 (got 3)
That usually means you've either typod a dimension, a batch size, or missed one of the torch.unsqueeze lines. General advice - 1) Go through it line by line and make sure you didn't miss anything. 2) Print out what's being passed through the model. Look at the actual output and see if it makes the error make more sense. 3) Plug the whole block of code + the error into ChatGPT. It's really good at spotting these kinds of issues.
Content on this topic seems to be quite sparse on internet. Your videos have helped me a lot. Thanks for sharing the knowledge !!!
Glad you're enjoying them, and I've noticed the same trying to learn this stuff myself. There's plenty of robotics content(ROS2, etc) and plenty of information on RL, but it's hard to find information on the intersection of those two topics. I'm working on getting SAC working with more complex tasks(like object sorting), as well as using human experience(piloting the robot by hand) to bootstrap more quickly. I'll post that project if I can get it running.
I've had a few people ask about a place to talk about these projects, compare notes, and get help, so I'm starting a Discord server. Please consider this a general space to talk about your "autonomous agent" projects. I'm happy to help you get the things I've built working, and I'd love to see what you're building. Join at discord.gg/dnhsk3pD2V.
I've had a few people ask about a place to talk about these projects, compare notes, and get help, so I'm starting a Discord server. Please consider this a general space to talk about your "autonomous agent" projects. I'm happy to help you get the things I've built working, and I'd love to see what you're building. Join at discord.gg/dnhsk3pD2V.
I've had a few people ask about a place to talk about these projects, compare notes, and get help, so I'm starting a Discord server. Please consider this a general space to talk about your "autonomous agent" projects. I'm happy to help you get the things I've built working, and I'd love to see what you're building. Join at discord.gg/dnhsk3pD2V.
I've had a few people ask about a place to talk about these projects, compare notes, and get help, so I'm starting a Discord server. Please consider this a general space to talk about your "autonomous agent" projects. I'm happy to help you get the things I've built working, and I'd love to see what you're building. Join at discord.gg/dnhsk3pD2V.
I've had a few people ask about a place to talk about these projects, compare notes, and get help, so I'm starting a Discord server. Please consider this a general space to talk about your "autonomous agent" projects. I'm happy to help you get the things I've built working, and I'd love to see what you're building. Join at discord.gg/dnhsk3pD2V.
I've had a few people ask about a place to talk about these projects, compare notes, and get help, so I'm starting a Discord server. Please consider this a general space to talk about your "autonomous agent" projects. I'm happy to help you get the things I've built working, and I'd love to see what you're building. Join at discord.gg/dnhsk3pD2V.
Hi, thanks for the great videos. Despite following the videos exactly (I also added "observation = next_observation"), when I run main.py, the score does not increase and converges flat to a value of around 0.4, 0.5. I've given it enough training time, but the model isn't improving. I can't find what's wrong, can you help or advise me?
Out of curiosity, how much training time did you give it, and did you put your code in a GitHub repo somewhere? I don't mind taking a quick look at it. If you'd rather share that information outside of the UA-cam comment section, I just set up a Discord server for discussion on these projects, and AI/autonomous agents more generally. You're welcome to join there for assistance as well. discord.gg/dnhsk3pD2V
Robert, can you please inform us on versions of python, tensorflow, keras and keras-rl modules you use? It would be even cooler if you post how you install this libraries (simple pip install keras-rl or install through git clone). Great videos and explanation but hard to reproduce on current versions of libraries.
It wolud be even cooler if i read the video description before asking the questions. Python 3.9.16 gym - 0.21.0 tensorflow: 2.10.0 ale-py: 0.7.5 keras-rl2: 1.0.5
@@Ankara_pharao No problem at all(and sorry for the late reply). In newer videos, I'm including versions, and my requirements.txt, in the video, because you're not the only one to follow up and ask. There's also a discord now for Q&A. discord.gg/dnhsk3pD2V, and if you haven't watched anything of mine in a while, I just finished posting a series on Maze-solving with SAC, and another on robotic arms with TD3.
excellent tutorial, I've been following along and have got the model to produce logs, as well as save and load checkpoints, however, when looking at my tensorboard, rather than a real-time graph, my tensorboard consists of a long series of scores with a single data point in each. The only other notable difference I see, is on the top of your graph, there is a string of hyperparameters (after score 0) which I don't have on mine. Any ideas what may be happening on my side to treat the data points individually rather than as a graph?
So, the string of hyperparameters is mostly for my own convenience. I tend to run lots of experiments, and they all take 10+ hours, so I forget what params I ran them with if I don't put together some kind of output. I'm able to reproduce something similar to what you're describing by removing the "global_step=i" variable from writer.add_scalar. That's the part that actually tells it which "step" it's on, and lets it show data over time. That line should end up looking something like this - writer.add_scalar(f"Score - {episode_identifier} alpha={alpha} - beta={beta} - batch_size={batch_size} - Critic AdamW - l1={layer1_size} l2={layer2_size} noise={starting_noise}", score, global_step=i). Something else you'll notice, I always go directly to the "Scalars" tab, and set smoothing to 0.95 or so. The "Time Series" tab has a bunch of other options I don't try to mess with.
Looks like this will not work in Python 12. The package gymnasium (previously gym) is not available for Python 12
Apologies for missing this back when you posted it but yes, most of my code is running on 3.10 or 3.11, and I haven't tested for compatibility any later. I should probably start including compatible python versions for new projects.
Hi can you please share the code?
Hi, Thanks for the videos can you share the code?
My GitHub is a bit disorganized, but yes. Please note that this was the initial implementation I based the videos on, so it should be very close but may not quite be line for line. github.com/bobcowher/duelling-dqn-breakout-pytorch