Robert Cowher - DevOps, Python, AI
Robert Cowher - DevOps, Python, AI
  • 45
  • 16 563
Cracking /etc/shadow passwords with C++ (yescrypt)
Learn to crack modern yescrypt passwords in Linux's /etc/shadow file with C++. This is a great exercise for Ethical Hacking students.
github.com/bobcowher/youtube-password-cracker-shadow-starter
Переглядів: 188

Відео

Building a Web App with RAG & ChatGPT - Part 4 - Querying the Model
Переглядів 4916 годин тому
This is part 4 of a series on building a web app with RAG(Retrieval Augmented Generation) and ChatGPT. In this video, we'll build the text search and ask methods. Starter code: github.com/bobcowher/youtube-rag-web-starter/tree/main Completed code: github.com/bobcowher/youtube-rag-web Original series by Daniel Bourke: ua-cam.com/video/qN_2fnOPY-M/v-deo.html Human nutrition text: pressbooks.oer.h...
Building a Web App with RAG & ChatGPT - Part 3 - Building the File Processor
Переглядів 6821 годину тому
This is part 3 of a series on building a web app with RAG(Retrieval Augmented Generation) and ChatGPT. In this video, we'll start building the RAG model & file processor. Starter code: github.com/bobcowher/youtube-rag-web-starter/tree/main Human nutrition text: pressbooks.oer.hawaii.edu/humannutrition2/open/download?type=pdf Precalculus textbook: www.opentextbookstore.com/precalc/
Building a Web App with RAG & ChatGPT - Part 2 - Building the Web Interface
Переглядів 10814 днів тому
This is part 2 of a series on building a web app with RAG(Retrieval Augmented Generation) and ChatGPT. In this video, we'll start by building the framework for the web interface. Starter code: github.com/bobcowher/youtube-rag-web-starter/tree/main Human nutrition text: pressbooks.oer.hawaii.edu/humannutrition2/open/download?type=pdf Precalculus textbook: www.opentextbookstore.com/precalc/
Building a Web App with RAG & ChatGPT - Part 1 - Intro
Переглядів 10314 днів тому
We're going to build a RAG(Retrieval Augmented Generation) application with ChatGPT. This is based on the work done by Daniel Bourke under - ua-cam.com/video/qN_2fnOPY-M/v-deo.html For the starter files for this video, visit - github.com/bobcowher/youtube-rag-web-starter/tree/main
Templating CloudFormation with Python(plus C++ comparison)
Переглядів 7814 днів тому
Templating CloudFormation with Python & C .
Robotic Arm Manipulation with Human Experiences & HRL - Part 9 - Training the Meta Agent
Переглядів 84Місяць тому
This is part 9 of a video series on solving long-horizon tasks in the Franka Kitchen gym environment with Pytorch, SAC, Human Experience Replay, and a Hierarchical Model Structure(similar to HRL). In this video, we'll be training the meta agent. For the data set used on this project, visit huggingface.co/datasets/robertcowher/farama-kitchen-sac-hrl-youtube/tree/main For the completed code base ...
Robotic Arm Manipulation with Human Experiences & HRL - Part 10 - The End
Переглядів 67Місяць тому
This is the end of a video series on solving long-horizon tasks in the Franka Kitchen gym environment with Pytorch, SAC, Human Experience Replay, and a Hierarchical Model Structure(similar to HRL). For the data set used on this project, visit huggingface.co/datasets/robertcowher/farama-kitchen-sac-hrl-youtube/tree/main For the completed code base used in this project, visit github.com/bobcowher...
Robotic Arm Manipulation with Human Experiences & HRL - Part 8 - Building the Meta Agent
Переглядів 44Місяць тому
This is part 8 of a video series on solving long-horizon tasks in the Franka Kitchen gym environment with Pytorch, SAC, Human Experience Replay, and a Hierarchical Model Structure(similar to HRL). In this video, we'll building the first part of the meta agent. For the data set used on this project, visit huggingface.co/datasets/robertcowher/farama-kitchen-sac-hrl-youtube/tree/main To install Py...
Robotic Arm Manipulation with Human Experiences & HRL - Part 7 - Training the Agent
Переглядів 56Місяць тому
This is part 7 of a video series on solving long-horizon tasks in the Franka Kitchen gym environment with Pytorch, SAC, Human Experience Replay, and a Hierarchical Model Structure(similar to HRL). In this video, we'll be training the individual agents. For the data set used on this project, visit huggingface.co/datasets/robertcowher/farama-kitchen-sac-hrl-youtube/tree/main To install Pytorch wi...
Robotic Arm Manipulation with Human Experiences & HRL - Part 6 - Building the Agent
Переглядів 78Місяць тому
This is part 6 of a video series on solving long-horizon tasks in the Franka Kitchen gym environment with Pytorch, SAC, Human Experience Replay, and a Hierarchical Model Structure(similar to HRL). In this video, we'll be building the first half of the Agent class. To install Pytorch with CUDA on your platform - pytorch.org/get-started/locally/ For the base SAC implementation I used - github.com...
Robotic Arm Manipulation with Human Experiences & HRL - Part 5 - Building the Model
Переглядів 62Місяць тому
This is part 5 of a video series on solving long-horizon tasks in the Franka Kitchen gym environment with Pytorch, SAC, Human Experience Replay, and a Hierarchical Model Structure(similar to HRL). In this video, we'll be building the model class. To install Pytorch with CUDA on your platform - pytorch.org/get-started/locally/ For the base SAC implementation I used - github.com:pranz24/pytorch-s...
Robotic Arm Manipulation with Human Experiences & HRL - Part 4 - Collecting Human Experiences
Переглядів 153Місяць тому
This is part 4 of a video series on solving long-horizon tasks in the Franka Kitchen gym environment with Pytorch, SAC, Human Experience Replay, and a Hierarchical Model Structure(similar to HRL). In this video, we'll be building the controller class and collecting some human experience data. To install Pytorch with CUDA on your platform - pytorch.org/get-started/locally/ For the game controlle...
Robotic Arm Manipulation with Human Experiences & HRL - Part 2 - Setting up the Environment
Переглядів 95Місяць тому
This is part 2 of a video series on solving long-horizon tasks in the Franka Kitchen gym environment with Pytorch, SAC, Human Experience Replay, and a Hierarchical Model Structure(similar to HRL). In this video, we'll be installing the prerequisites and setting up the environment wrapper. To install Pytorch with CUDA on your platform - pytorch.org/get-started/locally/ For the game controller I'...
Robotic Arm Manipulation with Human Experiences & HRL - Part 3 - Building the Replay Buffer
Переглядів 75Місяць тому
This is part 3 of a video series on solving long-horizon tasks in the Franka Kitchen gym environment with Pytorch, SAC, Human Experience Replay, and a Hierarchical Model Structure(similar to HRL). In this video, we'll be building the replay buffer class. To install Pytorch with CUDA on your platform - pytorch.org/get-started/locally/ For the game controller I'm using - Logitech - F310 Gaming Pa...
Robotic Arm Manipulation with Human Experiences & HRL - Part 1 - Intro (Advanced)
Переглядів 231Місяць тому
Robotic Arm Manipulation with Human Experiences & HRL - Part 1 - Intro (Advanced)
Solving Mazes with Reinforcement Learning - Part 11 - Results & Next Steps
Переглядів 1322 місяці тому
Solving Mazes with Reinforcement Learning - Part 11 - Results & Next Steps
Solving Mazes with Reinforcement Learning - Part 10 - Building the Intrinsic Curiosity Module
Переглядів 712 місяці тому
Solving Mazes with Reinforcement Learning - Part 10 - Building the Intrinsic Curiosity Module
Solving Mazes with Reinforcement Learning - Part 9 - Building the Test Method
Переглядів 602 місяці тому
Solving Mazes with Reinforcement Learning - Part 9 - Building the Test Method
Solving Mazes with Reinforcement Learning - Part 8 - Building the Train Method
Переглядів 432 місяці тому
Solving Mazes with Reinforcement Learning - Part 8 - Building the Train Method
Solving Mazes with Reinforcement Learning - Part 7 - The Parameter Update Method
Переглядів 642 місяці тому
Solving Mazes with Reinforcement Learning - Part 7 - The Parameter Update Method
Solving Mazes with Reinforcement Learning - Part 6 - Starting the Agent Class
Переглядів 532 місяці тому
Solving Mazes with Reinforcement Learning - Part 6 - Starting the Agent Class
Solving Mazes with Reinforcement Learning - Part 5 - Building the Buffer Class
Переглядів 472 місяці тому
Solving Mazes with Reinforcement Learning - Part 5 - Building the Buffer Class
Solving Mazes with Reinforcement Learning - Part 4 - Building the Actor Model
Переглядів 762 місяці тому
Solving Mazes with Reinforcement Learning - Part 4 - Building the Actor Model
Solving Mazes with Reinforcement Learning - Part 3 - Building the Critic Model
Переглядів 672 місяці тому
Solving Mazes with Reinforcement Learning - Part 3 - Building the Critic Model
Solving Mazes with Reinforcement Learning - Part 2 - Building the Maze
Переглядів 692 місяці тому
Solving Mazes with Reinforcement Learning - Part 2 - Building the Maze
Solving Mazes with Reinforcement Learning - Part 1 - A Quick Intro
Переглядів 1892 місяці тому
Solving Mazes with Reinforcement Learning - Part 1 - A Quick Intro
How I use VSCode with Python - Keyboard Shortcuts, Conda, ChatGPT, and Tensorboard
Переглядів 1314 місяці тому
How I use VSCode with Python - Keyboard Shortcuts, Conda, ChatGPT, and Tensorboard
Robotic Arm Manipulation with Reinforcement Learning - Part 5 - Building the main loop
Переглядів 4727 місяців тому
Robotic Arm Manipulation with Reinforcement Learning - Part 5 - Building the main loop
Robotic Arm Manipulation with Reinforcement Learning - Part 6 - Testing & Experimentation
Переглядів 5757 місяців тому
Robotic Arm Manipulation with Reinforcement Learning - Part 6 - Testing & Experimentation

КОМЕНТАРІ

  • @bleah4321
    @bleah4321 День тому

    Hello. thank you for the lesson. I'm getting an error that says "undefined reference to 'crypt' collect2 : error : ld returned 1 exit status"

  • @texwiller7577
    @texwiller7577 5 днів тому

    That with the mutli objective single policy is really interesting and I'm looking forward to see the progress, no matter how good/bad it is. Can you at least anticipate, how do assign the rewards? Are there different reward for different tasks? How do you set the task execution order? Many thanks for all those wonderful videos.

  • @texwiller7577
    @texwiller7577 7 днів тому

    Hey Robert. In the playlist this video (part 2) comes after part 3. Probably you should swap them. Thanks again for your videos

  • @Patrick-wn6uj
    @Patrick-wn6uj 10 днів тому

    Its like you read my mind hhhhh I have been trying to make this. boom you post it

    • @robertcowher
      @robertcowher 7 днів тому

      Glad to help ;) I had a lot of fun with this one.

  • @WeiLi-s8x
    @WeiLi-s8x 12 днів тому

    I hope you can create a tutorial video on creating a simulation environment and writing code for env and tasks.

  • @WeiLi-s8x
    @WeiLi-s8x 12 днів тому

    After following all the courses with you, I have gained a lot of help. I have a few questions: First, I have successfully trained all tasks using your dataset except for the hinge_cabinet task, which is not working, and I don't know why. Second, why did I use PyCharm before, but in this series, I used VSCode? What is the difference between the two? Third, could you later release a tutorial on how to import and define my own robot and environment for training?

    • @robertcowher
      @robertcowher 12 днів тому

      Great questions. 1) For hinge cabinet, what behavior are you seeing? Have you tried deleting the weights file for that network and retraining? 2) You can use whatever IDE you're most comfortable with for most of my videos. I switched over to VSCode full time because I needed something that supported Python, Jupyter notebooks, C++, and ROS and integrated well with ChatGPT. Jetbrains(the company that makes Pycharm) has solutions for all of those problems but it pushes you into their paid tier and they bundle some of those capabilities into different products(PyCharm v.s. CLion). 3) That's something I'm currently trying to figure out. When I do, I'll definitely make a video :)

    • @WeiLi-s8x
      @WeiLi-s8x 11 днів тому

      I'm very excited to receive your reply, thank you very much, and I look forward to your new work@@robertcowher

  • @tonyho6882
    @tonyho6882 21 день тому

    Such a greate series. I have looked up this content for a long. Thank you so much for your efforts.

    • @robertcowher
      @robertcowher 21 день тому

      Glad you're enjoying the videos. I've also had a hard time finding good robotics-specific RL content. I may not be the best person to fill that gap, but I'm definitely having the most fun with it :)

  • @karamdaaboul6528
    @karamdaaboul6528 22 дні тому

    Thank you for your description. After setting the `last_action` for the first goal, you already obtain the next state, which could be used as input for the policy of the second task. Instead of reusing the action from task 1, wouldn’t it be better to sample a fresh action based on the task-specific policy for task 2, given the last state from task 1? This new action should be more appropriate for task 2.

    • @robertcowher
      @robertcowher 22 дні тому

      Good observation, and the answer is simple enough. The environment returns different observation shapes for some of the goals. You can absolutely take the last state, instead of last action, and generate a new action from the new policy, but it will be the wrong shape for some of the models and cause the agent to crash. I've been playing with solving that problem other ways(for example, padding the observations so all goals/models have the same shape), but that didn't make this implementation.

    • @karamdaaboul6528
      @karamdaaboul6528 21 день тому

      @@robertcowher Thank you, it is now clear to me. You talked about your omni agent in the last video; for this agent, you might have a large observation space consisting of the observation spaces of all objects in the environment.

  • @scott089
    @scott089 22 дні тому

    With this approach do you collect manipulation data first and then train the model on this collected dataset, also another question I have is only manipulation data or some visual input too?

    • @robertcowher
      @robertcowher 22 дні тому

      Great question. I'm not collecting visual data for this one, but the observations include a numerical representation of the environment. What you'll find a bit later in the course is that I'm using a hybrid approach, starting live SAC training with a buffer of experiences, and weighting the buffer towards picking up more of those live experiences for the first few hundred episodes.

  • @Patrick-wn6uj
    @Patrick-wn6uj Місяць тому

    Make a video on decision mamba - reinforcement learning that uses mamba network

    • @robertcowher
      @robertcowher Місяць тому

      I haven't tried MAMBA, but I'll check it out, and make a video if it looks interesting. Thanks for the tip!

  • @texwiller7577
    @texwiller7577 Місяць тому

    YESSSS!!!! Now it works!

    • @robertcowher
      @robertcowher Місяць тому

      Hey, thanks for the problem report. I wouldn't have noticed for weeks. :)

  • @texwiller7577
    @texwiller7577 Місяць тому

    Totally underestimated video series. I cannot stop watching it. Anyway, if I understood correctly, we need a _super_ policy, which selects one of the sub policies for accomplishing a complex task (e.g open microwave, turn on stove, close microwave, etc.), right?

    • @robertcowher
      @robertcowher Місяць тому

      Correct. Though in this case, the super policy is going to be static(i.e. looping through a list of goals and selecting the appropriate sub-policy). We'll get there in the next few videos :)

  • @robertcowher
    @robertcowher Місяць тому

    I strongly encourage everyone to generate their own data, but I've also posted my dataset to HuggingFace here - huggingface.co/datasets/robertcowher/farama-kitchen-sac-hrl-youtube/tree/main

  • @texwiller7577
    @texwiller7577 Місяць тому

    Top videos! Did I understand correctly? You took 30.000 iteractions using the joystick?

    • @robertcowher
      @robertcowher Місяць тому

      Great question, and yes, but there's some nuance. Every timestep is considered a memory so continuous joystick actions tend to generate a lot of them very quickly. My average was right around 200 timesteps per task once I got good at it, so filling a 30K memory buffer takes about 150 successful completions of that task. I experimented with human memory buffers from 20K to 60K for various tasks and found 30K to be a good minimum buffer size to succeed at all tasks. To that end, the can_sample method we've coded here looks for batch_size(64) * 500, or 32,000. You could tweak that multiplier and experiment with less.

    • @robertcowher
      @robertcowher Місяць тому

      I was going to do this at the end of the series, but I went ahead and pinned a comment with my data set. I still recommend doing some of your own data generation to get the full process, but it's there to save you some time.

  • @robertcowher
    @robertcowher Місяць тому

    For questions about the videos, or just to come talk about robots and AI, join me on Discord - discord.gg/dnhsk3pD2V

  • @moshoodolawale3591
    @moshoodolawale3591 2 місяці тому

    Interestingly amazing tutorials. Looking forward to more of this. Thanks

    • @robertcowher
      @robertcowher 2 місяці тому

      Glad you enjoyed the series :) Any specific environments you'd like to see solved? I'm already working on object sorting with a virtual Emika Panda robot arm, but that's been a challenge so far.

  • @WilliamChen-pp3qs
    @WilliamChen-pp3qs 2 місяці тому

    why are we only using the forward model instead of both forward and inverse model of ICM?

  • @Patrick-wn6uj
    @Patrick-wn6uj 2 місяці тому

    I wish you could get a bunch of views

    • @robertcowher
      @robertcowher 2 місяці тому

      Hey, me too :) Honestly though, UA-cam helps me feel like there's a clear end goal when building these projects, and that's a big part of why I'm doing it. If it's useful to a few people, I've hit my goals.

  • @Patrick-wn6uj
    @Patrick-wn6uj 2 місяці тому

    keep posting

    • @robertcowher
      @robertcowher Місяць тому

      If you enjoyed these, you might like the new series I'm working on. It's a similar robotic arm in a much more complex environment with a human experience component. ua-cam.com/video/ma6fbvy77Uo/v-deo.html

  • @EthanTownsend-t7o
    @EthanTownsend-t7o 2 місяці тому

    Hi I just followed your tutorial and it worked! Thank you for making it clear and easy to follow. If I wanted to change the rewards policy in the future (such as rewarding proximity to the ball or longevity) would I be able to? Or is the reward policy pretty much set in place?

    • @robertcowher
      @robertcowher 2 місяці тому

      Glad to hear it, and great question. What you're describing is an observation wrapper. They're part of the gym library and here for exactly these kinds of changes. An observation wrapper let's you "wrap" the environment, and then add your custom logic on top of it. My recommendation would be to get a wrapper working for something really simple(say, doubling the reward when the agent hits a ball) to make sure you've got the wiring right, and then getting into the logic of what you really want to do. They sound complicated, but a basic observation wrapper can be 6 lines of code. Here's the doc to follow -> gymnasium.farama.org/api/wrappers/observation_wrappers/

    • @robertcowher
      @robertcowher 2 місяці тому

      Also worth noting - An observation wrapper is what I should have used instead of processing the observation in the training loop.

  • @peterpettigrew5972
    @peterpettigrew5972 3 місяці тому

    please provide gthub repo for this code

  • @mobina3017
    @mobina3017 3 місяці тому

    I get this error for the forward method of the ActorNetwork class, any idea how I can solve it? `RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x128 and 46x256)`

    • @robertcowher
      @robertcowher 3 місяці тому

      That error usually means that the output of one layer doesn't match the input of the next layer. Check for typos in the layer input/output sizes, and if nothing jumps out at you, start printing out the variables and their sizes as you go through the forward method in the network. One of them is likely going to be a different size than expected and that's your culprit.

  • @Paranoid_mp3
    @Paranoid_mp3 4 місяці тому

    Только учусь python и как раз искал нечто похожее. Спасибо за видео.

    • @robertcowher
      @robertcowher 4 місяці тому

      Absolutely. Glad you enjoyed it.

  • @asitrath8838
    @asitrath8838 4 місяці тому

    Hi, I keep getting the below error after a few iterations. I have combed through the code its the same as yours and chatgpt was not able to resolve the error. Can you please help? line 97, in train state_b, action_b, reward_b, done_b, next_state_b = self.memory.sample( File "C:\Users\asitr\Downloads\agent.py", line 37, in sample return [torch.cat(items).to(self.device) for items in batch] File "C:\Users\asitr\Downloads\agent.py", line 37, in <listcomp> return [torch.cat(items).to(self.device) for items in batch] TypeError: expected Tensor as element 29 in argument 0, but got tuple

    • @robertcowher
      @robertcowher 4 місяці тому

      Would you mind posting your code to GitHub so I can do a quick comparison?

    • @asitrath8838
      @asitrath8838 4 місяці тому

      @@robertcowher I have posted my GitHub link 3 times now and keeps getting deleted. Can I give it to you in any other way?

  • @asitrath8838
    @asitrath8838 4 місяці тому

    I am getting the below error when i am trying to run atari_breakout.py. Have you encountered this? x = torch.Tensor(x) ValueError: expected sequence of length 210 at dim 1 (got 3)

    • @robertcowher
      @robertcowher 4 місяці тому

      That usually means you've either typod a dimension, a batch size, or missed one of the torch.unsqueeze lines. General advice - 1) Go through it line by line and make sure you didn't miss anything. 2) Print out what's being passed through the model. Look at the actual output and see if it makes the error make more sense. 3) Plug the whole block of code + the error into ChatGPT. It's really good at spotting these kinds of issues.

  • @SachinJadhav-p8w
    @SachinJadhav-p8w 5 місяців тому

    Content on this topic seems to be quite sparse on internet. Your videos have helped me a lot. Thanks for sharing the knowledge !!!

    • @robertcowher
      @robertcowher 5 місяців тому

      Glad you're enjoying them, and I've noticed the same trying to learn this stuff myself. There's plenty of robotics content(ROS2, etc) and plenty of information on RL, but it's hard to find information on the intersection of those two topics. I'm working on getting SAC working with more complex tasks(like object sorting), as well as using human experience(piloting the robot by hand) to bootstrap more quickly. I'll post that project if I can get it running.

  • @robertcowher
    @robertcowher 6 місяців тому

    I've had a few people ask about a place to talk about these projects, compare notes, and get help, so I'm starting a Discord server. Please consider this a general space to talk about your "autonomous agent" projects. I'm happy to help you get the things I've built working, and I'd love to see what you're building. Join at discord.gg/dnhsk3pD2V.

  • @robertcowher
    @robertcowher 6 місяців тому

    I've had a few people ask about a place to talk about these projects, compare notes, and get help, so I'm starting a Discord server. Please consider this a general space to talk about your "autonomous agent" projects. I'm happy to help you get the things I've built working, and I'd love to see what you're building. Join at discord.gg/dnhsk3pD2V.

  • @robertcowher
    @robertcowher 6 місяців тому

    I've had a few people ask about a place to talk about these projects, compare notes, and get help, so I'm starting a Discord server. Please consider this a general space to talk about your "autonomous agent" projects. I'm happy to help you get the things I've built working, and I'd love to see what you're building. Join at discord.gg/dnhsk3pD2V.

  • @robertcowher
    @robertcowher 6 місяців тому

    I've had a few people ask about a place to talk about these projects, compare notes, and get help, so I'm starting a Discord server. Please consider this a general space to talk about your "autonomous agent" projects. I'm happy to help you get the things I've built working, and I'd love to see what you're building. Join at discord.gg/dnhsk3pD2V.

  • @robertcowher
    @robertcowher 6 місяців тому

    I've had a few people ask about a place to talk about these projects, compare notes, and get help, so I'm starting a Discord server. Please consider this a general space to talk about your "autonomous agent" projects. I'm happy to help you get the things I've built working, and I'd love to see what you're building. Join at discord.gg/dnhsk3pD2V.

  • @robertcowher
    @robertcowher 6 місяців тому

    I've had a few people ask about a place to talk about these projects, compare notes, and get help, so I'm starting a Discord server. Please consider this a general space to talk about your "autonomous agent" projects. I'm happy to help you get the things I've built working, and I'd love to see what you're building. Join at discord.gg/dnhsk3pD2V.

  • @박보형학생공과대학기
    @박보형학생공과대학기 6 місяців тому

    Hi, thanks for the great videos. Despite following the videos exactly (I also added "observation = next_observation"), when I run main.py, the score does not increase and converges flat to a value of around 0.4, 0.5. I've given it enough training time, but the model isn't improving. I can't find what's wrong, can you help or advise me?

    • @robertcowher
      @robertcowher 6 місяців тому

      Out of curiosity, how much training time did you give it, and did you put your code in a GitHub repo somewhere? I don't mind taking a quick look at it. If you'd rather share that information outside of the UA-cam comment section, I just set up a Discord server for discussion on these projects, and AI/autonomous agents more generally. You're welcome to join there for assistance as well. discord.gg/dnhsk3pD2V

  • @183lucrido_ase
    @183lucrido_ase 6 місяців тому

    Robert, can you please inform us on versions of python, tensorflow, keras and keras-rl modules you use? It would be even cooler if you post how you install this libraries (simple pip install keras-rl or install through git clone). Great videos and explanation but hard to reproduce on current versions of libraries.

    • @183lucrido_ase
      @183lucrido_ase 6 місяців тому

      It wolud be even cooler if i read the video description before asking the questions. Python 3.9.16 gym - 0.21.0 tensorflow: 2.10.0 ale-py: 0.7.5 keras-rl2: 1.0.5

    • @robertcowher
      @robertcowher 2 місяці тому

      @@183lucrido_ase No problem at all(and sorry for the late reply). In newer videos, I'm including versions, and my requirements.txt, in the video, because you're not the only one to follow up and ask. There's also a discord now for Q&A. discord.gg/dnhsk3pD2V, and if you haven't watched anything of mine in a while, I just finished posting a series on Maze-solving with SAC, and another on robotic arms with TD3.

  • @WaltWhite71100
    @WaltWhite71100 6 місяців тому

    excellent tutorial, I've been following along and have got the model to produce logs, as well as save and load checkpoints, however, when looking at my tensorboard, rather than a real-time graph, my tensorboard consists of a long series of scores with a single data point in each. The only other notable difference I see, is on the top of your graph, there is a string of hyperparameters (after score 0) which I don't have on mine. Any ideas what may be happening on my side to treat the data points individually rather than as a graph?

    • @robertcowher
      @robertcowher 6 місяців тому

      So, the string of hyperparameters is mostly for my own convenience. I tend to run lots of experiments, and they all take 10+ hours, so I forget what params I ran them with if I don't put together some kind of output. I'm able to reproduce something similar to what you're describing by removing the "global_step=i" variable from writer.add_scalar. That's the part that actually tells it which "step" it's on, and lets it show data over time. That line should end up looking something like this - writer.add_scalar(f"Score - {episode_identifier} alpha={alpha} - beta={beta} - batch_size={batch_size} - Critic AdamW - l1={layer1_size} l2={layer2_size} noise={starting_noise}", score, global_step=i). Something else you'll notice, I always go directly to the "Scalars" tab, and set smoothing to 0.95 or so. The "Time Series" tab has a bunch of other options I don't try to mess with.

  • @Mutrino
    @Mutrino 6 місяців тому

    Looks like this will not work in Python 12. The package gymnasium (previously gym) is not available for Python 12

    • @robertcowher
      @robertcowher 2 місяці тому

      Apologies for missing this back when you posted it but yes, most of my code is running on 3.10 or 3.11, and I haven't tested for compatibility any later. I should probably start including compatible python versions for new projects.

  • @homataha5626
    @homataha5626 6 місяців тому

    Hi can you please share the code?

  • @homataha5626
    @homataha5626 6 місяців тому

    Hi, Thanks for the videos can you share the code?

    • @robertcowher
      @robertcowher 6 місяців тому

      My GitHub is a bit disorganized, but yes. Please note that this was the initial implementation I based the videos on, so it should be very close but may not quite be line for line. github.com/bobcowher/duelling-dqn-breakout-pytorch

  • @davebostain8588
    @davebostain8588 7 місяців тому

    I have been looking for a course that shows how to develop the code to play Atari Pong and Breakout. Hopefully this will be it. I have a couple of programs that will train, but I can't get the display to work (Colab). Many are old. With this recent course, I am hoping I can get it all to work. I will update this after I complete. I am optimistic...

    • @robertcowher
      @robertcowher 7 місяців тому

      There's going to be a lot here that's not relevant to the problem you're trying to solve, but this course by Escape Velocity Labs has a several examples of UI/Gym-based code in Colab. The approach he takes is to generate videos of the agent playing and then play those. www.udemy.com/course/advanced-rl-pg. Another approach would be to train the agent in Colab and then download the trained network and run it locally. Either way, good luck! I've also got a series on robotic control you might enjoy.

    • @davebostain8588
      @davebostain8588 7 місяців тому

      Thanks for the reply. I have taken one of this guy's courses before and the software worked, as I remember, so I may try it. I think I will still do your course for the code along exercise. Where can I access your series on robotic control? BTW, I took Mike Cohen's DUDL course. It was amazing. I am occasionally in contact with him asking him to do an RL course, but no luck so far!

    • @robertcowher
      @robertcowher 7 місяців тому

      ​@@davebostain8588 If Mike Cohen ever puts out an RL course, I'll be first in line. First robot arm video is here - ua-cam.com/video/z1Lnlw2m8dg/v-deo.html I'm still working on more complex tasks, like object sorting & assembly, but this technique gets you to a robot arm that can open doors and(if you make a few tweaks) pick up blocks successfully.

    • @davebostain8588
      @davebostain8588 7 місяців тому

      Great, thanks again Robert. I got through the first two Breakout videos and everything works; so far so good

    • @davebostain8588
      @davebostain8588 6 місяців тому

      @@robertcowher I got the model to work but the training in PyCharm was really slow. I transferred the code to Colab and am training with a GPU and it is much faster. I trained for 9500 epochs, stopped the training, loaded the saved model and recommenced training. However, the model acts like it is starting with new random weights. At 9500 epochs, the reward was about -2.5, but when I loaded the model it started back at -5.0. Have you ever seen that before? I had the same symptom with another DQN program on Pong. I have used the same save/load model procedure on Acrobat and other Gym agents and it worked as expected. I'm puzzled...

  • @Mansi_V_Jain
    @Mansi_V_Jain 7 місяців тому

    Hey, will I be able to replicate this on Jupyter notebook? Or Google colab? Or do I have to use pycharm

    • @robertcowher
      @robertcowher 7 місяців тому

      You can use whatever editor you're most comfortable with. My advice: If you're relatively new, follow along with the exact same tools I'm using. If you've been working with Python for a while and you're comfortable making changes to support your workflow, roll on with your favorite IDE :) I still follow that pattern when learning new languages/tools. One practical concern is that these agents do take a very long time to train, and something like Pycharm is going to make it easy to have separate train and test files, to validate as you go.

    • @Mansi_V_Jain
      @Mansi_V_Jain 7 місяців тому

      Hey, I reached the training step, and it’s been almost 2 hours, and it hasn’t printed ‘10 epochs’ yet. My laptop supports intel iRIS Xe Graphics.

    • @robertcowher
      @robertcowher 7 місяців тому

      If you print out “device” is it actually running on your GPU?

    • @Mansi_V_Jain
      @Mansi_V_Jain 7 місяців тому

      @@robertcowher no it was running on my cpu😅

    • @robertcowher
      @robertcowher 7 місяців тому

      ​@@Mansi_V_Jain That would do it. I know with an Nvidia GPU I had to install a specific version of Torch with CUDA support, after installing CUDA. Not sure what that looks like for IRIS. It's a little more expert friendly, but you could always look at something like www.runpod.io/. You can load it up with a $10 credit balance and rent a machine with an RTX 3070 for $0.14/hour. I've done that a few times when I didn't want to have my desktop tied up for a couple of days for training. A mid-tier gaming machine is also a good investment for this kind of work, if you want an excuse to buy one. Anytime you're training on images, you need a decent GPU and plenty of RAM, so get something expandable.

  • @akashvyas7715
    @akashvyas7715 7 місяців тому

    Hi.. robert , when will you release the next video?

    • @robertcowher
      @robertcowher 7 місяців тому

      I’m planning to post the next one this weekend.

    • @akashvyas7715
      @akashvyas7715 7 місяців тому

      Thanks, @@robertcowher. After watching your video, I'm also performing a lifting task using DDPG, but it's not converging, so now I'm playing with hyperparameters. I'm hopeful that your implementation will work.

    • @robertcowher
      @robertcowher 7 місяців тому

      I spent about a month playing around with regular DDPG and it just wasn't robust enough for these kinds of problems. It would converge on something weird like tapping the floor next to the block, but not picking it up, or it the robot would just randomly start spinning and then lock into that pattern. Switching over to TD3 let me successfully open doors and pick up blocks. I'm trying to get one working on a sorting task now, but that's been quite a bit harder. If you manage to get this code working on something more complex, a repo link would be greatly appreciated :) I just posted parts 5 and 6, which should get you to the end of the project. Good luck!

    • @WaltWhite71100
      @WaltWhite71100 6 місяців тому

      @@robertcowher I've setup the examples you've been demonstrating in this series and really appreciate your walk-through. I have it working with varying degrees of success, depending on different things I try. Sometimes the score converges around 220, other times less than half that. I'd love to talk in more depth about the training, checkpointing, testing, rewards shaping, and the robosuite environment. Aside from these comments, is there a community or place (e.g. slack, discord, email or other) where people can discuss/share/compare notes.

    • @robertcowher
      @robertcowher 6 місяців тому

      ​@@WaltWhite71100 Glad to hear it's(mostly) working for you. Sometimes lowering the learning rate or increasing exploration can improve stability, but your mileage may vary. I'm actually trying to get this code working on a harder problem(object sorting, vs object grasping) and it's not scaling well so far, but if I can figure it out, I'll post another video. As for discussion, I honestly haven't thought about it, but I like the idea of setting up a Discord. I'll take some time over the weekend to see if there's a group it makes sense to glom onto, and if not, I may set something up. Thanks for the idea :)

  • @akashvyas7715
    @akashvyas7715 7 місяців тому

    Please, make a video on how to take image observation as feedback.

    • @robertcowher
      @robertcowher 7 місяців тому

      Will do! I'm going to try to get the second half of this series up in the next couple of weeks. Should be able to pivot and work on it then.

  • @SaschaRobitzki
    @SaschaRobitzki 9 місяців тому

    Instead of img.unsqueeze(0).unsqueeze(0) you can do img[(None,)*2].

    • @robertcowher
      @robertcowher 9 місяців тому

      Thanks! I'll give it a shot.

  • @4amalreadyy
    @4amalreadyy 10 місяців тому

    Hey I really need help with somehting. My model is not showing any results even after 6000 episodes. It's average it's still around -4.5. Can you please help me? Is there anyway to contact you?

    • @robertcowher
      @robertcowher 10 місяців тому

      Sure. So a few thoughts - 1) If your model isn't learning at all, double check the sections with the loss function and the reward structure. 2) If you haven't yet, plug the entire section you're having trouble with into ChatGPT and explain the problem to it. I've worked out several complex coding problems by using ChatGPT as a pair programming assistant. 3) I recommend spending the most time on steps 1 and 2, because these problems are where you really start to learn. If all else fails, I've got the project published to GitHub here. You can always run a diff and see if a typo or something simple like that is off - github.com/bobcowher/duelling-dqn-breakout-pytorch

    • @4amalreadyy
      @4amalreadyy 10 місяців тому

      @@robertcowher Found a possible typo. In the repo, inside agent.py, on Line 67, you wrote "-1" for the argument "dim" of the torch.argmax() function. But on your part 5 video you use "1" for that argument. Is it possible that the agent wasn't using the correct output of the neural network because of that?

    • @4amalreadyy
      @4amalreadyy 10 місяців тому

      @@robertcowherI've cloned your repo and trained an agent for 1000 epochs, using the hyperparameters values you left there, but he still has an average of -4 . Is this just bad luck?

    • @robertcowher
      @robertcowher 10 місяців тому

      @@4amalreadyyWell now I'm curious. I'm going to try clearing the weights and re-training the network over the weekend. I do see a difference between what I've got locally and what I originally pushed to GitHub so I may have made a tweak I forgot to commit. I'll also go ahead and add Tensorboard and give you an idea of how long it's expected to train.

    • @4amalreadyy
      @4amalreadyy 10 місяців тому

      ​@@robertcowher Yes please , that would be wonderful. Can we keep in touch? I've been training the agent for hours on end with different hyper-parameters and I never got to any significant result. Please help us , I really wanted to put this to work. If there's a way to keep in touch please tell

  • @Hi-hl1ci
    @Hi-hl1ci 10 місяців тому

    Hi, thank you for the video. By any chance, did you post your code somewhere? Thanks

  • @4amalreadyy
    @4amalreadyy 10 місяців тому

    Could you please share how many epochs the model you tested at 14:40 was trained on?

    • @robertcowher
      @robertcowher 10 місяців тому

      That took just over a million epochs. That's one of the reasons tracking statistics is important, training can easily run for 12+ hours. I did a lot of tweaking, making sure it was trending upwards, and then doing something else for a few hours or leaving it overnight.

    • @4amalreadyy
      @4amalreadyy 10 місяців тому

      @@robertcowher Thank you for the quick response! I'm intrigued by the tweaks you mentioned that helped in efficiently training the model for over a million epochs. Could you elaborate on the specific adjustments you made and how they contributed to speeding up the training process? Any insights would be greatly appreciated!

    • @robertcowher
      @robertcowher 10 місяців тому

      ​@@4amalreadyy A lot of it is included in the final code I put in this video. Number of layers, the choice of a duelling DQN(v.s single), quantity of dropout, etc. A few things - - As the network trains, you may want to tweak the learning rate. Starting with a higher learning rate in the beginning(0.001 or 0.0001) will help the network converge faster, but falling back to a lower learning rate can help it find a more specific optimal policy later in training. - Since I built this, I've found that sometimes smaller networks are more stable and train faster, and this might have worked just fine with hidden layers of 300 or 400 neurons. Something to test out and play with. - Tensorboard is much better than the hacked-together graphing class I built here. This course isn't specifically on reinforcement learning, but it helped me gain a much better understanding of how these networks work, and how different hyperparameters help learning - www.udemy.com/course/deeplearning_x/ I'm currently working on training a (virtual) robotic control arm with DDPG and I plan to do another walkthrough video when I get it working. It should be a fun one, and will include a Tensorboard demo.

    • @4amalreadyy
      @4amalreadyy 10 місяців тому

      Hi again, I've been experimenting with the model and noticed that, despite having a fast GPU, the training seems slower than expected. I came across other PyTorch-based models, like one for a snake game, which show impressive results in just 10 minutes of training without using Dueling Network Architecture AND rendering the game as it trains. I've been training my model for over 3 hours, but the average score is still hovering around -4.0. I made sure to follow your agent.py exactly as in your video. Is there something I might be missing that's bottlenecking the training? I'm relatively new to this and would greatly appreciate any guidance or tips you might have to speed up the training process. Thank you for your time!@@robertcowher

  • @4amalreadyy
    @4amalreadyy 10 місяців тому

    The video cuts abruptly , I'm still getting this error: mat1 and mat2 shapes cannot be multiplied (64x1024 and 3136x1024)

    • @robertcowher
      @robertcowher 10 місяців тому

      Thank you for pointing that out. Change state_value1 on line 26 to state_value3 to fix that error. That whole AtariNet file should look like this. import torch import torch.nn as nn class AtariNet(nn.Module): def __init__(self, nb_actions=4): super(AtariNet, self).__init__() self.relu = nn.ReLU() self.conv1 = nn.Conv2d(1, 32, kernel_size=(8, 8), stride=(4, 4)) self.conv2 = nn.Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2)) self.conv3 = nn.Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1)) self.flatten = nn.Flatten() self.dropout = nn.Dropout(p=0.2) self.action_value1 = nn.Linear(3136, 1024) self.action_value2 = nn.Linear(1024, 1024) self.action_value3 = nn.Linear(1024, nb_actions) self.state_value1 = nn.Linear(3136, 1024) self.state_value2 = nn.Linear(1024, 1024) self.state_value3 = nn.Linear(1024, 1) def forward(self, x): x = torch.Tensor(x) x = self.relu(self.conv1(x)) x = self.relu(self.conv2(x)) x = self.relu(self.conv3(x)) x = self.flatten(x) state_value = self.relu(self.state_value1(x)) state_value = self.dropout(state_value) state_value = self.relu(self.state_value2(state_value)) state_value = self.dropout(state_value) state_value = self.relu(self.state_value3(state_value)) state_value = self.dropout(state_value) action_value = self.relu(self.action_value1(x)) action_value = self.dropout(action_value) action_value = self.relu(self.action_value2(action_value)) action_value = self.dropout(action_value) action_value = self.relu(self.action_value3(action_value)) action_value = self.dropout(action_value) output = state_value + (action_value - action_value.mean()) return output def save_the_model(self, weights_filename='models/latest.pt'): # Take the default weights filename(latest.pt) and save it torch.save(self.state_dict(), weights_filename) def load_the_model(self, weights_filename='models/latest.pt'): try: self.load_state_dict(torch.load(weights_filename)) print(f"Successfully loaded weights file {weights_filename}") except: print(f"No weights file available at {weights_filename}")

    • @4amalreadyy
      @4amalreadyy 10 місяців тому

      That solved it! Thank you so much!@@robertcowher

    • @henrybarthelemy2692
      @henrybarthelemy2692 6 місяців тому

      @@robertcowherThere are no errors being thrown, but conceptually why does this code in this comment include a dropout for the action&state value after the 3rd layer while the video code does not for action/state value for overfitting?

    • @robertcowher
      @robertcowher 6 місяців тому

      ​@@henrybarthelemy2692 You're very observant, and the answer is way less interesting than I'd like it to be. Because I build these projects ahead of time, and then re-type them for the videos, it's easy to miss a line and the code I have in my repo(and thus what I pasted) won't 100% match what I have in the videos. Conceptually though, they both work because the full multi-layer dropout wasn't necessary to solve this particular problem. You'll find a lot of this is a matter of experimentation, and I'd encourage you to experiment with more or less dropout and see how it affects training. If you'd like to continue the conversation, I actually launched a Discord server this morning, and you're welcome to join. discord.gg/dnhsk3pD2V.

  • @thoughtslibrary
    @thoughtslibrary 11 місяців тому

    thank you bro

  • @gabrieldumas_
    @gabrieldumas_ Рік тому

    love your videos man, keep going !! Cheers from brasil !!

    • @robertcowher
      @robertcowher Рік тому

      Thanks! I just started another series on Atari Breakout using Pytorch. I plan to upload them over the next week.

  • @molayy5956
    @molayy5956 Рік тому

    Hi Robert, I am currently in the process of training my model to learn how to play Pong using the reinforcement learning framework from the previous two parts of this series. I had one question regarding loading the weights after the 1000000 steps have finished. Is it possible to re-train the model based on the weights saved to the checkpoint file? I am asking this because if I find the model's performance to not be adequate after the 1000000 steps, I don't want to have to train the model completely from scratch as I am not using CUDA GPU acceleration.

    • @robertcowher
      @robertcowher Рік тому

      Yes! The Try/Catch on line 105 should load the checkpoint file and let the model pick back up where you left it. It took me a couple of days to train this one and I came back to it several times. Something you may need to tweak is the nb_steps_warmup value and the value_max in the LinearAnnealedPolicy. nb_steps_warmup is going to have your agent take 50,000 random actions before it starts learning again, so you want to set that to 0 or close to 0 if you're starting from a mostly trained model.

    • @robertcowher
      @robertcowher Рік тому

      The value_max and value_min on the LinearAnnealedPolicy are a ratio of how often the agent will go to the model for an action v.s. taking a random action. A "1" represents 100% random responses, where a 0 represents 100% responses from the agent. If you're starting from pre-trained, you probably want that max to be 0.2 or 0.3. You want to leave some randomness in for training purposes, but you don't want your mostly trained agent to start out picking 100% random value. If this answer helped you, do me a favor and give the video a thumbs up :)

    • @molayy5956
      @molayy5956 Рік тому

      @@robertcowher I have a follow up questions regarding the dependency versions you are using for this project. For some reason when I try running the dqn.test function I receive a "too many values to unpack (expected 4)" error. The versions for my dependencies are as follows: tensorflow(2.12), gym(0.26.2), ale-py(0.8.1), atari(0.2.9), keras-rl2(1.0.5). I can run the dqn.test function on gym(0.12.5 and 0.9.5) using 'Pong-v1' and the render window pops up with no errors. However nothing happens (just a frame of the environment). When I use gym(0.25.2) which worked for the classic control games, I receive the 'render(mode) is depracated' error even though I pass the render mode into the gym.make() function as the error suggests. I have tried so many different combinations but nothing has let me test my model successfully, do you have any suggestions?

    • @robertcowher
      @robertcowher Рік тому

      @@molayy5956 gym - 0.21.0, tensorflow: 2.10.0, ale-py: 0.7.5, keras-rl2: 1.0.5, and I don't have atari installed. I'll add these to the video description as well.

    • @molayy5956
      @molayy5956 Рік тому

      @@robertcowher Thanks, though I am unable to install gym(0.21.0), its failing to build the wheel. I have a feeling it is due to incompatability with opencv, what version of openCV do you have?