TinyRL: Can AI Learn to Swing Up a Real Pendulum? | DigiKey

Поділитися
Вставка
  • Опубліковано 3 сер 2024
  • Reinforcement learning (RL) is a form of machine learning that involves training agents to interact with an environment in order to maximize cumulative rewards. In this video, we teach an AI to swing up a pendulum using real hardware and RL.
    A write-up of the project can be found here: www.digikey.com/en/maker/proj...
    An RL agent learns to interact with its environment using trial and error. Shawn creates an interface in Arduino that can control a stepper motor and read the position of an encoder attached to the pendulum. The goal is to train an agent to learn to swing up the pendulum on its own.
    Intro to Reinforcement Learning video: • Introduction to Reinfo...
    Hyperparameter Optimization video: • Hyperparameter Optimiz...
    To accomplish this, the Arduino is connected to a computer running the Farama gymnasium and Stable Baselines3 frameworks. These frameworks take in the observations, have the agent guess an action, and tell the Arduino what action to take. The agent is updated using the proximal policy optimization (PPO) algorithm found in Stable Baselines3.
    Initially, Shawn tried to perform a full swing-up and balance with a continuous action set. However, this proved too difficult for the agent, as the round trip time to and from the Arduino along with model updates took too long to successfully balance the pole. To reduce the scope, the action set was made into a discrete set (+10 deg, 0 deg, -10 deg), and the episode ended when the pendulum reached the top under a particular speed. If the pendulum moved too fast near the top, it was considered to have “crashed,” and a penalty was applied.
    Once the agent successfully learned how to perform the swing-up, it was deployed to the Arduino. To perform the deployment, the critic portion of the actor-critic model in the PPO agent was stripped away, and the remaining actor model (3-layer dense neural network) was optimized using Edge Impulse. The model was then deployed to an ESP32S3 to perform the swing-up without any input from the computer.
    Product Links:
    STEVAL-EDUKIT01 - www.digikey.com/en/products/d...
    Seeed Studio XIAO ESP32S3 - www.digikey.com/en/products/d...
    Related Videos:
    • Introduction to Reinfo...
    • Exploring Reinforcemen...
    Related Project Links:
    www.digikey.com/en/maker/proj...
    www.digikey.com/en/maker/proj...
    Learn more:
    Maker.io - www.digikey.com/en/maker
    DigiKey’s Blog - TheCircuit www.digikey.com/en/blog
    Connect with DigiKey on Facebook / digikey.electronics
    And follow us on X (formerly Twitter) / digikey
    00:00 - Introduction
    01:10 - Hardware overview
    03:00 - Modifying the pendulum tower
    04:20 - Arduino communication interface
    04:49 - Overview of reinforcement learning
    06:17 - Reward function
    08:32 - Agent actor-critic deep neural network
    09:33 - Hyperparameter optimization overview
    09:51 - Agent training with Python
    14:57 - Troubleshooting an agent that does not learn
    16:46 - Reduce scope to just swing up and use discrete action space
    18:03 - Train simpler agent
    18:22 - Deploy agent to ESP32
    19:56 - Test agent on the pendulum
    20:46 - Conclusion and further areas of research
  • Наука та технологія

КОМЕНТАРІ • 9

  • @MeanGeneHacks
    @MeanGeneHacks 7 місяців тому +3

    Very cool you were able to get these results. Getting RL to work on real hardware is notoriously difficult.

  • @TonyHammitt
    @TonyHammitt 7 місяців тому +1

    Back in the dark ages (early 90s), we'd use much smaller neural networks because our computers were about what you have there as the microcontroller. It's a whole new game now with all of these tools. Will be fun to play with, that's for sure.

  • @mostafanfs
    @mostafanfs 7 місяців тому +2

    Shawn how do you know everything?! Its very cool to be this master and not fair at the same time

  • @chrisBruner
    @chrisBruner 7 місяців тому +1

    Wow, fantastic project. I wonder if a NN training a fuzzy logic controller might work better.

  • @eCMastermind
    @eCMastermind Місяць тому +1

    🎉

  • @AdityaMehendale
    @AdityaMehendale 7 місяців тому +1

    I can imagine that the allowable latency is a function of the natural time-constant of the system. If you are having issues with fine-tuning the latency, would you consider (as a proof-of-concept) to slow-down the natural system? To achieve this, you could, for example, reduce the eccentricity of the pendulum, while keeping the moment-of-inertia constant. To slow it down 4x, you might need to increase the inertia 16x . A pragmatic way to achieve this would be to add a counterweight against the pendulum, so the mass (inertia) increases, but the restoring-force decreases.
    It would still be an "inverted" pendulum, only far less aggressive.

  • @Hellboy-ce9tm
    @Hellboy-ce9tm 7 місяців тому +1

    Is the current limit on the stepper driver set too high?

    • @fdavidcamaya
      @fdavidcamaya 6 місяців тому

      They potentiometer of the driver of the steper motor