Hyperparameter Optimization for Reinforcement Learning using Meta’s Ax | DigiKey

Edge AI Anomaly Detection Part 2: Feature Extraction and Model Training | Digi-Key Electronics

How I'd Learn AI (If I Had to Start Over)

ДИНАМО - ПАРТИЗАН | ОГЛЯД МАТЧУ

Прийшов додому на 5 хвилин, щоб попрощатися з батьками #shorts #війна #полонені #зниклібезвісти

Disparos en la colectora de la General Paz: ladrón atropelló a los policías que lo quisieron detener

TinyRL: Can AI Learn to Swing Up a Real Pendulum? | DigiKey

DigiKey

Переглядів 4 284

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 3 сер 2024
Reinforcement learning (RL) is a form of machine learning that involves training agents to interact with an environment in order to maximize cumulative rewards. In this video, we teach an AI to swing up a pendulum using real hardware and RL.
A write-up of the project can be found here: www.digikey.com/en/maker/proj...
An RL agent learns to interact with its environment using trial and error. Shawn creates an interface in Arduino that can control a stepper motor and read the position of an encoder attached to the pendulum. The goal is to train an agent to learn to swing up the pendulum on its own.
Intro to Reinforcement Learning video: • Introduction to Reinfo...
Hyperparameter Optimization video: • Hyperparameter Optimiz...
To accomplish this, the Arduino is connected to a computer running the Farama gymnasium and Stable Baselines3 frameworks. These frameworks take in the observations, have the agent guess an action, and tell the Arduino what action to take. The agent is updated using the proximal policy optimization (PPO) algorithm found in Stable Baselines3.
Initially, Shawn tried to perform a full swing-up and balance with a continuous action set. However, this proved too difficult for the agent, as the round trip time to and from the Arduino along with model updates took too long to successfully balance the pole. To reduce the scope, the action set was made into a discrete set (+10 deg, 0 deg, -10 deg), and the episode ended when the pendulum reached the top under a particular speed. If the pendulum moved too fast near the top, it was considered to have “crashed,” and a penalty was applied.
Once the agent successfully learned how to perform the swing-up, it was deployed to the Arduino. To perform the deployment, the critic portion of the actor-critic model in the PPO agent was stripped away, and the remaining actor model (3-layer dense neural network) was optimized using Edge Impulse. The model was then deployed to an ESP32S3 to perform the swing-up without any input from the computer.
Product Links:
STEVAL-EDUKIT01 - www.digikey.com/en/products/d...
Seeed Studio XIAO ESP32S3 - www.digikey.com/en/products/d...
Related Videos:
• Introduction to Reinfo...
• Exploring Reinforcemen...
Related Project Links:
www.digikey.com/en/maker/proj...
www.digikey.com/en/maker/proj...
Learn more:
Maker.io - www.digikey.com/en/maker
DigiKey’s Blog - TheCircuit www.digikey.com/en/blog
Connect with DigiKey on Facebook / digikey.electronics
And follow us on X (formerly Twitter) / digikey
00:00 - Introduction
01:10 - Hardware overview
03:00 - Modifying the pendulum tower
04:20 - Arduino communication interface
04:49 - Overview of reinforcement learning
06:17 - Reward function
08:32 - Agent actor-critic deep neural network
09:33 - Hyperparameter optimization overview
09:51 - Agent training with Python
14:57 - Troubleshooting an agent that does not learn
16:46 - Reduce scope to just swing up and use discrete action space
18:03 - Train simpler agent
18:22 - Deploy agent to ESP32
19:56 - Test agent on the pendulum
20:46 - Conclusion and further areas of research
Наука та технологія

КОМЕНТАРІ • 9

@MeanGeneHacks 7 місяців тому ⁺³
Very cool you were able to get these results. Getting RL to work on real hardware is notoriously difficult.
@TonyHammitt 7 місяців тому ⁺¹
Back in the dark ages (early 90s), we'd use much smaller neural networks because our computers were about what you have there as the microcontroller. It's a whole new game now with all of these tools. Will be fun to play with, that's for sure.
@mostafanfs 7 місяців тому ⁺²
Shawn how do you know everything?! Its very cool to be this master and not fair at the same time
@chrisBruner 7 місяців тому ⁺¹
Wow, fantastic project. I wonder if a NN training a fuzzy logic controller might work better.
@eCMastermind Місяць тому ⁺¹
🎉
@AdityaMehendale 7 місяців тому ⁺¹
I can imagine that the allowable latency is a function of the natural time-constant of the system. If you are having issues with fine-tuning the latency, would you consider (as a proof-of-concept) to slow-down the natural system? To achieve this, you could, for example, reduce the eccentricity of the pendulum, while keeping the moment-of-inertia constant. To slow it down 4x, you might need to increase the inertia 16x . A pragmatic way to achieve this would be to add a counterweight against the pendulum, so the mass (inertia) increases, but the restoring-force decreases.
It would still be an "inverted" pendulum, only far less aggressive.
@Hellboy-ce9tm 7 місяців тому ⁺¹
Is the current limit on the stepper driver set too high?
@fdavidcamaya 6 місяців тому
They potentiometer of the driver of the steper motor

Наступне

Автоматичне відтворення

Hyperparameter Optimization for Reinforcement Learning using Meta’s Ax | DigiKey

Hyperparameter Optimization for Reinforcement Learning using Meta’s Ax | DigiKey

Edge AI Anomaly Detection Part 2: Feature Extraction and Model Training | Digi-Key Electronics

Edge AI Anomaly Detection Part 2: Feature Extraction and Model Training | Digi-Key Electronics

How I'd Learn AI (If I Had to Start Over)

How I'd Learn AI (If I Had to Start Over)

ДИНАМО - ПАРТИЗАН | ОГЛЯД МАТЧУ

ДИНАМО - ПАРТИЗАН | ОГЛЯД МАТЧУ

Прийшов додому на 5 хвилин, щоб попрощатися з батьками #shorts #війна #полонені #зниклібезвісти

Прийшов додому на 5 хвилин, щоб попрощатися з батьками #shorts #війна #полонені #зниклібезвісти

Disparos en la colectora de la General Paz: ladrón atropelló a los policías que lo quisieron detener

Disparos en la colectora de la General Paz: ladrón atropelló a los policías que lo quisieron detener

«Жінка моя - героїня, я це так»: батько шістьох дітей та боєць 10-ї ОГШБр #shorts

«Жінка моя — героїня, я це так»: батько шістьох дітей та боєць 10-ї ОГШБр #shorts

How to Tune a PID Controller for an Inverted Pendulum | DigiKey

How to Tune a PID Controller for an Inverted Pendulum | DigiKey

Open Source Motion Capture for Autonomous Drones

Open Source Motion Capture for Autonomous Drones

This is why Deep Learning is really weird.

This is why Deep Learning is really weird.

But What Is A Neural Network?

But What Is A Neural Network?

The Real Reason Robots Shouldn’t Look Like Humans

The Real Reason Robots Shouldn’t Look Like Humans

Water powered timers hidden in public restrooms

Water powered timers hidden in public restrooms

Reinforcement learning control of an inverted pendulum using Python, Simulink, and LW-RCP.

Reinforcement learning control of an inverted pendulum using Python, Simulink, and LW-RCP.

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

Introduction to RTOS Part 1 - What is a Real-Time Operating System (RTOS)? | Digi-Key Electronics

Introduction to RTOS Part 1 - What is a Real-Time Operating System (RTOS)? | Digi-Key Electronics

Что если робот Cozmo увидит огромную африканскую Саранчу?

Что если робот Cozmo увидит огромную африканскую Саранчу?

Собрал ПК на ОЗОН, чтобы продать на АВИТО дороже! Сколько заработал на перепродаже компьютеров?

Собрал ПК на ОЗОН, чтобы продать на АВИТО дороже! Сколько заработал на перепродаже компьютеров?

Todos os modelos de smartphone

Todos os modelos de smartphone

99,9% людей не соберут себе такой ПК! MSI EDITION!

99,9% людей не соберут себе такой ПК! MSI EDITION!

iOS 17.6 обновление! Что нового в iOS 17.6? Обзор iOS 17.6, батарея, скорость, стоит ли ставить?

iOS 17.6 обновление! Что нового в iOS 17.6? Обзор iOS 17.6, батарея, скорость, стоит ли ставить?

Аккумулятор меняете? #магазин #электроника #зеон

Аккумулятор меняете? #магазин #электроника #зеон

iOS 17.6 обновление! Что нового в iOS 17.6? Обзор iOS 17.6, батарея, скорость, стоит ли ставить?

iOS 17.6 обновление! Что нового в iOS 17.6? Обзор iOS 17.6, батарея, скорость, стоит ли ставить?

Сколько реально стоит ПК Величайшего?

Сколько реально стоит ПК Величайшего?