@@Hardwareai Can you please tell me something, is it possible to fast forward the learning process to extreme speeds, for example, 30 days would be equivalent to 30,000 years of learning?
0:01 hello and welcome to my channel hardware 0:04 ai 0:06 this is my first video after trading 0:08 east for west and moving to switzerland 0:12 it is also going to be the least 0:14 technical video from recently made and 0:17 will have none of my usual distracting 0:19 hand waving since i'm still setting up 0:21 the studio and the green screen 0:24 so please sit back and relax while i'll 0:28 tell you about reinforcement learning 0:30 powered robots taking over the world 0:34 nope that's not happening anytime soon 0:43 [Music] 0:50 since i finished my tiny ml course 0:53 series i wanted to focus a bit more on 0:56 robotics and publish some of the last 0:59 year's projects that i was quietly 1:01 working on 1:02 you remember that i made a few videos 1:05 about a robotic dog from pitoy beetle 1:09 i discussed how to write a custom driver 1:11 for it and perform a tele operation and 1:14 also how to do mapping with lidar and or 1:17 cameras 1:18 subscribers to my twitter knew that i 1:21 was exploring reinforcement learning for 1:24 opta and beetle 1:27 here is where my perfectionism came into 1:30 play 1:31 reinforcement learning is notoriously 1:33 hard for real world problems and 1:37 in my humble perfectionist opinion i did 1:40 not achieve stellar results and thus did 1:43 not have material to share 1:45 well 1:46 watching a series on training beetle 1:48 from zendex i realized that even the 1:52 past to the final project and experiment 1:55 however unsuccessful they might be are 1:59 interesting and useful to other people 2:02 worst case people can just learn how not 2:04 to do reinforcement learning for 2:06 quadruped robots from me 2:08 plus being in academia now taught me a 2:11 thing or two about failures in 2:13 scientific research having value on 2:16 their own 2:17 spoiler alert though it wasn't a 2:20 complete failure 2:22 first of all if you haven't watched 2:24 zendex videos do watch them he did a 2:27 great job explaining many of the basic 2:30 things that i won't be focusing on this 2:32 video 2:33 i was using nvidia isaac gem as a 2:36 simulation environment for the 2:38 experiments 2:39 it's fresh off the development bench and 2:42 in fact still isn't beta phase 2:45 but the fact that it fully utilizes 2:49 nvidia gpu capabilities for simulation 2:52 makes it possible to keep most of 2:55 elements of your training pipeline 2:57 namely the environment and the agents 3:00 and the products of their interaction in 3:02 gpu as tensors speds up training by a 3:06 lot 3:07 i tried open ai gym before and while it 3:10 was the thing that possibly inspired 3:13 nvidia team to create isaac gin now 3:16 isaac jim can be strongly recommended in 3:19 favor of open's open ai's gym 3:23 speed of training means a great deal 3:26 when testing different reward functions 3:28 verifying the correctness of your 3:29 environment and robot model and so on 3:33 it could very well be different between 3:36 success and never getting past the point 3:39 where your robot just flies across the 3:41 environment like a crazed chickadee 3:46 [Music] 3:52 [Music] 4:02 for my first try i adapted one of the 4:05 example algorithms nvidia shipped 4:08 with their first version of isaac jim 4:11 the ant walker to ottawa 4:14 it uses ppo algorithm which stands for 4:18 proximal policy optimization 4:21 an actor critic method 4:23 it is one of the most commonly used 4:25 baselines for new reinforcement learning 4:27 tasks and its variants have also been 4:30 used to train a robot hand to solve a 4:33 rubik's cube or win dota 2 against 4:35 professional players 4:37 so it's a good place to get started 4:42 experimenting with simpler robots also 4:45 allowed me to get the hang of creating 4:48 somewhat complex urdf robot descriptions 4:52 in phobus 4:53 a more or less vzweek editor 4:56 what you see is what you get 4:58 working as a blender plugin 5:01 it was a success and i was really happy 5:04 to see that virtual otta has learned the 5:07 walking gate resembling the walking gate 5:10 of a normal altar 5:23 after a slight nudge from an aussie 5:25 friend of mine i went to tackle a more 5:29 challenging task 5:31 teaching a quadruped robot how to walk 5:34 first in simulation and then ideally 5:37 utilizing scene to real to transfer the 5:40 learned knowledge to an actual physical 5:42 robot 5:43 creating your df for detail wasn't the 5:46 cakewalk 5:47 but after some try on error i was able 5:49 to create a urdf with 3d models reverse 5:53 engineered by a third-party developer 5:56 and i shared it on github for the people 5:58 to build on my work 6:00 i'm happy to see it was stared quite a 6:03 few times and used by other people 6:06 including sandix 6:08 reinforcement learning algorithm wise 6:10 the first thing i tried was adopting the 6:13 same ant walker approach 6:16 it did not work well or at all 6:19 what was different in detail apart from 6:22 inherently more complexity coming from 6:25 having more joints 6:26 is that it's d its default pose is not 6:30 stable 6:31 changing the initial position of joints 6:33 aka the starting pose however just 6:36 brought different but still not 6:38 satisfying gates like slight jumping on 6:41 the knees 6:42 walking still on the knees and a lot a 6:45 lot of fall among the things that i have 6:50 tried also was tweaking the reward 6:52 function to incentivize a pride movement 6:55 in specific direction and staying above 6:58 certain height 7:00 that just brought more jumping 7:03 in inside it seems the model was 7:06 hopelessly overfitting when the only 7:09 thing it was incentivized to learn was 7:12 movement in specific direction for 7:14 longest time possible without dying 7:18 of being reset in this case mostly from 7:21 falling or turning on its back 7:23 i mean in the end as often happens with 7:27 reinforcement learning algorithms it 7:29 wasn't wrong 7:31 perhaps jumping on its knees was the 7:33 best way to move in a specific direction 7:36 for longest time possible without 7:38 accidents 7:39 it's just not exactly what i wanted from 7:42 it 7:44 with the second version of isaac gym 7:47 nvidia released the code for quadrupid 7:51 walking for animal robots and by 7:54 comparing my old code with it i 7:57 immediately realized what was missing 8:00 piece of the puzzle here 8:02 instead of trying to formulate the 8:04 reward functions as just movement in 8:06 specific direction for longest time 8:09 possible without dying 8:11 in order to avoid overfitting they 8:14 formulated the reward function for 8:16 animal essentially as just difference 8:19 between random angular and linear 8:21 velocity commons and the actual angular 8:25 and linear velocities of robot was 8:27 moving with 8:29 after being given these comments 8:32 that would teach the quadruped how to 8:35 move in different directions and avoid 8:38 the pitfalls of previous approach 8:40 just because jumping like a wounded 8:43 cricket is no longer the best way to 8:45 maximize the reward function 8:49 so a more generalized gate needs to be 8:52 developed 8:53 by and held by the algorithm 8:56 animal code could not be used as a 8:59 drop-in replacement for beetle and i had 9:01 to make quite a few tweaks with respect 9:04 to initial joint position angular linear 9:06 velocities and reward function 9:09 however in the end it worked reasonably 9:12 well i wasn't able to get a perfect 9:14 walking gait but for that i suppose 9:17 more research is needed 9:21 now for some final thoughts when making 9:24 a urdf model for beetle i have already 9:27 contemplated how would it be possible to 9:30 transfer the trained algorithm to real 9:33 robots to bridge the gap between 9:35 simulation and reality 9:38 the code for animal while working for 9:40 robots in simulation takes many 9:43 observations that won't be accessible on 9:45 beetle 9:46 the only two sensors that are available 9:49 are accelerometer and gyroscope which 9:51 actually combined in a single mpu unit i 9:55 placed a virtual gyroscope and 9:57 accelerometer in the center of the board 10:00 when making beetle urdf and this is 10:03 where we can get rotation and speed 10:06 values from virtual accelerometer try 10:09 training algorithm that takes these plus 10:12 velocity comments and outputs angles or 10:15 torque for servers 10:17 the speed at which all of this needs to 10:19 be executed means that the neural 10:21 network very likely needs to be run on 10:24 the edge right at the beetle main board 10:30 the standard knight board will not be 10:32 sufficient since it only has atmega328p 10:36 chip so by board with esp32 needs to be 10:40 used 10:41 fortunately i got one lawyer beetle 10:44 equipped with by board that followed me 10:48 all the way to switzerland
Creating RL is going to continue to get faster & easier. It will become thinking like a robot not writing hundreds of lines of code. The T-800 will be here very soon & we'll need experts in edging & ball fondling at the ready.
@@Hardwareai the task will be to set optimal reward functions. To do this you will need to think like your subject (robot). Gym let's you see what he is thinking (or not thinking about) & adjust your functions to suit. The way you arrived at the IMU solution.
Well, I think what you're saying is that imitation learning can be used - that is true, although the setup is a bit cumbersome. What I want to experiment with is RL with feedback, as per deepmind.com/blog/article/learning-through-human-feedback
I quite like this video; it was very helpful.I downloaded your URDF and attempted to replicate Petoi Bittle in Isaac Gym, however several of the body parts started floating for no apparent reason. Petoi robot appears to be in nearly a standing position and moves strangely even when I modify its joints positions during training
Thanks for the great video. I was wondering where you got the PD gains from. Do we know for sure that Bittle uses a PD controller with those gains on the real robot? I didn't find any documentation on this
Great question! No, unfortunately PD gains are incorrect - they were tweaked to make it work in simulation, as there are quite a few other things needed to make it work on a real robot. Which project are you working on?
Thank you for giving useful video. As my case also trying to hard to making my mobile robot car. Could you please give me an example python code or helpful reference? I want to reduce my experiment time.
OMG, OMG, OMG! I'm one very exited Ozzy. If you & 100 followers all fail (partially succeed) then share you results in an easy to observe environment (ISAAC gym) you will have a powerful & successful team.
Support my work on making tutorials and guides on Patreon!
www.patreon.com/hardware_ai
I'm trying on this project too
Glad to see more information about Isaac gym
I hope your channel gets bigger!
hello i am working on this project too,and I have achieved some works so far.I hope that we can discuss together and collaborate for this project
I hope so too!
@@tastlk6351 sry for late reply
I found leggedgym and tried insert own model but it doesn't work well..
This is really hard project ever
@@Hardwareai Can you please tell me something, is it possible to fast forward the learning process to extreme speeds, for example, 30 days would be equivalent to 30,000 years of learning?
0:01
hello and welcome to my channel hardware
0:04
ai
0:06
this is my first video after trading
0:08
east for west and moving to switzerland
0:12
it is also going to be the least
0:14
technical video from recently made and
0:17
will have none of my usual distracting
0:19
hand waving since i'm still setting up
0:21
the studio and the green screen
0:24
so please sit back and relax while i'll
0:28
tell you about reinforcement learning
0:30
powered robots taking over the world
0:34
nope that's not happening anytime soon
0:43
[Music]
0:50
since i finished my tiny ml course
0:53
series i wanted to focus a bit more on
0:56
robotics and publish some of the last
0:59
year's projects that i was quietly
1:01
working on
1:02
you remember that i made a few videos
1:05
about a robotic dog from pitoy beetle
1:09
i discussed how to write a custom driver
1:11
for it and perform a tele operation and
1:14
also how to do mapping with lidar and or
1:17
cameras
1:18
subscribers to my twitter knew that i
1:21
was exploring reinforcement learning for
1:24
opta and beetle
1:27
here is where my perfectionism came into
1:30
play
1:31
reinforcement learning is notoriously
1:33
hard for real world problems and
1:37
in my humble perfectionist opinion i did
1:40
not achieve stellar results and thus did
1:43
not have material to share
1:45
well
1:46
watching a series on training beetle
1:48
from zendex i realized that even the
1:52
past to the final project and experiment
1:55
however unsuccessful they might be are
1:59
interesting and useful to other people
2:02
worst case people can just learn how not
2:04
to do reinforcement learning for
2:06
quadruped robots from me
2:08
plus being in academia now taught me a
2:11
thing or two about failures in
2:13
scientific research having value on
2:16
their own
2:17
spoiler alert though it wasn't a
2:20
complete failure
2:22
first of all if you haven't watched
2:24
zendex videos do watch them he did a
2:27
great job explaining many of the basic
2:30
things that i won't be focusing on this
2:32
video
2:33
i was using nvidia isaac gem as a
2:36
simulation environment for the
2:38
experiments
2:39
it's fresh off the development bench and
2:42
in fact still isn't beta phase
2:45
but the fact that it fully utilizes
2:49
nvidia gpu capabilities for simulation
2:52
makes it possible to keep most of
2:55
elements of your training pipeline
2:57
namely the environment and the agents
3:00
and the products of their interaction in
3:02
gpu as tensors speds up training by a
3:06
lot
3:07
i tried open ai gym before and while it
3:10
was the thing that possibly inspired
3:13
nvidia team to create isaac gin now
3:16
isaac jim can be strongly recommended in
3:19
favor of open's open ai's gym
3:23
speed of training means a great deal
3:26
when testing different reward functions
3:28
verifying the correctness of your
3:29
environment and robot model and so on
3:33
it could very well be different between
3:36
success and never getting past the point
3:39
where your robot just flies across the
3:41
environment like a crazed chickadee
3:46
[Music]
3:52
[Music]
4:02
for my first try i adapted one of the
4:05
example algorithms nvidia shipped
4:08
with their first version of isaac jim
4:11
the ant walker to ottawa
4:14
it uses ppo algorithm which stands for
4:18
proximal policy optimization
4:21
an actor critic method
4:23
it is one of the most commonly used
4:25
baselines for new reinforcement learning
4:27
tasks and its variants have also been
4:30
used to train a robot hand to solve a
4:33
rubik's cube or win dota 2 against
4:35
professional players
4:37
so it's a good place to get started
4:42
experimenting with simpler robots also
4:45
allowed me to get the hang of creating
4:48
somewhat complex urdf robot descriptions
4:52
in phobus
4:53
a more or less vzweek editor
4:56
what you see is what you get
4:58
working as a blender plugin
5:01
it was a success and i was really happy
5:04
to see that virtual otta has learned the
5:07
walking gate resembling the walking gate
5:10
of a normal altar
5:23
after a slight nudge from an aussie
5:25
friend of mine i went to tackle a more
5:29
challenging task
5:31
teaching a quadruped robot how to walk
5:34
first in simulation and then ideally
5:37
utilizing scene to real to transfer the
5:40
learned knowledge to an actual physical
5:42
robot
5:43
creating your df for detail wasn't the
5:46
cakewalk
5:47
but after some try on error i was able
5:49
to create a urdf with 3d models reverse
5:53
engineered by a third-party developer
5:56
and i shared it on github for the people
5:58
to build on my work
6:00
i'm happy to see it was stared quite a
6:03
few times and used by other people
6:06
including sandix
6:08
reinforcement learning algorithm wise
6:10
the first thing i tried was adopting the
6:13
same ant walker approach
6:16
it did not work well or at all
6:19
what was different in detail apart from
6:22
inherently more complexity coming from
6:25
having more joints
6:26
is that it's d its default pose is not
6:30
stable
6:31
changing the initial position of joints
6:33
aka the starting pose however just
6:36
brought different but still not
6:38
satisfying gates like slight jumping on
6:41
the knees
6:42
walking still on the knees and a lot a
6:45
lot of fall among the things that i have
6:50
tried also was tweaking the reward
6:52
function to incentivize a pride movement
6:55
in specific direction and staying above
6:58
certain height
7:00
that just brought more jumping
7:03
in inside it seems the model was
7:06
hopelessly overfitting when the only
7:09
thing it was incentivized to learn was
7:12
movement in specific direction for
7:14
longest time possible without dying
7:18
of being reset in this case mostly from
7:21
falling or turning on its back
7:23
i mean in the end as often happens with
7:27
reinforcement learning algorithms it
7:29
wasn't wrong
7:31
perhaps jumping on its knees was the
7:33
best way to move in a specific direction
7:36
for longest time possible without
7:38
accidents
7:39
it's just not exactly what i wanted from
7:42
it
7:44
with the second version of isaac gym
7:47
nvidia released the code for quadrupid
7:51
walking for animal robots and by
7:54
comparing my old code with it i
7:57
immediately realized what was missing
8:00
piece of the puzzle here
8:02
instead of trying to formulate the
8:04
reward functions as just movement in
8:06
specific direction for longest time
8:09
possible without dying
8:11
in order to avoid overfitting they
8:14
formulated the reward function for
8:16
animal essentially as just difference
8:19
between random angular and linear
8:21
velocity commons and the actual angular
8:25
and linear velocities of robot was
8:27
moving with
8:29
after being given these comments
8:32
that would teach the quadruped how to
8:35
move in different directions and avoid
8:38
the pitfalls of previous approach
8:40
just because jumping like a wounded
8:43
cricket is no longer the best way to
8:45
maximize the reward function
8:49
so a more generalized gate needs to be
8:52
developed
8:53
by and held by the algorithm
8:56
animal code could not be used as a
8:59
drop-in replacement for beetle and i had
9:01
to make quite a few tweaks with respect
9:04
to initial joint position angular linear
9:06
velocities and reward function
9:09
however in the end it worked reasonably
9:12
well i wasn't able to get a perfect
9:14
walking gait but for that i suppose
9:17
more research is needed
9:21
now for some final thoughts when making
9:24
a urdf model for beetle i have already
9:27
contemplated how would it be possible to
9:30
transfer the trained algorithm to real
9:33
robots to bridge the gap between
9:35
simulation and reality
9:38
the code for animal while working for
9:40
robots in simulation takes many
9:43
observations that won't be accessible on
9:45
beetle
9:46
the only two sensors that are available
9:49
are accelerometer and gyroscope which
9:51
actually combined in a single mpu unit i
9:55
placed a virtual gyroscope and
9:57
accelerometer in the center of the board
10:00
when making beetle urdf and this is
10:03
where we can get rotation and speed
10:06
values from virtual accelerometer try
10:09
training algorithm that takes these plus
10:12
velocity comments and outputs angles or
10:15
torque for servers
10:17
the speed at which all of this needs to
10:19
be executed means that the neural
10:21
network very likely needs to be run on
10:24
the edge right at the beetle main board
10:30
the standard knight board will not be
10:32
sufficient since it only has atmega328p
10:36
chip so by board with esp32 needs to be
10:40
used
10:41
fortunately i got one lawyer beetle
10:44
equipped with by board that followed me
10:48
all the way to switzerland
I do need to start adding subtitles :)
Creating RL is going to continue to get faster & easier. It will become thinking like a robot not writing hundreds of lines of code. The T-800 will be here very soon & we'll need experts in edging & ball fondling at the ready.
What is "thinking like a robot"? xD
@@Hardwareai the task will be to set optimal reward functions. To do this you will need to think like your subject (robot). Gym let's you see what he is thinking (or not thinking about) & adjust your functions to suit. The way you arrived at the IMU solution.
Good job and congrats on relocation!
Thank you so much 😀
It is looking more and more like set poses, with their own reward, should also be added in the learning.
Well, I think what you're saying is that imitation learning can be used - that is true, although the setup is a bit cumbersome. What I want to experiment with is RL with feedback, as per deepmind.com/blog/article/learning-through-human-feedback
I quite like this video; it was very helpful.I downloaded your URDF and attempted to replicate Petoi Bittle in Isaac Gym, however several of the body parts started floating for no apparent reason. Petoi robot appears to be in nearly a standing position and moves strangely even when I modify its joints positions during training
Hmmm. I did publish an example code, did you have a look? Does that work normally?
I hope bittle becomes a year long series.
It is likely to become a trilogy :)
Could you please edit the description to contain the git repo. At the moment, is says "WIP" but I am not sure what that means. Nice video by the way.
Oh-oh, I do need to upload the code then. I'll put a reminder for myself.
Very good
Thanks!
Thanks for the great video. I was wondering where you got the PD gains from. Do we know for sure that Bittle uses a PD controller with those gains on the real robot? I didn't find any documentation on this
Great question! No, unfortunately PD gains are incorrect - they were tweaked to make it work in simulation, as there are quite a few other things needed to make it work on a real robot. Which project are you working on?
hallo sir, do you have a code for your pondo as the example of the video ?
What is the pondo? There is GH repository in the video description.
strap an NRF24l01 on it, run the net on a big desktop GPU, add some random input delay in the gym to account for lag, i'm curious of the results.
Why is NRF24l01 necessary? ESP32 on BiBoard can be wirelessly connected to PC and it already has accelerometer/gyro.
Hi. Can you share the github link for the URDF?
It's in the video description.
Thank you for giving useful video. As my case also trying to hard to making my mobile robot car. Could you please give me an example python code or helpful reference? I want to reduce my experiment time.
Right. As mentioned in another comment, I'm wrapped up at the moment, but I put a reminder to clean and upload the code!
Sir can we make a custom reinforcement learning environment with issasc gym
Of course. This is actually a point of Isaac Gym.
@@Hardwareai can you a video for that for us plz
OMG, OMG, OMG! I'm one very exited Ozzy. If you & 100 followers all fail (partially succeed) then share you results in an easy to observe environment (ISAAC gym) you will have a powerful & successful team.
I was trying to pronounce "Aussie". Did I fail? xD
Hey lovely video, will you be sharing the code?
I apologise, I found the github. Thank you for your hardwork
How did you actually transferred the model into the BiBoard?
That was not done yet.