Don't know how to thank you for the recordings! It is really helping me with my education on RL. Thank you very much for the effort and for making the amazing lectures available to the public.
This is the first lecture everyone needs to IRL should watch before any other lecture on UA-cam. It made me understand the other “very technical” lectures I’ve seen.
The first assumption in IRL is that the expert policy π* (the one that you're imitating) is optimal with respect to some reward function R*. Your estimate of the current policy is R_i; if the policy π does better than π* at optimizing R_i; then R_i != R*! Why? Because your original assumption was that no policy is better than π* when it comes to optimizing R*. So, your estimate of R must be wrong, and you need to update it to one where the expert policy performs better than your current policy. This will bring your estimate closer to R*.
Don't know how to thank you for the recordings! It is really helping me with my education on RL. Thank you very much for the effort and for making the amazing lectures available to the public.
This is the first lecture everyone needs to IRL should watch before any other lecture on UA-cam. It made me understand the other “very technical” lectures I’ve seen.
So clear and good starting point
Thank you, professor! This has been a great starting point for IRL.
why we don't want the oprimized Pi to be better or have a higher R value than expert's trajectory?
The first assumption in IRL is that the expert policy π* (the one that you're imitating) is optimal with respect to some reward function R*. Your estimate of the current policy is R_i; if the policy π does better than π* at optimizing R_i; then R_i != R*! Why? Because your original assumption was that no policy is better than π* when it comes to optimizing R*. So, your estimate of R must be wrong, and you need to update it to one where the expert policy performs better than your current policy. This will bring your estimate closer to R*.
Brilliant Lecture. Thank you so much
Best Lecture
Thanks Professor