CS885 Lecture17c: Inverse Reinforcement Learning

Поділитися
Вставка
  • Опубліковано 11 гру 2024

КОМЕНТАРІ • 8

  • @nikhilchalla6658
    @nikhilchalla6658 Рік тому +1

    Don't know how to thank you for the recordings! It is really helping me with my education on RL. Thank you very much for the effort and for making the amazing lectures available to the public.

  • @datascience_with_yetty
    @datascience_with_yetty 4 роки тому +2

    This is the first lecture everyone needs to IRL should watch before any other lecture on UA-cam. It made me understand the other “very technical” lectures I’ve seen.

  • @tvsrr1990
    @tvsrr1990 3 роки тому +1

    So clear and good starting point

  • @nathan_ca
    @nathan_ca 5 років тому

    Thank you, professor! This has been a great starting point for IRL.

  • @youssefkilani9177
    @youssefkilani9177 3 роки тому

    why we don't want the oprimized Pi to be better or have a higher R value than expert's trajectory?

    • @vrangaswamy1
      @vrangaswamy1 3 роки тому +2

      The first assumption in IRL is that the expert policy π* (the one that you're imitating) is optimal with respect to some reward function R*. Your estimate of the current policy is R_i; if the policy π does better than π* at optimizing R_i; then R_i != R*! Why? Because your original assumption was that no policy is better than π* when it comes to optimizing R*. So, your estimate of R must be wrong, and you need to update it to one where the expert policy performs better than your current policy. This will bring your estimate closer to R*.

  • @GoKotlinJava
    @GoKotlinJava 5 років тому

    Brilliant Lecture. Thank you so much

  • @fairuzshadmanishishir8171
    @fairuzshadmanishishir8171 4 роки тому

    Best Lecture
    Thanks Professor