Actor-Critic Reinforcement for continuous actions!

Поділитися
Вставка
  • Опубліковано 9 лип 2021
  • Here's a link to the github repository of the actor-critic method I learned from:
    github.com/pytorch/examples/blob/master/reinforcement_learning/actor_critic.py
    patreon.com/thinkstr

КОМЕНТАРІ • 11

  • @AM-dj4vp
    @AM-dj4vp Рік тому +4

    very underrated video, literally the best explanation on actor/critic that I've seen. Good Job! and Thanks!

    • @Thinkstr
      @Thinkstr  Рік тому +2

      Hey, thanks for watching! These are fun to make and I learn a lot. I think my understanding has come a long way since I made this video, so I'll have to make another eventually.

  • @PeterIntrovert
    @PeterIntrovert 2 роки тому +2

    This is rocket science to me lol but I get value from your videos anyway. I learn critical thinking from you. I think I understand and like general idea I hope you wont invent skynet or something in the future. :D

    • @Thinkstr
      @Thinkstr  2 роки тому +3

      Haha, thanks! If I ever invent skynet, I hope it's a NICE skynet.

  • @underlecht
    @underlecht Рік тому

    Great video! I was a bit messed up about actor-critic by confusing when and which varianble should be back propagated in loss function and which not :D You did it right it seems.

    • @Thinkstr
      @Thinkstr  Рік тому

      Thanks for watching, I'm glad you liked it! I should be making more of these videos soon...

  • @aprameyandesikan3648
    @aprameyandesikan3648 11 місяців тому +1

    Hey, awesome video!! I had a question regarding how the model is choosing the averages and standard deviations. It is supposed to be continuous, so how is the model choosing a continuous output for the the two?

    • @Thinkstr
      @Thinkstr  11 місяців тому

      Thanks for watching! I'm not sure I understand the question, but I think it's actually easier to make a neural network which outputs in a continuous range instead of a discrete range (like categorization). After the actor makes the mean "mu" and standard deviation "sigma", it samples "epsilon" from a normal distribution and adds mu + sigma * epsilon; it's called the "reparameterization trick." sassafras13.github.io/images/2020-05-25-ReparamTrick-eqn2.png

    • @aprameyandesikan3648
      @aprameyandesikan3648 11 місяців тому +1

      Thanks! I think that answers my question. So you just essentially take the continuous outputs of your network for your action itself, I presume, instead of like categorisation where the one with the highest probability is chosen?

    • @Thinkstr
      @Thinkstr  11 місяців тому

      @@aprameyandesikan3648 Yes, exactly!

    • @aprameyandesikan3648
      @aprameyandesikan3648 11 місяців тому +1

      Awesome, thanks for taking your time to answer my questions! Keep up with the videos!