Reward Machines: Structuring Reward Function Specifications and Reducing Sample Complexity...

Поділитися
Вставка
  • Опубліковано 12 гру 2024

КОМЕНТАРІ • 7

  • @brucebayley9430
    @brucebayley9430 2 роки тому

    What Diagramming-software did you use for the LTL diagrams?

  • @erickarwa-0705
    @erickarwa-0705 4 роки тому

    This is quite informative.
    I have a question. I am developing a reinforcement learning algorithm for energy optimization. My reward is inverse of the cost (1/c). I realized that when I used the inverse of the square of the cost, the agent performs better and gets a lower global cost than when I use just 1/c. do you have a reason for this?

    • @elsins9790
      @elsins9790 3 роки тому

      You changed the magnitude of your reward structure which leads to more stable gradients. Often, rewards are clipped between -1 and 1 to accomplish this.

  • @simonstrandgaard5503
    @simonstrandgaard5503 4 роки тому

    Great explanations.

  • @5ithofnov159
    @5ithofnov159 5 років тому

    Is reward system how AI logistics works ?

  • @vs7185
    @vs7185 2 роки тому

    Great presentation, thank you! I don't think HRL finds locally optimal solutions. In HRL, options and actions are jointly learned to maximize the overall reward (or minimize the number of steps in this problem).