This is quite informative. I have a question. I am developing a reinforcement learning algorithm for energy optimization. My reward is inverse of the cost (1/c). I realized that when I used the inverse of the square of the cost, the agent performs better and gets a lower global cost than when I use just 1/c. do you have a reason for this?
You changed the magnitude of your reward structure which leads to more stable gradients. Often, rewards are clipped between -1 and 1 to accomplish this.
Great presentation, thank you! I don't think HRL finds locally optimal solutions. In HRL, options and actions are jointly learned to maximize the overall reward (or minimize the number of steps in this problem).
What Diagramming-software did you use for the LTL diagrams?
This is quite informative.
I have a question. I am developing a reinforcement learning algorithm for energy optimization. My reward is inverse of the cost (1/c). I realized that when I used the inverse of the square of the cost, the agent performs better and gets a lower global cost than when I use just 1/c. do you have a reason for this?
You changed the magnitude of your reward structure which leads to more stable gradients. Often, rewards are clipped between -1 and 1 to accomplish this.
Great explanations.
Is reward system how AI logistics works ?
no. just RL.
Great presentation, thank you! I don't think HRL finds locally optimal solutions. In HRL, options and actions are jointly learned to maximize the overall reward (or minimize the number of steps in this problem).