Grokking Deep Reinforcement Learning Chapter 3 examples - balancing immediate and long term rewards

Поділитися
Вставка
  • Опубліковано 9 лют 2025
  • This chapter shows how to find optimal reinforcement learning policies for the simple slippery walk and frozen lake environments. It first does it by computing the "V" value function and the policy based on the "Q" function computed from the value function. Then it uses policy iteration and value iteration strategies to find optimal policies from zero.
    References:
    Book:
    www.amazon.com...
    Project:
    github.com/mim...
    Code:
    github.com/mim...

КОМЕНТАРІ • 1

  • @xodlxo
    @xodlxo Рік тому +2

    I really appreciate your work !