Can a Reinforcement Learning Agent Learn with NO Rewards? Intrinsic Curiosity Coding Tutorial

Поділитися
Вставка
  • Опубліковано 21 сер 2024

КОМЕНТАРІ • 24

  • @elijahberegovsky8957
    @elijahberegovsky8957 2 роки тому

    First I’ve gotta say thank you for making this video. I’ve just read the paper, enjoyed it immensely, and wanted to find an implementation. And bang! here you are, with an in-depth guide on making it work.
    Also, please, do Never Give Up as well!

  • @softerseltzer
    @softerseltzer 2 роки тому

    Nice! Some weekend activity, thanks!

  • @amegatron07
    @amegatron07 2 роки тому

    Thank you very much for giving an example of how to implement ICM. I'm looking forward to try it myself, and also to make my own further experiments with it. I could perhaps give one tip: as a strong adherent of separation of concerns, I believe it would be better to focus less on other parts of the code, which are less relevant to the core topic, and perhaps just take already written components. I believe that would save a lot of time :)

  • @TaganMorgul
    @TaganMorgul 2 роки тому

    Thank you very much for such a detailed ICM explanation! I was trying to implement it some time ago but with gym envs like cart pole or lunar lander I found it doesn't perform as expected, probably due to absence of "states encoding" part which I thought is a very important part of the work. I also didn't use a3c for my experiments but rather used a2c. In the end, I found that "Random Network Distillation" algorithm works way better for the same purpose and also free of "TV on the wall" defect like ICM.

  • @qiluo6299
    @qiluo6299 2 роки тому

    This is great video and thanks for sharing !

  • @orsimhon133
    @orsimhon133 Рік тому +1

    Hi Phill, thank you very much for this tutorial !
    As I understood the ICM, the Inverse model should be trained together with the encoder NN (which we do not use here) in order to inform the encoder about the parts of the states that controllable by the agent.
    So if we dont need the encoder here, we also dont need the Inverse model, isnt ?
    Expecting to some answers, thanks again!

    • @akashvyas7715
      @akashvyas7715 2 місяці тому

      I was thinking the same thing. Did you try removing the inverse model?

  • @61Marsh
    @61Marsh 2 роки тому +2

    I worked on this last year and ended up developing it, but my full solution never quite held up to my expectations. I always wondered if implemented it correctly, time to verify against yours. Thanks.

  • @bobingstern4448
    @bobingstern4448 2 роки тому +2

    Hey, I was working on a genetic NEAT like algorithm but I don’t how to crossover two neural networks with different topology. Is there a procedure to doing this or do you just choose a random one when this happens?

    • @royvivat113
      @royvivat113 2 роки тому

      If you look at the neat paper it explains specifically how to do it. It has to do with keeping track of the topological history I believe.

  • @leo.y.comprendo
    @leo.y.comprendo 2 роки тому

    I was just reading about this!

  • @WilliamChen-pp3qs
    @WilliamChen-pp3qs 21 день тому

    How would it perform compare with HER (hindsight experience replay)?

  • @yualan2158
    @yualan2158 Рік тому

    First of all, I have to thank you for making this video. I have made some necessary modification to apply "MountainCar-v0" problem, which is a real "sparse reward" environment. However, it doesn't work. Can you check the code if it is successful in this environment? Thanks!

  • @mehranzand2873
    @mehranzand2873 2 роки тому

    thanks a lot

  • @tanerylmaz8340
    @tanerylmaz8340 Рік тому

    Hello there
    Can we save the trained model in this example? Then is it possible to test the model we trained for another environment? How are we going to do? Thus, we can see the success and performance of the trained model more clearly. Could you help?

  • @tsunamio7750
    @tsunamio7750 2 роки тому +1

    I'm pretty sure we can compact everything you said with fewer words and fewer domain-specific words. At some points, I can follow you, but the jargon is exploding my face.

  • @tsunamio7750
    @tsunamio7750 2 роки тому

    feature vector, featur map. We have so many terms.

  • @chadmcintire4128
    @chadmcintire4128 2 роки тому

    This seems really similar to the entropy of SAC.

  • @sounakmojumder5689
    @sounakmojumder5689 Місяць тому

    HI, did anyone run this in google colab? is there any problem with spawning