Rich Sutton, Toward a better Deep Learning

Поділитися
Вставка
  • Опубліковано 2 лют 2025

КОМЕНТАРІ • 10

  • @mavenlin
    @mavenlin 4 місяці тому +8

    But how are we going to prevent interference to old data when we change the backbone?
    IMO, the major issue of using a dynamic architecture is, when the fringe joins the backbone, it not only provide you the capacity to deal with new data, but also change the function mapping for all the old data, this change can be catastrophic especially when we're doing this with a very long temporal sequence, and some early data may take a long time to appear again.
    I guess some form of information of the old data is still needed, e.g. replay buffer or a bayesian posterior of weights.

    • @christopherbentley6647
      @christopherbentley6647 2 місяці тому

      No idea but it sounds like the back bone never changes only grows

    • @mavenlin
      @mavenlin 2 місяці тому +1

      @christopherbentley6647 but when you grow, the newly added part will interfere with the function mapping for old data. Unless you choose not to activate them for the old data. But then how to make this decision what to activate? How to evolve this decision "continually"?

    • @stevenkao4800
      @stevenkao4800 2 місяці тому

      I think the shadow weights are meant to address this. Their shadow weights were initialized in accordance with their master hunger, and hence they provide activation in a similar direction as the backbone. In some sense, these shadow weights have inherited some knowledge learned from the previous old data.
      ---
      One goal of continual learning is to avoid the network re-learn previously learned knowledge. So, the replay buffer seems to contradict this goal.

    • @mavenlin
      @mavenlin 2 місяці тому

      @@stevenkao4800 I don't like replay buffer either. But if growing/pruning the network would ever work without replaying. It needs to have some form of theory that guarantees the retention of old information. I wonder if the "similar direction" argument can be formalized as such a guarantee.

    • @stevenkao4800
      @stevenkao4800 2 місяці тому

      I think that even with today’s standard neural networks, it is hard to pose any theoretical guarantees on their knowledge or ability. The most we can say is that they seem to work well most of the time.
      This is actually a strength of neural networks-their approximate nature makes them extremely flexible.

  • @hemig
    @hemig 4 місяці тому +3

    Great thinking. But, doesn't this reverse regularization and tend to overfit?

    • @andrewferguson6901
      @andrewferguson6901 4 місяці тому +6

      Overfit continously and you might end up somewhere