Yoshua Bengio Guest Talk - Towards Causal Representation Learning

Поділитися
Вставка
  • Опубліковано 2 гру 2020
  • In this causalcourse.com guest talk from Yoshua Bengio, Yoshua talks about causal representation learning.
    Playlist link: • Yoshua Bengio Guest Ta...

КОМЕНТАРІ • 20

  • @Ceelvain
    @Ceelvain 3 роки тому +20

    OMG!
    I need Yoshua Bengio as a PhD advisor. I don't care about the PhD I already have.
    *Every* thing he said is exactly what I've been thinking since I've started learning ML a few years ago.
    This is exactly what I want to work on! It is exactly what I believe is the future of ML and AI and humanity.
    What he frames in terms of generalization (and zero-shot learning), I used to frame in terms of discrete (symbolic) model learning.
    What he calls "compositionality" (in the sens of systemic generalization) or "independent mechanisms" is what I used to call "separability".
    That analogy with "System 1 and System 2" modes of reasoning, is what I've had a hard time explaining to my fellow ML enthusiasts.
    Learning a causal model by computing the weights on a bayesian network is also something I thought about. But it assumes we already have the right abstract representational variables (ball position and not pixels). Therefore I didn't think much about it afterwards.
    But the solution with an encoder that can pass the gradient back is very interesting. BTW, is the optimisation problem to determine the edges of the causal graph actually convex? Does it always converge with edge being saturated to 0 or 1? I should probably read some of those papers.
    I would also like to point out that we already have very efficient tools to perform some logic reasonning. Like SMT solvers. I think associating them with a learned causal model could produce really awesome results for long-term planning.
    This brings me to another hard issue I've had: hierarchical planning. We, human, don't plan our every move according to our causal mental model. The whole model is very much too complex. We plan the outline, then plan each segment according to the current situation as we observe it.
    How do we make those "probable shortcut actions" fit into the causal graph? How do we decide what level of details is needed?

    • @amaizel
      @amaizel 2 роки тому +3

      If you have a PhD you can write a research project and try to do a Post-doc with him as a supervisor.

    • @kemalbey271
      @kemalbey271 4 місяці тому

      What are you up to, very inspiring writing thank you

  • @LuyangLUO
    @LuyangLUO 3 роки тому

    Thank Yoshua and Brady for this wonderful talk.

  • @connorshorten6311
    @connorshorten6311 3 роки тому +15

    Amazing. Thank you for all your hard work on this channel, learning a lot!

    • @hoaxuan7074
      @hoaxuan7074 3 роки тому

      Yoshua was one of the first people to introduce ReLU to the world. If he had had an analog electronics background he might have made an observation.
      Anyway I had a slight exchange of emails with him about associative memory and random projection based neural nets. Then I got on his wrong side by saying I had found evolution based algorithms to be okay for training nets. However its true I was using especially fast nets. I think the very simple evolution algorithm Continuous Gray Code optimization works well and I got very good generization behaviour. Likely because the full training set is used at each step rather than batches. That might seem unmanageble at scale. However federated learning is very simple using short sparse lists of mutations. Each CPU has the full neural model and part of the training data (which can be private and local.) Each CPU is sent the same short list of mutations and returns the cost for its part of the training data. The costs are summed and if an improvement an accept message is sent to each CPU else a reject message.
      Very little data is moving around. If you say you have a training set of 1 million images and 100o CPU cores (eg. 250 raspberry pi 4s) then each core only has to evaluate 1000 images. Quite managable. And the cost only about the same as 1 or 2 high end GPUs. Seems okay to me. Of course especially fast neural networks would help.

  • @DistortedV12
    @DistortedV12 3 роки тому

    Someone make this guy live forever!

  • @InquilineKea
    @InquilineKea 2 роки тому +1

    How can you decompose foundational/large language models into informationally independent pieces (modules, mechanisms) =>a factor graph? This presentation covers all that is missing from the NN/DL hype, but if NN/DL makes it easier to create representations to perform logic on, this complements it.

  • @entrastic
    @entrastic 3 роки тому +2

    It is a shame that there is no References slide. How am I supposed to find Kim et.al. 2019 or Bengio et.al. 2017?

  • @TM-su7vu
    @TM-su7vu 6 місяців тому

    Fantastic presentation. I know it’s fairly old now but is there any chance of the full presentation and/or a full citation list being shared?

  • @JTMoustache
    @JTMoustache 3 роки тому +3

    Amazing talk but all knowledge is verbalisable, although the access might not be explicit. For example it is possible to verbalise how to walk, even if it is implicit knowledge. And credit assignment is not over short causal chains at all, just look at addiction, the brain is able to associate very complex sequences of behaviours to consommation. Or how human understand drama, theory of mind can understand very long chains of actions/reactions during human interactions.

    • @Ceelvain
      @Ceelvain 3 роки тому +1

      Humans are very good at post-hoc explanations. We make them up and they seem logical but are in no way the actual knowledge that we have. This is called "motivated reasoning". For instance, there's a psychology experiment where the experimenter shows pairs of pictures of faces to the subject, and the subject chooses which one is the most attractive. Then they go over the pairs of picture again and the subject is asks why they chose this one. The trick is, some pictures have been swapped. Yet the subject is very much convinced it was their choice and come up with an explanation that supports it. Even if they actually made the opposite choice. Even more, when asked, they are very certain of what they say is the absolute truth of their own choice.
      Therefore, we can conclude that what they verbalise is not the implicit knowledge of who is more attractive that you. It's a lie that your brain trick you to believe in order to reduce the cognitive dissonance.
      I'm not sure whether the credit assignment work over long causal chains. I mean, the causal graph might have shortcuts.

  • @444haluk
    @444haluk 2 роки тому +1

    Now I understood why I loved Bengio's lessons but hated his discussions and papers! He is talking about all the RIGHT concepts, but when he tries to go implementation phase, he thinks about graphs, and convolutions, vectors and backprop. Human brain is binary spike machine (like loihi 2). It can do everything it says by causal links (place, grid, head direction, vector cells), memory (neocortex) and prediction (neocortex) of hebbian learning rules, better, faster efficient and more closely how humans do it.
    Listen to everything he says except how he DOES, folks, and you are gonna be just fine.
    HTM solves all these (in theory) by creating an nD space where all axes hold both different motivational gradients and "close" "closed" states. They will probably try to solve it like a maze problem. so you can compose, averse, get close to a state. ever wonder why we draw graphs the way we do? "close" circles in distinct "positions", some "ways" to "go" there, there are sometimes no "way" to do it, you cannot just "jump" to conclusion. Maybe what we have in our mind is literally a sheet (grid cells) and what we want on one side (vectors), where we are on the other and life is literally a travel (place cells) from A to B. Think about it, why do you represent the thing in your mind the way you do it?

    • @manojdude3
      @manojdude3 2 роки тому

      Hey, whats HTM ??

    • @444haluk
      @444haluk 2 роки тому

      @@manojdude3 search Numenta HTM school video playlist

    • @manojdude3
      @manojdude3 2 роки тому

      @@444haluk thanks 👍

    • @buh357
      @buh357 Рік тому

      Does HTM stand for Hierarchical temporal memory?

  • @mathtick
    @mathtick 3 роки тому +1

    The talk seems mostly fairly obvious (if you are thinking about causal reasoning and understand composition, api semantics for "why" and passing on modes, temporal learning etc). He skipped over all the interesting details of things in the papers.
    Can someone clarify who the intended audience is and is any of this stuff not obvious or at least surprising to young AI folks? Is everyone just learning deep regression? Or am I missing something? I am asking about *why this talk* and *who is the audience* basically.
    Only surprising/interesting thing for me was that he associated catastrophic forgetting with bad causal factor decompositions.
    It was a good list of papers I want to have a look at for learning large causal networks though.

    • @off4on
      @off4on 3 роки тому

      I have the same feeling. I'm not sure who he is talking to.
      I know he's a big deal and what not, but this is a terrible talk.

    • @aiikaiik
      @aiikaiik 3 роки тому +5

      At least, it was quite informational for me. Hey, why don't you give your own talk let's see.