Can LLMs Reason & Plan? (Talk

Поділитися
Вставка
  • Опубліковано 28 січ 2025

КОМЕНТАРІ • 9

  • @julkiewitz
    @julkiewitz 9 місяців тому +2

    So basically an AlphaGo Master (?) architecture? Seems like AlphaGo Zero was kind of an appendix in the sense that it just got rid of its planner-driven System 2 in favor of a hugely overgrown System 1. Now that's good enough against humans who also cannot possibly analyse that many possibilities either and also revert to System 1 often. But maybe that's actually an inferior architecture for generalization. At least until somebody actually makes progress in NN-driven Systems 2s.

  • @kulkarniankur
    @kulkarniankur 6 місяців тому

    I think you have shown that LLMs cannot reason or plan for your definition of planning. But they can compose essays, and doesn't the very act of composing an essay involve a kind of planning -- the organization of ideas, breaking them down into paragraphs, and expressing them through carefully chosen words? They seem to be doing planning, but in the domain of words and linguistically expressed ideas.

    • @szebike
      @szebike 5 місяців тому +1

      What you describe can be extracted statistically, so given enough essay trainingdata you can extract where to put which words to look like a convincing essay without really thinking about creating an essay.

  • @BrianPeiris
    @BrianPeiris 8 місяців тому +1

    Prof. Rao, I've had a short discussion with Liron Shapira and we were wondering if you feel strongly enough about this argument that you would make a prediction about what GPT-5 *won't* be able to do. Assuming GPT-5 will just be a bigger transformer with more training data, more parameters, and better RLHF, could you predict that it still won't be able to solve your Randomized Mystery Blocksworld problems past, say 10%?

    • @billykotsos4642
      @billykotsos4642 8 місяців тому

      does solving 10% of the problems make it imrpessive?

    • @BrianPeiris
      @BrianPeiris 8 місяців тому

      @@billykotsos4642 Maybe not impressive, but it would be surprising. At 20:16, Rao shows that GPT-4 can only get to 2% with a Randomized Mystery Blockworld. Humans can solve it at close to 100%. Going from 2% to 10% would at least be a bit of a signal that there's more to transformer-based LLMs expected.

    • @billykotsos4642
      @billykotsos4642 6 днів тому

      @@BrianPeiris I wonder how O1 would fair here

    • @BrianPeiris
      @BrianPeiris 6 днів тому

      @@billykotsos4642 Indeed, or o3 for that matter. I would also like to see updated stats on the blocksworld problems, but the ARC-AGI scores for o3 are pretty surprising. Chollet thinks that ARC-AGI-2 will bring the scores down considerably though, so it's possible that blocksworld is still a challenge.

    • @billykotsos4642
      @billykotsos4642 6 днів тому +1

      @@BrianPeiris I just had a look and there is a new paper on arxiv by the author covering o1-preview. It seems that there is a significant step up compared to LLMs (the paper calls 01-like models LRMs -> ‘Language Reasoning Models’) I need to go through the paper thoroughly though… planning is obviously something that simple LLMs and even LRMs can’t do out of the box. It would be great to see also how DeepSeek models fair on these benchmarks.