o3 Inference Time CoT Reasoning: How relevant is SFT and RL?

Поділитися
Вставка
  • Опубліковано 22 гру 2024

КОМЕНТАРІ • 11

  • @GodbornNoven
    @GodbornNoven 12 годин тому +4

    Increasing Inference time compute just improves what is already there, multiplying the already present foundation. While interesting, it doesn't revolutionize the core concepts that are required for AGI. Concepts like neuroplasticity, SNN's, hierarchical processing of concepts(words vs sentences vs abstract thoughts). Better transfer learning, working on methods to avoid catastrophic forgetting to create the potential for continuous learning
    We don't want to just mindlessly throw compute at a problem, that's barbaric, even though it works, there are naturally much better things we can work on.
    I think right now instead of taking something we know works and expanding upon that, it might be better to innovate. The improvements brought by the increase in inference time won't really stop but its a log scale, i think we ought to focus our efforts in trying to find new ways instead and how to make all these new breakthroughs work together in an efficient and reliable way.

    • @mrd6869
      @mrd6869 9 годин тому +1

      Or you can avoid the comment section and build something better.
      I'll wait🤣

  • @luke.perkin.online
    @luke.perkin.online 14 годин тому +1

    There's test time fine tuning too, on nearest neighbour examples. Very successful on ARC.

  • @SebastianFabricioMaidanaFariña
    @SebastianFabricioMaidanaFariña 10 годин тому

    I'm a researcher in Paraguay.
    Looking forward to meeting you.

  • @davidhurtado9922
    @davidhurtado9922 12 годин тому

    Just a random idea: would it be possible to train an AI in layers, where each layer trains using meta-analysis of the previous layers, and then the system becomes "meta-trained"? I mean, similar to how humans use their brains-training useful neural pathways through repetition and meta-consciousness, allowing us to learn and refine each layer of consciousness.

  • @CharlotteLopez-n3i
    @CharlotteLopez-n3i 12 годин тому

    Optimizing safety data across pre-training, fine-tuning, and reinforcement learning can reveal key dependencies and enhance O3’s performance. Has anyone explored this in depth?

  • @msokokokokokok
    @msokokokokokok 11 годин тому

    I think what they do is 4 step 1. Create synthetic data for given task say 1000 2. Train a reward Model to plan to generate synthetic output given synthetic input 3. Train a policy to optimise thinking on that task 4. Use policy to generate thinking token to solve test time task

  • @zandrrlife
    @zandrrlife 13 годин тому +1

    First. A+ quality doc per usual. However, i still seeing cognitive dissonance on pretrain and Sft/RL. Pretrain for helpfulness? How is it helpful if the model learns the data, instead of learning from it? Qwen loss didn’t really improve with qwq post-training, rather the benefits only seen during test-time. Apparently openAI o series did though. I also hear they had about 50 trillion tokens synthetic data. Reasoning…I mean preference modeling in general. How is it not obvious there is a significant KL divergence gap between the two modes. It’s insane to me the solution is so obvious. Hybrid/synthetic data with full pretraining coverage. This way we can decouple atom, biases…etc from the raw data. Since procedural knowledge drives models. We know reasoning and values are encoded in the data. We can decouple this and make it more explicit…otherwise. Especially for reasoning. The data scale you need to overcome pretrain inductive biases imo, is GREAT. This has to be a pretraining thing. “Physics of language models” is simple and something i frequently refer to, but it highlights this intuition. We must close the distribution gap, otherwise all post-training is suboptimal. Anthropomorphic model trying to escape is a pretrain data failure. Not weird alien behavior from the model.

  • @GoronCityOfficialBoneyard
    @GoronCityOfficialBoneyard 5 годин тому

    The funny thing is I work in AI safety research, not professionally but the group running this channel basically breaks it, always go with a logic loop, always go basic, then move into more complex as the higher order checks lack understanding. Even if you train under the chain of thought or other reasoning models they still have to check the outputs, meaning you need a higher order check sum system which needs abstract output reasoning

  • @vrc5674
    @vrc5674 8 годин тому

    Are you aware of any research where they train LLMs on Tableau reasoning steps or teaching LLM's to shortcut logical reasoning by applying (complex) learned laws/rules inherent in logical reasoning or boolean algebra? Sorta like applying DeMorgan's laws, etc to boolean algebra expressions to simplify them down. In theory an LLM may be able to discover more complex logical expressions that can be simplified down in a single step by encoding the expression into a vector that can in essence search a vector space for the simplification rather than going through the painfully slow process of testing the validity of each individual logical arguement. In this way, I wonder if an LLM can sorta "feel" its way to an answer and then work backward from the answer to test whether or answer is sound based on the initial assertions. I feel this might be closer to how humans actually reason. We employ a sort of pseudo-logical reasoning where its very close to formal logic but its quicker and not as rigorous.

  • @richsoftwareguy
    @richsoftwareguy Годину тому

    Happy to see that annoying intro HELLLOOO is gone.. might be able to watch some videos now 👌