o3 Inference Reasoning: How to Build the Training Data Set

Поділитися
Вставка
  • Опубліковано 26 гру 2024

КОМЕНТАРІ • 12

  • @code4AI
    @code4AI  18 годин тому

    With the automatic audio dubbing from UA-cam /Google you hear a synthetic voice in your regional language.
    To hear my original voice in English, switch to "Default" or "English" in the settings. Thank you.

  • @tokenranxomizsr
    @tokenranxomizsr 3 дні тому +1

    I like your theme colors! As always, great presentation!

  • @CharlotteLopez-n3i
    @CharlotteLopez-n3i 2 дні тому

    How-to on building o3 training datasets? Invaluable for implementing in 7B LLMs. Insightful on o3 reasoning! Aligned SFT and RL training procedures are key. Thanks!

  • @KitcloudkickerJr
    @KitcloudkickerJr 3 дні тому +3

    as far as i see it, the test at 30:40, o3 did not fail. that question is ambigous at best. the answer it gave was logical. thatwas one of the 9 it slipped on and it needs to be reevvaluated

    • @ramiroramirez8503
      @ramiroramirez8503 3 дні тому

      I disagree. For an AGI, ambiguous questions like this shouldn’t cause it to stumble. As humans, we have the ability to see through ambiguity and still come up with logical answers. That’s the whole point of tests like the ARC AGI benchmark, to evaluate whether an AGI can reason abstractly and handle uncertainty in the same way we do. If the question seems ambiguous, it’s even more important for the AGI to step up and demonstrate its ability to interpret and respond intelligently, just like a human would int these context.

    • @KitcloudkickerJr
      @KitcloudkickerJr 3 дні тому

      @ramiroramirez8503 it's answer was right but was marked wrong by humans. So who's not general intelligence?

    • @wwkk4964
      @wwkk4964 3 дні тому

      The question is stupid. We have 3 examples: and they cover 2 types of cases: a pair of points, and two pairs of points. Then a third set is presented to "generalize". Its literally asking us to presume whatever happens in 2 coordinates is what will happen in 3. it's ridiculous, and I am happy that o3 didn't answer a non general solution to a not so well posed problem..

  • @kaio0777
    @kaio0777 3 дні тому

    man your are on top of things.

  • @davidwynter6856
    @davidwynter6856 3 дні тому

    At 23:30 you say that the OpenAI o1 API has access to python etc. to solve for mathematical correctness. My question is how does o1 know to use python is the cases where it would be an advantage in solving for mathematical correctness or any other problem where a programmatic approach is an advantage?

  • @msokokokokokok
    @msokokokokokok 2 дні тому

    @28:00 We should evolve SFT CoT into RL CoT via rewarding response instead of evolving response generation. Current approach is .. ... < Response>.. .. and content under ... is learnt by SFT and content under . is learnt by RL but we should instead learn content under ... by RL and content under .. should just be used to compute reward as KL divergence from optimal response

  • @saiderico
    @saiderico 2 дні тому

    @code4AI , How do automated theorem proving programs work?
    I don't know much about this, but it seems to me that this is the direction to look.
    1. Can a AI model replace such a program?
    2. Is it possible to translate a chain of thoughts into some formal logic and work with this data from a formal point of view?

  • @wwkk4964
    @wwkk4964 3 дні тому

    30:51 You are saying o3 was not intelligent because it noticed that whatever happens with 2 pairs and 1 pair, cannot be generalized to 3 pairs ? common. imagine the case was x, y,z axis. can we easily claim we know what will happen based on planar examples? its ridiculous