Reasoning LLMs battle : Qwen QwQ vs OpenAI o1 vs o1 mini vs Deepseek r1.

Поділитися
Вставка
  • Опубліковано 8 січ 2025

КОМЕНТАРІ •

  • @jeffwads
    @jeffwads Місяць тому +6

    Good informative video. A suggestion: a chart at the end with pass or fail for the models.

    • @YJxAI
      @YJxAI  Місяць тому +1

      good suggestion thanks for that.

  • @madeniran
    @madeniran Місяць тому +6

    For the Chinese models try swapping the word Unicorn with Qilin or Kirin.
    They somewhat resemble a Unicorn - Horned Horse.

    • @YJxAI
      @YJxAI  Місяць тому +1

      hmm. SHould try this.

  • @TheDiamondHawkOfficial
    @TheDiamondHawkOfficial Місяць тому +1

    Thanks for the info bro,

    • @YJxAI
      @YJxAI  Місяць тому

      welcome bro :)

  • @wwkk4964
    @wwkk4964 Місяць тому +5

    Great content!

    • @YJxAI
      @YJxAI  Місяць тому +2

      Thank you very much.

  • @ZachKang-c2p
    @ZachKang-c2p Місяць тому +2

    great video!

  • @emport2359
    @emport2359 Місяць тому +2

    Sick video man

    • @YJxAI
      @YJxAI  Місяць тому +2

      thanks man : )

  • @tescOne
    @tescOne Місяць тому

    LOL that's the funniest thing. The actual "strawberry" model can perfectly guess how many r's are in "strawberry", but if you make it just a tiny bit more complicated, it fails as bad as before. @Chollet would laugh at this so much xD

    • @YJxAI
      @YJxAI  Місяць тому

      😂

  • @iamboring2535
    @iamboring2535 Місяць тому +4

    gemini 1121 got all the questions right expect for the earnings problem and the unicorn svg

    • @YJxAI
      @YJxAI  Місяць тому +2

      comming up with it's video :). Actually planned that but this reasoning mode dropped.

  • @sangeetanarendrasingh5416
    @sangeetanarendrasingh5416 Місяць тому +1

    Did you write the prompts yourself or did you get them from someplace?

    • @YJxAI
      @YJxAI  Місяць тому

      i have picked them up from various exams. The earning problem i made it. It was when o1 was released and when i tested it personally it shattered my questions so came up with that.
      Thanks for noticing.

  • @underTheStorm
    @underTheStorm Місяць тому +1

    So which is best?

    • @YJxAI
      @YJxAI  Місяць тому +1

      It's the o1 but we also see that you might also get away with o1-mini.
      1.o1 (good overall)
      2.o1-mini (Good when you have very specific issue )
      3.Deepseek r1( could be cheaper than the too but api release will tell)
      4.QwenQWQ ( The cheapest , Deepseek r1's api will tell if it retains that. Brings reasoning abilities to actual usable prices.)
      I hope it was helpful.
      :)

  • @JEHOASJY
    @JEHOASJY Місяць тому +1

    When openAI makes a breakthrough other companies soon followed.

    • @Ahmadtayyem
      @Ahmadtayyem Місяць тому

      But openai is not actually open! All models are depends on the google research for the transformars even chatgpt

    • @haroldpierre1726
      @haroldpierre1726 Місяць тому

      They have no moat!

  • @successahead5598
    @successahead5598 Місяць тому +1

    windsulf taking over

    • @YJxAI
      @YJxAI  Місяць тому

      I build first android application with it yesterday. tears🥹

  • @harriehausenman8623
    @harriehausenman8623 Місяць тому +1

    pretty much meaningless. via the webinterface, you never know what model version you get and esp. OpenAI is known for making A-B tests. So you have to use the API.
    And a temperature above 0 makes no sense for these kind of tests.

    • @YJxAI
      @YJxAI  Місяць тому +1

      I get you bro but the point is.
      API pricing is of the charts. (o1-preview)
      And people will be using in most of the cases the chatgpt version.
      Yes.There could be internal system prompt change . Hidden AB tests. (yeah that is a downside but happens rarely.)
      Known AB tests are there and visible so we know when they come.
      All in all i get your point. I have thought about this and other things like factuality of models ( you can watch my "Can you trust LLMs" video).
      i have some plans to take these into account but.
      If i am being honest i am little busy on something related to family but i will try to get it implemented ASAP.

    • @harriehausenman8623
      @harriehausenman8623 Місяць тому +1

      @@YJxAI Makes sense! 😉 You could just use the example python implementations for your tests. Just an idea.