Open Reasoning vs OpenAI

Yup, QwQ is CRACKED: Prompt Chaining with Qwen and QwQ reasoning model (Ollama + LLM)

Предел развития НЕЙРОСЕТЕЙ

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

Хто такий РОМАН СВІТАН? Звідки бере інформацію про фронт?

Wall Rebound Challenge 🙈😱

Reasoning LLMs battle : Qwen QwQ vs OpenAI o1 vs o1 mini vs Deepseek r1.

YJxAI

Переглядів 3 109

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 8 січ 2025

КОМЕНТАРІ •

@jeffwads Місяць тому ⁺⁶
Good informative video. A suggestion: a chart at the end with pass or fail for the models.
@YJxAI Місяць тому ⁺¹
good suggestion thanks for that.
@madeniran Місяць тому ⁺⁶
For the Chinese models try swapping the word Unicorn with Qilin or Kirin.
They somewhat resemble a Unicorn - Horned Horse.
@YJxAI Місяць тому ⁺¹
hmm. SHould try this.
@TheDiamondHawkOfficial Місяць тому ⁺¹
Thanks for the info bro,
@YJxAI Місяць тому
welcome bro :)
@wwkk4964 Місяць тому ⁺⁵
Great content!
@YJxAI Місяць тому ⁺²
Thank you very much.
@ZachKang-c2p Місяць тому ⁺²
great video!
@emport2359 Місяць тому ⁺²
Sick video man
@YJxAI Місяць тому ⁺²
thanks man : )
@tescOne Місяць тому
LOL that's the funniest thing. The actual "strawberry" model can perfectly guess how many r's are in "strawberry", but if you make it just a tiny bit more complicated, it fails as bad as before. @Chollet would laugh at this so much xD
@YJxAI Місяць тому
😂
@iamboring2535 Місяць тому ⁺⁴
gemini 1121 got all the questions right expect for the earnings problem and the unicorn svg
@YJxAI Місяць тому ⁺²
comming up with it's video :). Actually planned that but this reasoning mode dropped.
@sangeetanarendrasingh5416 Місяць тому ⁺¹
Did you write the prompts yourself or did you get them from someplace?
@YJxAI Місяць тому
i have picked them up from various exams. The earning problem i made it. It was when o1 was released and when i tested it personally it shattered my questions so came up with that.
Thanks for noticing.
@underTheStorm Місяць тому ⁺¹
So which is best?
@YJxAI Місяць тому ⁺¹
It's the o1 but we also see that you might also get away with o1-mini.
1.o1 (good overall)
2.o1-mini (Good when you have very specific issue )
3.Deepseek r1( could be cheaper than the too but api release will tell)
4.QwenQWQ ( The cheapest , Deepseek r1's api will tell if it retains that. Brings reasoning abilities to actual usable prices.)
I hope it was helpful.
:)
@JEHOASJY Місяць тому ⁺¹
When openAI makes a breakthrough other companies soon followed.
@Ahmadtayyem Місяць тому
But openai is not actually open! All models are depends on the google research for the transformars even chatgpt
@haroldpierre1726 Місяць тому
They have no moat!
@successahead5598 Місяць тому ⁺¹
windsulf taking over
@YJxAI Місяць тому
I build first android application with it yesterday. tears🥹
@harriehausenman8623 Місяць тому ⁺¹
pretty much meaningless. via the webinterface, you never know what model version you get and esp. OpenAI is known for making A-B tests. So you have to use the API.
And a temperature above 0 makes no sense for these kind of tests.
@YJxAI Місяць тому ⁺¹
I get you bro but the point is.
API pricing is of the charts. (o1-preview)
And people will be using in most of the cases the chatgpt version.
Yes.There could be internal system prompt change . Hidden AB tests. (yeah that is a downside but happens rarely.)
Known AB tests are there and visible so we know when they come.
All in all i get your point. I have thought about this and other things like factuality of models ( you can watch my "Can you trust LLMs" video).
i have some plans to take these into account but.
If i am being honest i am little busy on something related to family but i will try to get it implemented ASAP.
@harriehausenman8623 Місяць тому ⁺¹
@@YJxAI Makes sense! 😉 You could just use the example python implementations for your tests. Just an idea.

Наступне

Автоматичне відтворення

Open Reasoning vs OpenAI

Open Reasoning vs OpenAI

Yup, QwQ is CRACKED: Prompt Chaining with Qwen and QwQ reasoning model (Ollama + LLM)

Yup, QwQ is CRACKED: Prompt Chaining with Qwen and QwQ reasoning model (Ollama + LLM)

Предел развития НЕЙРОСЕТЕЙ

Предел развития НЕЙРОСЕТЕЙ

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

Хто такий РОМАН СВІТАН? Звідки бере інформацію про фронт?

Хто такий РОМАН СВІТАН? Звідки бере інформацію про фронт?

Wall Rebound Challenge 🙈😱

Wall Rebound Challenge 🙈😱

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Googles GEMINI 2.0 Just SHOCKED The ENTIRE INDUSTRY! (OpenAI Beaten) Full Breakdown

Googles GEMINI 2.0 Just SHOCKED The ENTIRE INDUSTRY! (OpenAI Beaten) Full Breakdown

DeepSeek V3 A 20-Year Developer’s Honest Review After 30 Hours of Coding

DeepSeek V3 A 20-Year Developer’s Honest Review After 30 Hours of Coding

Новый ChatGPT: от новичка до PRO за полчаса. Большой бесплатный курс.

Новый ChatGPT: от новичка до PRO за полчаса. Большой бесплатный курс.

Finally a Competitor! Claude 3.5 Sonnet vs Deepseek v3 Who will Win?

Finally a Competitor! Claude 3.5 Sonnet vs Deepseek v3 Who will Win?

Шахматы на рейтинге 400 разрушили мою жизнь (GothamChess, русская озвучка)

Шахматы на рейтинге 400 разрушили мою жизнь (GothamChess, русская озвучка)

OpenAI's Noam Brown Unpacks the Full Release of o1 and the Path to AGI

OpenAI's Noam Brown Unpacks the Full Release of o1 and the Path to AGI

Сеньоры из #Microsoft не могут построить график функции. FAANG CODE REVIEW #ityoutubersru #itubeteam

Сеньоры из #Microsoft не могут построить график функции. FAANG CODE REVIEW #ityoutubersru #itubeteam

Qwen QwQ-32B Tested LOCALLY: An Open Source Model that THINKS

Qwen QwQ-32B Tested LOCALLY: An Open Source Model that THINKS

Is Gemini Flash 2.0 Worth the hype?

Is Gemini Flash 2.0 Worth the hype?

Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей

Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей

Комаровский. Когда конец войны, Трамп не поможет, потеря Украины, эмиграция, многоженство в Украине

Комаровский. Когда конец войны, Трамп не поможет, потеря Украины, эмиграция, многоженство в Украине

вернулись в ПРОШЛОЕ 🔃 | WICSUR #shorts

вернулись в ПРОШЛОЕ 🔃 | WICSUR #shorts

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

Женская супер-сила 😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец

Женская супер-сила 😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец

The evil clown plays a prank on the angel

The evil clown plays a prank on the angel

How Strong Is Tape?

How Strong Is Tape?

Тайское мороженое в Калининграде

Тайское мороженое в Калининграде