Llama-3.1 (Fully Tested) : Are the 405B, 70B & 8B Models Really Good? (Can it beat Claude & GPT-4O?)

Qwen-2.5 + Hyperbolic + ClaudeDev + Aider : This BEST & FREE AI Coding Setup is AMAZING!

Using Clusters to Boost LLMs 🚀

Молодой паренёк шокировал всех!

Outsmarted 😂

Лишилося кілька днів? Коли буде ракетна атака РФ

Qwen-2.5 & Coder (Fully Tested) : The BEST Opensource LLMs or NOT SO GOOD? (Beats Llama-3.1?)

AICodeKing

Переглядів 7 040

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 19 лис 2024
Наука та технологія

КОМЕНТАРІ • 89

@kafkaesqued 2 місяці тому ⁺⁵
First one to hit
@A_Me_Amy День тому
TEST KING! I do love to see it all, I find picking the mind of these beings to be highly lovely.
@maxmustermann194 26 днів тому
This is exactly the ocmparison I was looking for. No one else has it so it's extremely valuable content. Now I don't have to spend a ton of time to do all the comparision testing and just go with 7b/14b. Also very interesting to see that the 72b didn't do better than the much smaller 7b.
@GiFo34M 2 місяці тому ⁺¹⁰
I would have said C is playing table tennis, because it is not fun for E to play alone...
@AICodeKing 2 місяці тому ⁺²
E might be Forrest Gump.
@maxmustermann194 26 днів тому
I concluded the same tbh. Table tennis is a 2+ player game after all.
@NLPprompter 2 місяці тому ⁺¹
OOOOh The King Test I would not dare to miss that!
@varunaeeriyaulla 2 місяці тому ⁺⁷
Everyone is claiming they beat Claude. 😅
@Remowylliams Місяць тому
Thanks for testing all these versions. 7B does seem good.
@spoomo.o997 Місяць тому
Test king!!!
@sammcj2000 2 місяці тому
0.5b is still useful for one thing - using as a draft model. If you load a small draft model with a larger model exllamav2 and llama.cpp can use the tokeniser from the tiny model to greatly speed up the large model.
@existentialbaby Місяць тому
TEST KINGGGGUUU
@EditUMedia 2 місяці тому ⁺¹
Testking, it's why I watch
@metaliumtux Місяць тому
Nice! I really like these tests. Can you share a list of questions that I can use to try to reproduce the results? I've noticed that if you try to pass the test several times, the results may be different.
@MrMoonsilver 2 місяці тому ⁺¹
Test King ma man
@kioidarlan Місяць тому
Test king
@homberger-it Місяць тому
Testking ✌️
@NLPprompter 2 місяці тому
Hear ye, hear ye! By the grace of His Majesty, I, thy humble servant, do bring tidings most wondrous. In mine inquiries, I have discovered a curious artifice known as "artificial intelligence," possessing powers most arcane. When presented with a task of great import, this sorcerous contraption doth summon forth the ability to inscribe upon parchment, as if by magic, through a mystical invocation they name "function calling in zed." Verily, 'tis a marvel to behold!
Translated: qwen2.5 coder can do workflow file creation in zed.
@gaelnieutin1096 2 місяці тому
Test king ! 👑
@dustfromoldxp 2 місяці тому
test king!
@Techn0man1ac 2 місяці тому
Test King
@zurgmuckerberg Місяць тому
6:23 while the 0.5b has the answer right, it actually has the reasoning completely wrong. In fact, it misunderstood the text. So it should be labeled as FAIL.
7:22 now this is impressive for 0.5b parameters.
@JosephShenouda 2 місяці тому
TESTKING~~!
@lesismodi Місяць тому
test king
@MeinDeutschkurs 2 місяці тому
I don’t want to write testking, but if it helps. 🎉 … my theory: We just don’t know how a butterfly should look like. 🤣🤣
@wellbishop 2 місяці тому
TestKing!
@ajvw5631 Місяць тому
TestKing :)
@alfredonovoa8124 2 місяці тому
Test King
@billybob9247 Місяць тому
test king 🙂
@bobsmith4749 2 місяці тому
Testking
@derkarhanak9734 2 місяці тому
Test king
@clxymox 2 місяці тому
king test for me 🎉
@CodingAnt Місяць тому
Test King is the best
@sammcj2000 2 місяці тому
Are you doing all your tests with temperature = 0? If not you’ll have randomness impacting the outputs giving you inconsistent results.
@Bolidoo Місяць тому
Yes but putting temperature 0 may worsen the performance
@stef4614 2 місяці тому
Great vid! Is your goto coding combo still 3.5 sonnet for code gen and qwen 1.5B for code completion ?
@AICodeKing 2 місяці тому ⁺¹
Yes, but I have recently switched for Supermaven for code completion. It's better and free.
@stef4614 2 місяці тому
@@AICodeKing alright I’ll give a try 👀 thank you 👊
@gtarptv_ 2 місяці тому
TestKing
@vishaldwdi 2 місяці тому
Testking
@diplobla 2 місяці тому
test king :-)
@truepilgrimm 2 місяці тому
C is playing table tennis. One cannot play table tennis alone.
@AICodeKing 2 місяці тому
You can play table tennis alone.
@wikittywhacktv 2 місяці тому
@@AICodeKing yep. i love playing table tennis alone. I score with every shot!
@evidenceX 2 місяці тому ⁺¹
These are not good models they seems to be focus on beating benchmarks instead focusing on reasoning of the models
@XaqBinks 2 місяці тому
Test king so you know
@abdullahzafar4401 Місяць тому
I have 64GB Ram and 12GB vram, can I run Deepseek v2.5 locally ?
lmstudio didn't work , It can offload to ram after vram is full
Is there something that can also offload to ssd ... , can ollama do that ?
@ZVCi Місяць тому
normally for 7b model ollama u need minimum 6gb vram. so for DS 2.5 may not
@GizerPlay Місяць тому
testking
@quantanovabusiness 2 місяці тому
test king
@SamuelDevdas 2 місяці тому
Planning on publishing your test bench results as link or website? Do it before somebody else copies your idea😅
@SamuelDevdas 2 місяці тому
On a side note it would also be helpful to see the results side by side with other similar models to compare and choose
@AICodeKing 2 місяці тому ⁺¹
It's too much work. I am already exhausted by testing them. But, I can do it. Good suggestion.
@crushfire2004 2 місяці тому
do 1.5B/7B coder model support for fim/usable for autocompletion?
@AICodeKing 2 місяці тому ⁺¹
They don't support fim but can be used in autocompletion
@UserB_tm 2 місяці тому
Test king😂
@MuhanadAbulHusn 2 місяці тому
Testing
@Phil-W 2 місяці тому ⁺¹
Testing, although I'm not sure this is a good use of video time. You could just do the tests and report on them.
@AICodeKing 2 місяці тому ⁺²
Most people want to see the responses as well. That's why I do it like this.
@다루루 2 місяці тому
Hmmmmm… 🧐
@aryindra2931 2 місяці тому ⁺¹
Third❤❤❤❤
@Jonaskaimynas 2 місяці тому
Test
@MeloudasPain 2 місяці тому
999
@AdminOne-hj4dt 2 місяці тому ⁺¹
Second 😊
@phargobikcin 2 місяці тому
test king skipping
@anasghgyc68 2 місяці тому
Lol Qwen2.5 is a mess.
@Luceo-xr6hb 2 місяці тому
test king(?)
I wonder in what kind of situation the small models like 0.5B and 1.5B can be useful. From what I've seen it is more like a AI toy instead of a AI model.
@AICodeKing 2 місяці тому ⁺¹
I think translation could be one strong suite. But, I think that there are some issue in that as well.
@leosousa7404 Місяць тому
Test King
@MrHugomesquita Місяць тому
Test king
@yossifibrahem7687 17 днів тому
test king
@mduthwala439 2 місяці тому
Test king
@SuperKrafs 2 місяці тому
TestKing
@maccloud8526 2 місяці тому
Testking
@bodhi.advayam Місяць тому
Testking
@bodhi.advayam Місяць тому
Really love your content Bro! As wel your calm and soothing way of doing it. You bring it in your own unique way .. Plus you make very helpful material!
@PhucNguyenTrong-s4l 2 місяці тому
test king
@raphaelheard835 2 місяці тому
Testing
@jksoftware1 Місяць тому
Test King
@thomastse5923 2 місяці тому
Test king
@jimlynch9390 2 місяці тому
test king
@FrancescoAlmanzo 2 місяці тому
Testking
@bondlove8235 2 місяці тому
Test king
@motomanta 2 місяці тому
Testking
@ChristianSchoppe 2 місяці тому
Testking
@mulderbm 2 місяці тому
Testking
@mulderbm 2 місяці тому
But why not create the landing page for the testking.and only maintain the dynamic question list generated by an LLM?

Наступне

Автоматичне відтворення

Llama-3.1 (Fully Tested) : Are the 405B, 70B & 8B Models Really Good? (Can it beat Claude & GPT-4O?)

Llama-3.1 (Fully Tested) : Are the 405B, 70B & 8B Models Really Good? (Can it beat Claude & GPT-4O?)

Qwen-2.5 + Hyperbolic + ClaudeDev + Aider : This BEST & FREE AI Coding Setup is AMAZING!

Qwen-2.5 + Hyperbolic + ClaudeDev + Aider : This BEST & FREE AI Coding Setup is AMAZING!

Using Clusters to Boost LLMs 🚀

Using Clusters to Boost LLMs 🚀

Молодой паренёк шокировал всех!

Молодой паренёк шокировал всех!

Лишилося кілька днів? Коли буде ракетна атака РФ

Лишилося кілька днів? Коли буде ракетна атака РФ

ЭТО самый бесполезный овощ? #картошка #картофель #овощи #питание #здоровье #психосоматика

ЭТО самый бесполезный овощ? #картошка #картофель #овощи #питание #здоровье #психосоматика

Machine Learning for Everybody - Full Course

Machine Learning for Everybody – Full Course

QWEN 2.5 Coder (32B) LOCALLY with Ollama, Open WebUI and Continue

QWEN 2.5 Coder (32B) LOCALLY with Ollama, Open WebUI and Continue

I love small and awesome models

I love small and awesome models

Qwen 2.5 Coder - 7B (Fully Tested): Can It Crush Deepseek 2.5 & Claude Sonnet as Code assistant ?

Qwen 2.5 Coder - 7B (Fully Tested): Can It Crush Deepseek 2.5 & Claude Sonnet as Code assistant ?

Qwen-2.5: The BEST Opensource LLM EVER! (Beats Llama 3.1-405B + On Par With GPT-4o)

Qwen-2.5: The BEST Opensource LLM EVER! (Beats Llama 3.1-405B + On Par With GPT-4o)

Project IDX + ClaudeDev + Aider : This is the BEST CLOUD AI Editor by GOOGLE beats Cursor, Bolt & V0

Project IDX + ClaudeDev + Aider : This is the BEST CLOUD AI Editor by GOOGLE beats Cursor, Bolt & V0

QWEN 2.5 72b Benchmarked - World's Best Open Source Ai Model?

QWEN 2.5 72b Benchmarked - World's Best Open Source Ai Model?

Data Analysis with Python for Excel Users - Full Course

Data Analysis with Python for Excel Users - Full Course

Mistral Small-2 (Fully Tested) : This NEW SMALL Model is GREAT! (w/ Free API & Beats Llama-3.1)

Mistral Small-2 (Fully Tested) : This NEW SMALL Model is GREAT! (w/ Free API & Beats Llama-3.1)

Самая дешевая VS самая дорогая видеокарта

Самая дешевая VS самая дорогая видеокарта

Соцмережі не для дітей, Поліція проти iPhone, Кінець Pixel, новий Steam Deck | Маленькі Новини №71

Соцмережі не для дітей, Поліція проти iPhone, Кінець Pixel, новий Steam Deck | Маленькі Новини №71

🔋Когда менять аккум? Проверь это секретным кодом 🤫

🔋Когда менять аккум? Проверь это секретным кодом 🤫

Глюк на моем iPhone 16 Pro Max

Глюк на моем iPhone 16 Pro Max

Лучшая БЮДЖЕТНАЯ экосистема Apple!

Лучшая БЮДЖЕТНАЯ экосистема Apple!

Этот парень ускорил работу своего компьютера с помощью аквариума! #рекомендации #факты #интересно

Этот парень ускорил работу своего компьютера с помощью аквариума! #рекомендации #факты #интересно

Как создаются Микрочипы? Этапы производства процессоров [Branch Education на русском]

Как создаются Микрочипы? Этапы производства процессоров [Branch Education на русском]

Huawei trifold💀☠️ #edit #edits #humor #troll #trendingshorts #trollface #memes

Huawei trifold💀☠️ #edit #edits #humor #troll #trendingshorts #trollface #memes