How does Gemini compare to GPT-4?

Is Chain of Thought faithful?

What does AI believe is true?

人是不能做到吗？#火影忍者 #家人 #佐助

Нельзя смеяться | Смех с водой | 97 #shorts

Женская супер-сила 😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец

How strong is Claude 2?

Samuel Albanie

Переглядів 5 298

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 2 січ 2025

КОМЕНТАРІ •

@henryholloway5656 Рік тому ⁺⁹
I'm digging your videos man, you're doing really great work
@SamuelAlbanie1 Рік тому
Thanks @henryholloway5656!
@irismaxj Рік тому ⁺²
Thanks for the review and commentary
@SamuelAlbanie1 Рік тому
Thanks!
@southcoastinventors6583 Рік тому ⁺¹
That ending was priceless about the adobe measuring tool its like the guy who picked a car lock with a tennis ball
@tomenglish9340 Рік тому ⁺⁵
I've had only one session with Claude 2. But I've already found deficiencies that I never see in the chat agent based on GPT-4. Claude 2 contradicts itself within single responses. Furthermore, when I tried to nudge it in the right direction, it instead generated worse responses.
I'll mention also that I've come to doubt that multiple-choice questions are appropriate in testing AIs. For humans, performance on multiple-choice tests is predictive of performance on more open-ended tasks. We don't have a good basis for believing the same about agents based on large language models.
@SamuelAlbanie1 Рік тому
Thanks for sharing your perspective.
I agree with your point that it's unclear the degree to which we can consider exams (particularly those designed with the constraints of human memory in mind) as predictive of broader performance for LLMs.
@ram49967 Рік тому ⁺³
Really appreciate your excellent selection of excerpts from technical papers that I never even knew existed. Your interaction with the AI is also very instructive. I believe that it is now possible to upload up to 5 pdf files at once, as well as documents in several other formats. I thought that the 100k token limit was for the combination of input and response, but it may be only for input, whereas the token limit for response is 4k.
@SamuelAlbanie1 Рік тому
Thanks @ram49967!
@AIWRLDOFFICIAL Рік тому ⁺²
Thanks
@SamuelAlbanie1 Рік тому
You're most welcome.
@foreignconta Рік тому ⁺¹
Your videos are awesome. Can't wait for more.
@SamuelAlbanie1 Рік тому
Thanks @AkarshanBiswas - much appreciated.
@JL-zl6ot Рік тому ⁺¹
really enjoy your walkthrough!
@SamuelAlbanie1 Рік тому
Thanks @JL-zI6ot!
@juliangawronsky9339 Рік тому ⁺³
8:22 Thanks for this insight
@SamuelAlbanie1 Рік тому
Thanks @juliangawrongsky9339.
@TheDRAGONFLITE Рік тому ⁺²
Keep going!
@SamuelAlbanie1 Рік тому
Thanks for the encouragement!
@alpaca_llama Рік тому ⁺¹
awesome
@SamuelAlbanie1 Рік тому
Thanks!
@zakgoldwasser7224 Рік тому ⁺²
Keep it up man!
@SamuelAlbanie1 Рік тому
Thanks! I'll do my best
@TheManinBlack9054 Рік тому ⁺¹
Love your humor lol! gj!
@SamuelAlbanie1 Рік тому
Thanks @TheManinBlack9054!
@reinerheiner1148 Рік тому ⁺¹
claude 2 is an awesome model, but it has serious problems with hallucinations. That could probably be fixed by providing it with data access via web access, and for the data files, they need to do some finetuning to make sure claude does not invent data that is not found in the files provided. Providing references in claude 2's respones would probably help. If I were anthropic, I would have realized that this is the most pressing concern for claude at the moment. On a positive note, I feel like for an estimates 174b parameters, claude 2 comes very close to gpt4 - which is said to be a bunch of experts, making it even more impressive.
@SamuelAlbanie1 Рік тому
Thanks for sharing your experience with the model!
@quantumjun Рік тому ⁺³
Hope we never reach the end
@SamuelAlbanie1 Рік тому ⁺¹
I'm not entirely sure I understand your meaning. But I hope this too.
@quantumjun Рік тому
@@SamuelAlbanie1 12:46
@cholst1 Рік тому ⁺¹
For me so far. It's very unwilling to return full code in its answers (longer code), quite annoying. And wont even try refactoring larger codebases. Just returns summaries and psuedocode.
@SamuelAlbanie1 Рік тому
Thanks for sharing your experience. I've mainly used it for summarisation so far - less so for coding. It's interesting to hear that Claude 2 may be less well-suited for that use-case.
@cholst1 Рік тому
@@SamuelAlbanie1 For summarizing large texts its been great, but a bit of a battle code wise, also code quality is not as good in most cases as GPT4. At least in my experience. Though it may have some more interesting ideas some times as to what to do (ie, reviewing/creating a ticket for given code).
I have yet to try Sourcegraph though, it looks like it may be a good competitor to Copilot Chat (which is pretty useless running on 3.5).
@denisblack9897 Рік тому
as much as i love my country and its people - ai unavailability makes me want to leave russia
seems like it was planned to force russia out of ai race
@SamuelAlbanie1 Рік тому
Have you experimented with SberBank GigaChat? (I saw it announced, but haven't tried it myself.)
@-blackcat-4749 Рік тому ⁺¹
That was a expected condition. A 👌 standard day
@SamuelAlbanie1 Рік тому
I think that's a reasonable assessment at this time.
@-blackcat-4749 Рік тому ⁺¹
That was a habitual 📍 moment. A commonplace event
@SamuelAlbanie1 Рік тому
Thanks for the question. I don't think we can read too much into the homogeneity of the context from the figure - it's primarily aimed at demonstrating that the loss continues to trend downwards.
That being said, intuitively it seems highly plausible that extending to a significantly larger context window may diminish the model's ability to pick up on details within the window (relative to an alternative that uses a a similar compute budget but a more compact window). I think it's an open question though.

Наступне

Автоматичне відтворення

How does Gemini compare to GPT-4?

How does Gemini compare to GPT-4?

Is Chain of Thought faithful?

Is Chain of Thought faithful?

What does AI believe is true?

What does AI believe is true?

人是不能做到吗？#火影忍者 #家人 #佐助

人是不能做到吗？#火影忍者 #家人 #佐助

Нельзя смеяться | Смех с водой | 97 #shorts

Нельзя смеяться | Смех с водой | 97 #shorts

Женская супер-сила 😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец

Женская супер-сила 😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

Can Claude 2 Outperform ChatGPT? Anthropic Thinks So!

Can Claude 2 Outperform ChatGPT? Anthropic Thinks So!

AI can't cross this line and we don't know why.

AI can't cross this line and we don't know why.

Alignment Faking in Large Language Models

Alignment Faking in Large Language Models

What is Superalignment?

What is Superalignment?

Can we verify training data?

Can we verify training data?

100% Offline ChatGPT Alternative?

100% Offline ChatGPT Alternative?

Sparks of AGI: early experiments with GPT-4

Sparks of AGI: early experiments with GPT-4

"How to give GPT my business knowledge?" - Knowledge embedding 101

"How to give GPT my business knowledge?" - Knowledge embedding 101

Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Full Paper Review)

Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Full Paper Review)

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

ГРАВИТАЦИЯ! ВЫЖИВАНИЕ на ЛЕТАЮЩЕМ ОСТРОВЕ(DDprod.) в РАСТ/RUST

ГРАВИТАЦИЯ! ВЫЖИВАНИЕ на ЛЕТАЮЩЕМ ОСТРОВЕ(DDprod.) в РАСТ/RUST

"Бажано відбити посадку без втрат": військовий розповів, як загибель побратимів впливає на психіку

"Бажано відбити посадку без втрат": військовий розповів, як загибель побратимів впливає на психіку

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Дубровский, Позов, Мамикс, Катя Клэп, Егорик, Кадрол, Столяров, Масленников

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Дубровский, Позов, Мамикс, Катя Клэп, Егорик, Кадрол, Столяров, Масленников

СПОРИМ ТЫ НЕ ЗНАЕШЬ ТРИ СЛОВА НА БУКВУ О? #shortsvideo #юмор #катяклон #comedy #прикол #мамадочка

СПОРИМ ТЫ НЕ ЗНАЕШЬ ТРИ СЛОВА НА БУКВУ О? #shortsvideo #юмор #катяклон #comedy #прикол #мамадочка

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

Что будет если украсть в магазине шоколадку 🍫

Что будет если украсть в магазине шоколадку 🍫

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts