A Natural Language AI (LLM) SQL Database - Could this work?

Mixture of Models (MoM) - SHOCKING Results on Hard LLM Problems!

GPTScript - The first AI-programming language worth using?

«У коридор вийшли і все прилетіло»: жителька пошкодженого будинку про удар РФ по Запоріжжю

Кирило Верес / "Я міг би втрачати менше людей" / "Сподіваюсь я до когось достукаюсь"

I Took An iPhone 16 From A POSTER! 😱📱 #shorts

Autonomous Open Source LLM Evaluator (Ollama) - Full Guide

All About AI

Переглядів 4 800

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 4 жов 2024

КОМЕНТАРІ • 18

@gunabalang9543 3 місяці тому ⁺²
I love the way you are using AI .....
@MrSuntask 3 місяці тому
Livestream was great!
@Ms.Robot. 3 місяці тому
This is a useful idea🎉
@ArseniyPotapov 3 місяці тому
I've built a similar system, but I noticed that judge model sometimes hallucinates and gives high marks to obviously wrong solutions. I tried to make a jury of multiple judges (different big models) this improved judging quality, but made it 8X slower. Also, with multiple judges you will need to fuse their judgements to some consensus, it's just pretty slow and all models do hallucinate.
@ArseniyPotapov 3 місяці тому ⁺¹
One of the problems almost all models suck at is the puzzle "a fox, a chicken and a sack of grain" or ("wolf, goat and cabbage problem"). All models recognize that it's a classic puzzle, but only few can give a coherent solution without weird glitches
@ProfessorCrumbs65 3 місяці тому
aya:35b blows everything out of the window. Not ten times better then chatGPT but one hundred times better. It's slow as it's 35B run locally but, I love it. Besides that I use llama3 for most everyday tasks..
@thenarrowgate3063 3 місяці тому
In The Bubble sort evaluation, all the models that were eval as wrong (MIstral, Codestral..etc) had a syntax error in line 1 because it included the output text as a line of code as for the code itself it was sound on all..so it is not a proper eval as you need to check your code as to why it worked for a couple but not the others as a simple syntax error that wasnt part of the LLM's code but yours does not make for a proper eval. Other than its a cool idea
@delta-gg 3 місяці тому
3:20 What if Kayley is a boy?
@tonywhite4476 3 місяці тому
May I ask what is your roadmap for this channel?
@j0hnny_R3db34rd 3 місяці тому ⁺¹
Yes, you may.
@Ms.Robot. 3 місяці тому
Who ? Who are you?
@watchdog163 3 місяці тому ⁺²
Roadmap? This is not a crypto coin. 😂
@JohnDoe-zx8bu 3 місяці тому
What is the sense to estimate many models by some more powerful model if this is required for each problem so it would be much faster to just ask GPT-4 for an answer of the problem
@TwoWayOrbitalStation 3 місяці тому
Because chatgpt can not be run locally. If you can evaluate what the best small local model is for a task, then you can use that model locally on your pc. If you have sensative code or senstative information, you dont want to pass this through chatgpt since openai will take your data, so you run locally. Not to mention, running locally is completely free, where using chatgpt api is gonna cost you.
The whole point of the test is basic test examples, so then you can pick the model to do a similar more complex task
@JohnDoe-zx8bu 3 місяці тому
@@TwoWayOrbitalStation the issue I see is that results might be different for slightly changed tasks. Means for this current task you get right result, but if you try to get answer for similar but different then answer might be wrong. So if you want to use small models locally then need to have some different way to estimate results without ChatGPT
@BinxNet 3 місяці тому ⁺¹
@@JohnDoe-zx8bu In this system, ChatGPT (being the best model) provides a rough approximation of “best” solutions from those provided, saving you tokens on getting it to provide its own lengthy results. Can also use this in a multi-agent loop with whatever LLM it picked to improve output entirely on user side, no additional tokens.
This is just where documentation and an understanding of what you’re asking of the LLMs comes into play. If you’re allowing the blind to lead the blind, of course it’s going to be horrible. However, if you need a bit of help doing this One Thing you just can’t get right, then problem solved.
We are not at the stage of these tools being autonomous fix-alls for every problem.
I’ve seen many people saying things like “ChatGPT almost broke my computer because i tried to get it to help me do this thing”, but the reality is, THEY almost broke their computer bc they had no fking idea what ChatGPT was telling them. Accountability is on the end-user here to determine what is and is not a useful output, and how it then can be applied.

Наступне

Автоматичне відтворення

A Natural Language AI (LLM) SQL Database - Could this work?

A Natural Language AI (LLM) SQL Database - Could this work?

Mixture of Models (MoM) - SHOCKING Results on Hard LLM Problems!

Mixture of Models (MoM) - SHOCKING Results on Hard LLM Problems!

GPTScript - The first AI-programming language worth using?

GPTScript - The first AI-programming language worth using?

«У коридор вийшли і все прилетіло»: жителька пошкодженого будинку про удар РФ по Запоріжжю

«У коридор вийшли і все прилетіло»: жителька пошкодженого будинку про удар РФ по Запоріжжю

Кирило Верес / "Я міг би втрачати менше людей" / "Сподіваюсь я до когось достукаюсь"

Кирило Верес / "Я міг би втрачати менше людей" / "Сподіваюсь я до когось достукаюсь"

I Took An iPhone 16 From A POSTER! 😱📱 #shorts

I Took An iPhone 16 From A POSTER! 😱📱 #shorts

Usyk and Conor McGregor met on AJ vs Dubois fight

Usyk and Conor McGregor met on AJ vs Dubois fight

Building open source LLM agents with Llama 3

Building open source LLM agents with Llama 3

host ALL your AI locally

host ALL your AI locally

Are Larger Context Window Sizes RAG Killers?

Are Larger Context Window Sizes RAG Killers?

Marker: This Open-Source Tool will make your PDFs LLM Ready

Marker: This Open-Source Tool will make your PDFs LLM Ready

Everything you need to know about Fine-tuning and Merging LLMs: Maxime Labonne

Everything you need to know about Fine-tuning and Merging LLMs: Maxime Labonne

What are AI Agents?

What are AI Agents?

How do QR codes work? (I built one myself to find out)

How do QR codes work? (I built one myself to find out)

LLaVA 1.6 is here...but is it any good? (via Ollama)

LLaVA 1.6 is here...but is it any good? (via Ollama)

Aider + CodeQwen : This AI Pair Programmer is BETTER than Github's Copilot (works w/ Ollama, OpenAI)

Aider + CodeQwen : This AI Pair Programmer is BETTER than Github's Copilot (works w/ Ollama, OpenAI)

🌭 BBQ Chili Dog Skillet #Shorts

🌭 BBQ Chili Dog Skillet #Shorts

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Масленников, Дзюба, Полина, L'One, Даник, Мага, Братишкин, Усачев, Чернец

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Масленников, Дзюба, Полина, L'One, Даник, Мага, Братишкин, Усачев, Чернец

The Surprising Truth About Maximizing Your Cat's Earnings

The Surprising Truth About Maximizing Your Cat's Earnings

这娘俩太坏了！合起伙来欺负爸爸 #funny #萌娃 #搞笑#cutebaby

这娘俩太坏了！合起伙来欺负爸爸 #funny #萌娃 #搞笑#cutebaby

Офицер, я всё объясню

Офицер, я всё объясню

СІМБОЧКА: я досі БОЮСЬ Євтушенка. Тиждень носила МЕРТВИЙ ПЛІД. Коли весілля з ПАРФЕНЮКОМ?

СІМБОЧКА: я досі БОЮСЬ Євтушенка. Тиждень носила МЕРТВИЙ ПЛІД. Коли весілля з ПАРФЕНЮКОМ?

Як Порошенко СПІЛКУЄТЬСЯ З ВЛАСНОЮ ДРУЖИНОЮ

Як Порошенко СПІЛКУЄТЬСЯ З ВЛАСНОЮ ДРУЖИНОЮ

когда не обедаешь в школе // EVA mash

когда не обедаешь в школе // EVA mash