Backtesting | LangSmith Evaluations - Part 19

GPT4o: 11 STUNNING Use Cases and Full Breakdown

How to scrape the web for LLM in 2024: Jina AI (Reader API), Mendable (firecrawl) and Scrapegraph-ai

💔 Історія захисника Маріуполя, який втратив ногу, осліп на праве око і пройшов полон. #зсу #shorts

Історія військовослужбовця з ТЦК на Миколаївщині #shortsvideo

НОВЫЙ AMONG US в РЕАЛЬНОЙ ЖИЗНИ! Utopia Show VS Масленников

How to evaluate upgrading your app to GPT-4o | LangSmith Evaluations - Part 18

LangChain

Переглядів 9 164

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 12 тра 2024
OpenAI recently released GPT-4o, which reports significant improvements in latency and cost. Many users may wonder how to evaluate the effects of upgrading their app to GPT-4o? For example, what latency benefit will users expect to gain and are there any material differences in app performance when I switch to the new GPT-4o model.
Decisions like this are often limited by quality evaluations! Here, we show the process of evaluating GPT-4o on an example RAG app with a 20 question eval set related to LangChain documentation. We show how regression testing in the LangSmith UI allows you to quickly pinpoint examples where GPT-4o shows improvements or regressions over your current app.
GPT-4o docs:
openai.com/index/hello-gpt-4o/
LangSmith regression testing UI docs:
docs.smith.langchain.com/old/...
RAG evaluation docs:
docs.smith.langchain.com/old/...
Public dataset referenced in the video:
smith.langchain.com/public/ea...
Cookbook referenced in the video:
github.com/langchain-ai/langs...
Наука та технологія

КОМЕНТАРІ • 13

@ibbobud 26 днів тому ⁺¹
Quick and to the point! Love the eval!
@MaybeTogether 26 днів тому
Thank you. I instinctively started googling, because for me answer accuracy / answer quality is more significant to me
@learnbydoing6010 27 днів тому ⁺¹
So fast. 🎉thank you.
@millingabani 26 днів тому ⁺¹
You guys are awesome!
@ClarkNewlove 27 днів тому
Nice. Thanks for sharing!
@Nairb932 26 днів тому
Keep up the great work
@octaviusp 27 днів тому
ahhaa, very fast reaction! great job
@BenitoMartin-dk7lj 26 днів тому
Amazing!
@_arkadij 26 днів тому
working fast cool
@calvin_banks_music 26 днів тому
Did you make this graphic at 1.48 programmatically or did you import it as image from a different tool?
@luisguillermopardo7792 20 днів тому
I have an script that use pandas agents and tools to solve questions. I updated to gpt-4o and the model has many Issues at uso y the tools compared with gpt-4. Do you know if we have to do any extra setting or something ?
@sounakroy1933 19 днів тому
For eda?
@AI_by_AI_007 26 днів тому
Googles team does AlphaFold and changes the world and Sam gives us NSFW tools….

Наступне

Автоматичне відтворення

Backtesting | LangSmith Evaluations - Part 19

Backtesting | LangSmith Evaluations - Part 19

GPT4o: 11 STUNNING Use Cases and Full Breakdown

GPT4o: 11 STUNNING Use Cases and Full Breakdown

How to scrape the web for LLM in 2024: Jina AI (Reader API), Mendable (firecrawl) and Scrapegraph-ai

How to scrape the web for LLM in 2024: Jina AI (Reader API), Mendable (firecrawl) and Scrapegraph-ai

💔 Історія захисника Маріуполя, який втратив ногу, осліп на праве око і пройшов полон. #зсу #shorts

💔 Історія захисника Маріуполя, який втратив ногу, осліп на праве око і пройшов полон. #зсу #shorts

Історія військовослужбовця з ТЦК на Миколаївщині #shortsvideo

Історія військовослужбовця з ТЦК на Миколаївщині #shortsvideo

НОВЫЙ AMONG US в РЕАЛЬНОЙ ЖИЗНИ! Utopia Show VS Масленников

НОВЫЙ AMONG US в РЕАЛЬНОЙ ЖИЗНИ! Utopia Show VS Масленников

Они убрались очень быстро!

Они убрались очень быстро!

Building Corrective RAG from scratch with open-source, local LLMs

Building Corrective RAG from scratch with open-source, local LLMs

Is Tree-based RAG Struggling? Not with Knowledge Graphs!

Is Tree-based RAG Struggling? Not with Knowledge Graphs!

DjangoCon US 2023: Don't Buy the "A.I." Hype

DjangoCon US 2023: Don't Buy the "A.I." Hype

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

26 Incredible Use Cases for the New GPT-4o

26 Incredible Use Cases for the New GPT-4o

Reliable, fully local RAG agents with LLaMA3

Reliable, fully local RAG agents with LLaMA3

AI Agents EXPLAINED: Unbiased Review of Langraph, AutoGen, and Crew AI Frameworks

AI Agents EXPLAINED: Unbiased Review of Langraph, AutoGen, and Crew AI Frameworks

Unlimited AI Agents running locally with Ollama & AnythingLLM

Unlimited AI Agents running locally with Ollama & AnythingLLM

How to Use Open AI's GPT-4o in FlutterFlow - Part 1

How to Use Open AI's GPT-4o in FlutterFlow - Part 1

when your serverless computing bill goes parabolic...

when your serverless computing bill goes parabolic...

🌑 Невероятный кубический дециметр на 3D принтере Creality K1 Max #3dprinting #Shorts Игорь Белецкий

🌑 Невероятный кубический дециметр на 3D принтере Creality K1 Max #3dprinting #Shorts Игорь Белецкий

КОПИМ НА АЙФОН В ТГК АРСЕНИЙ СЭДГАПП🛒

КОПИМ НА АЙФОН В ТГК АРСЕНИЙ СЭДГАПП🛒

Сонячні панелі на балконі - реально? | EcoFlow Delta 2 + Solar Panel + PowerStream

Сонячні панелі на балконі – реально? | EcoFlow Delta 2 + Solar Panel + PowerStream

Мобильные Ryzen 7000 | Как выбрать ноутбук и разгадать ребус AMD

Мобильные Ryzen 7000 | Как выбрать ноутбук и разгадать ребус AMD

Почему Играть на зарядке в Samsung, можно? #Shorts

Почему Играть на зарядке в Samsung, можно? #Shorts

Почему в вашем адресе электронной почты есть символ «@» 🤔

Почему в вашем адресе электронной почты есть символ «@» 🤔

ЭТО Главный провал Apple перевод @mkbhd Смотри до КОНЦА

ЭТО Главный провал Apple перевод @mkbhd Смотри до КОНЦА