RAG (evaluate intermediate steps) | LangSmith Evaluations - Part 16

How to Use LangSmith to Achieve a 30% Accuracy Improvement with No Prompt Engineering

Online Evaluation (RAG) | LangSmith Evaluations - Part 20

1 класс vs 11 класс (игрушка)

ПОЛНАЯ ИСТОРИЯ ЭКЗОРЦИЗМА [Топ Сикрет]

The Worlds Most Powerfull Batteries !

Regression Testing | LangSmith Evaluations - Part 15

LangChain

Переглядів 3 051

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 9 чер 2024
Evaluations can accelerate LLM app development, but it can be challenging to get started. We've kicked off a new video series focused on evaluations in LangSmith.
With the rapid pace of AI, developers are often faced with a paradox of choice: how to choose the right prompt, how to trade-off LLM quality vs cost? Evaluations can accelerate development with structured process for making these decisions. But, we've heard that it is challenging to get started. So, we are launching a series of short videos focused on explaining how to perform evaluations using LangSmith.
This video focuses on Regression Testing, which lets a user highlight particular examples in an eval set that show improvement or regression across a set of experiments.
Blog: blog.langchain.dev/regression...
LangSmith: smith.langchain.com/
Documentation: docs.smith.langchain.com/eval...

КОМЕНТАРІ • 3

@MattJonesYT Місяць тому ⁺³
This is extremely useful, especially for agent systems where the rules have been written to be over-fit for a particular LLM. I find crewai often has that problem, it works well for the LLM it was written for but then makes nonsense with a different LLM.
@MattJonesYT Місяць тому
An extension of this idea would be doing regressions on the prompt system as a whole in an agent system to see how well it adapts to other LLMs. Make a matrix of how its prompts work for its original LLM vs new, out-of-sample LLMs. If it immediately breaks on new LLMs then it is probably over-fit and you can have AI try to re-write those prompts to be simpler and then make a system that is more robust for different LLMs.
@UtopIA-IAparaDevs Місяць тому
Thank you

Наступне

Автоматичне відтворення

RAG (evaluate intermediate steps) | LangSmith Evaluations - Part 16

RAG (evaluate intermediate steps) | LangSmith Evaluations - Part 16

How to Use LangSmith to Achieve a 30% Accuracy Improvement with No Prompt Engineering

How to Use LangSmith to Achieve a 30% Accuracy Improvement with No Prompt Engineering

Online Evaluation (RAG) | LangSmith Evaluations - Part 20

Online Evaluation (RAG) | LangSmith Evaluations - Part 20

1 класс vs 11 класс (игрушка)

1 класс vs 11 класс (игрушка)

ПОЛНАЯ ИСТОРИЯ ЭКЗОРЦИЗМА [Топ Сикрет]

ПОЛНАЯ ИСТОРИЯ ЭКЗОРЦИЗМА [Топ Сикрет]

The Worlds Most Powerfull Batteries !

The Worlds Most Powerfull Batteries !

Я ПЕРЕЖИЛ 10 СТАДИЙ РОБОТОВ В МАЙНКРАФТ!

Я ПЕРЕЖИЛ 10 СТАДИЙ РОБОТОВ В МАЙНКРАФТ!

Why Evals Matter | LangSmith Evaluations - Part 1

Why Evals Matter | LangSmith Evaluations - Part 1

Is LangGraph the Future of AgentExecutor? Comparison Reveals All!

Is LangGraph the Future of AgentExecutor? Comparison Reveals All!

RAG Evaluation (Answer Correctness) | LangSmith Evaluations - Part 12

RAG Evaluation (Answer Correctness) | LangSmith Evaluations - Part 12

Flow Engineering with LangChain/LangGraph and CodiumAI

Flow Engineering with LangChain/LangGraph and CodiumAI

LangSmith in 10 Minutes

LangSmith in 10 Minutes

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

GraphRAG: LLM-Derived Knowledge Graphs for RAG

GraphRAG: LLM-Derived Knowledge Graphs for RAG

LangGraph 101: it's better than LangChain

LangGraph 101: it's better than LangChain

"okay, but I want Llama 3 for my specific use case" - Here's how

"okay, but I want Llama 3 for my specific use case" - Here's how

ВІТАЛІЙ ВОЛОЧАЙ В КЛУБІ ДИЛЕТАНТІВ 37

ВІТАЛІЙ ВОЛОЧАЙ В КЛУБІ ДИЛЕТАНТІВ 37

Крутой проект построен и брошен 10 лет назад!

Крутой проект построен и брошен 10 лет назад!

Закон тайги | 1 сезон | 6 серия | Эксцесс исполнителя

Закон тайги | 1 сезон | 6 серия | Эксцесс исполнителя

ПОЛНАЯ ИСТОРИЯ ЭКЗОРЦИЗМА [Топ Сикрет]

ПОЛНАЯ ИСТОРИЯ ЭКЗОРЦИЗМА [Топ Сикрет]

🍕Пиццерия FNAF в реальной жизни #shorts

🍕Пиццерия FNAF в реальной жизни #shorts

Угадай Беременную Женщину! 6 Девушек Врут, 1 Говорит Правду! (Хазяева, Кокошка)

Угадай Беременную Женщину! 6 Девушек Врут, 1 Говорит Правду! (Хазяева, Кокошка)

Sprinting with More and More Money

Sprinting with More and More Money