Evaluation Primitives | LangSmith Evaluations - Part 2

Tool Calling with LangChain

Is LangGraph the Future of AgentExecutor? Comparison Reveals All!

Super gymnastics 😍🫣

Я ПЕРЕЖИЛ 10 СТАДИЙ РОБОТОВ В МАЙНКРАФТ!

Доктор Комаровский. Конец войны, Путин и холодильник, ужас Харькова, замер писюнов, лохи ли мы

Why Evals Matter | LangSmith Evaluations - Part 1

LangChain

Переглядів 6 311

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 7 кві 2024
With the rapid pace of AI, developers are often faced with a paradox of choice: how to choose the right prompt, how to trade-off LLM quality vs cost? Evaluations can accelerate development with structured process for making these decisions. But, we've heard that it is challenging to get started. So, we are launching a series of short videos focused on explaining how to perform evaluations using LangSmith.
This video lays out 4 main considerations for evaluation: (1) dataset, (2) evaluator, (3) task, (4) how to apply evaluation to improve your product (e.g., unit tests, A/B tests, etc).
Getting started documentation:
docs.smith.langchain.com/eval...

КОМЕНТАРІ • 4

@chaitanyagoel9837 8 днів тому
🎯 Key points for quick navigation:
00:00 *🎥 Introduction to Evaluations*
- Introduction to the importance of evaluations for new models.
- Overview of public evaluations and the components involved.
00:54 *🧪 Evaluation Methods*
- Explanation of human evaluations and their structure.
- Comparative evaluation methods like Chatbot Arena.
- Different metrics used to interpret results, such as ELO scores.
02:44 *🔍 Personalized Testing*
- Discussion on the trend of personalized testing and evaluations.
- Methods to build and curate datasets for evaluations.
- Examples of user interactions and synthetic data generation.
04:05 *🤖 Evaluation Judges*
- Various types of judges for evaluations including humans and LLMs.
- Modes of evaluation, both reference-free and ground-truth based.
- Application of evaluations in different contexts like unit tests and AB testing.
05:28 *🔧 Implementing Evaluations with LangSmith*
- Introduction to LangSmith platform for running evaluations.
- Overview of LangSmith features: dataset creation, evaluator definition, trace inspections.
- Future videos will explore detailed steps to build evaluations using LangSmith.
Made with HARPA AI
@andrianantenainaprincyraso7162 Місяць тому
cool !
@aaronbiliyok4553 Місяць тому
Hey Lance Good job... can you please share you slides?
@kareammohamad 2 місяці тому
Fine

Наступне

Автоматичне відтворення

Evaluation Primitives | LangSmith Evaluations - Part 2

Evaluation Primitives | LangSmith Evaluations - Part 2

Tool Calling with LangChain

Tool Calling with LangChain

Is LangGraph the Future of AgentExecutor? Comparison Reveals All!

Is LangGraph the Future of AgentExecutor? Comparison Reveals All!

Super gymnastics 😍🫣

Super gymnastics 😍🫣

Я ПЕРЕЖИЛ 10 СТАДИЙ РОБОТОВ В МАЙНКРАФТ!

Я ПЕРЕЖИЛ 10 СТАДИЙ РОБОТОВ В МАЙНКРАФТ!

Доктор Комаровский. Конец войны, Путин и холодильник, ужас Харькова, замер писюнов, лохи ли мы

Доктор Комаровский. Конец войны, Путин и холодильник, ужас Харькова, замер писюнов, лохи ли мы

Hamster Kombat Update: How Morse Code Can Help You Earn More 🐹⚡️Hamster Academy

Hamster Kombat Update: How Morse Code Can Help You Earn More 🐹⚡️Hamster Academy

Build a RAG app in minutes using Langflow OpenAI and Azure | StudioFP101

Build a RAG app in minutes using Langflow OpenAI and Azure | StudioFP101

金曜コーヒー: Positive Mood with Relaxing Jazz Instrumental & Soft Bossa Nova Piano Music - 作業用カフェBGM #79

金曜コーヒー: Positive Mood with Relaxing Jazz Instrumental & Soft Bossa Nova Piano Music - 作業用カフェBGM #79

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Devon: Opensource AI Software Engineer - Pair Programmer Creates Software!

Devon: Opensource AI Software Engineer - Pair Programmer Creates Software!

RAG in 2024: Advancing to Agents

RAG in 2024: Advancing to Agents

Self-reflective RAG with LangGraph: Self-RAG and CRAG

Self-reflective RAG with LangGraph: Self-RAG and CRAG

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

ВИРУСНЫЕ ВИДЕО / Личные границы 😅

ВИРУСНЫЕ ВИДЕО / Личные границы 😅

100😭🎉 #thankyou

100😭🎉 #thankyou

Історія військовослужбовця з ТЦК на Миколаївщині #shortsvideo

Історія військовослужбовця з ТЦК на Миколаївщині #shortsvideo

⚡️ТОЙ САМИЙ ЕФІР! Портников розніс Латиніну! Ви маєте ЦЕ ПОБАЧИТИ!

⚡️ТОЙ САМИЙ ЕФІР! Портников розніс Латиніну! Ви маєте ЦЕ ПОБАЧИТИ!

The Worlds Most Powerfull Batteries !

The Worlds Most Powerfull Batteries !

Закон тайги | 1 сезон | 5 серия | Черная роза - эмблема печали

Закон тайги | 1 сезон | 5 серия | Черная роза — эмблема печали

Китайка и Пчелка 4 серия😂😆

Китайка и Пчелка 4 серия😂😆

You can now keep your hands clean, and your toilet cleaner...🚽 #toilet #cooltech #future

You can now keep your hands clean, and your toilet cleaner...🚽 #toilet #cooltech #future