Evaluating LLMs and RAG Pipelines at Scale
Вставка
- Опубліковано 15 тра 2024
- Speakers: Eric O. Korman, Cofounder / Chief Science Officer, Striveworks
Large Language Models (LLMs) and their applications, such as Retrieval-Augmented Generation (RAG) pipelines, present unique evaluation challenges due to the often unstructured nature of their outputs. These challenges are compounded by the variety of moving parts and parameters involved, such as the choice of underlying LLM, prompt templates, document chunking strategies, and embedding models.
With the proliferation of available LLMs (both open and closed source), ML teams would like processes to enable answering the question: what is the best LLM model and parameters for my specific task and dataset?
In this talk, we will introduce Valor, our new open-source evaluation service. We will demonstrate how Valor facilitates rigorous, real-world testing of these systems in production settings and how it can be integrated into existing LLMOps tech stacks.