Developing and Serving RAG-Based LLM Applications in Production
Вставка
- Опубліковано 29 лис 2024
- There are a lot of different moving pieces when it comes to developing and serving LLM applications. This talk will provide a comprehensive guide for developing retrieval augmented generation (RAG) based LLM applications - with a focus on scale (embed, index, serve, etc.), evaluation (component-wise and overall) and production workflows. We’ll also explore more advanced topics such as hybrid routing to close the gap between OSS and closed LLMs.
Takeaways:
• Evaluating RAG-based LLM applications are crucial for identifying and productionizing the best configuration.
• Developing your LLM application with scalable workloads involves minimal changes to existing code.
• Mixture of Experts (MoE) routing allows you to close the gap between OSS and closed LLMs.
Find the slide deck here: drive.google.c...
About Anyscale
---
Anyscale is the AI Application Platform for developing, running, and scaling AI.
www.anyscale.com/
If you're interested in a managed Ray service, check out:
www.anyscale.c...
About Ray
---
Ray is the most popular open source framework for scaling and productionizing AI workloads. From Generative AI and LLMs to computer vision, Ray powers the world’s most ambitious AI workloads.
docs.ray.io/en...
#llm #machinelearning #ray #deeplearning #distributedsystems #python #genai