Scaling LLM Inference: AWS Inferentia Meets Ray Serve on EKS | Ray Summit 2024
Вставка
- Опубліковано 8 лют 2025
- The race for efficient, scalable AI inference is on, and AWS is at the forefront with innovative solutions. This session showcases how to achieve high-performance, cost-effective inference for large language models like Llama2 and Mistral-7B using Ray Serve and AWS Inferentia on Amazon EKS.
Vara Bonthu and Ratnopam Chakrabarti will guide you through the intricacies of building a scalable inference infrastructure that bypasses GPU availability constraints. They'll demonstrate how the synergy between Ray Serve, AWS Neuron SDK, and Karpenter autoscaler on Amazon EKS creates a powerful, flexible environment for AI workloads. Attendees will explore strategies for optimizing costs while maintaining high performance, opening new possibilities for deploying and scaling advanced language models in production environments.
--
Interested in more?
Watch the full Day 1 Keynote: • Ray Summit 2024 Keynot...
Watch the full Day 2 Keynote • Ray Summit 2024 Keynot...
Check out the Ray Summmit Breakout sessions • Ray Summit 2024 - Brea...
--
🔗 Connect with us:
Subscribe to our UA-cam channel: / @anyscale
Twitter: x.com/anyscale...
LinkedIn: / joinanyscale
Website: www.anyscale.com