Scaling LLM Inference: AWS Inferentia Meets Ray Serve on EKS | Ray Summit 2024

Поділитися
Вставка
  • Опубліковано 8 лют 2025
  • The race for efficient, scalable AI inference is on, and AWS is at the forefront with innovative solutions. This session showcases how to achieve high-performance, cost-effective inference for large language models like Llama2 and Mistral-7B using Ray Serve and AWS Inferentia on Amazon EKS.
    Vara Bonthu and Ratnopam Chakrabarti will guide you through the intricacies of building a scalable inference infrastructure that bypasses GPU availability constraints. They'll demonstrate how the synergy between Ray Serve, AWS Neuron SDK, and Karpenter autoscaler on Amazon EKS creates a powerful, flexible environment for AI workloads. Attendees will explore strategies for optimizing costs while maintaining high performance, opening new possibilities for deploying and scaling advanced language models in production environments.
    --
    Interested in more?
    Watch the full Day 1 Keynote: • Ray Summit 2024 Keynot...
    Watch the full Day 2 Keynote • Ray Summit 2024 Keynot...
    Check out the Ray Summmit Breakout sessions • Ray Summit 2024 - Brea...
    --
    🔗 Connect with us:
    Subscribe to our UA-cam channel: / @anyscale
    Twitter: x.com/anyscale...
    LinkedIn: / joinanyscale
    Website: www.anyscale.com

КОМЕНТАРІ •