Introducing Open Platform for Enterprise AI - Ramakrishna Karamsetty & Arun Gupta, Intel Corporation

Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84

Snowflake + Predibase: Smaller, faster & cheaper LLMs that beat GPT-4

СИНИЙ ИЛИ ЗЕЛЕНЫЙ, КТО ПОБЕДИТ?! #Shorts #Глент

Прохожу маску ЭМОЦИИ🙀 #юмор

Даша змусила гостей їсти з підлоги - Супермама 8 сезон - Випуск 5

LoRAX: Serve 1000s of Fine-Tuned LLMs on a Single GPU - Travis Addair, Predibase, Inc.

The Linux Foundation

Переглядів 272

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 12 вер 2024
LoRAX: Serve 1000s of Fine-Tuned LLMs on a Single GPU - Travis Addair, Predibase, Inc.
LoRAX (LoRA eXchange), is a new LLM inference system that allows users to pack 1000s of fine-tuned “LoRA” adapters into a single GPU, dramatically reducing the cost of serving compared against dedicated deployments per fine-tuned model. LoRAX is open-source, free to use commercially, and production-ready, with pre-built docker images and Helm charts available for immediate download and use. In this talk, we'll introduce LoRAX and explore the key ideas that make it the most cost effective and efficient way to serve fine-tuned LLMs in production, including: - Dynamic Adapter Loading: allowing each set of fine-tuned LoRA weights to be loaded from storage just-in-time as requests come in at runtime, without blocking concurrent requests. - Heterogeneous Continuous Batching: an extension to continuous batching that packs requests for different adapters together into the same batch, keeping latency and throughput nearly constant with the number of concurrent adapters. - Adapter Exchange Scheduling: a fair scheduling policy that asynchronously prefetches and offloads adapters between GPU and CPU memory, and schedules request batching to optimize the aggregate throughput of the system.

КОМЕНТАРІ •

Наступне

Автоматичне відтворення

Introducing Open Platform for Enterprise AI - Ramakrishna Karamsetty & Arun Gupta, Intel Corporation

Introducing Open Platform for Enterprise AI - Ramakrishna Karamsetty & Arun Gupta, Intel Corporation

Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84

Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84

Snowflake + Predibase: Smaller, faster & cheaper LLMs that beat GPT-4

Snowflake + Predibase: Smaller, faster & cheaper LLMs that beat GPT-4

СИНИЙ ИЛИ ЗЕЛЕНЫЙ, КТО ПОБЕДИТ?! #Shorts #Глент

СИНИЙ ИЛИ ЗЕЛЕНЫЙ, КТО ПОБЕДИТ?! #Shorts #Глент

Прохожу маску ЭМОЦИИ🙀 #юмор

Прохожу маску ЭМОЦИИ🙀 #юмор

Даша змусила гостей їсти з підлоги - Супермама 8 сезон - Випуск 5

Даша змусила гостей їсти з підлоги – Супермама 8 сезон – Випуск 5

"Ми в тюрмі побували. Що нас може лякати?": як служать колишні вʼязні / hromadske

"Ми в тюрмі побували. Що нас може лякати?": як служать колишні вʼязні / hromadske

5 Reasons Why Adapters are the Future of Fine-tuning LLMs

5 Reasons Why Adapters are the Future of Fine-tuning LLMs

CNS 2023 - Simplifying Event-Driven Workflows with Argo Events & Argo Workflows / Lingxian Kong

CNS 2023 - Simplifying Event-Driven Workflows with Argo Events & Argo Workflows / Lingxian Kong

Serve 100s of Fine-tuned LLMs for the Cost of Serving One with LoRAX

Serve 100s of Fine-tuned LLMs for the Cost of Serving One with LoRAX

Event Driven Architecture EXPLAINED in 15 Minutes

Event Driven Architecture EXPLAINED in 15 Minutes

The Best Programmer I Know • Daniel Terhorst-North • GOTO 2024

The Best Programmer I Know • Daniel Terhorst-North • GOTO 2024

Virtual Workshop: Fine-tune Your Own LLMs that Rival GPT-4

Virtual Workshop: Fine-tune Your Own LLMs that Rival GPT-4

The Tragedy of systemd

The Tragedy of systemd

Beyond the Hype: A Realistic Look at Large Language Models • Jodie Burchell • GOTO 2024

Beyond the Hype: A Realistic Look at Large Language Models • Jodie Burchell • GOTO 2024

40 Years Of Software Engineering Experience In 19 Minutes

40 Years Of Software Engineering Experience In 19 Minutes

Russian soldier catches Ukraine FPV drone with his bare hands and runs with it

Russian soldier catches Ukraine FPV drone with his bare hands and runs with it

A$AP Rocky - Tailor Swif (Official Video)

A$AP Rocky - Tailor Swif (Official Video)

Новый уровень твоей сосиски

Новый уровень твоей сосиски

Самое неинтересное видео

Самое неинтересное видео

БЕЛКА РОЖАЕТ?#cat

БЕЛКА РОЖАЕТ?#cat

Throwing Swords From My Blue Cybertruck

Throwing Swords From My Blue Cybertruck

«А ми під Україну підемо?»: жителька Курщини #україна #війна #зсу #курск

«А ми під Україну підемо?»: жителька Курщини #україна #війна #зсу #курск

МастерШеф 14 сезон. Випуск 1 від 24.08.2024 | ПРЕМ’ЄРА

МастерШеф 14 сезон. Випуск 1 від 24.08.2024 | ПРЕМ’ЄРА