Do NOT Learn Kubernetes Without Knowing These Concepts...

Golang to Kubernetes without Docker

Bay.Area.AI: vLLM Project Update, Zhuohan Li, Woosuk Kwon

Sad To Announce I Did Not Qualify For Mens 2024 Olympic Gymnastics Team

Перші думки батьків, коли дізнались про важке поранення сина #війна #україна #зсу #люди #shorts

Get 10 Mega Boxes OR 60 Starr Drops!!

vLLM on Kubernetes in Production

Kubesimplify

Переглядів 2 429

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 20 сер 2024
vLLM is a fast and easy-to-use library for LLM inference and serving. In this video, we go through the basics of vLLM, how to run it locally, and then how to run it on Kubernetes in production with GPU-attached nodes via a DaemonSet. It includes a hands-on demo explaining vLLM deployment in production.
Blog post: opensauced.piz...
John McBride(‪@JohnCodes‬)
►►►Connect with me ►►►
► Kubesimplify: kubesimplify.c...
► Newsletter: saiyampathak.c...
► Discord: saiyampathak.c...
► Twitch: saiyampathak.c...
► UA-cam: saiyampathak.c...
► GitHub: github.com/sai...
► LinkedIn: / saiyampathak
► Website: / saiyampathak
► Instagram: / saiyampathak
► / saiyampathak

КОМЕНТАРІ • 10

@JohnCodes 3 місяці тому ⁺³
Thanks for having me on Saiyam!! It was alot of fun to show you how we use vLLM at OpenSauced!! Happy to answer any questions here people might have!
@aireddy Місяць тому ⁺¹
This is absolutely wonderful session to understand how can we deploy LLMs in production on Kubernetes cluster!!
@kubesimplify Місяць тому
@@aireddy glad it was helpful!
@DaewonSuh 26 днів тому
Thanks for the wonderful Demo!
I was wondering why you deploy vllm pod through demonsets rather than deployments.
With daemonset, you can only deploy one pod in one node and a pod occupying a single gpu.
Considering that nodes are usually attached with multiple gpus, I am afraid that using daemonset might make a lot of gpus idle.
@umeshjaiswal5298 3 місяці тому
Thanks for this tutorial Saiyam.
@kubesimplify 2 місяці тому
Glad its useful, you building something with LLM?
@shivangsharma1 Місяць тому ⁺¹
Loved it...❤
@kubesimplify 29 днів тому
Glad you found it useful!
@divyamchandel8734 Місяць тому
Hi John / Saiyam. In the last part you mentioned "In lot of cases could be cheaper"
What are those cases where locally hosting it is cheaper vs when using openai is cheaper:
Is it just dependent on the load which we will have (RPD and max RPM)?
@matrix9083 Місяць тому
openai is $.50 per million tokens for gpt 3.5 for example. If you rent a gpu server for that same amount, you can generate tens or hundred of millions of tokens in one hour depending on which text generation model you choose. something like mistral 7b, phi 3 series, llama 3 8b, gemma 2b,etc all deliver about the same results if not better than gpt 3.5 and also all fit on a gpu server that costs 44 cents per hour on runpod. (the A5000 gpu server for example.)

Наступне

Автоматичне відтворення

Do NOT Learn Kubernetes Without Knowing These Concepts...

Do NOT Learn Kubernetes Without Knowing These Concepts...

Golang to Kubernetes without Docker

Golang to Kubernetes without Docker

Bay.Area.AI: vLLM Project Update, Zhuohan Li, Woosuk Kwon

Bay.Area.AI: vLLM Project Update, Zhuohan Li, Woosuk Kwon

Sad To Announce I Did Not Qualify For Mens 2024 Olympic Gymnastics Team

Sad To Announce I Did Not Qualify For Mens 2024 Olympic Gymnastics Team

Перші думки батьків, коли дізнались про важке поранення сина #війна #україна #зсу #люди #shorts

Перші думки батьків, коли дізнались про важке поранення сина #війна #україна #зсу #люди #shorts

Get 10 Mega Boxes OR 60 Starr Drops!!

Get 10 Mega Boxes OR 60 Starr Drops!!

Fast LLM Serving with vLLM and PagedAttention

Fast LLM Serving with vLLM and PagedAttention

NGINX Tutorial - What is Nginx

NGINX Tutorial - What is Nginx

How This New Battery is Changing the Game

How This New Battery is Changing the Game

Understanding LLM Inference | NVIDIA Experts Deconstruct How AI Works

Understanding LLM Inference | NVIDIA Experts Deconstruct How AI Works

The New Option and Result Types of C#

The New Option and Result Types of C#

Kubernetes Network Policy Deep Dive

Kubernetes Network Policy Deep Dive

Virtual Machine (VM) vs Docker

Virtual Machine (VM) vs Docker

Kubernetes Deployment vs. StatefulSet vs. DaemonSet

Kubernetes Deployment vs. StatefulSet vs. DaemonSet

Best operating system for Servers in 2024

Best operating system for Servers in 2024

Бабцю КИНУЛИ росіяни! Місцеві не знають де її армія РФ. "А кто нас защищает?"

Бабцю КИНУЛИ росіяни! Місцеві не знають де її армія РФ. "А кто нас защищает?"

Справжнє кохання | GOVOR TikTok #govor #shots

Справжнє кохання | GOVOR TikTok #govor #shots

Running With Bigger And Bigger Feastables

Running With Bigger And Bigger Feastables

👀Коли трохи не вийшло з Києвом за 3 дні #війна #курськ #суджа #росія #зсу

👀Коли трохи не вийшло з Києвом за 3 дні #війна #курськ #суджа #росія #зсу

Get 10 Mega Boxes OR 60 Starr Drops!!

Get 10 Mega Boxes OR 60 Starr Drops!!

SCHOOLBOY RUNAWAY В РЕАЛЬНОЙ ЖИЗНИ 📚🔔 #schoolboy #runaway #schoolboyrunaway #shorts YOUNG

SCHOOLBOY RUNAWAY В РЕАЛЬНОЙ ЖИЗНИ 📚🔔 #schoolboy #runaway #schoolboyrunaway #shorts YOUNG

💣Все! Під КУРСЬК зайшли БІЛОРУСИ на танках. У Київ везуть ПОСЛАННЯ ПУТІНА. ТАКОГО ТОЧНО ще не було!

💣Все! Під КУРСЬК зайшли БІЛОРУСИ на танках. У Київ везуть ПОСЛАННЯ ПУТІНА. ТАКОГО ТОЧНО ще не було!

Обязательно запомни эту хитрость, что можно сделать из клеевых стержней!#shorts

Обязательно запомни эту хитрость, что можно сделать из клеевых стержней!#shorts