Mixture of Experts: Mixtral 8x7B

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

🔴ЗСУ ЖОРСТОКО ПОМСТИЛИСЬ! СПЕЦНАЗ РФ - РОЗНЕСЛИ В ХЛАМ! КОРЕЙЦІВ ВЖЕ ПАКУЮТЬ У ЧОРНІ ПАКЕТИ!

How Much Tape To Stop A Lamborghini?

Прозвища народов #сша #россия #украина

LLM inference optimization: Architecture, KV cache and Flash attention

YanAITalk

Переглядів 2 963

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 15 лис 2024

КОМЕНТАРІ • 5

@cliffordino Місяць тому ⁺¹
Nicely done and very helpful! Thank you!! FYI, the stress is on the first syllable of "INference", not the second ("inFERence").
@yanaitalk Місяць тому
Copy that! Thank you😊
@johndong4754 Місяць тому
Ive been learning about LLMs over the past few months, but i havent gone into too much depth. Your videos seem very detailed and technical. Which one(s) would you recommend starting off with?
@yanaitalk Місяць тому
There are excellent courses from DeepLearning.ai on Coursera. To go even deeper, I recommend to directly read the technical papers which gives you more depth of understanding.
@HeywardLiu Місяць тому
1. Roofline model
2. Transformer arch. > bottleneck of attention > flash attention
3. LLM Inference can be divided into: prefilling-stage (compute-bound) and decoding-stage (memory-bound)
4. LLM serving: paged attention, radix attention
If you want to optimize the inference performance, this review paper is awesome: LLM Inference Unveiled: Survey and Roofline Model Insights

Наступне

Автоматичне відтворення

Mixture of Experts: Mixtral 8x7B

Mixture of Experts: Mixtral 8x7B

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

🔴ЗСУ ЖОРСТОКО ПОМСТИЛИСЬ! СПЕЦНАЗ РФ - РОЗНЕСЛИ В ХЛАМ! КОРЕЙЦІВ ВЖЕ ПАКУЮТЬ У ЧОРНІ ПАКЕТИ!

🔴ЗСУ ЖОРСТОКО ПОМСТИЛИСЬ! СПЕЦНАЗ РФ – РОЗНЕСЛИ В ХЛАМ! КОРЕЙЦІВ ВЖЕ ПАКУЮТЬ У ЧОРНІ ПАКЕТИ!

How Much Tape To Stop A Lamborghini?

How Much Tape To Stop A Lamborghini?

Прозвища народов #сша #россия #украина

Прозвища народов #сша #россия #украина

ЭТО самый бесполезный овощ? #картошка #картофель #овощи #питание #здоровье #психосоматика

ЭТО самый бесполезный овощ? #картошка #картофель #овощи #питание #здоровье #психосоматика

Fast LLM Serving with vLLM and PagedAttention

Fast LLM Serving with vLLM and PagedAttention

DSCoV: Introduction to Efficient LLMs: Training and Deployment without Massive GPU Resources

DSCoV: Introduction to Efficient LLMs: Training and Deployment without Massive GPU Resources

Query, Key and Value Matrix for Attention Mechanisms in Large Language Models

Query, Key and Value Matrix for Attention Mechanisms in Large Language Models

Deep Dive: Optimizing LLM inference

Deep Dive: Optimizing LLM inference

vLLM Office Hours - Advanced Techniques for Maximizing vLLM Performance - September 19, 2024

vLLM Office Hours - Advanced Techniques for Maximizing vLLM Performance - September 19, 2024

How AI 'Understands' Images (CLIP) - Computerphile

How AI 'Understands' Images (CLIP) - Computerphile

How a Transformer works at inference vs training time

How a Transformer works at inference vs training time

Deep dive - Better Attention layers for Transformer models

Deep dive - Better Attention layers for Transformer models

Understanding LLM Inference | NVIDIA Experts Deconstruct How AI Works

Understanding LLM Inference | NVIDIA Experts Deconstruct How AI Works

БОЙ: Майк Тайсон - Джейк Пол | БОКС

БОЙ: Майк Тайсон - Джейк Пол | БОКС

БОЕВИК! СПЕЦНАЗОВЕЦ ДОЛЖЕН ВЫВЕСТИ ДЕВОЧКУ ИЗ ЭПИЦЕНТРА ВОЕННЫХ ДЕЙСТВИЙ! Уцелевший! Русский фильм

БОЕВИК! СПЕЦНАЗОВЕЦ ДОЛЖЕН ВЫВЕСТИ ДЕВОЧКУ ИЗ ЭПИЦЕНТРА ВОЕННЫХ ДЕЙСТВИЙ! Уцелевший! Русский фильм

When Cucumbers Meet PVC Pipe The Results Are Wild! 🤭

When Cucumbers Meet PVC Pipe The Results Are Wild! 🤭

Пришла в себя в городской больнице... // Было дело. Советский след

Пришла в себя в городской больнице... // Было дело. Советский след

家里的东西越扔越少了...#電車 #車文化 #跑車

家里的东西越扔越少了...#電車 #車文化 #跑車

"Він залишив свій слід в Україні та світі": у Вінниці попрощалися з В'ячеславом Узелковим

"Він залишив свій слід в Україні та світі": у Вінниці попрощалися з В'ячеславом Узелковим

БАСКЕТБОЛИСТЫ ИГРАЮТ В НАСТОЛЬНЫЙ ТЕННИС #иванабрамов #дедищев #баскетбол #пингпонг #shorts

БАСКЕТБОЛИСТЫ ИГРАЮТ В НАСТОЛЬНЫЙ ТЕННИС #иванабрамов #дедищев #баскетбол #пингпонг #shorts

Когда муж не доверяет жене @Oscar_elteacher

Когда муж не доверяет жене @Oscar_elteacher