InstructPix2Pix (w/ OpenAI's Tim Brooks)

Demystifying Large Language Models in 45 minutes

Lucas Beyer (Google DeepMind) - Convergence of Vision & Language

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

How Strong Is Tape?

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

GPT-Fast - blazingly fast inference with PyTorch (w/ Horace He)

Aleksa Gordić - The AI Epiphany

Переглядів 4 222

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 29 січ 2025

КОМЕНТАРІ • 21

@TheAIEpiphany 10 місяців тому ⁺¹
Horace He joined us to walk us through what can one do with native PyTorch when it comes to accelerating inference! Also, if you need some GPUs check out Hyperstack: console.hyperstack.cloud/?Influencers&Aleksa+Gordi%C4%87 who are sponsoring this video! :)
@xl0xl0xl0 10 місяців тому ⁺⁵
Wow, this presentation was excellent. Straight to the point. No over-complicating, no over-simplifying, no trying to sound smart by obscuring simple things. Thank you Horace!
@kaushilkundalia2197 3 місяці тому ⁺¹
It was so informative
@orrimoch5226 10 місяців тому ⁺¹
Wow! It was very educational and practical!
I liked the graphics in the presentation!
Great job by both of you!
Thanks!
@Cropinky 8 місяців тому ⁺¹
i love this guy so much its unreal
@nikossoulounias7036 10 місяців тому
Super interesting talk!! Do u guys have any idea how the compilation-generated decoding kernel compares against custom kernels like Flash-Decoding or Flash-Decoding++?
@xl0xl0xl0 10 місяців тому
One thing that was not super clear to me. Are we loading the next weight matrix (assuming there is enough SRAM), as the previous matmul+activation is being computed?
@Chhillee 10 місяців тому
Within each matmul the loading of data from main memory into registers occurs at the same time as the values being computed.
So the answer to your question is "no, but it also wouldn't help because the previous matmul/activation is already saturating the bandwidth"
@xl0xl0xl0 10 місяців тому
@@Chhillee Thank you, makes sense.
@XartakoNP 10 місяців тому
I didn't understand one of the points made. In a couple of occasions Horace mentions that we are loading all the weights (into the registers I assume) with every token - that's also what the diagram shows at ua-cam.com/video/18YupYsH5vY/v-deo.html . Is that what's happening? Can the registers load all the model weights at once? If that were the case why do you need to load them every time instead of leaving them untouched. I hope that's a not too stupid of a question.
@Chhillee 10 місяців тому
This is a good question! The big problem is that GPUs do not have enough registers (i.e. SRAM) to load all the model weights at once. A GPU has on the order of megabytes of registers/SRAM, while the weights require 10s of gigabytes to store.
Q: But what if we used hundreds of chips to have enough SRAM to store the entire model? Would generation be much faster then?
A: Yes, and that's what we have with Groq :)
@XartakoNP 10 місяців тому
@@Chhillee Thanks!! I appreciate the answer. I assume the diagram has been simplified for clarity then
@xmorse 10 місяців тому ⁺¹
Your questions about why fast-gpt is faster than the cuda version: kernel fusion, merging kernels into one is faster than multiple hand written ones
@SinanAkkoyun 10 місяців тому
How does PPL look at int4 quants? Also, given GPTQ, how high is the tps with gpt-fast?
@mufgideon 10 місяців тому ⁺¹
Is there any discord for this channel community ?
@TheAIEpiphany 10 місяців тому ⁺²
Yes sir! Pls see vid description
@tljstewart 10 місяців тому
awesome talks, can Triton target TPUs?
@kimchi_taco 10 місяців тому
speculative decoding is major thing, right? If so, not very fair comparison...
@Chhillee 10 місяців тому
None of the results are using speculative decoding except the results we specifically mentioned were using speculative decoding. I.e: we hit ~200 tok/s with int4 without spec-dec, and 225 or so with spec-dec.
@kyryloyemets7022 10 місяців тому
But ctranslate2 as i understand still faster?
@let-me-handle Місяць тому
wondering why larger batch sizes in general don't work well with torch compile? ua-cam.com/video/18YupYsH5vY/v-deo.html

Наступне

Автоматичне відтворення

InstructPix2Pix (w/ OpenAI's Tim Brooks)

InstructPix2Pix (w/ OpenAI's Tim Brooks)

Demystifying Large Language Models in 45 minutes

Demystifying Large Language Models in 45 minutes

Lucas Beyer (Google DeepMind) - Convergence of Vision & Language

Lucas Beyer (Google DeepMind) - Convergence of Vision & Language

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

How Strong Is Tape?

How Strong Is Tape?

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ! Новогодний выпуск!

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ! Новогодний выпуск!

Do this to succeed: The 2025 Sales & Marketing Playbook - Webinar by Senders

Do this to succeed: The 2025 Sales & Marketing Playbook - Webinar by Senders

What Lies Beyond Machine Learning & AI: Decision Systems & the Future of Data Teams w/ Chris Wiggins

What Lies Beyond Machine Learning & AI: Decision Systems & the Future of Data Teams w/ Chris Wiggins

LLaMA 3 Deep Dive! (Thomas Scialom - Meta)

LLaMA 3 Deep Dive! (Thomas Scialom - Meta)

Gaining Deep Insight into Application Security Through Advanced Observability

Gaining Deep Insight into Application Security Through Advanced Observability

Making AI accessible with Andrej Karpathy and Stephanie Zhan

Making AI accessible with Andrej Karpathy and Stephanie Zhan

How does Groq LPU work? (w/ Head of Silicon Igor Arsovski!)

How does Groq LPU work? (w/ Head of Silicon Igor Arsovski!)

VCMI Invited Talks #6 | January 2025 | Diogo Pernes

VCMI Invited Talks #6 | January 2025 | Diogo Pernes

Thomas Wolf (HuggingFace) - the case for open-source!

Thomas Wolf (HuggingFace) - the case for open-source!

An introduction to Raymarching

An introduction to Raymarching

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

TOY STORY IN BRAWL STARS!?

TOY STORY IN BRAWL STARS!?

Lp. Сердце Вселенной #60 РОЖДЕНИЕ ЛОЛОЛОШКИ [Финал] • Майнкрафт

Lp. Сердце Вселенной #60 РОЖДЕНИЕ ЛОЛОЛОШКИ [Финал] • Майнкрафт

To Brawl AND BEYOND!

To Brawl AND BEYOND!

Анна Трінчер - Треш (Official Music Video)

Анна Трінчер - Треш (Official Music Video)

Как найти себе жену? Больше - тут @stas.yornik.shorts

Как найти себе жену? Больше - тут @stas.yornik.shorts