10 weird algorithms

Generative AI in a Nutshell - how to survive and thrive in the age of AI

How Rotary Position Embedding Supercharges Modern LLMs

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ! Новогодний выпуск!

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

СКАНДАЛЬНЫЙ бой Али, когда в ринге ему противостояли сразу ДВОЕ #shorts

How FlashAttention Accelerates Generative AI Revolution

Jia-Bin Huang

Переглядів 5 160

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 7 лют 2025
Наука та технологія

КОМЕНТАРІ • 38

@raayandhar6195 3 місяці тому ⁺⁶
Great video as always
@jbhuang0604 3 місяці тому ⁺¹
Thank you! Appreciate that!
@siddaarthprasanna 3 місяці тому ⁺¹
agree
@jbhuang0604 3 місяці тому ⁺¹
@@siddaarthprasanna Thank you so much!
@JanHolly-m5s 2 місяці тому ⁺³
Thank you for this video! Amazing explanation!
@jbhuang0604 2 місяці тому
You’re welcome! Happy that you liked it.
@aritraroy3220 Місяць тому ⁺¹
Wow! What a beautiful explanation!!!!
@jbhuang0604 Місяць тому
Thank you so much!
@JanHolly-m5s 2 місяці тому ⁺¹
Thank you for this video! Clearly explained!
@jbhuang0604 2 місяці тому
Thank you!!
@LarryLaiTW 3 місяці тому ⁺¹
Clearly explained! Highly recommended.
@jbhuang0604 2 місяці тому
Thank you, Larry!
@davidlearnforus Місяць тому ⁺¹
It's a very good explanation and video in general as well!
@jbhuang0604 Місяць тому ⁺²
Glad it was helpful!
@VladimerKhasia 27 днів тому
@@jbhuang0604 Love all of the content on this channel! Thank you so much for doing this. I am spreading info about this great channel everywhere :))
@柳沢-u3g 3 місяці тому ⁺¹
Great explanation and animation!
@jbhuang0604 3 місяці тому
Glad you liked it!
@jiaruixu4873 Місяць тому ⁺¹
Really amazing video! May I ask what tools you use to create this video?
@jbhuang0604 Місяць тому ⁺¹
Thanks! The animation comes from PowerPoints. I edit the video with Adobe premiere pro.
@sourabhverma9034 3 місяці тому ⁺¹
Awesome!
@jbhuang0604 3 місяці тому
Thanks!
@Тима-щ2ю 2 місяці тому ⁺¹
Combining this video with Umar Jamil implementation is useful
@jbhuang0604 2 місяці тому
That’s COOL!
@present-bk2dh 3 місяці тому
Thanks for making this! The notation is a bit confusion @8:36 if S = Q.K^T, and S={x_1, x_2....x_n}, then x_1, x_2.... should be column vectors, but here:
m_0 = -inf
...
m_i = max(m_i, x_i)
they are handled as values, perhaps there's a missing outer loop? q_j where j goes through 1..N (if square matrices).
but then S would be S={x_{1,1}, x_{1,2},...x_{1,N}
x_{2,1}, x_{2,2},...x_{2,N}
....
x_{N,1}, x_{N,2},...x_{N,N}}
essentially I'm confused about whether O_N is a vector or a value.
Thanks again for this content, I really enjoyed it!
@jbhuang0604 3 місяці тому ⁺¹
Thanks for the question. Yes, in general S is a N x N matrix, where N is the number of tokens.
When explaining the online softmax, we only look at the attention coming from one query vector and all key vectors. So the S = q * K^\T. Here the query vector is of size 1 x d_k and the key matrix is of the side d_k x N. Therefore, the "matrix" S is just a 1 x N vector.
The O_N is a vector of size 1 by d, where d (it's a weighted average of value vectors, where the weights are from the attention).
We only need to look at one query vector to understand the key idea of online softmax and FlashAttention. We can process multiple query vectors and key/value vectors at the same time in parallel (depending on the size of the on-chip SRAM).
@present-bk2dh 3 місяці тому ⁺¹
@@jbhuang0604O_N is of shape 1 by d_v, thank you so much for this answer and making this video! You really made it click!
@jbhuang0604 3 місяці тому ⁺¹
Thanks a lot!
@t.w.7065 2 місяці тому ⁺²
@11:18 not HMB but HBM
@jbhuang0604 2 місяці тому ⁺³
Good catch! Clearly I was just trying to make sure if people are paying attention. :-p
@zhi-shengchen7950 3 місяці тому ⁺¹
Very very clear explanation! Thank you Professor, learned a lot! ps: May I ask what software you use to make the animation?
@jbhuang0604 3 місяці тому ⁺¹
Thank you! It’s mostly morph transition from MS PowerPoint.
@GapLoser42 2 місяці тому ⁺¹
This video is so cool!!! May I ask how do you make this fantastic slides? Do you use Google docs, Beamer or sth else?
@jbhuang0604 2 місяці тому ⁺¹
Thanks! I used PowerPoint. The animation comes from the morph transition.
@GapLoser42 2 місяці тому ⁺¹
@@jbhuang0604 Thx a lot!
@Leo-cc9pi 3 місяці тому ⁺¹
Thank you for your video. I have a simple question.
The paper explained outer and inner loops, so is the order right at around 10:40?
@jbhuang0604 3 місяці тому ⁺²
Yes, good catch! In FlashAttention-1, the KV is in the inner loop and the Q is in the outer loop (to prevent repeatly writing the partial output to the HBM). In FlashAttention-2, they change it back to using KV as the outer loop and Q as the inner loop due to parallelization (as the partial outputs are always in the on-chip SRAM).
I intentionally use the loop order from FlashAttention-2 to better illustrate the accumulation of the partial results to obtain the full output. I think it’s easier to understand for the core concept.
@duzx4541 Місяць тому ⁺¹
Uncle Roger???
haha sorry, good explanation, cheers
@jbhuang0604 Місяць тому
Thank you! Cheers!

Наступне

Автоматичне відтворення

10 weird algorithms

10 weird algorithms

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

How Rotary Position Embedding Supercharges Modern LLMs

How Rotary Position Embedding Supercharges Modern LLMs

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ! Новогодний выпуск!

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ! Новогодний выпуск!

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

СКАНДАЛЬНЫЙ бой Али, когда в ринге ему противостояли сразу ДВОЕ #shorts

СКАНДАЛЬНЫЙ бой Али, когда в ринге ему противостояли сразу ДВОЕ #shorts

СКОЛЬКО ИХ...?! #Shorts #Глент

СКОЛЬКО ИХ...?! #Shorts #Глент

How I Understand Flow Matching

How I Understand Flow Matching

Is the Future of Linear Algebra.. Random?

Is the Future of Linear Algebra.. Random?

The Most Important Algorithm in Machine Learning

The Most Important Algorithm in Machine Learning

LLM Lecture: A Deep Dive into Transformers, Prompts, and Human Feedback

LLM Lecture: A Deep Dive into Transformers, Prompts, and Human Feedback

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

How I Understand Diffusion Models

How I Understand Diffusion Models

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

The Dark Matter of AI [Mechanistic Interpretability]

The Dark Matter of AI [Mechanistic Interpretability]

iOS 18.2 обновление! Что нового iOS 18.2? Полный обзор iOS 18.2, ИИ iOS 18.2, скорость, батарея

iOS 18.2 обновление! Что нового iOS 18.2? Полный обзор iOS 18.2, ИИ iOS 18.2, скорость, батарея

The damaged battery headisrepaired. #Battery #Repair

The damaged battery headisrepaired. #Battery #Repair

Заставить дрон сбить дрон 🤖💥🚁🕹️ #факты

Заставить дрон сбить дрон 🤖💥🚁🕹️ #факты

На ПК за 400.000 РУБЛЕЙ падает FPS в игре!

На ПК за 400.000 РУБЛЕЙ падает FPS в игре!

Все об iOS 18.2 Финал! Теперь с ChatGPT, Genmoji, генерацией картинок и многое другое!

Все об iOS 18.2 Финал! Теперь с ChatGPT, Genmoji, генерацией картинок и многое другое!

Apsara 5B apsara 4H domes mate pancil vs 220v electricity #experiment #electrical #science #shotrs

Apsara 5B apsara 4H domes mate pancil vs 220v electricity #experiment #electrical #science #shotrs

НЕДЕЛЯ с Samsung Galaxy S24 FE - зачем КОРЕЙЦЫ так ОШИБАЮТСЯ? | ЧЕСТНЫЙ ОТЗЫВ

НЕДЕЛЯ с Samsung Galaxy S24 FE — зачем КОРЕЙЦЫ так ОШИБАЮТСЯ? | ЧЕСТНЫЙ ОТЗЫВ

"ЛАЗЕРНЫЕ" LED и другие светодиодные "новинки"!

"ЛАЗЕРНЫЕ" LED и другие светодиодные "новинки"!