Transformer Architecture

Self-Attention Using Scaled Dot-Product Approach

Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention

Kluster Duo #настольныеигры #boardgames #игры #games #настолки #настольные_игры

Затулин: Цели СВО ПРОВАЛЕНЫ. Украина под руководством ЗЕЛЕНСКОГО останется существовать!

Собираю Маню к осени ✨

A Dive Into Multihead Attention, Self-Attention and Cross-Attention

Machine Learning Studio

Переглядів 28 719

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 19 жов 2024

КОМЕНТАРІ • 33

@qwertypwnown Рік тому ⁺⁵
You make great videos with great explanations! I can listen to you all day
@isaiasprestes9759 Рік тому ⁺⁵
This paper is amazing! Thank you very much for making this video. Quite good!
@chrisogonas 10 місяців тому ⁺¹
Very well illustrated! Thanks
@PyMLstudio 9 місяців тому
Glad you liked it!
@gusbakker Рік тому ⁺³
Thanks! Really nice explanation
@davefaulkner6302 10 місяців тому ⁺³
Regarding Multi-headed attention: it wasn't until you listed the dimensions of the output heads that it became clear that you are splitting the input by the embedding dimension, d, across the different heads. This should have been made more explicit in your explanation. Regardless, I was looking for the answer to this question of how the input was split across the heads so thank you for this detailed explanation of how the multi-headed mechanism works.
@jacobyoung2045 4 місяці тому
Thanks, your comment made it clearer for me.
@xxxiu13 Рік тому ⁺¹
This is a great explanation.
@MrDelord39 Рік тому ⁺³
Thank you for this video :)
@maryammohseni4507 6 місяців тому ⁺²
great video!tnx
@anastasia_wang17 2 місяці тому ⁺¹
Hi, thanks for the amazing video. One question. I get d, q,k,v but didn't get the denotation W.
@Lesoleil370 2 місяці тому ⁺¹
Thanks, so w is a learnable matrix to get q, k, and v.
So to get q, we use q=w_q x, and similarly , for k and v:
q=W_q x
k=W_k x
v=W_v x
@anastasia_wang17 2 місяці тому
@@Lesoleil370 thank you!
@gigglygeekgal 7 місяців тому
Great Explanation :)
@milanvasilic4510 3 місяці тому
Greate explanation, thx :). So the headnumber just tells me how many weight matrices i have for K, Q and V?
@me-ou8rf 4 місяці тому
Can you suggest some materials that deal with how transformer can be applied to time series database like EEG ?
@mehdimohsenimahani4150 2 місяці тому ⁺¹
great great 👍💯
@PyMLstudio 2 місяці тому
Thanks 🙏🏻
@paktv858 4 місяці тому
what is the difference between self attention and multi head self attention? is both are same just instead of single attention multi head attention use multi heads?
@just4visit 10 місяців тому ⁺³
where do the W come from?
@PyMLstudio 10 місяців тому
Good question! These W matrices are learnable parameters of the model. They could be initialized randomly if we build a transformer from scratch, and gradually updated and learned through back-propagation.
@temanangka3820 3 місяці тому
6:00 Does each attention head only process part of embeded token?
Example: Say, there is 100 token and 2 attention heads.
Does each head only process 50 tokens.
??
If yes, then how can we make sure each head can understand whole context of sentence, while it only consumes half of sentence?
@PyMLstudio 3 місяці тому ⁺¹
That’s a great question
The multihead attention splits the feature dimension not the sequence dimension. So that way, each head is able to see entire sequence, but working on a smaller feature-size.
Example : input is 100 token and each embedding vector is 256 dimensional . Then with 8 heads , each head will process tensors of size 100x16
@temanangka3820 3 місяці тому
@@PyMLstudio understood... Great explanation.. Thank you, Bro...
@Kevoshea 5 днів тому
100x16 or 100x32?
@NaveenKumar-vn7vx 11 місяців тому
thank you
@quentinparrenin2484 Рік тому
great video ! Really enjoyed the explaination in a simple way specially for the cross attention. Any plan to explain some other concept aside some python ? Would love it
@PyMLstudio Рік тому
Thanks for the comment. Yes, my plan is to first cover transformers , then I’ll also cover more general machine learning and deep learning concepts as well, and some of them with Python implementation
@harshaldharpure9921 7 місяців тому
How to do the cross attention mechanism if we have a three feature x,y, rag with size of
x.shape torch.Size([8, 768])
y.shape torch.Size([8, 512])
rag.feature torch.Size([8, 768])
@PyMLstudio 7 місяців тому
So, let's assume query=y, key=x, and value=rag for our explanation, but remember, you can adjust this configuration depending on your specific needs. Given these tensors, our first step is to ensure that the dimensions of the query, key, and value match for the attention mechanism to work properly. Since y has a different dimension (512) compared to x and rag (768), we need to project y to match the 768-dimension space of x and rag:
query_projector = Linear(512, 768)
query_projected = query_projector(query) ## --> 8x768
With this projection, all three tensors (query_projected, key=x, value=rag) now share the same dimensionality (8x768), making them compatible for the multi-head attention, where each head involves dot-product between query_projected and keys, followed by Softmax and multiplying by the values.
Remember that the assignment of x/y/rag to query/key/value can change depending on your use case and where these tensors come from.
I hope this answers your question.
@harshaldharpure9921 7 місяців тому
Thanks a lot sir @@PyMLstudio
@Yaz71023 7 місяців тому
When you find people who makes science easy and engoyable 🫡
@isaiasprestes9759 Рік тому ⁺¹
This paper is amazing! Thank you very much for making this video. Quite good!

Наступне

Автоматичне відтворення

Transformer Architecture

Transformer Architecture

Self-Attention Using Scaled Dot-Product Approach

Self-Attention Using Scaled Dot-Product Approach

Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention

Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention

Kluster Duo #настольныеигры #boardgames #игры #games #настолки #настольные_игры

Kluster Duo #настольныеигры #boardgames #игры #games #настолки #настольные_игры

Затулин: Цели СВО ПРОВАЛЕНЫ. Украина под руководством ЗЕЛЕНСКОГО останется существовать!

Затулин: Цели СВО ПРОВАЛЕНЫ. Украина под руководством ЗЕЛЕНСКОГО останется существовать!

Собираю Маню к осени ✨

Собираю Маню к осени ✨

Indoor plant care hacks for plant lovers 🌿 #shorts #planting #garden #diy #indoor

Indoor plant care hacks for plant lovers 🌿 #shorts #planting #garden #diy #indoor

Rasa Algorithm Whiteboard - Transformers & Attention 1: Self Attention

Rasa Algorithm Whiteboard - Transformers & Attention 1: Self Attention

Attention Mechanism - Basics, Additive Attention, Multi-head Attention

Attention Mechanism - Basics, Additive Attention, Multi-head Attention

The Most Important Algorithm in Machine Learning

The Most Important Algorithm in Machine Learning

Прикладное машинное обучение 4. Self-Attention. Transformer overview

Прикладное машинное обучение 4. Self-Attention. Transformer overview

The math behind Attention: Keys, Queries, and Values matrices

The math behind Attention: Keys, Queries, and Values matrices

Cross Attention | Method Explanation | Math Explained

Cross Attention | Method Explanation | Math Explained

ML Was Hard Until I Learned These 5 Secrets!

ML Was Hard Until I Learned These 5 Secrets!

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

The Attention Mechanism in Large Language Models

The Attention Mechanism in Large Language Models

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Киркоров, Масленников, +100500, Дава, Супер Стас, Ликс, Генсуха, Шадоукек

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Киркоров, Масленников, +100500, Дава, Супер Стас, Ликс, Генсуха, Шадоукек

«Легкий способ бросить курить»

«Легкий способ бросить курить»

Собираю Маню к осени ✨

Собираю Маню к осени ✨

НИКИТА ПОДСТАВИЛ ДЖОНИ 😡

НИКИТА ПОДСТАВИЛ ДЖОНИ 😡

❌В ЛИСТОПАДІ ВСЕ СТАНЕ ЯСНО❌🇺🇦 ІРИНА КЛЕВЕР ТА ДМИТРО КОСТИЛЬОВ

❌В ЛИСТОПАДІ ВСЕ СТАНЕ ЯСНО❌🇺🇦 ІРИНА КЛЕВЕР ТА ДМИТРО КОСТИЛЬОВ

СТРАННОЕ УВЛЕЧЕНИЕ (смешное видео, приколы, юмор, поржать)

СТРАННОЕ УВЛЕЧЕНИЕ (смешное видео, приколы, юмор, поржать)

БЕГСТВО ОТ МОСКВЫ, В СТОРОНУ ПЕКИНА. БЕСЕДА С ВИТАЛИЙ ПОРТНИКОВ @portnikov.argumenty

БЕГСТВО ОТ МОСКВЫ, В СТОРОНУ ПЕКИНА. БЕСЕДА С ВИТАЛИЙ ПОРТНИКОВ @portnikov.argumenty

Seja Gentil com os Pequenos Animais 😿

Seja Gentil com os Pequenos Animais 😿