Transformers - Part 2 - Self attention complete equations

Transformers - Part 3 - Encoder

Self-Attention Using Scaled Dot-Product Approach

повтори звуки животного 😱

СМ*РТЬ Путіна та ЗАКІНЧЕННЯ ВІЙНИ 🔥 ШОКУЮЧЕ пророцтво ВАНГИ

Неловко вышло😅 приколы каждый день @stas.yornik.shorts

Transformers - Part 1 - Self-attention: an introduction

Lennart Svensson

Переглядів 17 981

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 30 вер 2024

КОМЕНТАРІ • 18

@wangkuanlee3548 2 роки тому ⁺⁷
Superb explanation. This is the clearest explanation of the concept of weight in self-attention I have ever heard. Thank you so much.
@mar-a-lagofbibug8833 3 роки тому ⁺³
Thank you for sharing.
@kencheligeer3448 3 роки тому ⁺³
It is a very brilliant explanation about self-attention!!! Thank you.
@kacemichakdi3048 2 роки тому ⁺¹
Thank you for your explanation. I just didnt understand how we chose W_k and W_q???
@lennartsvensson7636 2 роки тому ⁺¹
These matrices contain learnable parameters that can be trained using standard techniques from deep learning.
@kacemichakdi3048 2 роки тому
Thank you
@mustafakocakulak5895 3 роки тому ⁺¹
Best explanation ever :) Thank you
@murkyPurple123 3 роки тому ⁺¹
Thank you
@piyushkumar-wg8cv Рік тому
Intuition buildup was amazing, you clearly explained why we need learnable parameters in the first place and how that can help relate similar words. Thanks for the explanation.
@euisasriani_01 2 роки тому
Thank you for great explanation. I still don't understand about how to gain Wq ad Wk.
@po-yupaulchen166 2 роки тому
Great and clear explanation. One question about W_Q and W_K. Since z1 = k1^T *q3 = x1^T * (W_k^T * W_Q) * x2, and W_k and W_Q are trainable matrices, could we just combine it as a matrix like
W_KQ = W_k^T * W_Q to reduce the number of paramters?
@lennartsvensson7636 2 роки тому
What you are suggesting should be possible as long as the matrices are quadratic.
@andrem82 Рік тому
Best explanation of self-attention I've seen so far. This is gold.
@exxzxxe 2 роки тому
A first class explanation of self attention- the best on UA-cam.
@jhnflory 2 роки тому
Thanks for putting these videos together!
@prasadkendre149 Рік тому
greatful forever
@ahmedb2559 Рік тому
Thank you !
@prateekpatel6082 8 місяців тому
pretty bad example . Even if we have trainiable Wq and Wk , what if there was a new sentence where we had Tom and and he , the WQ will still make word 9 point to wmma and she

Наступне

Автоматичне відтворення

Transformers - Part 2 - Self attention complete equations

Transformers - Part 2 - Self attention complete equations

Transformers - Part 3 - Encoder

Transformers - Part 3 - Encoder

Self-Attention Using Scaled Dot-Product Approach

Self-Attention Using Scaled Dot-Product Approach

повтори звуки животного 😱

повтори звуки животного 😱

СМ*РТЬ Путіна та ЗАКІНЧЕННЯ ВІЙНИ 🔥 ШОКУЮЧЕ пророцтво ВАНГИ

СМ*РТЬ Путіна та ЗАКІНЧЕННЯ ВІЙНИ 🔥 ШОКУЮЧЕ пророцтво ВАНГИ

Неловко вышло😅 приколы каждый день @stas.yornik.shorts

Неловко вышло😅 приколы каждый день @stas.yornik.shorts

«Я три доби просиділа під тими завалами. Але дива не сталося»

«Я три доби просиділа під тими завалами. Але дива не сталося»

Transformers - Part 7 - Decoder (2): masked self-attention

Transformers - Part 7 - Decoder (2): masked self-attention

Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.

Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.

Self Attention in Transformer Neural Networks (with Code!)

Self Attention in Transformer Neural Networks (with Code!)

Pytorch Transformers from Scratch (Attention is all you need)

Pytorch Transformers from Scratch (Attention is all you need)

Transformer - Part 6 - Decoder (1): testing and training

Transformer - Part 6 - Decoder (1): testing and training

Lecture 12.1 Self-attention

Lecture 12.1 Self-attention

ML Was Hard Until I Learned These 5 Secrets!

ML Was Hard Until I Learned These 5 Secrets!

Vision Transformer Basics

Vision Transformer Basics

Attention is all you need; Attentional Neural Network Models | Łukasz Kaiser | Masterclass

Attention is all you need; Attentional Neural Network Models | Łukasz Kaiser | Masterclass

Україна - Венесуела: ОГЛЯД МАТЧУ / футзал, Чемпіонат світу-2024, 1/4 фіналу

Україна – Венесуела: ОГЛЯД МАТЧУ / футзал, Чемпіонат світу-2024, 1/4 фіналу

БЕЛКА СЬЕЛА КОТЕНКА?#cat

БЕЛКА СЬЕЛА КОТЕНКА?#cat

Помоги Симбочке убежать от монстра! Подпишись на ютуб, скорее! 🙀 #симба #симбочка #симбочкапимпочка

Помоги Симбочке убежать от монстра! Подпишись на ютуб, скорее! 🙀 #симба #симбочка #симбочкапимпочка

повтори звуки животного 😱

повтори звуки животного 😱

Dad took her, blood pressure soared 180 directly.😡When she came back from the bath, she saw this s

Dad took her, blood pressure soared 180 directly.😡When she came back from the bath, she saw this s

Папа из-за ТАКОГО снова за хлебом ушёл😁А у тебя есть папа?🤔@KOTFIN

Папа из-за ТАКОГО снова за хлебом ушёл😁А у тебя есть папа?🤔@KOTFIN

Кирилл Набутов. Над трупом Маслякова надругались, Патрушева прикончили, Терешкова выжила из ума

Кирилл Набутов. Над трупом Маслякова надругались, Патрушева прикончили, Терешкова выжила из ума

Україна - Нідерланди: ПРЯМА ТРАНСЛЯЦІЯ МАТЧУ / футзал, Чемпіонат світу-2024, 1/8 фіналу

Україна – Нідерланди: ПРЯМА ТРАНСЛЯЦІЯ МАТЧУ / футзал, Чемпіонат світу-2024, 1/8 фіналу