Attention is all you need; Attentional Neural Network Models | Łukasz Kaiser | Masterclass

Pi School

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 3 жов 2017
Łukasz Kaiser - Research Scientist at Google Brain - talks about attentional neural network models and the quick developments that have been made in this recent field. In his talk, he explains how such models "look at the past and generate the next word of the output" and how to train them. In a new talk, Lukasz introduces a new efficient variant of the Transformer. Enjoy the video pischool.link/transformer-update
If you are a brilliant post-graduate Machine Learning engineer and want to practise in real-world projects, apply for the Session 12 of the Pi School of AI starting on November 21 2022.
Ten free grants are available only for the most brilliants minds pischool.link/sailk

КОМЕНТАРІ • 73

@tylersnard 3 роки тому ⁺⁵
I love how excited he is.
@autripat 3 роки тому ⁺³⁸
Starting @ 15:45, in well under 2 minutes, attention explained! Only a true master can do it. Love.
@Scranny 3 роки тому
K is a matrix representing the T previously seen words and V is the matrix representing the full dictionary of words of the target language, right? But what are K and V exactly? What values do these matrices hold? Are they learned?
@lmaes 3 роки тому ⁺⁹
The passion that he transmits is priceless
@yacinebenaffane6535 4 роки тому
Nice explain about position and multihead ...
@nsuryapa1 4 роки тому ⁺²
Nice explanation!!!!
@elliotwaite 5 років тому ⁺¹
Great talk, Łukasz.
@itshgirish 4 роки тому ⁺⁷
Great presentation, he's having fun explaining the bits....great camera work- it was fun watching a moving cam than a boring still view.
@mrvishwjeetkumar 5 років тому ⁺²
very nice lecture ...enjoyed it lot.
@igorcherepanov4765 5 років тому ⁺⁸⁶
"there is this guy, he never got his bachelor but he wrote most of these papers" - appreciation
@threeMetreJim 4 роки тому ⁺⁶
Where experience and 'thinking outside the box' can beat education in some cases. He should be getting an 'honorary' bachelor degree, if he hasn't already.
@MucciciBandz 4 роки тому ⁺¹⁴
Excuse me? that's fake news! Even his linked in profile says Duke 1998 (yes it's the same noam shazeer from this exact same paper)... "Noam Shazeer is an Engineer at Google. He graduated from Duke in 1998 with a double major in Mathematics and Computer Science"
@MrLacker 3 роки тому ⁺¹⁴
I think he meant that Noam doesn't have a PhD. Noam does have a bachelors degree, but he started working at Google pretty soon after graduating (literally decades ago) and has contributed to many important Google technologies in his time there. Noam was a Google old-timer back when I started working there in 2005.
@Marcos10PT 3 роки тому ⁺¹⁹
This is the best explanation of attention I have seen so far! And I have been looking :)
@ksrajavel Рік тому ⁺¹
Bcoz, he is one of the co-author of the revolutionary paper which introduced it
@kingleasen85 Рік тому ⁺¹
@@ksrajavel revolutionary indeed. As per Google Scholar, the paper has over 72000 citations in just 5 years. One of the most cited papers in the history of academia.
@mosicr 5 років тому ⁺⁴³
Great lecture. Best explanation of attention in just a few words.
@gilgarad1 5 років тому ⁺¹
Nice lecture. I enjoyed it
@jayantpriyadarshi9266 3 роки тому ⁺¹
Great talk. Something very useful.
@HimanshuGhadigaonkar 3 роки тому ⁺¹
Best expaination!!
@ahmedb2559 Рік тому
Thank you !
@kadamparikh8421 3 роки тому ⁺²
Great content in this video. Would love if you had the multi-headed devil covered! Though, great video to get the overall view..
@intelligenttrends8935 4 роки тому ⁺¹
Here I get it.
Thank u
@rinkagamine9201 5 років тому ⁺¹
Can I somehow get the machine produced texts which where shown at the beginning of the presentation?
@pankajtiwari12 3 роки тому
great explanation !
@pankajtiwari12 3 роки тому
@27:14 multitasking
@KartoffelnSalatMitAlles 5 років тому ⁺¹
What model is that at the beginning? Can I somehow get the machine produced texts which where shown at the beginning of the presentation?
"
@CharlesVanNoland 9 місяців тому
I just wish he hadn't stood right in front of what he was trying to show people, but I love his passion for explaining what he's talking about.
@FranckDernoncourt 3 роки тому ⁺⁵
Thanks for sharing! It'd be great if the video could pay more attention to the slides though.
@pischool6210 3 роки тому ⁺⁸
Thank you for your comment, Franck! You can download the slides here: picampus-school.com/open-day-2017-presentations-download/
@FranckDernoncourt 3 роки тому
@@pischool6210 perfect, thanks!
@louerleseigneur4532 2 роки тому
Thanks buddy
@rishabhshirke1175 4 роки тому ⁺²
nothing beats GPT 2 TL;DR summarization trick
@homeroni 4 роки тому
Are the talks he is referring to (as the previous talks) available on UA-cam?
@pischool6210 4 роки тому ⁺²
Hello! Sure. You can find all the Masterclasses from our Open Day here 👉ua-cam.com/play/PLU3hjga27ZUiuL8V0CVlidBK27CDxWf-F.html
@vast634 3 роки тому ⁺¹⁰
They should invent a device that can always tell the time of day when the user wants.
@brandomiranda6703 3 роки тому ⁺²
where is the library he talks about to get the details of training the DL "right"?
@kvsnoufal 3 роки тому
31:55
@brandomiranda6703 3 роки тому
@@kvsnoufal is there one for pytorch?
@TheGodSaw 6 років тому ⁺¹³
Is there a way to get the slides?
@pischool6210 6 років тому ⁺¹⁰
You can download them here: picampus-school.com/open-day-2017-presentations-download/
@khanzorbo 5 років тому ⁺¹
Pi School I have just checked and it seems the slides linked to the presentation is "tensorflow workshop", can you please double-check?
@pischool6210 5 років тому ⁺⁴
Dear Vladimir, have a look here: drive.google.com/file/d/0B8BcJC1Y8XqobGNBYVpteDdFOWc/view
@threeMetreJim 4 роки тому ⁺¹
"He didn't put a trophy into the suitcase because it was too small." is an ambiguous statement. "it" could refer to either the trophy or the suitcase. It seems like the answer is mainly decided on probability from past experience, rather than the intended (ambiguous) meaning, similar to a survey or experiment with too small a sample size. It is also possible that he didn't want to put a too small a trophy into the suitcase in case it ended up being jostled about too much, and became damaged; although that is a less likely, but still a possible explanation and would need a thought process to come to that conclusion, or some further context, to clarify the intended meaning. People on the Autistic spectrum (HFA / Asperger's) have that same problem when phrasing thoughts (ambiguous meaning), and are often misunderstood because of it. When a statement has two (or more) possible meanings, then it's probably unfair to judge the performance of a system in 'getting the answer right' as there isn't a definite correct answer to begin with, just a more likely one.
A word for word translation, with grammatical correction applied would probably achieve a better result in a case like this. Google translate seems to somewhat agree.
Original: He didn't put a trophy into the suitcase because it was too small
Google translate: Er hat keine Trophäe in den Koffer gesteckt, weil er zu klein war.
Back to english: He did not put a trophy in his suitcase because he was too small.
Word for word translation (incorrect, but probably still understandable if you speak German): er nicht stellen ein Trophäe in das koffer da es was auch klein.
Google translate of word to word to english (much better but still wrong - where did the 'also' come from?):he does not put a trophy in the suitcase as it is also small.
@nabinchaudhary73 2 роки тому ⁺¹
does embedding gets trained or key or query or value gets trained i am confused. please help
@RobertElliotPahel-Short 3 роки тому ⁺¹
math majors/ graduate math students skip to 15:36
@someone_518 Рік тому ⁺¹
ChatGPT gave me link to this video)
@ramyaneekashyap4356 4 роки тому
Is there any way i could get the ppts for reference?
@pischool6210 4 роки тому ⁺²
Hi, sure! You can download it here: picampus-school.com/open-day-2017-presentations-download/
@ramyaneekashyap4356 4 роки тому
@@pischool6210 thankyou so much!!!!
@sajjadayobi688 3 роки тому ⁺¹
Transformers learned translation without language dependency O_o
@josy26 4 роки тому
Slides?
@SubhamKumar-eg1pw 4 роки тому ⁺²
drive.google.com/file/d/0B8BcJC1Y8XqobGNBYVpteDdFOWc/view
@alexandrogomez5493 10 місяців тому
Tarea 6
@TheAIEpiphany 3 роки тому ⁺²
47:55 "We tried it on images it didn't work so well". 2020, Visual Transformer: am I a joke to you?
@souhamghosh8714 3 роки тому
In VIT, it is clearly stated that a "small dataset" like imagenet doesnt show promising results but a larger dataset like the jft gives amazing results, so this maybe a start, but it is far from perfection. Btw, I am not contradicting your statement. 😁. and also JFT is not an open source dataset(yet)
@TheAIEpiphany 3 роки тому
@@souhamghosh8714 True Google folks ^^
@souhamghosh8714 3 роки тому
“Hi, I am from google, you know what i got, TPUs..more than you can imagine”😂
@IExSet Рік тому ⁺¹
Strange thing, he mention "attention" term before explaining what it is. What was EXACT meaning of this Query Key Value magic ??? I suspect speakers just copy thoughts of another people mechanically, not understaning real meaning of operations !
@kingenking9303 2 роки тому
the video image is too poor, you need to fix it more
@ShadowD2C 5 днів тому
good video but his and the camera placements are subobtimal
@uhmerikuhn 2 роки тому ⁺⁵
...comes from Google - Check. ...TensorFlow T-shirt - Check. Most viewers therefore rate this lecture highly - Check.
This is very hand-wavy throughout with relatively no rigor shown. There are many lectures/presentations online which actually explain the nuts and bolts and wider use cases of Attention mechanisms. Maybe the title of this video should be something else, like "Our group's success with one use case (language translation) of Attention." Frankly, the drive-by treatment of the technical details of language translation case was almost terrible and should have probably been omitted.
@georgemaratos1122 2 роки тому ⁺¹
which lectures do you like that explain attention mechanisms and their wider use?
@clray123 3 роки тому ⁺³
Most I gather from this talk is that "attention" is a pretty terrible term. Something like "fuzzy lookup" or "matching" or "mapping" would have been much more descriptive, but oh well, which researcher needs to think about terminology before unleashing it on the world.
@aojing 5 років тому ⁺¹¹
can't believe this guy was one of the authors of Transformer. He just can not explain what he was doing!
@mauricet910 5 років тому
I thought it was a really insightful talk. I'm preparing a talk about Transformer myself, and this talk was super inspiring :)
@haiyangsun8344 5 років тому ⁺⁷
I also couldn't understand.. The architecture diagram is not very intuitive, and I was expecting some elaborations.. However, the explanation was not clear...
@NicholasAmpazis 4 роки тому ⁺³
If you don’t already know something about attention then it’s impossible to follow the presentation. Everything is explained very poorly...
@clray123 3 роки тому ⁺¹
His communication skill are like a runner who is tripping over his shoelaces.
Unfortunately, it seems to be quite a common ailment of even "brilliant" coders (or shall I say, scientists) that they can't explain their ideas to others clearly using natural language. It's like they have no model of someone else's knowledge and take so many things for granted that their attempts at "explanation" just sound like gobbledygook to those who expect to be taught something. That's why we have technical writers, teachers, popular science books etc.
@clray123 Рік тому
@Yancy Stevens Yes, to communicate you have to model in your head whoever you are communicating to, what they know, don't know, and foremost what they want to know. Otherwise it's just a fail, no matter how much knowledge you have.

Наступне

Автоматичне відтворення

The math behind Attention: Keys, Queries, and Values matrices