*DeepMind x UCL | Deep Learning Lectures | 7/12 | Deep Learning for Natural Language Processing* *My takeaways:* *1. Plan for this lecture **0:23* *2. Background: Deep learning and language **3:03* 2.1 Language applications use deep learning in very different extent 4:12 2.2 Why is deep learning such an effective tool for language processing 7:08 2.3 Understand languages: this is import for building language models 7:50 *3. The Transformer **22:14* 3.1 Distributed representation of words 23:40 3.2 Self-attention over word input embeddings 32:13 3.3 Multi-head self-attention 38:55 3.4 Feedforward layer 41:57 3.5 A complete Transformer block 42:23 3.6 Skip connections 42:38 3.7 Position encoding of words 46:02 3.8 Summary 50:58 *4. Unsupervised and transfer learning with BERT **54:45* 4.1 Problems in language 55:39 4.2 BERT 59:42 -Unsupervised learning --Masked language model pertaining 1:02:05 --Next sentence prediction pertaining 1:05:55 -BERT fine-tuning 1:09:55 -BERT supercharges transfer learning 1:12:05 *7. Extract language-related knowledge from the environment **1:13:55* -Grounded language learning at DeepMind: towards language understanding in a situated agent *8. To conclude **1:27:18*
Is the picture at 37:12 correct? Because, if we take a small amout of the value of each of the other words, plus the value of the word "beetle" to the next layer, then for me the v term from the word "the" should be connected to lambda1 and not the v term for the word "beetle". The same logic should be applied to the other words and their lambdas.
Thank you for the amazing lecture. Why are there only feedforward, but not feedback mechanisms in language models? Would that make a difference? We process language both bottom up and top down. Our expectation of the world, our beliefs of people's intentions can influence how we process a sequence of sound, just like how topdown processes make us hallucinate certain aspects of vision. The skip level connections allow lower down information to feedback up, but does not allow higher level representations to influence representation lower down, at least not at inference time. Would it be possible to have such a structure in Transformers? Would it help?
*DeepMind x UCL | Deep Learning Lectures | 7/12 | Deep Learning for Natural Language Processing*
*My takeaways:*
*1. Plan for this lecture **0:23*
*2. Background: Deep learning and language **3:03*
2.1 Language applications use deep learning in very different extent 4:12
2.2 Why is deep learning such an effective tool for language processing 7:08
2.3 Understand languages: this is import for building language models 7:50
*3. The Transformer **22:14*
3.1 Distributed representation of words 23:40
3.2 Self-attention over word input embeddings 32:13
3.3 Multi-head self-attention 38:55
3.4 Feedforward layer 41:57
3.5 A complete Transformer block 42:23
3.6 Skip connections 42:38
3.7 Position encoding of words 46:02
3.8 Summary 50:58
*4. Unsupervised and transfer learning with BERT **54:45*
4.1 Problems in language 55:39
4.2 BERT 59:42
-Unsupervised learning
--Masked language model pertaining 1:02:05
--Next sentence prediction pertaining 1:05:55
-BERT fine-tuning 1:09:55
-BERT supercharges transfer learning 1:12:05
*7. Extract language-related knowledge from the environment **1:13:55*
-Grounded language learning at DeepMind: towards language understanding in a situated agent
*8. To conclude **1:27:18*
This is hands down, The best explanation of Transformers!
Best explanation? Unfortunately, it was difficult for me to follow ...
Thank you very much for taking the time to prepare this incredible lecture series! #respectfrombrazil 🇧🇷
One of the best lectures in the series.
It's really informative, thank you. There is only one noticeable failure - it is not a fruit fly on the picture :)
Thanks Felix! You're a great teacher. That's it.
Thank you so much for the very informative lecture!
Looks like Linus Sebastian is taking the lecture :D
Great lecture and big thanks to DeepMind for sharing this great content.
Thank you! This is a great series of lectures!
Is the picture at 37:12 correct? Because, if we take a small amout of the value of each of the other words, plus the value of the word "beetle" to the next layer, then for me the v term from the word "the" should be connected to lambda1 and not the v term for the word "beetle". The same logic should be applied to the other words and their lambdas.
I agree, there seems to be an issue with arrows in that figure. As the lambdas sum to 1, if the figure was right then v' would be equal to v_beetle.
Impressive effort has been done in preparation regarding lecture. Thanks for sharing the knowledge and research.
Amazing explanation of the Transformer, thanks so much
I got Covid from 15:28 lol
Great lectures btw, huge thanks to DeepMind and UCL!
Thank you for the amazing lecture. Why are there only feedforward, but not feedback mechanisms in language models? Would that make a difference? We process language both bottom up and top down. Our expectation of the world, our beliefs of people's intentions can influence how we process a sequence of sound, just like how topdown processes make us hallucinate certain aspects of vision. The skip level connections allow lower down information to feedback up, but does not allow higher level representations to influence representation lower down, at least not at inference time. Would it be possible to have such a structure in Transformers? Would it help?
Superb Lecture...Thank you
Thanks for sharing knowledge!!
Amazing lecture!
Not easy to follow the exact steps with the visualization and explanation provided. I think more detail would be helpful.
The explanations by the lecturer are great but the slides do not reflect this. They are too poor.
I'm completely lost. Is this a graduate level course?
1:27:57 "We've reached the end of the lecture, because I urgently need to go now…"
Thanks!
Thank you for sharing the research.
can anybody post the paper at the end where it says McClelland et 2019
Excellent,.
Thank you for for this amazing tutorial. Well organised!!
Head of search
He who is first shall be last, or just seen of as a twat 😁🤦🏻♂️🤣👍
FIRST