- 14
- 103 159
Lennart Svensson
Приєднався 29 чер 2012
Guest lecture by Wayve,2023
The video shows the guest lecture by Oleg Sinavski from Wayve. The lecture was a part of my course "SSY340 -- Deep Machine Learning", which is given at Chalmers University of Technology. More details are provided below.
0:00 Self-driving vehicles and Software 2.0
4:14 Wayve's approach to self-driving vehicle control
07:45 Limitations of AV2.0
10:01 GAIA-1
16:35 LINGO
28:00 Questions to Oleg Sinavski
Title: Language and Video-Generative AI in Autonomous Driving
Abstract:
Recent advances in Large Language Models (LLMs) demonstrate the possibility of achieving human-level capabilities in generating explanations and reasoning using end-to-end text models. Meanwhile, explainability and reasoning remain significant challenges in the field of autonomous driving. Wayve's unique approach to end-to-end self-driving positions the company to make use of these recent developments. In this talk, we showcase Wayve’s latest advancements in language and video generative AI, present some techniques and discuss their current and future applications.
Bio:
Oleg Sinavski is currently a Principal Applied Scientist at Wayve, London. His research interests focus on applying advances in large language models to the fields of self-driving, reinforcement learning, planning, and simulation. Previously, Oleg worked at Brain Corp in San Diego, CA, as the VP of R&D, where he led research efforts in scalable robotic navigation. Earlier in his career, he specialized in neuromorphic computation and hardware and holds a Ph.D. in computational neuroscience.
0:00 Self-driving vehicles and Software 2.0
4:14 Wayve's approach to self-driving vehicle control
07:45 Limitations of AV2.0
10:01 GAIA-1
16:35 LINGO
28:00 Questions to Oleg Sinavski
Title: Language and Video-Generative AI in Autonomous Driving
Abstract:
Recent advances in Large Language Models (LLMs) demonstrate the possibility of achieving human-level capabilities in generating explanations and reasoning using end-to-end text models. Meanwhile, explainability and reasoning remain significant challenges in the field of autonomous driving. Wayve's unique approach to end-to-end self-driving positions the company to make use of these recent developments. In this talk, we showcase Wayve’s latest advancements in language and video generative AI, present some techniques and discuss their current and future applications.
Bio:
Oleg Sinavski is currently a Principal Applied Scientist at Wayve, London. His research interests focus on applying advances in large language models to the fields of self-driving, reinforcement learning, planning, and simulation. Previously, Oleg worked at Brain Corp in San Diego, CA, as the VP of R&D, where he led research efforts in scalable robotic navigation. Earlier in his career, he specialized in neuromorphic computation and hardware and holds a Ph.D. in computational neuroscience.
Переглядів: 850
Відео
Deep Generative Models, Stable Diffusion, and the Revolution in Visual Synthesis
Переглядів 2,5 тис.2 роки тому
We had the pleasure of having Professor Björn Ommer as a guest lecturer in my course SSY340, Deep machine learning at Chalmers University of Technology. Chapters: 0:00 Introduction 8:10 Overview of generative models 15:00 Diffusion models 19:37 Stable diffusion 26:10 Retrieval-Augmented Diffusion Models Abstract: Recently, deep generative modeling has become the most prominent paradigm for lear...
BERT: transfer learning for NLP
Переглядів 10 тис.3 роки тому
In this video we present BERT, which is a transformer-based language model. BERT is pre-trained in a self-supervised manner on a large corpus. After that, we can use transfer learning and fine-tune the model for new tasks and obtain good performance even with a limited annotated dataset for the specific task that we would like solve (e.g., a text classification task). The original paper: arxiv....
Transformers - Part 7 - Decoder (2): masked self-attention
Переглядів 20 тис.4 роки тому
This is the second video on the decoder layer of the transformer. Here we describe the masked self-attention layer in detail. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ua-cam.com/play/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje.html Slides are available here: chalmersuniversity.box.com...
Transformer - Part 6 - Decoder (1): testing and training
Переглядів 9 тис.4 роки тому
This is the first out of three videos about the transformer decoder. In this video, we focus on describing how the decoder is used during testing and training since this is helpful in order to understand how the decoder is constructed The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ...
Transformer - Part 8 - Decoder (3): Encoder-decoder self-attention
Переглядів 9 тис.4 роки тому
This is the third video about the transformer decoder and the final video introducing the transformer architecture. Here we mainly learn about the encoder-decoder multi-head self-attention layer, used to incorporate information from the encoder into the decoder. It should be noted that this layer is also commonly known as the cross-attention layer. The video is part of a series of videos on the...
Transformers - Part 5 - Transformers vs CNNs and RNNS
Переглядів 4,1 тис.4 роки тому
In this video, we highlight some of the differences between the transformer encoder and CNNs and RNNs. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ua-cam.com/play/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje.html Slides are available here: chalmersuniversity.box.com/s/c2a64rz0hlp44pdouq9m...
Transformers - Part 4 - Encoder remarks
Переглядів 4,3 тис.4 роки тому
In this video we highlight a few properties of the transformer encoder. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ua-cam.com/play/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje.html Slides are available here: chalmersuniversity.box.com/s/c2a64rz0hlp44pdouq9mc24msbz60xf2
Transformers - Part 2 - Self attention complete equations
Переглядів 9 тис.4 роки тому
In this video, we present the complete equations for self-attention. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ua-cam.com/play/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje.html Slides are available here: chalmersuniversity.box.com/s/c2a64rz0hlp44pdouq9mc24msbz60xf2
Transformers - Part 3 - Encoder
Переглядів 10 тис.4 роки тому
In this video, we present the encoder layer in the transformer. Important components of this presentation is that we introduce multi-head attention, positional encodings and the architecture of the encoder blocks that appear inside the encoder. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivat...
Transformers - Part 1 - Self-attention: an introduction
Переглядів 18 тис.4 роки тому
In this video, we briefly introduce transformers and provide an introduction to the intuition behind self-attention. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ua-cam.com/play/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje.html Slides are available here: chalmersuniversity.box.com/s/c2a64r...
Flipping a Masters course
Переглядів 9387 років тому
In this video we describe how we flipped a Masters course (SSY320) at Chalmers University of Technology, during the fall 2014. We will also go though what the teacher and the students thought about the experience.
An introduction to flipped classroom teaching
Переглядів 1,4 тис.7 років тому
In this video we explain the concept of flipped classroom teaching and discuss some of the arguments regarding why it is good.
Generalized optimal sub-pattern assignment metric (GOSPA)
Переглядів 4 тис.7 років тому
This video presents the GOSPA paper: A. S. Rahmathullah, Á . F. García-Fernández, and L. Svensson, “Generalized optimal sub-pattern assignment metric,” in 2017 20th International Conference on Information Fusion, Xi’an, P.R. China, Jul. 2017. arxiv.org/abs/1601.05585 The paper received the best paper award at the Fusion conference in 2017, and Matlab code to compute the GOSPA metric is availabl...
very few people know these concepts well enough to give detailed explanation with formulae. thanks a ton. I was having a lot of queries and this video helped resolve those
A very clear and amazingly detailed explanation of such a complex topic. It would be nice to have more videos related to ML from you!
This is great
does 5:45 - 8:15 refer to the old RNN training method ? and hence next video is the real transformer decoder
thank you for your work, these are incredible videos. but there is one thing I didn't understand. during the training phase, the entire sentence already correctly translated is given as input to the decoder and to prevent the transformer from "cheating" masked self attention is used. How many times does this step happen? because if it only happened once then the hidden words would not be usable during training. During the training phase, after each step does backpropagation occur and then does the mask move, hiding fewer words?
pretty bad example . Even if we have trainiable Wq and Wk , what if there was a new sentence where we had Tom and and he , the WQ will still make word 9 point to wmma and she
Hi, great video! I have just a question about this. When we compute Z = KT * Q, where KT is the transpose of K, we are doing Z = (W_K * X)T * (W_Q * X) = XT * W_KT * W_Q * X. Now, calling M = W_KT * W_Q, we have Z = XT * M * X. So why we are decomposing M into W_KT * W_Q? At the end we use only the product of W_K and W_Q, so why do we learn both separately and not just learn M directly? Thank you
Need an example with BATCH being fed into that. What would be the rows in batches? What would Y looks like? Only then it is possible to really see how masks works.
The first sentence of the video solved my problem 😅 "what enables us to parallelize calculation during training"
These videos are wonderful, thank you for putting in the work. Everything was communicated so clearly and thoroughly. My interpretation of the attention mechanism is that the result of the similarity (weight) matrix multiplied by the value matrix gives us an offset vector, which we then add to the value and normalize to get a contextualized vector. It's interesting in the decoder, we derive this offset from a value vector in the source language, add it to the target words and it is still somehow meaningful. I presume that it is the final linear layer which ensures that this resulting normalized output vector maps coherently to a discrete word in the target language. If we can do this across languages, I wonder if this can be done across modalities.
Thanks. That sounds like an accurate description of cross-attention (what I refer to as encoder-decoder attention). It can certainly be used across modalities and there are many papers describing just that. The common combination is probably images and language but images and point clouds, video and audio, and many other combinations can be found in the literature.
Great job Dr Lennart . Everyone should learn from you
The linear production equations have operands in the wrong order.
Excellent, thank you!
Best video on masking!
Thank you for the nice explanation. I think you missed to mention that in order to get zeros masking with the softmax you need to set the values (upper triangle of the matrix) to negative infinity.
Excellent video! Thank you!
Hi Lennart, where from have you retrieved all the details that you are presenting here? I mean, have you maybe studied/analyzed a source code of an existing implementation of transformer-based models? I haven’t found anywhere else this detailed explanation. Bravo! And thank you.
What is the difference between attention and self-attention?
Intuition buildup was amazing, you clearly explained why we need learnable parameters in the first place and how that can help relate similar words. Thanks for the explanation.
Just binged the entire playlist, helped me understand the intuitions behind the math. I hope you make more videos :)
Excellent video, as all of yours are. Thank you, I have learned a lot. One thing I'm not clear on is why we need the free parameters W. And how might those be trained? Thank you again.
Would you be interested in working together, in say, creating a "better" model than tranformer models? I believe we can integrate reasoning , that would be similar to how humans reason in the higher layers of the tranformer
Hi Professor, are there lectures of courses, or weblinks to your what you teach? Love your clear, precise and well paced coverage of the concepts here! many thanks.
Thanks for your kind words. I have an online course in multi-object tracking (on UA-cam and edX) but it is model-based instead of learning-based. Hopefully, I will soon find time to post more ML material.
Beautifully explained, thank you. Transformers are so simplistic yet powerful.
this is so clear explanations, thanks so much
Thank you professor for this amazing series on the transformer!
Best UA-cam video explaining Transformer ever!
Thank you !
Thank you !
Dear Lennart, that was awesome, could you please make a tutorial in python as well? :)
best explanation! Thank you Mr. Svensson.
The best transformer video!
Best explanation of self-attention I've seen so far. This is gold.
best explanation
greatful forever
At 7:14 ,I was the notations would be sm(Z_11, Z_12) and sm(Z_21, Z_22) for the second column...... This that correct?
Very helpful. Thank you!
After the training of the model, when we are giving an unknown source sentence to the model how does it predict or decode the words?
One of the earlier videos focuses on this process. Have you watched the entire series?
"Revolution in visual synthesis" is an excellent label of this epoch. Good video and helps people understand how inverting diffusion helps us arrive towards the concepts of "stable" diffusion.
Thanks a lot Lennart. What a crisp and clear explanation of BERT.
Lennart, you are the UA-cam Wizard of Transformers!
Well done!
A first class explanation of self attention- the best on UA-cam.
I really love this series, Many thanks to you sir. But may I ask if I want additional sources to study the transformer from, what do you recommend?
Thanks a lot, the only complete course about transformers that I found. One question, Why K = [q1 q2 ... q_(nE)] and not K=[ k1 .... ] (or its typo?)
That's definitely a typo! Thanks for pointing it out. I might actually adjust the videos later this fall to correct typos like this.
Very easy to follow
「動画の音が良くない」、
This is the best explanation on Transformer I have found in the web. Can you doing another set of video for T5 ?
Undoubtedly, these 8 videos best explain transformers. I tried other videos and tutorials, but you are the best.
Awesome!