Lennart Svensson
Lennart Svensson
  • 14
  • 103 159
Guest lecture by Wayve,2023
The video shows the guest lecture by Oleg Sinavski from Wayve. The lecture was a part of my course "SSY340 -- Deep Machine Learning", which is given at Chalmers University of Technology. More details are provided below.
0:00 Self-driving vehicles and Software 2.0
4:14 Wayve's approach to self-driving vehicle control
07:45 Limitations of AV2.0
10:01 GAIA-1
16:35 LINGO
28:00 Questions to Oleg Sinavski
Title: Language and Video-Generative AI in Autonomous Driving
Abstract:
Recent advances in Large Language Models (LLMs) demonstrate the possibility of achieving human-level capabilities in generating explanations and reasoning using end-to-end text models. Meanwhile, explainability and reasoning remain significant challenges in the field of autonomous driving. Wayve's unique approach to end-to-end self-driving positions the company to make use of these recent developments. In this talk, we showcase Wayve’s latest advancements in language and video generative AI, present some techniques and discuss their current and future applications.
Bio:
Oleg Sinavski is currently a Principal Applied Scientist at Wayve, London. His research interests focus on applying advances in large language models to the fields of self-driving, reinforcement learning, planning, and simulation. Previously, Oleg worked at Brain Corp in San Diego, CA, as the VP of R&D, where he led research efforts in scalable robotic navigation. Earlier in his career, he specialized in neuromorphic computation and hardware and holds a Ph.D. in computational neuroscience.
Переглядів: 850

Відео

Deep Generative Models, Stable Diffusion, and the Revolution in Visual Synthesis
Переглядів 2,5 тис.2 роки тому
We had the pleasure of having Professor Björn Ommer as a guest lecturer in my course SSY340, Deep machine learning at Chalmers University of Technology. Chapters: 0:00 Introduction 8:10 Overview of generative models 15:00 Diffusion models 19:37 Stable diffusion 26:10 Retrieval-Augmented Diffusion Models Abstract: Recently, deep generative modeling has become the most prominent paradigm for lear...
BERT: transfer learning for NLP
Переглядів 10 тис.3 роки тому
In this video we present BERT, which is a transformer-based language model. BERT is pre-trained in a self-supervised manner on a large corpus. After that, we can use transfer learning and fine-tune the model for new tasks and obtain good performance even with a limited annotated dataset for the specific task that we would like solve (e.g., a text classification task). The original paper: arxiv....
Transformers - Part 7 - Decoder (2): masked self-attention
Переглядів 20 тис.4 роки тому
This is the second video on the decoder layer of the transformer. Here we describe the masked self-attention layer in detail. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ua-cam.com/play/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje.html Slides are available here: chalmersuniversity.box.com...
Transformer - Part 6 - Decoder (1): testing and training
Переглядів 9 тис.4 роки тому
This is the first out of three videos about the transformer decoder. In this video, we focus on describing how the decoder is used during testing and training since this is helpful in order to understand how the decoder is constructed The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ...
Transformer - Part 8 - Decoder (3): Encoder-decoder self-attention
Переглядів 9 тис.4 роки тому
This is the third video about the transformer decoder and the final video introducing the transformer architecture. Here we mainly learn about the encoder-decoder multi-head self-attention layer, used to incorporate information from the encoder into the decoder. It should be noted that this layer is also commonly known as the cross-attention layer. The video is part of a series of videos on the...
Transformers - Part 5 - Transformers vs CNNs and RNNS
Переглядів 4,1 тис.4 роки тому
In this video, we highlight some of the differences between the transformer encoder and CNNs and RNNs. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ua-cam.com/play/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje.html Slides are available here: chalmersuniversity.box.com/s/c2a64rz0hlp44pdouq9m...
Transformers - Part 4 - Encoder remarks
Переглядів 4,3 тис.4 роки тому
In this video we highlight a few properties of the transformer encoder. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ua-cam.com/play/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje.html Slides are available here: chalmersuniversity.box.com/s/c2a64rz0hlp44pdouq9mc24msbz60xf2
Transformers - Part 2 - Self attention complete equations
Переглядів 9 тис.4 роки тому
In this video, we present the complete equations for self-attention. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ua-cam.com/play/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje.html Slides are available here: chalmersuniversity.box.com/s/c2a64rz0hlp44pdouq9mc24msbz60xf2
Transformers - Part 3 - Encoder
Переглядів 10 тис.4 роки тому
In this video, we present the encoder layer in the transformer. Important components of this presentation is that we introduce multi-head attention, positional encodings and the architecture of the encoder blocks that appear inside the encoder. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivat...
Transformers - Part 1 - Self-attention: an introduction
Переглядів 18 тис.4 роки тому
In this video, we briefly introduce transformers and provide an introduction to the intuition behind self-attention. The video is part of a series of videos on the transformer architecture, arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here: ua-cam.com/play/PLDw5cZwIToCvXLVY2bSqt7F2gu8y-Rqje.html Slides are available here: chalmersuniversity.box.com/s/c2a64r...
Flipping a Masters course
Переглядів 9387 років тому
In this video we describe how we flipped a Masters course (SSY320) at Chalmers University of Technology, during the fall 2014. We will also go though what the teacher and the students thought about the experience.
An introduction to flipped classroom teaching
Переглядів 1,4 тис.7 років тому
In this video we explain the concept of flipped classroom teaching and discuss some of the arguments regarding why it is good.
Generalized optimal sub-pattern assignment metric (GOSPA)
Переглядів 4 тис.7 років тому
This video presents the GOSPA paper: A. S. Rahmathullah, Á . F. García-Fernández, and L. Svensson, “Generalized optimal sub-pattern assignment metric,” in 2017 20th International Conference on Information Fusion, Xi’an, P.R. China, Jul. 2017. arxiv.org/abs/1601.05585 The paper received the best paper award at the Fusion conference in 2017, and Matlab code to compute the GOSPA metric is availabl...

КОМЕНТАРІ

  • @subusrable
    @subusrable 18 днів тому

    very few people know these concepts well enough to give detailed explanation with formulae. thanks a ton. I was having a lot of queries and this video helped resolve those

  • @farrugiamarc0
    @farrugiamarc0 7 місяців тому

    A very clear and amazingly detailed explanation of such a complex topic. It would be nice to have more videos related to ML from you!

  • @farrelledwards
    @farrelledwards 7 місяців тому

    This is great

  • @jeremykenn
    @jeremykenn 7 місяців тому

    does 5:45 - 8:15 refer to the old RNN training method ? and hence next video is the real transformer decoder

  • @nomecognome-f9w
    @nomecognome-f9w 8 місяців тому

    thank you for your work, these are incredible videos. but there is one thing I didn't understand. during the training phase, the entire sentence already correctly translated is given as input to the decoder and to prevent the transformer from "cheating" masked self attention is used. How many times does this step happen? because if it only happened once then the hidden words would not be usable during training. During the training phase, after each step does backpropagation occur and then does the mask move, hiding fewer words?

  • @prateekpatel6082
    @prateekpatel6082 9 місяців тому

    pretty bad example . Even if we have trainiable Wq and Wk , what if there was a new sentence where we had Tom and and he , the WQ will still make word 9 point to wmma and she

  • @babbobill510
    @babbobill510 10 місяців тому

    Hi, great video! I have just a question about this. When we compute Z = KT * Q, where KT is the transpose of K, we are doing Z = (W_K * X)T * (W_Q * X) = XT * W_KT * W_Q * X. Now, calling M = W_KT * W_Q, we have Z = XT * M * X. So why we are decomposing M into W_KT * W_Q? At the end we use only the product of W_K and W_Q, so why do we learn both separately and not just learn M directly? Thank you

  • @DmitryPesegov
    @DmitryPesegov Рік тому

    Need an example with BATCH being fed into that. What would be the rows in batches? What would Y looks like? Only then it is possible to really see how masks works.

  • @cedricmanouan1615
    @cedricmanouan1615 Рік тому

    The first sentence of the video solved my problem 😅 "what enables us to parallelize calculation during training"

  • @ryanhewitt9902
    @ryanhewitt9902 Рік тому

    These videos are wonderful, thank you for putting in the work. Everything was communicated so clearly and thoroughly. My interpretation of the attention mechanism is that the result of the similarity (weight) matrix multiplied by the value matrix gives us an offset vector, which we then add to the value and normalize to get a contextualized vector. It's interesting in the decoder, we derive this offset from a value vector in the source language, add it to the target words and it is still somehow meaningful. I presume that it is the final linear layer which ensures that this resulting normalized output vector maps coherently to a discrete word in the target language. If we can do this across languages, I wonder if this can be done across modalities.

    • @lennartsvensson7636
      @lennartsvensson7636 Рік тому

      Thanks. That sounds like an accurate description of cross-attention (what I refer to as encoder-decoder attention). It can certainly be used across modalities and there are many papers describing just that. The common combination is probably images and language but images and point clouds, video and audio, and many other combinations can be found in the literature.

  • @Chill_Magma
    @Chill_Magma Рік тому

    Great job Dr Lennart . Everyone should learn from you

  • @cheese-power
    @cheese-power Рік тому

    The linear production equations have operands in the wrong order.

  • @dirtyharry7280
    @dirtyharry7280 Рік тому

    Excellent, thank you!

  • @hemanthsai369
    @hemanthsai369 Рік тому

    Best video on masking!

  • @amirnasser7768
    @amirnasser7768 Рік тому

    Thank you for the nice explanation. I think you missed to mention that in order to get zeros masking with the softmax you need to set the values (upper triangle of the matrix) to negative infinity.

  • @jeremyyd1258
    @jeremyyd1258 Рік тому

    Excellent video! Thank you!

  • @zbynekba
    @zbynekba Рік тому

    Hi Lennart, where from have you retrieved all the details that you are presenting here? I mean, have you maybe studied/analyzed a source code of an existing implementation of transformer-based models? I haven’t found anywhere else this detailed explanation. Bravo! And thank you.

  • @piyushkumar-wg8cv
    @piyushkumar-wg8cv Рік тому

    What is the difference between attention and self-attention?

  • @piyushkumar-wg8cv
    @piyushkumar-wg8cv Рік тому

    Intuition buildup was amazing, you clearly explained why we need learnable parameters in the first place and how that can help relate similar words. Thanks for the explanation.

  • @g111an
    @g111an Рік тому

    Just binged the entire playlist, helped me understand the intuitions behind the math. I hope you make more videos :)

  • @atheistjourney
    @atheistjourney Рік тому

    Excellent video, as all of yours are. Thank you, I have learned a lot. One thing I'm not clear on is why we need the free parameters W. And how might those be trained? Thank you again.

  • @randalllionelkharkrang4047

    Would you be interested in working together, in say, creating a "better" model than tranformer models? I believe we can integrate reasoning , that would be similar to how humans reason in the higher layers of the tranformer

  • @manishagarwal5323
    @manishagarwal5323 Рік тому

    Hi Professor, are there lectures of courses, or weblinks to your what you teach? Love your clear, precise and well paced coverage of the concepts here! many thanks.

    • @lennartsvensson7636
      @lennartsvensson7636 Рік тому

      Thanks for your kind words. I have an online course in multi-object tracking (on UA-cam and edX) but it is model-based instead of learning-based. Hopefully, I will soon find time to post more ML material.

  • @notanape5415
    @notanape5415 Рік тому

    Beautifully explained, thank you. Transformers are so simplistic yet powerful.

  • @violinplayer7201
    @violinplayer7201 Рік тому

    this is so clear explanations, thanks so much

  • @nappingyiyi
    @nappingyiyi Рік тому

    Thank you professor for this amazing series on the transformer!

  • @SungheeYun
    @SungheeYun Рік тому

    Best UA-cam video explaining Transformer ever!

  • @ahmedb2559
    @ahmedb2559 Рік тому

    Thank you !

  • @ahmedb2559
    @ahmedb2559 Рік тому

    Thank you !

  • @TechSuperGirl
    @TechSuperGirl Рік тому

    Dear Lennart, that was awesome, could you please make a tutorial in python as well? :)

  • @mir7tahmid
    @mir7tahmid Рік тому

    best explanation! Thank you Mr. Svensson.

  • @leiqin111
    @leiqin111 Рік тому

    The best transformer video!

  • @andrem82
    @andrem82 2 роки тому

    Best explanation of self-attention I've seen so far. This is gold.

  • @prasadkendre149
    @prasadkendre149 2 роки тому

    best explanation

  • @prasadkendre149
    @prasadkendre149 2 роки тому

    greatful forever

  • @abrahamowos
    @abrahamowos 2 роки тому

    At 7:14 ,I was the notations would be sm(Z_11, Z_12) and sm(Z_21, Z_22) for the second column...... This that correct?

  • @asmersoy4111
    @asmersoy4111 2 роки тому

    Very helpful. Thank you!

  • @akhileshbisht6469
    @akhileshbisht6469 2 роки тому

    After the training of the model, when we are giving an unknown source sentence to the model how does it predict or decode the words?

    • @lennartsvensson7636
      @lennartsvensson7636 2 роки тому

      One of the earlier videos focuses on this process. Have you watched the entire series?

  • @deanmeece326
    @deanmeece326 2 роки тому

    "Revolution in visual synthesis" is an excellent label of this epoch. Good video and helps people understand how inverting diffusion helps us arrive towards the concepts of "stable" diffusion.

  • @goelnikhils
    @goelnikhils 2 роки тому

    Thanks a lot Lennart. What a crisp and clear explanation of BERT.

  • @exxzxxe
    @exxzxxe 2 роки тому

    Lennart, you are the UA-cam Wizard of Transformers!

  • @exxzxxe
    @exxzxxe 2 роки тому

    Well done!

  • @exxzxxe
    @exxzxxe 2 роки тому

    A first class explanation of self attention- the best on UA-cam.

  • @gehadhisham2539
    @gehadhisham2539 2 роки тому

    I really love this series, Many thanks to you sir. But may I ask if I want additional sources to study the transformer from, what do you recommend?

  • @antonisnesios1701
    @antonisnesios1701 2 роки тому

    Thanks a lot, the only complete course about transformers that I found. One question, Why K = [q1 q2 ... q_(nE)] and not K=[ k1 .... ] (or its typo?)

    • @lennartsvensson7636
      @lennartsvensson7636 2 роки тому

      That's definitely a typo! Thanks for pointing it out. I might actually adjust the videos later this fall to correct typos like this.

  • @jaehoyoon7061
    @jaehoyoon7061 2 роки тому

    Very easy to follow

  • @СоломияНовицкая
    @СоломияНовицкая 2 роки тому

    「動画の音が良くない」、

  • @rickyebay
    @rickyebay 2 роки тому

    This is the best explanation on Transformer I have found in the web. Can you doing another set of video for T5 ?

  • @shaifulchowdhury9967
    @shaifulchowdhury9967 2 роки тому

    Undoubtedly, these 8 videos best explain transformers. I tried other videos and tutorials, but you are the best.

  • @victormachadogonzaga1898
    @victormachadogonzaga1898 2 роки тому

    Awesome!