The complete guide to Transformer neural Networks!

Поділитися
Вставка
  • Опубліковано 28 кві 2024
  • Let's do a deep dive into the Transformer Neural Network Architecture for language translation.
    ABOUT ME
    ⭕ Subscribe: ua-cam.com/users/CodeEmporiu...
    📚 Medium Blog: / dataemporium
    💻 Github: github.com/ajhalthor
    👔 LinkedIn: / ajay-halthor-477974bb
    RESOURCES
    [ 1 🔎] Transformer Architecture Image :github.com/ajhalthor/Transfor...
    [2 🔎] draw.io version of the image for clarity: github.com/ajhalthor/Transfor...
    PLAYLISTS FROM MY CHANNEL
    ⭕ Transformers from scratch playlist: • Self Attention in Tran...
    ⭕ ChatGPT Playlist of all other videos: • ChatGPT
    ⭕ Transformer Neural Networks: • Natural Language Proce...
    ⭕ Convolutional Neural Networks: • Convolution Neural Net...
    ⭕ The Math You Should Know : • The Math You Should Know
    ⭕ Probability Theory for Machine Learning: • Probability Theory for...
    ⭕ Coding Machine Learning: • Code Machine Learning
    MATH COURSES (7 day free trial)
    📕 Mathematics for Machine Learning: imp.i384100.net/MathML
    📕 Calculus: imp.i384100.net/Calculus
    📕 Statistics for Data Science: imp.i384100.net/AdvancedStati...
    📕 Bayesian Statistics: imp.i384100.net/BayesianStati...
    📕 Linear Algebra: imp.i384100.net/LinearAlgebra
    📕 Probability: imp.i384100.net/Probability
    OTHER RELATED COURSES (7 day free trial)
    📕 ⭐ Deep Learning Specialization: imp.i384100.net/Deep-Learning
    📕 Python for Everybody: imp.i384100.net/python
    📕 MLOps Course: imp.i384100.net/MLOps
    📕 Natural Language Processing (NLP): imp.i384100.net/NLP
    📕 Machine Learning in Production: imp.i384100.net/MLProduction
    📕 Data Science Specialization: imp.i384100.net/DataScience
    📕 Tensorflow: imp.i384100.net/Tensorflow
    TIMESTAMPS
    0:00 Introduction
    1:38 Transformer at a high level
    4:15 Why Batch Data? Why Fixed Length Sequence?
    6:13 Embeddings
    7:00 Positional Encodings
    7:58 Query, Key and Value vectors
    9:19 Masked Multi Head Self Attention
    14:46 Residual Connections
    15:50 Layer Normalization
    17:57 Decoder
    20:12 Masked Multi Head Cross Attention
    22:47
    24:03 Tokenization & Generating the next translated word
    26:00 Transformer Inference Example

КОМЕНТАРІ • 99

  • @CodeEmporium
    @CodeEmporium  Рік тому +10

    The link to the image and it’s raw file are in the description. If you think I deserve it, please give this video a like and subscribe for more! If you think it’s worth sharing, please do so as well. I would love to grow to 100k subscribers this year with your help :) Thank you!

    • @RanDuan-dp6oz
      @RanDuan-dp6oz Рік тому

      Just gave the thumb up! Just curious: what software did you use to draw such a wonderful diagram?

    • @junningdeng7385
      @junningdeng7385 Рік тому

      Sooooo nice! Where we can find the link to the image😂

    • @CodeEmporium
      @CodeEmporium  Рік тому +1

      Thanks I used draw.io to draw the image

    • @CodeEmporium
      @CodeEmporium  Рік тому

      The image can be found in the description of the video on GitHub

    • @user-np2jc9km3u
      @user-np2jc9km3u 4 місяці тому

      But what is the source for the kannada words that was feed in to the output?, how can we get those word in reality? could you explain me if you are willing to. Thank you.

  • @siddheshdandagavhal9804
    @siddheshdandagavhal9804 9 місяців тому +7

    Most underrated youtuber. You are explaining this complex topics with such an ease. Many big channels avoid explaining this topics. Really appreciate your work man.

    • @CodeEmporium
      @CodeEmporium  9 місяців тому

      Thanks a lot for the kind words. I try :)

    • @ShimoriUta77
      @ShimoriUta77 3 місяці тому

      Bro for real! It never felt a possibility for me to learn ML but this guy took me by hand and is teaching all this for free!
      I can't even thank this dude enough

  • @menghan9260
    @menghan9260 11 місяців тому +5

    The way you approach this topic make it so easy to understand, and I appreciate the pace of your talking. Best content on transformer.

    • @CodeEmporium
      @CodeEmporium  11 місяців тому

      You are very welcome. And thanks so much for that super thanks. You didn’t have to, but very appreciated

  • @ianrugg
    @ianrugg Рік тому +4

    Great overview! Thanks for taking the time to put all this together!

  • @Anirudh-cf3oc
    @Anirudh-cf3oc 6 місяців тому +2

    You are the most underrated UA-camr. This is the best video explaining Transformers completely in the most intuitive way. I started my journey with Transformers with your first Transformers video few years ago which was very helpful. Also, I am so happy to see an AI tutorial video using an Indian Language. I really appreciate your work.

  • @asdfasdf71865
    @asdfasdf71865 10 місяців тому +1

    i like your visualization of the matrixes. those residual connections and positional embeddings were good details to mention here

  • @moseslee8761
    @moseslee8761 7 місяців тому

    You explain really well! I think its quite complex but as you explained it, it has become more clear. I think with the coding video, it is extremely useful

  • @ArunKumar-bp5lo
    @ArunKumar-bp5lo 5 місяців тому

    love the visualization makes it so clear

  • @ramakantshakya5478
    @ramakantshakya5478 Рік тому +3

    Amazing explanations throughout the series, and top-notch content, as always. Waiting for a detailed explanation/visualisation of the backward pass in the encoder/decoder during training. I would appreciate it if you were thinking in the same way.

  • @aintgonhappen
    @aintgonhappen Рік тому

    Video quality is amazing.
    Keep it up, buddy!

  • @helloansuman
    @helloansuman Рік тому +2

    Amazing❤ Salute to the dedication in making this video, visual explaination and knowledge.

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks so much for watching and commenting!

  • @Mr.AIFella
    @Mr.AIFella Рік тому

    You're explanation is the most realistic explication of the Transformer that I've ever seen in the internet.
    Thanks dude.

    • @CodeEmporium
      @CodeEmporium  Рік тому

      That means a lot. Thank you. Please like subscribe and share around if you can :)

  • @amiralioghli8622
    @amiralioghli8622 7 місяців тому +1

    Thank you so much for taking the time to code and explain the transformer model in such detail, I followed your series from zeros to heros. You are amazing and, if possible please do a series on how transformers can be used for time series anomaly detection and forecasting. it is extremly necessary on yotube for somone!

  • @wireghost897
    @wireghost897 10 місяців тому

    Very well explained. Thank you.

  • @bhashganti9483
    @bhashganti9483 3 місяці тому

    Awesome tutorial on application of "transformer" architecture for language translation.
    This is my very first lesson on the topic and I will give a 5+ stars.
    Thx dude you inspired me to subscribe to your channel -- my very first you tube subscription .
    Can't thank you enough!!

    • @CodeEmporium
      @CodeEmporium  3 місяці тому

      Thanks for the kind words! And super glad this video was helpful. Hope you enjoy the full playlist “Transformers from scratch “ of which this video is a part of :)

  • @triloksachin4826
    @triloksachin4826 29 днів тому

    Amazing video, keep up the good work. Thanks for this!!

  • @Sneha-Sivakumar
    @Sneha-Sivakumar 6 місяців тому

    this was a brilliant video!! super comprehensive

  • @enrico1976
    @enrico1976 3 місяці тому

    That was awesome. Thank you man!!!

  • @cyberpunkbuilds
    @cyberpunkbuilds Місяць тому

    You kanada written language is really beautiful!

  • @loplop88
    @loplop88 8 днів тому

    so underrated!

  • @abirbenaissa3717
    @abirbenaissa3717 8 місяців тому

    Life saver, thank you

  • @amitsingha1637
    @amitsingha1637 7 місяців тому

    Bro all of my Confusion vanished like vanishing Gradient.
    Thanks. Really worth it.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Рік тому

    Really well presented.

  • @codeative
    @codeative Рік тому

    Very well explained 👍

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks a ton for commenting and watching :)

  • @prashantlawhatre7007
    @prashantlawhatre7007 Рік тому

    Eagerly waiting for the upcoming videos in the series.

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks! Probably just 1-2 long form video(s) more

  • @soumilyade1057
    @soumilyade1057 Рік тому

    hopefully the series is completed soon ❤️ would binge watch 😁

    • @CodeEmporium
      @CodeEmporium  Рік тому +1

      Yep. Maybe 1 or 2 videos left. I am running into some issues, but I’ll probably either have them solved or just have a fun community help video. Either way, it should be good

    • @soumilyade1057
      @soumilyade1057 Рік тому

      @@CodeEmporium ♥️♥️ 😌

  • @charleskangai4618
    @charleskangai4618 2 місяці тому

    Excellent!

  • @user-pu4iz8wb4d
    @user-pu4iz8wb4d Рік тому

    THIS IS AMAZING ,helped me a lot thanks :)

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks so much for watching and commenting!

  • @anandgupta2892
    @anandgupta2892 11 місяців тому

    very well 👍

  • @k-c
    @k-c Рік тому

    Will have to brush up my basics and then come back to this.

    • @CodeEmporium
      @CodeEmporium  Рік тому +1

      Yea. This can be a lot of info. Hopefully the earlier videos in this playlist will help too

    • @k-c
      @k-c Рік тому

      @@CodeEmporium Your channel is really good! Thanks for all the work.

  • @Diego-nw4rt
    @Diego-nw4rt Рік тому

    Great channel and very useful video, thank you very much! I will watch other videos of your channel as well.
    I have a question. After you perform layer normalization obtaining an output tensor, how do you give a three-dimensional tensor as input to a feed forward layer?
    Do you flatten the input?

  • @ravikumarnaduvin5399
    @ravikumarnaduvin5399 Рік тому

    My friend Ajay, your playlist "Transformers from scratch" is great. It was very appealing to me to see your block diagram representation. Waiting with great anticipation for the final video. Would you be able to make it available soon?

    • @CodeEmporium
      @CodeEmporium  Рік тому +1

      Glad you like it! I am hitting a few roadblocks though I feel I am 99% there. I’ll make a video on this to mostly ask the community. So it should be a fun exercise for everyone too :) hoping when that is resolved, we can make a final video :D

  • @phaZZi6461
    @phaZZi6461 Рік тому

    hi, i really love your complete model overview!
    also at 8:08 you mention that the difference between K Q V isnt very explicit to the model. what would be your personal intuitive interpretation for what a Key vector might extract/learn from a input word? i find the key conept a bit odd and wondered how the authors came up with the idea of training a Key vector(/matrix), where previous attention papers only had a value vector, which would be used in both places (K and V) of the equation .
    when i think about information retrieval concepts where we have a search query and documents to be ranked, iirc the intuition there is to compute a dot product to get a similarity/relevance score between them. in my mind the concept of "how relevant is each document" isnt that far off from "how much attention should i pay to each document".
    And analogously I would interpret documents to be Values, and the idea of a key seems to be absent? (unless IR in practice computes a key for each document, basically a key_of(document)-query-similarity; then i just answered the question myself).
    anyways, i wondered if it wouldnt be possible to simplify the attention mechanism, while keeping it conceptually similar. not sure where i should look to get to know more about this.

  • @lakshman587
    @lakshman587 5 місяців тому +1

    Thank you so much for all these videos, I have learnt a lot from your videos!!!
    I thought you were from Tamil Nadu, but today I got to know that you were from Karnataka!!
    Where from Karnataka? I'm staying in Bangalore, Would like to meet you in-person!!!!!

  • @naveenrs7460
    @naveenrs7460 Рік тому +1

    Lovely brother. I am your Neighbour Tamizhan. Lovely brotherhood

  • @josephfemia8496
    @josephfemia8496 Рік тому +1

    If I can recommend a next steps to this series, going into Bert, GPT, and DETR would be lovely extensions

    • @CodeEmporium
      @CodeEmporium  Рік тому +2

      I was kind of thinking the same! For now, I have videos on BERT , GPT on the channel if you haven’t checked it out. But an architecture deep dive would be fun too :)

    • @RanDuan-dp6oz
      @RanDuan-dp6oz Рік тому

      @@CodeEmporium Yes, that will be super fun! Also, it would be great if you can introduce how a ML practitioner could do fine tune based on these complex models.

  • @DanielTorres-gd2uf
    @DanielTorres-gd2uf Рік тому

    Damn, could've used a few weeks ago for my OMSCS quiz. Solid review though, nice job!

  • @sarahgh8756
    @sarahgh8756 2 місяці тому

    Thank you for all the videos about transformer. Although I understood the architecture, I still dont know what to set for the input of the decoder (embedded target) and mask for the TEST phase?

  • @fayezalhussein7115
    @fayezalhussein7115 Рік тому

    amaaazing

  • @capyk5455
    @capyk5455 Рік тому

    Amazing

  • @rafaelgp9072
    @rafaelgp9072 Рік тому

    Would be nice a video like this explaining LLAMA model

  • @whiteroadism
    @whiteroadism Рік тому

    Great video. At 12:09 , how will dividing all the numbers by 8 ensure the small values are not too small or large values are not too large? Wouldn't dividing by 8 cause a number to be 8 times smaller?

  • @davefaulkner6302
    @davefaulkner6302 Місяць тому

    Fantastic lecture. The attention layer and their inter-relationships are very well explained. Thank you. However this and other videos gloss over the use of the fully-connected layers following the attention layer. Using FC with language model embeddings makes little sense to me. Are there 512x50 inputs to the FC, i.e., is the input sentence simply flattened as input to the FC layer?

  • @joegarcia8935
    @joegarcia8935 Рік тому

    Thanks!

    • @CodeEmporium
      @CodeEmporium  Рік тому

      You are super welcome! I appreciate the donation! Thanks!

  • @markusnascimento210
    @markusnascimento210 11 місяців тому

    Very good. In general articles don´t show the dimensions when explaining. It helps a lot. Tks

  • @abulfahadsohail466
    @abulfahadsohail466 Рік тому

    Please can you apply transformers which you have built on text summarisation. It is really helpful.

  • @wishIKnewHowToLove
    @wishIKnewHowToLove Рік тому

    concise

  • @paragbhardwaj5753
    @paragbhardwaj5753 Рік тому

    Do a video on this new model. Called RWKV-LM.

  • @CyKeulz
    @CyKeulz Рік тому

    Great! Still a bit too hard for me but i still learned stuff.
    Question, would it be possible to use the same encoder accross multiple languages ? without retrainning it after the first time, i mean.

    • @CodeEmporium
      @CodeEmporium  Рік тому

      I hope the full playlist “Transformers from scratch” helps with pacing this.
      To your second question. This is a simple transformer neural network and not the typical language model like BERT/GPT. The transformer on its own doesn’t make use of transfer learning typically. So some retraining will be required. That said, if you were using the language models, then you might just need to fine tune your parameters to the target language (which is technically training). Or if you go the GPT3 route, you could get away without fine tuning and use meta learning techniques instead.

  • @colinmaharaj50
    @colinmaharaj50 8 місяців тому

    Can this be done in pure C++

  • @gabrielnilo6101
    @gabrielnilo6101 10 місяців тому

    11:08 I am sorry if I am wrong but the transposed K matrix, isn't it 50x30x64?

  • @susmitjaiswal136
    @susmitjaiswal136 Рік тому

    what is the use of feed forward network in transformer ..please answer

  • @erikschmidt3067
    @erikschmidt3067 Рік тому

    What're in the feed forward layers? Just an input and output layer? Are there hidden layers? What are the sizes of the layers?

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Freed forward layers are hidden layers. It’s just essentially 2,048 neurons in size. You can think of it as mapping 512 dimension vector to 2,048 dimension vector. And then mapping the 2048 vector to 512 dimensions. All of this to capture additional information about the word

  • @anwarulislam6823
    @anwarulislam6823 Рік тому

    Without bci multi head attention process possible with human brain?

  • @samurock100
    @samurock100 3 місяці тому

    1kth like

  • @venkideshk2413
    @venkideshk2413 Рік тому

    Masked multihead attention is for decoder right. Is that a typo in your encoder architecture.

  • @user-np2jc9km3u
    @user-np2jc9km3u 4 місяці тому

    But what is the source for the kannada words that was feed in to the output?, how can we get those word in reality? could someone explain me if you are willing to. Thank you.

  • @TheTimtimtimtam
    @TheTimtimtimtam Рік тому

    First :)

  • @wintobisakul1848
    @wintobisakul1848 Рік тому

    amazing fluent in english speak like native speaker

    • @CodeEmporium
      @CodeEmporium  Рік тому +1

      I am a native English speaker, but I’ve lived a good amount of my adolescence and early adult life in India

    • @wintobisakul1848
      @wintobisakul1848 Рік тому

      @@CodeEmporium Wow, so that means you also speak the Indian dialect, which I assume makes you fluent in three languages?

    • @wintobisakul1848
      @wintobisakul1848 Рік тому

      I truly appreciate your explanation regarding content, tone, accent, and other related aspects.

  • @creativeuser9086
    @creativeuser9086 11 місяців тому +1

    So you're from the silicon valley of India. We all now it

  • @jamesroy9027
    @jamesroy9027 10 місяців тому

    background music create lot of disturbance and especially that pop out sound otherwise content delivery is best