The complete Transformer Neural Network Code in 300 lines!

Поділитися
Вставка
  • Опубліковано 15 тра 2024
  • Code for the transformer neural network. This is the architecture for the "Attention is all you need" paper. And this forms the core of ChatGPT and other Large Language models today
    ABOUT ME
    ⭕ Subscribe: ua-cam.com/users/CodeEmporiu...
    📚 Medium Blog: / dataemporium
    💻 Github: github.com/ajhalthor
    👔 LinkedIn: / ajay-halthor-477974bb
    RESOURCES
    [1] Code for the video: github.com/ajhalthor/Transfor...
    PLAYLISTS FROM MY CHANNEL
    ⭕ Transformers from scratch playlist: • Self Attention in Tran...
    ⭕ ChatGPT Playlist of all other videos: • ChatGPT
    ⭕ Transformer Neural Networks: • Natural Language Proce...
    ⭕ Convolutional Neural Networks: • Convolution Neural Net...
    ⭕ The Math You Should Know : • The Math You Should Know
    ⭕ Probability Theory for Machine Learning: • Probability Theory for...
    ⭕ Coding Machine Learning: • Code Machine Learning
    MATH COURSES (7 day free trial)
    📕 Mathematics for Machine Learning: imp.i384100.net/MathML
    📕 Calculus: imp.i384100.net/Calculus
    📕 Statistics for Data Science: imp.i384100.net/AdvancedStati...
    📕 Bayesian Statistics: imp.i384100.net/BayesianStati...
    📕 Linear Algebra: imp.i384100.net/LinearAlgebra
    📕 Probability: imp.i384100.net/Probability
    OTHER RELATED COURSES (7 day free trial)
    📕 ⭐ Deep Learning Specialization: imp.i384100.net/Deep-Learning
    📕 Python for Everybody: imp.i384100.net/python
    📕 MLOps Course: imp.i384100.net/MLOps
    📕 Natural Language Processing (NLP): imp.i384100.net/NLP
    📕 Machine Learning in Production: imp.i384100.net/MLProduction
    📕 Data Science Specialization: imp.i384100.net/DataScience
    📕 Tensorflow: imp.i384100.net/Tensorflow

КОМЕНТАРІ • 38

  • @CodeEmporium
    @CodeEmporium  Рік тому +6

    If you think I deserve it, please give this video a like and subscribe for more :)

  • @rpraver1
    @rpraver1 5 місяців тому +3

    Best playlist on transformers I have viewed to date, bar none...

    • @CodeEmporium
      @CodeEmporium  5 місяців тому

      Thanks so much for the kind words! I really appreciate it

  • @vtrandal
    @vtrandal 12 днів тому +1

    Thanks! Excellent videos!

  • @jematos92
    @jematos92 8 місяців тому +2

    Thank you so much for the detailed explanation, Ajay, I'm really enjoying these videos!!!
    QQ - I noticed you included the embeddings layers with positional encoding inside the Decoder and Encoder class. Is this a design a choice bc you are only training with 1 decoder and 1 encoder block? Or is this embedding required in all the blocks if you decide to stack more than 1 encoder/decoder blocks?

  • @dataflex4440
    @dataflex4440 Рік тому +7

    God Bless this man for putting so much of Hard work and dedication and creating so high quality content.

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks so much for the kind words. Hope you enjoy the other videos too :)

  • @lawrencemacquarienousagi789
    @lawrencemacquarienousagi789 Рік тому +2

    Man, you are doing a great job

  • @arpitsingh9198
    @arpitsingh9198 9 місяців тому +1

    hi Ajay, thanks for the code walkthrough, I wanted to ask if its really necessary to create a forward method in the transformer class, you could have created encoder, decoder and an output probability function as a seperate torch module and it would have been easy during inference as encoder is called ONCE whereas only the auto regressive decoder can be called in a while loop during inference saving a lot of computation time ?

  • @LuizHenrique-qr3lt
    @LuizHenrique-qr3lt Рік тому +2

    Great vídeo!!

  • @3HWay
    @3HWay 11 місяців тому

    Please give a video(s) on Tabular Transformer (TabTransformer) and how to use it in best practice. Thanks.

  • @DLwithShreyas
    @DLwithShreyas Рік тому +1

    Great video Ajay !! I have one such project for English - Hinglish translation, but it is in Keras. Would love to see your implementation of this with Kannada.

    • @CodeEmporium
      @CodeEmporium  Рік тому

      English Hindi translation? So you want an English to Kannada implementation in Keras?

    • @DLwithShreyas
      @DLwithShreyas Рік тому

      @@CodeEmporium No no I just meant to say, it would be great to see your approach to NMT using torch

    • @DLwithShreyas
      @DLwithShreyas Рік тому

      @@CodeEmporium Hinglish meaning Hindi language in English text example : "Aaj mene Trasformer padha !"

    • @CodeEmporium
      @CodeEmporium  Рік тому +2

      @@DLwithShreyas hmm. I think the technical term for this is “transliteration”. Also you should be able to convert Hindi text to this transliterated Hindi text (Hinglish) via a simple algorithm with no machine learning.
      If you have an English - Hindi dataset, you can use a script to convert Hindi to Hinglish. Then use the transformer to train a translator to convert English to Hinglish directly.

  • @convolutionalnn2582
    @convolutionalnn2582 Рік тому +1

    In the code of Positional encoding,on the final class, position is 1 to max sequence length....Which include both even and odd...I think we use cos for odd and sin for even..Why all the position are pass to both sin and cos which mean 1 to max sequence length including even are pass in cos and including odd are pass in sin.

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Because it likely would destroy the final word embedding. These position encodings are added to the word vectors. If we were to add monotonically increasing values (vector of 0s for 1st word, vector of 1s for 2nd word and so on), then the word vectors would only be able to see words around it at best. And we wouldn’t be able to see high attention values between 2 words that are further apart from each other.
      If you are wondering “why”, it is because 2 words will have high attention value if the vectors are similar (that is the multiplication of vectors yields large values). But adding monotonically increasing position encodings even to similar words can destroy the word vector embedding if those words are sufficiently far apart. The sine/cosine functions on the other hand will always be between a fixed range regardless of how many words (or characters in my case) there are in the sentence. Hence attention is good for very long sequences and is also scalable.
      This is primarily why I think we use sine/cosine function for the position encoder. If you have other thoughts, do let me know

    • @convolutionalnn2582
      @convolutionalnn2582 Рік тому

      @@CodeEmporium So we should also pass odd index in the sin function and even index in the cos function....As you put all the index in both sin and cos function for positional embeddings....Most video explain that only odd index must be pass on cos function and only even index must be pass on sin function...

    • @CodeEmporium
      @CodeEmporium  Рік тому

      In my video on position encoding, I rewrote the equation given in the attention is all you need paper. Their implementation and my own are the same thing. It just might look a lil different is all

  • @rupamjyotidas736
    @rupamjyotidas736 Рік тому

    Hi, thanks for your videos. Can you please tell me what will be the main difference if i am building a text generator not a translator. Such as blog generation task.
    What should be the deocder encoder input while training in that context

    • @CodeEmporium
      @CodeEmporium  Рік тому

      When building a text generator, you can use BERT, GPT which are just the encoder or just the decoder respectively. I should have individual videos discussing each in my playlist called “Language Models” if you’re interested. Thanks for watching :)

    • @dataflex4440
      @dataflex4440 Рік тому

      I guess than can be done by just applying common sense by understanding these transformer series

  • @languagemodeler
    @languagemodeler Рік тому

    🙏🙏🙏

  • @vinaynalluri277
    @vinaynalluri277 Місяць тому

    I have a one question where I have to detect the indic language which is romanised script. can you help me in this

  • @lawrencemacquarienousagi789

    Thanks

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Thanks for the super thanks and the support !

  • @dev0_018
    @dev0_018 Рік тому

    could u make a video or reply me on how long it took u to get all the knowledge u have now in terms of hours ?
    how did u get your source of knowledge and did u gain experience once u learn things, was it by applying on toy projects ? and if they were what were the toy projects ?
    in what industry does your knowledge can be used ? can it be used in search engines, trading industry, image recognition etc...
    shortly could you make a video entangling how exactly you got here from where you knew about nothing.

    • @CodeEmporium
      @CodeEmporium  Рік тому +1

      I think this is a good question for a video. But to make sure you don’t wait forever, here is a lil answer. I started getting into Machine Learning in 2016 (about 7 years now). Start very application side, but applied on actual projects. My first big project I took up was Speech to Text for Kannada. It was very code heavy but I also understood the difficulty and fun of collecting data, reading up on language to increase my domain knowledge, packaging code into an application apart from the actual model itself (I used Hidden Markov Models at the time). I have a computer science background. So starting this way was the most approachable.
      Next is to read research papers. Start with a simple google search of “state of the art language model papers”, pull the first paper up, read it. I found it very hard to understand anything, but kept pushing through. “How did they even come up with this” I used to think. But in the paper, they improve on previous work, so you can take a look at that and the cycle repeats. This is hard at first, but gets much easier the more papers you read.
      Next is math. I learned the bulk of my fundamentals in grad school at the time (2017-2019). But you don’t necessarily have to go to school for this. These days, there are a ton of free resources (though it can be difficult to piece together). My early videos try to dive into the math if you’re curious.
      The industry I work is currently e-commerce. Machine learning is useful for recommendation systems, predictive pricing, determining which users to market, time series models to forecast how many packages will get into the warehouse among so many others. And yes AI can be used in search engines (there are ranking algorithms for example), trading and image recognition (the entire field of computer vision is dedicated to this).
      Hope this helps for now. I might make a video on this in the future

    • @CodeEmporium
      @CodeEmporium  Рік тому +1

      Also, you probably don’t need 7 years to learn everything. But I would by no means say “I know everything about everything” now. It’s a journey you need to take 1 step at a time :)

    • @dev0_018
      @dev0_018 Рік тому

      @@CodeEmporium i appreciate this long reply.
      it would be cool to see a video.
      thanks man your videos are my go to whenever I'm stuck on something which u have made a video about.

  • @krishradha5709
    @krishradha5709 10 місяців тому

    Hey bro, if i want to add another layer of encoder and decoder for improving the performance what should i do...

    • @tommathew5148
      @tommathew5148 10 місяців тому

      just change the num_layers to 6 no?

  • @gileneusz
    @gileneusz Рік тому

    0:38 that's actually the worst language choice. Nearly no one is familiar with that exotic language. It's confusing and very bad choice for this educational videos.

    • @CodeEmporium
      @CodeEmporium  Рік тому +2

      The world isn’t European. It’s better to teach with languages you’re comfortable in as it’s easy to validate example translations. I don’t know French, German or Spanish enough to validate examples

    • @user-ji2om8gy2m
      @user-ji2om8gy2m 11 місяців тому

      ​@@CodeEmporium you get the point, french to English is what consider as common, but i guess if you can't speak French frequently, that is no reason to do so.