Coding a ChatGPT Like Transformer From Scratch in PyTorch

Поділитися
Вставка
  • Опубліковано 30 вер 2024

КОМЕНТАРІ • 174

  • @statquest
    @statquest  3 місяці тому +13

    - You can get the code here: github.com/StatQuest/decoder_transformer_from_scratch
    - Learn more about GiveInternet.org: giveinternet.org/StatQuest NOTE: Donations up to $30 will be matched by an Angel Investor - so a $30 donation would give $60 to the organization. DOUBLE BAM!!!
    - The full Neural Networks playlist, from the basics to AI, is here: ua-cam.com/video/CqOfi41LfDw/v-deo.html
    - Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @techproductowner
    @techproductowner 3 місяці тому +26

    You will be rememberd for next 1000 years in the history of Statistics and Data Science , You should be named as "Father of Applied Statistics & Machine Learning " , Pls thumbs up if you are with me

  • @thebirdhasbeencharged
    @thebirdhasbeencharged 3 місяці тому +75

    Can't imagine the work that goes into this, writing the code, making diagrams, recording, editing and voice over, you're the goat big J.

    • @statquest
      @statquest  3 місяці тому +2

      Thanks!

    • @thomasalderson368
      @thomasalderson368 3 місяці тому

      he is well compensated

    • @statquest
      @statquest  3 місяці тому +11

      @@thomasalderson368 am I? Maybe it's relative, but hour for hour I'm making significantly less than I did doing data analysis in a lab.

    • @FindEdge
      @FindEdge 2 місяці тому +9

      @@statquest Sir we Love you and your work, please don't let such comments to your heart! You may never meet us but there is a generation of statisticians and Data Scientists who owe a lot to you may be all of it!

    • @statquest
      @statquest  2 місяці тому +4

      @@FindEdge Thanks!

  • @jahanzebnaeem2525
    @jahanzebnaeem2525 3 місяці тому +12

    HUGE RESPECT for all the work you put into your videos

  • @n.h.son1902
    @n.h.son1902 3 місяці тому +9

    You said this was going to come out at the end of May. And I’ve been waiting for this for 2 months. Finally, it’s out 😂

    • @statquest
      @statquest  3 місяці тому +15

      I guess better later than never?

  • @muhammadikram375
    @muhammadikram375 3 місяці тому +11

    sir you deserved millions of views on your UA-cam ❤❤🎉

  • @hewas321
    @hewas321 2 місяці тому +6

    Hey Josh, you know what? I used to watch your videos explaining the key ingredients of statistics EVERY DAY in 2020~2021 when I was a freshman. Whatever I click among your videos, it was always the first time for me to learn it. I knew nothing. But I still remember what concept you dealt with in videos and how you explained them.
    Fortunately now I work as an AI researcher - it's been a year already - although I am a 3rd grade student. You suddenly came to my mind so I've just taken a look at your channel for the first time in a long time. This time I've already knew about all of what you explain in videos. It feels really weird. Everything is all thanks to you and still your explanations are clear, well-visualized and awesome. You are such a big help to the newbies of statistics and machine/deep learning. Always love your works. Please keep it going!!! 🔥

    • @statquest
      @statquest  2 місяці тому +1

      Thank you very much! I'm so happy that my videos were helpful for you. BAM! :)

  • @pro100gameryt8
    @pro100gameryt8 3 місяці тому +4

    Incredible video, Josh! Love your content. Can you please make a video on diffusion models?

    • @statquest
      @statquest  3 місяці тому +2

      I'll keep that in mind.

    • @pro100gameryt8
      @pro100gameryt8 2 місяці тому +1

      Thank you very much​ Josh! Bam @statquest

  • @TalkOfWang
    @TalkOfWang 3 місяці тому +6

    It is party time! Thanks for uploading!

  • @akshaygs4048
    @akshaygs4048 2 місяці тому +3

    It had been sometime since i watched your video. Very good video as always 🎉🎉

  • @hammry_pommter
    @hammry_pommter 27 днів тому +1

    sir first of all huge respect to your content......Sir one more request can u make one video on how to apply transformer on image datasets for different image processing models....like object detection,segmentation....
    but only thing is teachers like u make this world more beautiful....

    • @statquest
      @statquest  27 днів тому +1

      Thanks! I'll keep those topics in mind.

  • @artofwrick
    @artofwrick Місяць тому +1

    Hey... Josh, can you please make a Playlist on all the videos on probability that you've posted so far??? Please ❤❤

    • @statquest
      @statquest  Місяць тому

      I'll keep that in mind, in the mean time, you can go through the Statistics Fundaments in this list: statquest.org/video-index/

  • @bayoudata
    @bayoudata 3 місяці тому +5

    Cool, learn a lot from all of your videos Josh! 🤯

  • @__no_name__
    @__no_name__ 2 місяці тому +1

    I want to make a sequence prediction model. How should i test the model? What can i use for inference/ testing? (Not for natural language)

    • @statquest
      @statquest  2 місяці тому +1

      I'm pretty sure you can do it just like shown in this video, just swap out the words for the tokens in your sequence.

  • @isaacsalzman
    @isaacsalzman 3 місяці тому +3

    Ya misspelled ChatGPT - Generative Pre-trained Transformer

  • @SaftigKnackig
    @SaftigKnackig 12 днів тому +1

    I could only watch your videos for getting cheered up by your intro song.

  • @codinghighlightswithsadra7343
    @codinghighlightswithsadra7343 Місяць тому +1

    thank you ! can you please explane how we can use transformer in time series please?

    • @statquest
      @statquest  Місяць тому +1

      I'll keep that in mind. But in the mean time, you can thank of an input prompt (like "what is statquest?") as a time series dataset - because the words are ordered and occur sequentially. So, based on a sequence of ordered sequence of tokens, the transformer generates a prediction about what happens next.

  • @iqra2291
    @iqra2291 Місяць тому +1

    Amazing explanation 🎉❤ you are the best 😊

  • @frommarkham424
    @frommarkham424 Місяць тому +1

    Optimus prime has been real quiet since this one dropped😬😬😬😬😬

  • @mousquetaire86
    @mousquetaire86 3 місяці тому +2

    Wish you could be Prime Minister of the United Kingdom!

  • @mikinyaa
    @mikinyaa 3 місяці тому +4

    🎉🎉🎉thank you😊

  • @Sravdar
    @Sravdar 2 місяці тому +1

    AMAZING VIDEOS. Watched all of your nn playlist in 3 days. And now reaching the end i have some questions. One is what are the future planned videos? And two is how do you select activation functions? In fact a video where you create custom models for for different problems and explaining "why to use this" would be great. No need to explain math or programing needed for that.
    Thank you for all of these videos!

    • @statquest
      @statquest  Місяць тому +1

      Thanks! I'm glad you like the videos. My guess is the next one will be about encoder-only transformers. I'm also working on a book about neural networks that includes all the content from the videos plus a few bonus things. I've finished the first draft and will start editing it soon.

  • @Pqrsaw
    @Pqrsaw Місяць тому +1

    Loved it!
    Thank you very much

  • @Brad-qw1te
    @Brad-qw1te 2 місяці тому +1

    I’ve been trying to make a Neural Network in c++ for like a month now. I was trying to just use 3b1b’s videos but they wernt good enough. But then I found your videos and I’m getting really close to being able to finish the back propagation algorithm.
    When I started I thought it would look good on my resume but now I’m thinking nobody will care but I’m in too deep to quit

  • @gvascons
    @gvascons 3 місяці тому +1

    Great and very didactic as usual, Josh!! Definitely going to wrap my head around this for a while and try a few tweaks! Do you plan on eventually also discussing other non-NLP topics like GANs and Diffusion Models?

    • @statquest
      @statquest  3 місяці тому

      One day I hope to.

  • @mohamedthasneem7327
    @mohamedthasneem7327 Місяць тому +1

    Thank you very much sir...💚

  • @hasibahmad297
    @hasibahmad297 2 місяці тому +1

    I saw the title and right away knew that it is BAM. Can we expect some data analysis, ML projects from scratch?

  • @ShadArfMohammed
    @ShadArfMohammed 3 місяці тому +2

    as always, wonderful content.
    Thanks :)

  • @jawadmansoor6064
    @jawadmansoor6064 3 місяці тому +3

    Finally greatly watied video arrived. Thank you.

  • @ramwisc1
    @ramwisc1 2 місяці тому +1

    Wow - have been waiting for this one! Now that I've wrapped my head around word embeddings, time to code this one up! Thank you @statquest!

  • @neonipun
    @neonipun 3 місяці тому +3

    I'm gonna enjoy this one!

  • @abhinavsb9228
    @abhinavsb9228 2 місяці тому +1

    100/100 🔥when i search for an explanation video on youtube this is what i expect🔥

  • @Op1czak
    @Op1czak 2 місяці тому +1

    Josh, I want to express my sincerest gratitude. I have been following your videos for years and they have been becoming increasingly more important for my study and career path. You are a hero.

  • @louislim2316
    @louislim2316 23 години тому +1

    Triple Bam :)

  • @jorgesanabria6484
    @jorgesanabria6484 3 місяці тому +1

    This will be awesome. I am trying to learn the math behind transformers and PyTorch so hopefully this helps give me some intuition

    • @statquest
      @statquest  3 місяці тому +2

      I've got a video all about the math behind transformers here: ua-cam.com/video/KphmOJnLAdI/v-deo.html

  • @Priyanshuc2425
    @Priyanshuc2425 Місяць тому +1

    Please include this in your happy halloween playlist

  • @thomasalderson368
    @thomasalderson368 3 місяці тому +1

    How about an encoder only classifier to round off the series? thanks

    • @statquest
      @statquest  3 місяці тому

      I'll keep that in mind.

  • @Bartosz-o4p
    @Bartosz-o4p 2 місяці тому +1

    Bam!
    Peanut Butter and Jaaam ;)

  • @gayedemiray
    @gayedemiray 2 місяці тому +1

    you are the best!!! hooray!!!! 😊

  • @sillypoint2292
    @sillypoint2292 2 місяці тому +1

    This video's amazing man. Not just this one but every video of yours. Before I began actually learning Machine Learning I used to watch your videos jus for fun and trust me, it had taught me a lot. Thanks for your amazing teaching :) with love from India ❤

  • @datasciencepassions4522
    @datasciencepassions4522 2 місяці тому +1

    God Bless You for the great work you do! Thank you so much

    • @statquest
      @statquest  2 місяці тому

      Thank you very much! :)

  • @Simon-FriedrichBöttger
    @Simon-FriedrichBöttger 2 місяці тому +1

    Thank you very much!

    • @statquest
      @statquest  2 місяці тому

      TRIPLE BAM!!! Thank you so much for supporting StatQuest!!!

  • @TheFunofMusic
    @TheFunofMusic 3 місяці тому +3

    Triple Bam!!!

  • @cuckoo_is_singing
    @cuckoo_is_singing 2 місяці тому

    hi josh,
    should embedding weigths be updated during training? for example nn.embedding(vocab_size,d_model) produces random numbers that each token will be referred to the related rows in our embedding matrice, should we update this weights during training? positional embedding weights are constant during our training and the only weights (except other parameters of course, like q,k,v) that prone to change are our nn.embedding weights!
    I wrote a code for translating amino acids to sequences
    everything in training works well with accuracy 95-98%
    but in inference stage I get to the bad results. i recall my model by
    loading_path=os.path.join(checkpoint_dir, config['model_name'])
    model.load_checkpoint(loading_path,model,optimizer)
    but after inference loop my result is like:
    'tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc ' :(
    even we assume my algorithm has overfitted We shouldn't get to this result!
    also I think other parameters like dropout factor should not be considered in inference stage (p=0 for dropout)
    I mean we shouldn't just reload the best parameters, we should change some parameters (srry I spoke alot :)) )

    • @statquest
      @statquest  2 місяці тому

      The word embedding weights are updated during training.

  • @observor-ds3ro
    @observor-ds3ro 2 місяці тому

    22:50 hey Josh you assigned 4 for number of tokens, but we have 5 tokens (including ) , even in the shape of the diagram, as you are pointing, there are 5 boxes (representing 5 outputs).. I got confused
    And you know what? Words fail me to say how much you affected on my life.. so I won’t say anything 😂

    • @statquest
      @statquest  2 місяці тому +1

      See 26:46 . At 22:50 we just assign a default value for that parameter, however, we don't use that default value when we create the transformer object at 26:46. Instead, we set it to the number of tokens in the vocabulary.

  • @glaudiston
    @glaudiston 3 місяці тому +1

    Today we learned that statquest is awesome. triple BAM!

  • @sharjeel_mazhar
    @sharjeel_mazhar 3 місяці тому +2

    Thank you! You're the best!!!

  • @205-cssaurabhmaulekhi9
    @205-cssaurabhmaulekhi9 3 місяці тому +2

    Thank you
    I was in need of this 😊

    • @statquest
      @statquest  3 місяці тому

      Glad it was helpful!

  • @acasualviewer5861
    @acasualviewer5861 3 місяці тому

    I'm confused as to why the values would come from the ENCODER when computing the cross attention between the Encoder and Decoder. Shouldn't the values come from the decoder itself?
    So if I trained a model to translate from English to German, then wanted to switch out the German for Spanish, I'd expect the new decoder to know what to do with the output of the Encoder. But if the values are coming from the Encoder, then this wouldn't work.

    • @statquest
      @statquest  3 місяці тому +1

      The idea is that the query in the decoder is used to determine how a potential word in the output is related to the words in the input. This done by using a query from the decoder and keys for all of the input words in the encoder. Then, once we have established how much (what percentages) a potential word in the output is related to all of the input word, we then have to determine what that percentage is of. It is of the values. And thus, the values have to come from the encoder. For more details, see: ua-cam.com/video/zxQyTK8quyY/v-deo.html

  • @Faisal-cl9iu
    @Faisal-cl9iu 3 місяці тому +1

    Thanks a lot for for this free wonderful content. ❤😊

  • @tismanasou
    @tismanasou 2 місяці тому

    Let's start from the basics. ChatGPT is not a transformer. It's an application.

    • @statquest
      @statquest  2 місяці тому

      Yep, that's correct.

  • @Melle-sq4df
    @Melle-sq4df 2 місяці тому

    in the very first slide the imports are broken at ua-cam.com/video/C9QSpl5nmrY/v-deo.html
    `import torch.nn as nn import` # there's an extra trailing import here.

    • @statquest
      @statquest  2 місяці тому

      Yep, that's a typo. That's why it's best to download the code. Here's the link: github.com/StatQuest/decoder_transformer_from_scratch

  • @旭哥-r5b
    @旭哥-r5b 13 днів тому +1

    Thank you. You're a lifesaver when I need this to finish my school project. However, if the input contains a various number of strings, do I add padding after ?

    • @statquest
      @statquest  12 днів тому +1

      Yes, you do that when training a batch of inputs with different lengths.

    • @旭哥-r5b
      @旭哥-r5b 7 днів тому

      @@statquest Thank you for your help. However, if I use zero padding and include zero as a valid token in the vocabulary, won't the model end up predicting zero-which is meant to represent padding-thereby making the output meaningless?

    • @statquest
      @statquest  7 днів тому +1

      @@旭哥-r5b You create a special token for padding.

    • @旭哥-r5b
      @旭哥-r5b 7 днів тому

      @@statquest And that token will still be used as the label for training?

    • @statquest
      @statquest  7 днів тому

      @@旭哥-r5b I believe that is is correct.

  • @HanqiXiao-x1u
    @HanqiXiao-x1u 2 місяці тому +1

    Horray!

  • @rishabhsoni
    @rishabhsoni 2 місяці тому +1

    Respect

  • @gastonmorixe
    @gastonmorixe 2 місяці тому +1

    gold

  • @naromsky
    @naromsky 3 місяці тому +1

    From scratch in pytorch, huh.

    • @statquest
      @statquest  3 місяці тому +4

      I decided to skip doing it in assembly. ;)

  • @miriamramstudio3982
    @miriamramstudio3982 2 місяці тому +1

    Great video. Thanks

    • @statquest
      @statquest  2 місяці тому

      Glad you liked it!

  • @suika6459
    @suika6459 3 місяці тому +2

    amazinggg

  • @gstiebler
    @gstiebler 3 місяці тому +2

    Thanks!

    • @statquest
      @statquest  3 місяці тому

      TRIPLE BAM!!! Thank you for supporting StatQuest!

  • @arnabmishra827
    @arnabmishra827 2 місяці тому

    What is that extra "import" at line 2, @1.37

    • @statquest
      @statquest  2 місяці тому

      That's called a typo.

  • @yosimadsu2189
    @yosimadsu2189 3 місяці тому

    🙏🏻🙏🏻🙏🏻🙏🏻🙏🏻 Please please please show us how to train QVK Weights in detail 🙏🏻🙏🏻🙏🏻🙏🏻🙏🏻
    You showed us just a simple call to function. But we are curious how it did the math, what to train, and how it can changes values of the weights. ABC

    • @statquest
      @statquest  3 місяці тому

      Every single weight and bias in a neural network is trained with backpropagation. To learn more about how this process works, see: ua-cam.com/video/IN2XmBhILt4/v-deo.html ua-cam.com/video/iyn2zdALii8/v-deo.html and ua-cam.com/video/GKZoOHXGcLo/v-deo.html

    • @yosimadsu2189
      @yosimadsu2189 3 місяці тому

      @@statquest Since both QVK Weights are splitted and the calculations are passing non neural network, imho the back propagation process is quite tricky. In the other hand, the fit function did not tell the order of calculations on each nodes.

  • @김정헌-i8r
    @김정헌-i8r 3 місяці тому +2

    GTP :)

  • @nossonweissman
    @nossonweissman 3 місяці тому +2

    BAM!!

  • @paslaid
    @paslaid 2 місяці тому +1

    🎉

  • @sidnath7336
    @sidnath7336 2 місяці тому

    Awesome video!
    Maybe we can have a part 2 where we incorporate multi-head attention? 👌🏽
    And then could make this a series on different decoder models and how they differ e.g., mistral uses RoPE and sliding window attention etc…

    • @statquest
      @statquest  2 місяці тому +1

      If you look at the code you'll see how to to create multi-headed attention: github.com/StatQuest/decoder_transformer_from_scratch

  • @BooleanDisorder
    @BooleanDisorder 2 місяці тому

    I have imported a torch. Do I light it now?

  • @Mạnhfefe
    @Mạnhfefe 3 місяці тому +1

    thank you sm fr bro

  • @keeperofthelight9681
    @keeperofthelight9681 3 місяці тому

    Sir can you include how to make the chatbot to hold a conversation with

    • @statquest
      @statquest  3 місяці тому

      I'll keep that in mind.

  • @zeroonetwothree1298
    @zeroonetwothree1298 3 місяці тому +1

    Legend.

  • @ckq
    @ckq 3 місяці тому +1

    GTP

  • @أحمدأكرمعامر
    @أحمدأكرمعامر 3 місяці тому +1

    Baaaam!❤

  • @zendr0
    @zendr0 3 місяці тому +1

    Bam!

  • @gustavojuantorena
    @gustavojuantorena 3 місяці тому +1

    🎉🎉🎉

  • @lamlamnguyen7093
    @lamlamnguyen7093 2 місяці тому

    Damnn bro 😮😮😮😮

  • @frommarkham424
    @frommarkham424 Місяць тому +1

    ARTIFICIAL NEURAL NETWORKS ARE AWESOMEEEEEEEEEE🔥🔥🔥🔥🦾🦾🦾🗣🗣🗣🗣💯💯💯💯

  • @readthefuckingmanual
    @readthefuckingmanual 2 місяці тому +1

    Awesome video!!!