The complete Transformer Neural Network Code in 300 lines!

CodeEmporium

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 15 тра 2024
Code for the transformer neural network. This is the architecture for the "Attention is all you need" paper. And this forms the core of ChatGPT and other Large Language models today
ABOUT ME
⭕ Subscribe: ua-cam.com/users/CodeEmporiu...
📚 Medium Blog: / dataemporium
💻 Github: github.com/ajhalthor
👔 LinkedIn: / ajay-halthor-477974bb
RESOURCES
[1] Code for the video: github.com/ajhalthor/Transfor...
PLAYLISTS FROM MY CHANNEL
⭕ Transformers from scratch playlist: • Self Attention in Tran...
⭕ ChatGPT Playlist of all other videos: • ChatGPT
⭕ Transformer Neural Networks: • Natural Language Proce...
⭕ Convolutional Neural Networks: • Convolution Neural Net...
⭕ The Math You Should Know : • The Math You Should Know
⭕ Probability Theory for Machine Learning: • Probability Theory for...
⭕ Coding Machine Learning: • Code Machine Learning
MATH COURSES (7 day free trial)
📕 Mathematics for Machine Learning: imp.i384100.net/MathML
📕 Calculus: imp.i384100.net/Calculus
📕 Statistics for Data Science: imp.i384100.net/AdvancedStati...
📕 Bayesian Statistics: imp.i384100.net/BayesianStati...
📕 Linear Algebra: imp.i384100.net/LinearAlgebra
📕 Probability: imp.i384100.net/Probability
OTHER RELATED COURSES (7 day free trial)
📕 ⭐ Deep Learning Specialization: imp.i384100.net/Deep-Learning
📕 Python for Everybody: imp.i384100.net/python
📕 MLOps Course: imp.i384100.net/MLOps
📕 Natural Language Processing (NLP): imp.i384100.net/NLP
📕 Machine Learning in Production: imp.i384100.net/MLProduction
📕 Data Science Specialization: imp.i384100.net/DataScience
📕 Tensorflow: imp.i384100.net/Tensorflow

КОМЕНТАРІ • 38

@CodeEmporium Рік тому ⁺⁶
If you think I deserve it, please give this video a like and subscribe for more :)
@rpraver1 5 місяців тому ⁺³
Best playlist on transformers I have viewed to date, bar none...
@CodeEmporium 5 місяців тому
Thanks so much for the kind words! I really appreciate it
@vtrandal 12 днів тому ⁺¹
Thanks! Excellent videos!
@jematos92 8 місяців тому ⁺²
Thank you so much for the detailed explanation, Ajay, I'm really enjoying these videos!!!
QQ - I noticed you included the embeddings layers with positional encoding inside the Decoder and Encoder class. Is this a design a choice bc you are only training with 1 decoder and 1 encoder block? Or is this embedding required in all the blocks if you decide to stack more than 1 encoder/decoder blocks?
@dataflex4440 Рік тому ⁺⁷
God Bless this man for putting so much of Hard work and dedication and creating so high quality content.
@CodeEmporium Рік тому
Thanks so much for the kind words. Hope you enjoy the other videos too :)
@lawrencemacquarienousagi789 Рік тому ⁺²
Man, you are doing a great job
@CodeEmporium Рік тому
Glad you like this!
@arpitsingh9198 9 місяців тому ⁺¹
hi Ajay, thanks for the code walkthrough, I wanted to ask if its really necessary to create a forward method in the transformer class, you could have created encoder, decoder and an output probability function as a seperate torch module and it would have been easy during inference as encoder is called ONCE whereas only the auto regressive decoder can be called in a while loop during inference saving a lot of computation time ?
@LuizHenrique-qr3lt Рік тому ⁺²
Great vídeo!!
@CodeEmporium Рік тому
Thank you for watching! Super glad
@3HWay 11 місяців тому
Please give a video(s) on Tabular Transformer (TabTransformer) and how to use it in best practice. Thanks.
@DLwithShreyas Рік тому ⁺¹
Great video Ajay !! I have one such project for English - Hinglish translation, but it is in Keras. Would love to see your implementation of this with Kannada.
@CodeEmporium Рік тому
English Hindi translation? So you want an English to Kannada implementation in Keras?
@DLwithShreyas Рік тому
@@CodeEmporium No no I just meant to say, it would be great to see your approach to NMT using torch
@DLwithShreyas Рік тому
@@CodeEmporium Hinglish meaning Hindi language in English text example : "Aaj mene Trasformer padha !"
@CodeEmporium Рік тому ⁺²
@@DLwithShreyas hmm. I think the technical term for this is “transliteration”. Also you should be able to convert Hindi text to this transliterated Hindi text (Hinglish) via a simple algorithm with no machine learning.
If you have an English - Hindi dataset, you can use a script to convert Hindi to Hinglish. Then use the transformer to train a translator to convert English to Hinglish directly.
@convolutionalnn2582 Рік тому ⁺¹
In the code of Positional encoding,on the final class, position is 1 to max sequence length....Which include both even and odd...I think we use cos for odd and sin for even..Why all the position are pass to both sin and cos which mean 1 to max sequence length including even are pass in cos and including odd are pass in sin.
@CodeEmporium Рік тому
Because it likely would destroy the final word embedding. These position encodings are added to the word vectors. If we were to add monotonically increasing values (vector of 0s for 1st word, vector of 1s for 2nd word and so on), then the word vectors would only be able to see words around it at best. And we wouldn’t be able to see high attention values between 2 words that are further apart from each other.
If you are wondering “why”, it is because 2 words will have high attention value if the vectors are similar (that is the multiplication of vectors yields large values). But adding monotonically increasing position encodings even to similar words can destroy the word vector embedding if those words are sufficiently far apart. The sine/cosine functions on the other hand will always be between a fixed range regardless of how many words (or characters in my case) there are in the sentence. Hence attention is good for very long sequences and is also scalable.
This is primarily why I think we use sine/cosine function for the position encoder. If you have other thoughts, do let me know
@convolutionalnn2582 Рік тому
@@CodeEmporium So we should also pass odd index in the sin function and even index in the cos function....As you put all the index in both sin and cos function for positional embeddings....Most video explain that only odd index must be pass on cos function and only even index must be pass on sin function...
@CodeEmporium Рік тому
In my video on position encoding, I rewrote the equation given in the attention is all you need paper. Their implementation and my own are the same thing. It just might look a lil different is all
@rupamjyotidas736 Рік тому
Hi, thanks for your videos. Can you please tell me what will be the main difference if i am building a text generator not a translator. Such as blog generation task.
What should be the deocder encoder input while training in that context
@CodeEmporium Рік тому
When building a text generator, you can use BERT, GPT which are just the encoder or just the decoder respectively. I should have individual videos discussing each in my playlist called “Language Models” if you’re interested. Thanks for watching :)
@dataflex4440 Рік тому
I guess than can be done by just applying common sense by understanding these transformer series
@languagemodeler Рік тому
🙏🙏🙏
@vinaynalluri277 Місяць тому
I have a one question where I have to detect the indic language which is romanised script. can you help me in this
@lawrencemacquarienousagi789 Рік тому
Thanks
@CodeEmporium Рік тому
Thanks for the super thanks and the support !
@dev0_018 Рік тому
could u make a video or reply me on how long it took u to get all the knowledge u have now in terms of hours ?
how did u get your source of knowledge and did u gain experience once u learn things, was it by applying on toy projects ? and if they were what were the toy projects ?
in what industry does your knowledge can be used ? can it be used in search engines, trading industry, image recognition etc...
shortly could you make a video entangling how exactly you got here from where you knew about nothing.
@CodeEmporium Рік тому ⁺¹
I think this is a good question for a video. But to make sure you don’t wait forever, here is a lil answer. I started getting into Machine Learning in 2016 (about 7 years now). Start very application side, but applied on actual projects. My first big project I took up was Speech to Text for Kannada. It was very code heavy but I also understood the difficulty and fun of collecting data, reading up on language to increase my domain knowledge, packaging code into an application apart from the actual model itself (I used Hidden Markov Models at the time). I have a computer science background. So starting this way was the most approachable.
Next is to read research papers. Start with a simple google search of “state of the art language model papers”, pull the first paper up, read it. I found it very hard to understand anything, but kept pushing through. “How did they even come up with this” I used to think. But in the paper, they improve on previous work, so you can take a look at that and the cycle repeats. This is hard at first, but gets much easier the more papers you read.
Next is math. I learned the bulk of my fundamentals in grad school at the time (2017-2019). But you don’t necessarily have to go to school for this. These days, there are a ton of free resources (though it can be difficult to piece together). My early videos try to dive into the math if you’re curious.
The industry I work is currently e-commerce. Machine learning is useful for recommendation systems, predictive pricing, determining which users to market, time series models to forecast how many packages will get into the warehouse among so many others. And yes AI can be used in search engines (there are ranking algorithms for example), trading and image recognition (the entire field of computer vision is dedicated to this).
Hope this helps for now. I might make a video on this in the future
@CodeEmporium Рік тому ⁺¹
Also, you probably don’t need 7 years to learn everything. But I would by no means say “I know everything about everything” now. It’s a journey you need to take 1 step at a time :)
@dev0_018 Рік тому
@@CodeEmporium i appreciate this long reply.
it would be cool to see a video.
thanks man your videos are my go to whenever I'm stuck on something which u have made a video about.
@krishradha5709 10 місяців тому
Hey bro, if i want to add another layer of encoder and decoder for improving the performance what should i do...
@tommathew5148 10 місяців тому
just change the num_layers to 6 no?
@gileneusz Рік тому
0:38 that's actually the worst language choice. Nearly no one is familiar with that exotic language. It's confusing and very bad choice for this educational videos.
@CodeEmporium Рік тому ⁺²
The world isn’t European. It’s better to teach with languages you’re comfortable in as it’s easy to validate example translations. I don’t know French, German or Spanish enough to validate examples
@user-ji2om8gy2m 11 місяців тому
@@CodeEmporium you get the point, french to English is what consider as common, but i guess if you can't speak French frequently, that is no reason to do so.

Наступне

Автоматичне відтворення