What are Transformer Models and How do they Work?
Вставка
- Опубліковано 31 тра 2024
- This video is part of LLM University
docs.cohere.com/docs/transfor...
Transformers are a new development in machine learning that have been making a lot of noise lately. They are incredibly good at keeping track of context, and this is why the text that they write makes sense. In this blog post, we will go over their architecture and how they work.
Bio:
Luis Serrano is the lead of developer relations at Co:here. Previously he has been a research scientist and an educator in machine learning and quantum computing. Luis did his PhD in mathematics at the University of Michigan, before embarking to Silicon Valley to work at several companies like Google and Apple. Luis is the author of the Amazon best-seller "Grokking Machine Learning", where he explains machine learning in a clear and concise way, and he is the creator of the educational UA-cam channel "Serrano.Academy", with over 100K subscribers and 5M views.
===
Resources:
Blog post: txt.cohere.com/what-is-semant...
Learn more: / luisserrano
Neural Networks: • A friendly introductio...
Attention Models: • What is Attention in L... - Наука та технологія
Thanks Luis! Your explanation hits just the right notes for me: no fluff, not too complex, well structured, logical, good rhythm. Excellent overall. I'll be checking out your other material. Merci beaucoup!
I try on internet several times to understand the very basic of What are Transformer Models and How do they Work but not found very good explanations like this. Thank you for this very meaningful information.
Explained well and simply.
Great Explanation, thanks...
indeed very clear explanation... that's how we need to teach convoluted concepts
🎯 Key Takeaways for quick navigation:
00:00 *🤖 Introduction to Transformer models*
- Transformer models are key to recent advancements in NLP tasks like text generation and semantic search
- They can capture context better than previous models, which was a major challenge
01:02 *🧠 How previous neural network models worked*
- Input words were represented as 0/1 vectors
- Neural network tried to mimic patterns from data to predict next word
- But lacked understanding of overall context beyond a few words
02:13 *🌐 Transformer models capture context*
- Unlike neural nets, transformers can understand and generate text with coherent context
- They build up text output one word at a time based on the context
03:22 *🛠️ Architecture of Transformer models*
- Has components like embeddings, positional encoding, attention, feed-forward layers
- Attention is the key mechanism that allows capturing context
06:08 *⚖️ How attention works*
- Allows focusing on relevant words for context via "gravitational" pull
- Multi-headed attention uses multiple representations for richer context
09:51 *🎲 Softmax for probabilistic output*
- Converts scores to probabilities to get varied outputs
- Allows sampling different word choices instead of same answer always
10:59 *👨💻 Post-training for specific tasks*
- General pre-training data is not enough for targeted use cases
- Requires further training on curated question-answer and conversational data
- Allows specializing transformer for tasks like open-domain QA, chatbots etc.
Made with HARPA AI
@cohere, @jpvalois: Excellent video however, there are some inaccuracies:
6:00 instead of "series of transformer blocks", should be "series of transformers" only? (or: attention block and feedforward block?). The description here says "three transformer layer"? The description also says "attention blocks".
09:00: the attention and feedforwarding blocks should be left to right and the arrows also left to right - this reflects the flow of the data better
09:20: should be "feedforwarding layer" instead of only "layer"?
09:40: is the first layer not an attention layer? And could an attention layer and a feedforwarding layer be combined to a transformer layer? (see "transformer blocks" on 06:00)
Thanks! It's a really clean and straightforward explanation.
You are the best....great explanation
Super clear thanks !!😊
How can I enrol on your courses? I need more of this ..especially autoencoders.
Thanks very much!
This explanation contains only the transformer’s encoder part. Which is used in BERT. But GPT model usage only the decoders. Please let me know if I am wrong?
Bing chatGPT agrees with your statement
That is correct. BERT is a pre-trained transformer model that only uses the encoder part of the transformer architecture. BERT is designed for natural language understanding tasks, such as question answering, sentiment analysis, and named entity recognition. BERT can process both left and right context of a given word, and can handle both single-sentence and sentence-pair inputs.
GPT is another pre-trained transformer model that only uses the decoder part of the transformer architecture. GPT is designed for natural language generation tasks, such as text summarization, text completion, and text generation. GPT can only process the left context of a given word, and can only handle single-sentence inputs.
Is this post training called prompt engineering or fine tuning?
Described seemed more about local content than conversation threads.
I watched a video about transformers just before yours that made me literally dizzy 😝. After watching your video I no longer think this is UFO technology (not as much as before, let’s say)