What are Transformer Models and How do they Work?

Cohere

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 31 тра 2024
This video is part of LLM University
docs.cohere.com/docs/transfor...
Transformers are a new development in machine learning that have been making a lot of noise lately. They are incredibly good at keeping track of context, and this is why the text that they write makes sense. In this blog post, we will go over their architecture and how they work.
Bio:
Luis Serrano is the lead of developer relations at Co:here. Previously he has been a research scientist and an educator in machine learning and quantum computing. Luis did his PhD in mathematics at the University of Michigan, before embarking to Silicon Valley to work at several companies like Google and Apple. Luis is the author of the Amazon best-seller "Grokking Machine Learning", where he explains machine learning in a clear and concise way, and he is the creator of the educational UA-cam channel "Serrano.Academy", with over 100K subscribers and 5M views.
===
Resources:
Blog post: txt.cohere.com/what-is-semant...
Learn more: / luisserrano
Neural Networks: • A friendly introductio...
Attention Models: • What is Attention in L...
Наука та технологія

КОМЕНТАРІ • 17

@jpvalois 3 місяці тому ⁺²
Thanks Luis! Your explanation hits just the right notes for me: no fluff, not too complex, well structured, logical, good rhythm. Excellent overall. I'll be checking out your other material. Merci beaucoup!
@LiveNobin 10 місяців тому ⁺⁴
I try on internet several times to understand the very basic of What are Transformer Models and How do they Work but not found very good explanations like this. Thank you for this very meaningful information.
@khanshovon День тому
Explained well and simply.
@Vitanzi Рік тому ⁺⁵
Great Explanation, thanks...
@paragjain12 7 місяців тому
indeed very clear explanation... that's how we need to teach convoluted concepts
@alagoumiri 20 днів тому
🎯 Key Takeaways for quick navigation:
00:00 *🤖 Introduction to Transformer models*
- Transformer models are key to recent advancements in NLP tasks like text generation and semantic search
- They can capture context better than previous models, which was a major challenge
01:02 *🧠 How previous neural network models worked*
- Input words were represented as 0/1 vectors
- Neural network tried to mimic patterns from data to predict next word
- But lacked understanding of overall context beyond a few words
02:13 *🌐 Transformer models capture context*
- Unlike neural nets, transformers can understand and generate text with coherent context
- They build up text output one word at a time based on the context
03:22 *🛠️ Architecture of Transformer models*
- Has components like embeddings, positional encoding, attention, feed-forward layers
- Attention is the key mechanism that allows capturing context
06:08 *⚖️ How attention works*
- Allows focusing on relevant words for context via "gravitational" pull
- Multi-headed attention uses multiple representations for richer context
09:51 *🎲 Softmax for probabilistic output*
- Converts scores to probabilities to get varied outputs
- Allows sampling different word choices instead of same answer always
10:59 *👨‍💻 Post-training for specific tasks*
- General pre-training data is not enough for targeted use cases
- Requires further training on curated question-answer and conversational data
- Allows specializing transformer for tasks like open-domain QA, chatbots etc.
Made with HARPA AI
@rollingstone1784 2 місяці тому
@cohere, @jpvalois: Excellent video however, there are some inaccuracies:
6:00 instead of "series of transformer blocks", should be "series of transformers" only? (or: attention block and feedforward block?). The description here says "three transformer layer"? The description also says "attention blocks".
09:00: the attention and feedforwarding blocks should be left to right and the arrows also left to right - this reflects the flow of the data better
09:20: should be "feedforwarding layer" instead of only "layer"?
09:40: is the first layer not an attention layer? And could an attention layer and a feedforwarding layer be combined to a transformer layer? (see "transformer blocks" on 06:00)
@junpyohong2132 Рік тому
Thanks! It's a really clean and straightforward explanation.
@sithembisodyubele1156 Рік тому ⁺¹
You are the best....great explanation
@user-eg8mt4im1i 6 місяців тому
Super clear thanks !!😊
@sithembisodyubele1156 Рік тому ⁺¹
How can I enrol on your courses? I need more of this ..especially autoencoders.
@mikecane 11 місяців тому
Thanks very much!
@scchouhansanjay Рік тому
This explanation contains only the transformer’s encoder part. Which is used in BERT. But GPT model usage only the decoders. Please let me know if I am wrong?
@bastabey2652 Рік тому ⁺¹
Bing chatGPT agrees with your statement
That is correct. BERT is a pre-trained transformer model that only uses the encoder part of the transformer architecture. BERT is designed for natural language understanding tasks, such as question answering, sentiment analysis, and named entity recognition. BERT can process both left and right context of a given word, and can handle both single-sentence and sentence-pair inputs.
GPT is another pre-trained transformer model that only uses the decoder part of the transformer architecture. GPT is designed for natural language generation tasks, such as text summarization, text completion, and text generation. GPT can only process the left context of a given word, and can only handle single-sentence inputs.
@user-er2uw8eu8m 8 місяців тому
Is this post training called prompt engineering or fine tuning?
@mobime6682 4 місяці тому
Described seemed more about local content than conversation threads.
@EnricoGolfettoMasella 7 місяців тому
I watched a video about transformers just before yours that made me literally dizzy 😝. After watching your video I no longer think this is UFO technology (not as much as before, let’s say)

Наступне

Автоматичне відтворення