Data Architecture Elevator Episode 4 - Privacy

React Tutorial For Beginners [ReactJS] | ReactJS Course | ReactJS For Beginners | Intellipaat

Dr. Paul Conti: How to Understand & Assess Your Mental Health | Huberman Lab Guest Series

Пропагандисти з РФ поглузували зі свого ж ПІДБИТОГО ТАНКА

РУЧКА (смешное видео, юмор, приколы, поржать, вайны)

😱 БЕЗУМЦЫ! РФ впервые АТАКОВАЛА Украину МЕЖКОНТИНЕНТАЛЬНОЙ баллистической ракетой #shorts

Foundations of Large Language Models: Under-the-hood of the Transformer • Talk @ SDSU • Nov 12, 2024

Aman Chadha

Переглядів 375

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 24 лис 2024
"Foundations of Large Language Models: Under-the-hood of the Transformer Architecture" • Invited Talk at San Diego State University (‪@SDSU‬) • November 12, 2024
• Relevant Primers:
transformer.ama...
llm.aman.ai
• Overview: The talk covered the foundational principles of Large Language Models (LLMs), focusing on the Transformer architecture and its key components, including embeddings, positional encoding, self- and cross-attention mechanisms, skip connections, token sampling, and the roles of the encoder and decoder, explaining how these innovations enable efficient and context-aware language processing.
• Agenda:
➜ Transformer Overview:
Scaled dot-product attention and multi-head mechanisms for parallel processing and contextual understanding.
Handles long-range dependencies and enables parallel computation for efficient training.
➜ Input Embeddings
Embeddings reduce the dimensionality of input data, projecting words into a lower-dimensional space where similar words are closer.
Enables generalization across words with similar meanings, significantly reducing the model's parameters and required training data.
➜ Positional Encoding
Absolute positional encoding uses sinusoidal functions to encode positions, enabling models to infer token order.
Rotary Positional Embeddings (RoPE) combine absolute and relative positional benefits for long-sequence handling.
➜ Self-Attention
Maps query, key, and value vectors derived from the same sequence to calculate token relationships.
Enables dynamic weighting of token relevance, creating contextualized embeddings in parallel.
➜ Cross Attention
Bridges encoder and decoder stacks by using encoder outputs as keys and values, with decoder queries steering generation.
Essential for tasks like translation, where the target sequence depends on the source sequence.
➜ Skip/Residual Connections
Prevent gradient vanishing and ensure original input retention by adding inputs back into outputs of layers.
Improves gradient flow, avoids forgetting input tokens, and enhances training stability.
➜ Token Sampling
Converts dense de-embedded outputs into probabilities using softmax for token prediction.
Techniques like temperature scaling or top-k sampling refine the generation diversity and quality.
➜ Encoder
Stacks of self-attention and feed-forward layers encode input sequences into fixed-size representations.
Processes input tokens bidirectionally to capture complete contextual relationships.
➜ Decoder
Uses causal self-attention to ensure autoregressive token generation.
Integrates cross-attention to incorporate encoder outputs and generate coherent outputs token-by-token.
• Relevant Papers:
➜ Transformer Overview
Attention Is All You Need: arxiv.org/abs/...
➜ Input Embeddings
Efficient Estimation of Word Representations in Vector Space (Word2Vec): arxiv.org/abs/...
GloVe: Global Vectors for Word Representation: aclanthology.o...
fastText: Enriching Word Vectors with Subword Information: arxiv.org/abs/...
➜ Positional Encoding
Attention Is All You Need (Original Sinusoidal Positional Encoding): arxiv.org/abs/...
Self-Attention with Relative Position Representations: arxiv.org/abs/...
RoFormer: Enhanced Transformer with Rotary Position Embedding: arxiv.org/abs/...
➜ Self-Attention
Attention Is All You Need: arxiv.org/abs/...
Neural Machine Translation by Jointly Learning to Align and Translate (Additive Attention): arxiv.org/abs/...
➜ Cross Attention
Attention Is All You Need (Encoder-Decoder Attention): arxiv.org/abs/...
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation: aclanthology.o...
Skip/Residual Connections
Deep Residual Learning for Image Recognition (ResNet): arxiv.org/abs/...
Attention Is All You Need (Skip Connections in Transformers): arxiv.org/abs/...
➜ Token Sampling
Categorical Reparameterization with Gumbel-Softmax (Sampling methods): arxiv.org/abs/...
Decoding Strategies for Neural Machine Translation: aclanthology.o...
➜ Encoder
Attention Is All You Need (Encoder Architecture): arxiv.org/abs/...
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding: arxiv.org/abs/...
➜ Decoder
Attention Is All You Need (Decoder Architecture): arxiv.org/abs/...
Language Models are Few-Shot Learners (Autoregressive Decoding in GPT-3): arxiv.org/abs/...

КОМЕНТАРІ •

Наступне

Автоматичне відтворення

Data Architecture Elevator Episode 4 - Privacy

Data Architecture Elevator Episode 4 - Privacy

React Tutorial For Beginners [ReactJS] | ReactJS Course | ReactJS For Beginners | Intellipaat

React Tutorial For Beginners [ReactJS] | ReactJS Course | ReactJS For Beginners | Intellipaat

Dr. Paul Conti: How to Understand & Assess Your Mental Health | Huberman Lab Guest Series

Dr. Paul Conti: How to Understand & Assess Your Mental Health | Huberman Lab Guest Series

Пропагандисти з РФ поглузували зі свого ж ПІДБИТОГО ТАНКА

Пропагандисти з РФ поглузували зі свого ж ПІДБИТОГО ТАНКА

РУЧКА (смешное видео, юмор, приколы, поржать, вайны)

РУЧКА (смешное видео, юмор, приколы, поржать, вайны)

😱 БЕЗУМЦЫ! РФ впервые АТАКОВАЛА Украину МЕЖКОНТИНЕНТАЛЬНОЙ баллистической ракетой #shorts

😱 БЕЗУМЦЫ! РФ впервые АТАКОВАЛА Украину МЕЖКОНТИНЕНТАЛЬНОЙ баллистической ракетой #shorts

Молодой боец приземлил легенду!

Молодой боец приземлил легенду!

CompTIA Network+ Certification Video Course

CompTIA Network+ Certification Video Course

How Deep Neural Networks Work - Full Course for Beginners

How Deep Neural Networks Work - Full Course for Beginners

Gregory Aldrete: The Roman Empire - Rise and Fall of Ancient Rome | Lex Fridman Podcast #443

Gregory Aldrete: The Roman Empire - Rise and Fall of Ancient Rome | Lex Fridman Podcast #443

Tim Ferriss: How to Learn Better & Create Your Best Future | Huberman Lab Podcast

Tim Ferriss: How to Learn Better & Create Your Best Future | Huberman Lab Podcast

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

Wayne Dyer: The Most Powerful Life Philosophy That Will Change Your Mindset Forever!

Wayne Dyer: The Most Powerful Life Philosophy That Will Change Your Mindset Forever!

GEOMETRIC DEEP LEARNING BLUEPRINT

GEOMETRIC DEEP LEARNING BLUEPRINT

A. Datta & D. Datta - Building machine learning pipelines that scale | PyData NYC 2024

A. Datta & D. Datta - Building machine learning pipelines that scale | PyData NYC 2024

Deep Learning: A Crash Course (2018) | SIGGRAPH Courses

Deep Learning: A Crash Course (2018) | SIGGRAPH Courses

From Small To Giant 0%🍫 VS 100%🍫 #katebrush #shorts #gummy

From Small To Giant 0%🍫 VS 100%🍫 #katebrush #shorts #gummy

I am the angry pumpkin 🎃 #plantsvszombies #pvz #animation #laurashigihara #pvz2 #videogames #cartoon

I am the angry pumpkin 🎃 #plantsvszombies #pvz #animation #laurashigihara #pvz2 #videogames #cartoon

Холостяк 13 - Випуск 1 від 01.11.2024 | ПРЕМ’ЄРА

Холостяк 13 – Випуск 1 від 01.11.2024 | ПРЕМ’ЄРА

Players push long pins through a cardboard box attempting to pop the balloon!

Players push long pins through a cardboard box attempting to pop the balloon!

Самый быстрый НОКАУТ в ИСТОРИИ бокса. Даже Тайсон на ТАКОЕ не способен #shorts

Самый быстрый НОКАУТ в ИСТОРИИ бокса. Даже Тайсон на ТАКОЕ не способен #shorts

BD556+ Smoke Silencer.Who needs this for Christmas? #toys #gelblasters #gelblasterguns #airsoft

BD556+ Smoke Silencer.Who needs this for Christmas? #toys #gelblasters #gelblasterguns #airsoft

Incredibox Sprunki vs Inside Out 2 - Which team rescues the mermaid AnythingAlexia? #shorts

Incredibox Sprunki vs Inside Out 2 - Which team rescues the mermaid AnythingAlexia? #shorts

Когда муж не доверяет жене @Oscar_elteacher

Когда муж не доверяет жене @Oscar_elteacher