LongRoPE & Theta Scaling to 1 Mio Token (2/2)

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

СКОЛЬКО ИХ...?! #Shorts #Глент

⚡КОРЕЙЦІ ПРОТИ росіянок

Пилот обманул смерть ракета пролетела рядом с ним #shorts

RoPE Rotary Position Embedding to 100K context length

Discover AI

Переглядів 5 207

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 7 лют 2025
ROPE - Rotary Position Embedding explained in simple terms for calculating the self attention in Transformers with a relative position encoding for extended Context lengths of LLMs.
All rights w/ authors:
ROFORMER: ENHANCED TRANSFORMER WITH ROTARY POSITION EMBEDDING (RoPE)
arxiv.org/pdf/...
#airesearch
#aiexplained

КОМЕНТАРІ • 15

@xiaohuiwang9308 6 місяців тому ⁺³
Best rope explanation ever
@code4AI 6 місяців тому
Thanks!
@MrEmbrance 6 місяців тому
first llm related yt channel that doesn't suck, thanks !
@desmur36 8 місяців тому
Amazing content! The explanations are SO clear! Thank you!
@LamontCranston-qh2rv 8 місяців тому ⁺¹
Thank you SO MUCH for providing such high quality conten! Very much enjoying all your many videos! If you have a chance, I'd love to see you discuss the recent work in giving AI spatial reasoning. I.e. artificial "imagination". (In it's natural form, very much a core feature of human thought.) Perhaps one might think about the creation of a "right brain" to go along with the "left brain" language models we have now? (Please forgive the over-simplification of human neuroscience.) Thanks again! All the best to you sincerely!
@riyajatar1311 5 місяців тому ⁺¹
Nice explanation
Can u give some talk on how can we use these techniques for existing model some notebook walkthrough
@aflah7572 4 місяці тому
Hey
Really good video. Is part 2 out?
@AYUSHSINGH-db6ev 8 місяців тому
Hi Sir! Really love your videos! How can we access your presentation slides?
@GevorgMinasyan-g7c 4 місяці тому
Does training with FP16 precision on this PI setup provide the same beneficial properties?
@dumbol8126 6 місяців тому
is there any where i can find an example to increasing the ctx length of model which previously used positional embeddings and then its changed to rotatory and then finetuned to work on longer sequences
@mshonle 8 місяців тому
If one rotation is good, how about going into three dimensional rotations and using quaternions? Is there any work using that?
@hangjianyu 7 місяців тому
there is a mistake, smaller dimensions change more quickly,and large dimensions change more slowly
@paratracker 8 місяців тому ⁺¹
Maybe it's obvious to YOU that the solution is that complex exponential, but I wish you hadn't assumed that WE would all see that as self-evident as you do.
@code4AI 8 місяців тому ⁺¹⁰
I see what you mean. You know, I spend some days to find simple explanations for the not so self explanatory RoPE algo, especially I will build on this in my second video, and then we examine more complex, more recent ideas about RoPE. I decided for a way, that will enable my audience to understand the main ideas and methods, and go from there. I recorded 90 min for the second part, and currently I cut it to max 60 min, striking a balance of providing insights for all my viewers. I'll try harder ....

Наступне

Автоматичне відтворення

LongRoPE & Theta Scaling to 1 Mio Token (2/2)

LongRoPE & Theta Scaling to 1 Mio Token (2/2)

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

СКОЛЬКО ИХ...?! #Shorts #Глент

СКОЛЬКО ИХ...?! #Shorts #Глент

⚡КОРЕЙЦІ ПРОТИ росіянок

⚡КОРЕЙЦІ ПРОТИ росіянок

Пилот обманул смерть ракета пролетела рядом с ним #shorts

Пилот обманул смерть ракета пролетела рядом с ним #shorts

ГРАВИТАЦИЯ! ВЫЖИВАНИЕ на ЛЕТАЮЩЕМ ОСТРОВЕ(DDprod.) в РАСТ/RUST

ГРАВИТАЦИЯ! ВЫЖИВАНИЕ на ЛЕТАЮЩЕМ ОСТРОВЕ(DDprod.) в РАСТ/RUST

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Rotary Position Embedding explained deeply (w/ code)

Rotary Position Embedding explained deeply (w/ code)

LLM - Reasoning SOLVED (new research)

LLM - Reasoning SOLVED (new research)

Topology DSPy: Prompting the Swarm (Multi-Agents)

Topology DSPy: Prompting the Swarm (Multi-Agents)

Rotary Positional Embeddings: Combining Absolute and Relative

Rotary Positional Embeddings: Combining Absolute and Relative

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

In-Context Learning: EXTREME vs Fine-Tuning, RAG

In-Context Learning: EXTREME vs Fine-Tuning, RAG

AI DISASTER: Product DEMO by OpenAI - Deep Research

AI DISASTER: Product DEMO by OpenAI - Deep Research

Attention in transformers, step-by-step | DL6

Attention in transformers, step-by-step | DL6

How Strong Is Tape?

How Strong Is Tape?

"Бажано відбити посадку без втрат": військовий розповів, як загибель побратимів впливає на психіку

"Бажано відбити посадку без втрат": військовий розповів, як загибель побратимів впливає на психіку

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм

«Просив пробачення, що не уберіг Діму» - історія братів Василя Репчука і Дмитра Мурару #shorts

«Просив пробачення, що не уберіг Діму» — історія братів Василя Репчука і Дмитра Мурару #shorts

Уличный боец с ДУХОМ воина

Уличный боец с ДУХОМ воина

КТО НЕ ДВИНЕТСЯ, ПОЛУЧИТ МАШИНУ!

КТО НЕ ДВИНЕТСЯ, ПОЛУЧИТ МАШИНУ!

Гениальное изобретение из обычного стаканчика!

Гениальное изобретение из обычного стаканчика!