RoPE Rotary Position Embedding to 100K context length
Вставка
- Опубліковано 7 лют 2025
- ROPE - Rotary Position Embedding explained in simple terms for calculating the self attention in Transformers with a relative position encoding for extended Context lengths of LLMs.
All rights w/ authors:
ROFORMER: ENHANCED TRANSFORMER WITH ROTARY POSITION EMBEDDING (RoPE)
arxiv.org/pdf/...
#airesearch
#aiexplained
Best rope explanation ever
Thanks!
first llm related yt channel that doesn't suck, thanks !
Amazing content! The explanations are SO clear! Thank you!
Thank you SO MUCH for providing such high quality conten! Very much enjoying all your many videos! If you have a chance, I'd love to see you discuss the recent work in giving AI spatial reasoning. I.e. artificial "imagination". (In it's natural form, very much a core feature of human thought.) Perhaps one might think about the creation of a "right brain" to go along with the "left brain" language models we have now? (Please forgive the over-simplification of human neuroscience.) Thanks again! All the best to you sincerely!
Nice explanation
Can u give some talk on how can we use these techniques for existing model some notebook walkthrough
Hey
Really good video. Is part 2 out?
Hi Sir! Really love your videos! How can we access your presentation slides?
Does training with FP16 precision on this PI setup provide the same beneficial properties?
is there any where i can find an example to increasing the ctx length of model which previously used positional embeddings and then its changed to rotatory and then finetuned to work on longer sequences
If one rotation is good, how about going into three dimensional rotations and using quaternions? Is there any work using that?
there is a mistake, smaller dimensions change more quickly,and large dimensions change more slowly
Maybe it's obvious to YOU that the solution is that complex exponential, but I wish you hadn't assumed that WE would all see that as self-evident as you do.
I see what you mean. You know, I spend some days to find simple explanations for the not so self explanatory RoPE algo, especially I will build on this in my second video, and then we examine more complex, more recent ideas about RoPE. I decided for a way, that will enable my audience to understand the main ideas and methods, and go from there. I recorded 90 min for the second part, and currently I cut it to max 60 min, striking a balance of providing insights for all my viewers. I'll try harder ....