LLM Jargons Explained: Part 3 - Sliding Window Attention
Вставка
- Опубліковано 15 вер 2024
- In this video, I thoroughly explore Sliding Window Attention (SWA), a technique employed to train Large Language Models (LLMs) effectively on longer documents. This concept was extensively discussed in the Longformer paper and has also been recently utilized by Mistral 7B, leading to reduced computational costs.
_______________________________________________________
💡 Longformer: arxiv.org/abs/...
💡 Mistral 7B: arxiv.org/abs/...
💡 NLP with Transformers: amzn.to/4aNpSaW
💡 Attention Is All You Need: arxiv.org/abs/...
_______________________________________________________
Follow me on:
👉🏻 Linkedin: / sachinkalsi
👉🏻 Twitter: / sachin_kalsi
👉🏻 GitHub: github.com/Sac...
0:28 what is the name of this parameter, the input token limit, phi_1,2?