LLM Jargons Explained: Part 3 - Sliding Window Attention

Поділитися
Вставка
  • Опубліковано 15 вер 2024
  • In this video, I thoroughly explore Sliding Window Attention (SWA), a technique employed to train Large Language Models (LLMs) effectively on longer documents. This concept was extensively discussed in the Longformer paper and has also been recently utilized by Mistral 7B, leading to reduced computational costs.
    _______________________________________________________
    💡 Longformer: arxiv.org/abs/...
    💡 Mistral 7B: arxiv.org/abs/...
    💡 NLP with Transformers: amzn.to/4aNpSaW
    💡 Attention Is All You Need: arxiv.org/abs/...
    _______________________________________________________
    Follow me on:
    👉🏻 Linkedin: / sachinkalsi
    👉🏻 Twitter: / sachin_kalsi
    👉🏻 GitHub: github.com/Sac...

КОМЕНТАРІ • 1

  • @samson6707
    @samson6707 5 місяців тому

    0:28 what is the name of this parameter, the input token limit, phi_1,2?