High Performance LLMs in Jax 2024 -- Session 8

Поділитися
Вставка
  • Опубліковано 28 тра 2024
  • Throughout this series of sessions, we will build an LLM from scratch in Jax, analyze its performance using the tools of roofline analysis and get it to achieve performance near the physical limits.
    In Session 8, we naively implement LLM inference and understand why we need the KV cache. (Additionally we will review a good profile and model-flop-utilization and briefly discuss Mixture-of-Experts.)
    github.com/rwitten/highperfll...

КОМЕНТАРІ •