Princeton Parallel Group
Princeton Parallel Group
  • 11
  • 16 097
LLMCompass ISCA 2024 Lightning Talk
Lightning talk for LLMCompass presented at ISCA 2024
Title: LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference
Authors: Hengrui Zhang, August Ning, Rohan Baskar Prabhakar, David Wentzlaff
Paper: parallel.princeton.edu/papers/isca24_llmcompass.pdf
Github: github.com/PrincetonUniversity/LLMCompass/
Abstract: The past year has witnessed the increasing popularity of Large Language Models (LLMs). Their unprecedented scale and associated high hardware cost have impeded their broader adoption, calling for efficient hardware designs. With the large hardware needed to simply run LLM inference, evaluating different hardware designs becomes a new bottleneck.
This work introduces LLMCompass, a hardware evaluation framework for LLM inference workloads. LLMCompass is fast, accurate, versatile, and able to describe and evaluate different hardware designs. LLMCompass includes a mapper to automatically find performance-optimal mapping and scheduling. It also incorporates an area-based cost model to help architects reason about their design choices.
Compared to real-world hardware, LLMCompass' estimated latency achieves an average 10.9% error rate across various operators with various input sizes and an average 4.1% error rate for LLM inference. With LLMCompass, simulating a 4-NVIDIA A100 GPU node running GPT-3 175B inference can be done within 16 minutes on commodity hardware, including 26,400 rounds of the mapper's parameter search.
With the aid of LLMCompass, this work draws architectural implications and explores new cost-effective hardware designs. By reducing the compute capability or replacing High Bandwidth Memory (HBM) with traditional DRAM, these new designs can achieve as much as 3.41x improvement in performance/cost compared to an NVIDIA A100, making them promising choices for democratizing LLMs.
Keywords: Large language model, performance model, area model, cost model, accelerator
Acknowledgments: We would like to thank Qixuan (Maki) Yu, Zhongming Yu, Haiyue Ma, Yanghui Ou, Christopher Batten, and the entire Princeton Parallel Group, for their feedback, suggestions, and encouragement. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-2039656, the National Science Foundation under Grant No. CCF-1822949, Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) under agreement No. FA8650-18-2-7862. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) or the U.S. Government.
Переглядів: 76

Відео

OpenPiton + Ariane Tutorial Part 7: FPGA Prototyping
Переглядів 3584 роки тому
OpenPiton with RISC-V Cores - A Hands-On Tutorial with the Open Source Manycore Processor Part 7: FPGA Prototyping Presented at MICRO 52 on October 12, 2019 in Columbus, OH Slides can be found at parallel.princeton.edu/openpiton More at openpiton.org and pulp-platform.org
OpenPiton + Ariane Tutorial Part 6: Operating System and System Software
Переглядів 1684 роки тому
OpenPiton with RISC-V Cores - A Hands-On Tutorial with the Open Source Manycore Processor Part 6: Operating System and System Software Presented at MICRO 52 on October 12, 2019 in Columbus, OH Slides can be found at parallel.princeton.edu/openpiton More at openpiton.org and pulp-platform.org
OpenPiton + Ariane Tutorial Part 5: Extension Using P-Mesh NoCs
Переглядів 2044 роки тому
OpenPiton with RISC-V Cores - A Hands-On Tutorial with the Open Source Manycore Processor Part 5: Extension. Using P-Mesh NoCs Presented at MICRO 52 on October 12, 2019 in Columbus, OH Slides can be found at parallel.princeton.edu/openpiton More at openpiton.org and pulp-platform.org
OpenPiton + Ariane Tutorial Part 4: Configuration
Переглядів 2214 роки тому
OpenPiton with RISC-V Cores - A Hands-On Tutorial with the Open Source Manycore Processor Part 4: Configuration Presented at MICRO 52 on October 12, 2019 in Columbus, OH Slides can be found at parallel.princeton.edu/openpit More at openpiton.org and pulp-platform.org
OpenPiton + Ariane Tutorial Part 3: Simulating OpenPiton + Ariane RTL
Переглядів 4994 роки тому
OpenPiton with RISC-V Cores - A Hands-On Tutorial with the Open Source Manycore Processor Part 3: Simulating OpenPiton Ariane RTL Presented at MICRO 52 on October 12, 2019 in Columbus, OH Slides can be found at parallel.princeton.edu/openpiton More at openpiton.org and pulp-platform.org
OpenPiton + Ariane Tutorial Part 2: VM Setup and Code Tour
Переглядів 5694 роки тому
OpenPiton with RISC-V Cores - A Hands-On Tutorial with the Open Source Manycore Processor Part 2: VM Setup and Code Tour Presented at MICRO 52 on October 12, 2019 in Columbus, OH Slides can be found at parallel.princeton.edu/openpiton More at openpiton.org and pulp-platform.org
OpenPiton + Ariane Tutorial Part 1: Opening
Переглядів 1,3 тис.4 роки тому
OpenPiton with RISC-V Cores - A Hands-On Tutorial with the Open Source Manycore Processor Part 1: Opening Presented at MICRO 52 on October 12, 2019 in Columbus, OH Slides can be found at parallel.princeton.edu/openpiton More at openpiton.org and pulp-platform.org
Piton HPCA 2018 Lightning Talk
Переглядів 5156 років тому
This is the lightning talk for our HPCA 2018 paper entitled "Power and Energy Characterization of an Open Source 25-core Manycore Processor". To learn more, visit parallel.princeton.edu/openpiton/piton_power_char.html and follow @OpenPiton on Twitter. Publication info: Michael McKeown, Alexey Lavrov, Mohammad Shahrad, Paul Jackson, Yaosheng Fu, Jonathan Balkind, Tri Nguyen, Katie Lim, Yanqi Zho...
Piton Hello World demo
Переглядів 11 тис.7 років тому
Running HelloWorld on Piton processor. Find more at www.openpiton.org
OpenPiton Tetris Demo on Genesys 2
Переглядів 1,1 тис.7 років тому
Playing Tetris on the full stack Linux, running on the FPGA implementation of OpenPiton. Find more at www.openpiton.org