CUTLASS: A CUDA C++ Template Library for Accelerating Deep Learning... Aniket Shivam & Vijay Thakkar

Поділитися
Вставка
  • Опубліковано 31 тра 2023
  • CUTLASS: A CUDA C++ Template Library for Accelerating Deep Learning Computations - Aniket Shivam & Vijay Thakkar, NVIDIA
    At the core of Machine and Deep Learning lie different flavors of linear algebra computations like matrix multiply and convolutions. In the last decade, GPU computing solutions from NVIDIA have accelerated AI compute, with an overall gain of 50X to 200X via architectural innovations. While this has helped applications like ChatGPT and Github Copilot to become a reality, the developers have to learn to optimally utilize and customize GPU compute for their applications. In this talk we present CUTLASS, an open-source header-only CUDA C++ template library that has been helping programmers, since 2017, in implementing high-performance CUDA kernels across various generations of NVIDIA's GPU architectures. CUTLASS, which contains, optimized, production quality implementations of AI computations has been the go-to source for Tensor Core programming details. CUTLASS provides modular abstractions and building blocks to CUDA programmers who are eager to write their own CUDA C++ kernels to perform deep learning computations such as matrix multiplication, convolutions, etc. We expect audience members to gain actionable knowledge and insights about Tensor Core programming and in developing custom CUDA C++ kernels using CUTLASS that push the limits of performance on NVIDIA GPUs.
  • Наука та технологія

КОМЕНТАРІ •