Enable GPU-Acceleration Without Worrying About Managing Device Drivers - Christopher Desiniotis

Поділитися
Вставка
  • Опубліковано 21 жов 2024
  • Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon North America in Salt Lake City from November 12 - 15, 2024. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at kubecon.io
    Enable GPU-Acceleration Without Worrying About Managing Device Drivers - Christopher Desiniotis, NVIDIA
    As more AI / ML workloads are deployed on Kubernetes, hardware accelerators, like GPUs, become increasingly important. Device drivers provide the foundational support for enabling such accelerators and this new class of applications. However, managing the lifecycle of device drivers at scale presents a unique set of challenges to cluster administrators. First, upgrading device drivers is a disruptive Day 2 operation that requires special care in order to minimize application downtime and prevent long running applications, e.g. batch training jobs, from losing their work. Second, it is oftentimes necessary to maintain multiple driver versions in the same cluster to support varying hardware configurations and application requirements. This talk will demonstrate an operator-based approach for tackling these challenges and making management of device drivers seamless in Kubernetes.

КОМЕНТАРІ •