Enfabrica's Approach to Solving IO Scaling Challenges in Accelerated Compute with Networking Silicon

Поділитися
Вставка
  • Опубліковано 19 вер 2024
  • Enfabrica, under the leadership of Rochan Sankar, has developed a novel solution to address the I/O scaling challenges in accelerated compute clusters by leveraging networking silicon. Their approach, termed the Accelerated Compute Fabric (ACF), refactors the traditional endpoint attachment to accelerators. Instead of using a single RDMA NIC for each accelerator, Enfabrica's solution employs a fully connected I/O hub that integrates the functionalities of a PCI switch, an array of NICs, and a network switch into a single device. This ACF card connects to a scalable compute surface on one side and a scalable network surface on the other, facilitating high port density and efficient data movement.
    The ACF architecture aims to eliminate inefficiencies in the current system where GPUs communicate through multiple layers of PCI switches and NICs to scale out. By collapsing these layers into a single, more efficient system, Enfabrica's solution reduces the number of memory copies and improves burst bandwidth to GPUs, thereby enhancing overall compute efficiency. The ACF device supports both scale-up and scale-out interfaces, allowing it to handle memory reads and writes directly into memory spaces and communicate packets over long distances. This design is particularly beneficial for AI workloads, which require rapid and efficient data movement across large compute clusters.
    Enfabrica's ACF device is designed to be compatible with existing programming models and protocols, ensuring seamless integration into current data center architectures. The device supports standard PCIe and CXL interfaces, and its programmability allows for flexible transport and congestion control. By integrating multiple NICs and a crossbar switch within a single chip, the ACF device offers enhanced resiliency and load balancing capabilities. This innovative approach not only addresses the immediate scaling challenges faced by AI and accelerated computing workloads but also positions Enfabrica as a key player in the evolving landscape of data center architecture.
    Recorded live in San Francisco, California on September 13, 2024 as part of AI Field Day 5. Watch the entire presentation at techfieldday.c... or visit TechFieldDay.c... or Enfabrica.net for more information.

КОМЕНТАРІ •