Large-scale cluster management at Google with Borg

Поділитися
Вставка
  • Опубліковано 16 вер 2024
  • Authors:
    Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes
    Abstract:
    Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines.
    It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation. It supports high-availability applications with runtime features that minimize fault-recovery time, and scheduling policies that reduce the probability of correlated failures. Borg simplifies life for its users by offering a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior.
    We present a summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it.
    ACM DL: dl.acm.org/cita...
    DOI: dx.doi.org/10.1...

КОМЕНТАРІ • 1

  • @skimdt1
    @skimdt1 9 років тому +2

    Does anyone know what a link shard is? I read the paper and it doesn't really explain what exactly a link shard is.