FlightAware and Ray: Scaling Distributed XGBoost and Parallel Data Ingestion

Поділитися
Вставка
  • Опубліковано 6 жов 2024
  • At FlightAware, we collect vast amounts of data about aircraft in motion all around the globe. On our Predictive Technologies crew, we leveraged Ray and AWS for training a new runway prediction model in XGBoost. In this talk we'll discuss our use case that begins with a data lake of Parquet files in S3 containing the features of billions of examples and concludes with our cost-effective solution built on a scalable Ray cluster that can quickly shuttle terabytes of training data from S3 into distributed memory. We'll talk about the various components of building a distributed XGBoost training system and how Ray helped make this as seamless as possible. In particular, we'll share how we organized our training data, configured our fault-tolerant and elastic Ray cluster, leveraged Amazon Lustre for FSx filesystem for high-speed data loading, tracked real-time metrics and evaluation data in MLFlow, and along the way we'll also discuss some tips and tricks we learned throughout the process that can help keep your costs and and training time lower.
    Find the slide deck here: drive.google.c...
    About Anyscale
    ---
    Anyscale is the AI Application Platform for developing, running, and scaling AI.
    www.anyscale.com/
    If you're interested in a managed Ray service, check out:
    www.anyscale.c...
    About Ray
    ---
    Ray is the most popular open source framework for scaling and productionizing AI workloads. From Generative AI and LLMs to computer vision, Ray powers the world’s most ambitious AI workloads.
    docs.ray.io/en...
    #llm #machinelearning #ray #deeplearning #distributedsystems #python #genai

КОМЕНТАРІ •