Alluxio at Trino Fest - Trino optimization with distributed caching on data lake

Поділитися
Вставка
  • Опубліковано 15 жов 2024
  • As the adoption of Trino and other distributed compute engines grows, users often face challenges such as slow, inconsistent query performance and high data transfer costs when using cloud storage like S3. While caching is a typical solution, the existing cache in Trino has its own set of issues, like maintaining snapshot-level isolation with Iceberg tables and ensuring write-after-write consistency for metadata and data caching.
    In this talk, Beinan and Hope share the caching optimization for Trino using affinity scheduling and node-local caching effects on query latency. They discuss the distributed caching design with real-world examples of existing implementations of this design. You will learn:
    The challenges of data locality and query latency in Trino on data lakes
    How to address these challenges through segmented data file caching, soft-affinity scheduler policies, and cache filtering
    How Meta, Uber, and Tiktok have used caching to optimize interactive queries, maximize cache hit rates, cut cloud storage costs, and accelerate queries on Iceberg tables using the TPC-DS benchmark results
    Effective strategies for monitoring cache usage and working set size with comprehensive JMX metrics

КОМЕНТАРІ • 3