Building Minimalistic Open Lakehouse w/ Open Source Projects Apache Spark™: Project Nessie & Iceberg

Поділитися
Вставка
  • Опубліковано 22 лип 2024
  • A Lakehouse architecture is a combination of various components such as storage, file format, table format, and catalog. What truly makes a lakehouse 'open' is data being stored in open source table and file formats like Iceberg, Delta and Parquet respectively, and the technology being open sourced for easy and quick adoption by the community. Like any new technology, implementation of a lakehouse may seem daunting at first. However, when we break down the architecture to its open components, this becomes easy to adopt and scale.
    Though this session, the idea is to help data engineers getting their leg into the world of data lakehouses, easily learn and implement it. We will go through a Notebook-style presentation to show beginners how to build a minimalistic functional lakehouse using Apache Spark, Project Nessie and Iceberg.
    In this session, we will cover:
    - Configuring the three different components
    - Creating tables from raw data files
    - Ingesting new data from various sources into the tables, querying it and making updates
    - Time travel, compaction, etc. capabilities
    Talk by: Dipankar Mazumdar
    Here’s more to explore:
    Rise of the Data Lakehouse: dbricks.co/3NHT7CD Lakehouse Fundamentals Training: dbricks.co/44ancQs
    Connect with us: Website: databricks.com
    Twitter: / databricks
    LinkedIn: / databricks
    Instagram: / databricksinc
    Facebook: / databricksinc
  • Наука та технологія

КОМЕНТАРІ •