Tabular at Trino Fest - CDC patterns in Apache Iceberg

How to CONVERT your LAKEHOUSE TABLE FORMAT using XTABLE | ICEBERG, DELTA LAKE & HUDI | End-To-End

Apache Iceberg on AWS with S3 and Athena [FULL COURSE IN 30MIN]

таба-лапка за миллион!? подпишись на тг «хей! это марьяна!» там больше секретов

🇺🇸‼️ЦЕ УЛЬТИМАТУМ? Трамп ВСЕ пояснив путіну! 🛡️🇪🇺 Буферна зона із солдатами ЄС? 🤯 Новини від Яніни

В ЧЬИХ РУКАХ СУДЬБА МИРА В УКРАИНЕ? БЕСЕДА С ВИТАЛИЙ ПОРТНИКОВ @portnikov.argumenty

How to MERGE your Database into a Data Lake on AWS | Change Data Capture | Apache Iceberg

Thomas Hass

Переглядів 632

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 19 лис 2024

КОМЕНТАРІ • 9

@raghuerumal 7 місяців тому ⁺¹
Good job Thomas ... Liked your demo and explanation. Please share blog with code snippets for lambda and glue job. Thank you
@DataMyselfAI 7 місяців тому ⁺³
Thank you for the positive feedback :) You can find the blog post with all code shown here: bit.ly/4aONz1M
@gatorpika 2 місяці тому
Watched your first two videos, liked and subscribed. Great stuff! I have never tried CDC as I am old skool batch, but the thing that always freaked me out was if I had to go back and reload from bronze because something happened to the related target in silver, seems I would always have to reload from the beginning with the first full load. With batch I could identify the time period that was effed up and just reload that. Is that a correct assumption and if so how is that normally handled in practice to avoid huge multiyear reloads? I am assuming the source data is gone due to shorter retention.
@DataMyselfAI 2 місяці тому
Thanks 🙏 Yeah you're right, production-ready, robust implementations of CDC can be a headache. That's why there are reliable, ready-to-use solutions like Delta Live Tables in Databricks that can handle it efficiently.
@PhaniBhushan-f5w 2 місяці тому
Can you please make a video on "Use a reusable ETL framework in your AWS lake house architecture" ?
@DataMyselfAI 2 місяці тому
I will put it on my list, you could use dbt for that or are you interested in an AWS native solution? :)
@PhaniBhushan-f5w 2 місяці тому
@@DataMyselfAI , here is the reference link : aws.amazon.com/blogs/architecture/use-a-reusable-etl-framework-in-your-aws-lake-house-architecture/
@ManishJindalmanisism 3 місяці тому
HI Thomas, I have one question on this. When you are creating hostsIncrementalInputDF in glue, every time you will read the full bronze table and then do clean/transformation over it. Will that not be waste of resources as table grows over time?
Should not this data frame only pick and process only those records from the bronze table which has changed or are new, since last run ?
@DataMyselfAI 3 місяці тому
Hi Manish, you are absolutely correct that this would be a waste of resources and incur unnecessary transformations.
That's why I activated Glue job bookmarks for the job, so that only new files are picked up compared to the last run. Also, this is more of a proof of concept. In a real scenario, we would need a more robust setup to ensure that everything works correctly, even if the job fails.

Наступне

Автоматичне відтворення

Tabular at Trino Fest - CDC patterns in Apache Iceberg

Tabular at Trino Fest - CDC patterns in Apache Iceberg

How to CONVERT your LAKEHOUSE TABLE FORMAT using XTABLE | ICEBERG, DELTA LAKE & HUDI | End-To-End

How to CONVERT your LAKEHOUSE TABLE FORMAT using XTABLE | ICEBERG, DELTA LAKE & HUDI | End-To-End

Apache Iceberg on AWS with S3 and Athena [FULL COURSE IN 30MIN]

Apache Iceberg on AWS with S3 and Athena [FULL COURSE IN 30MIN]

таба-лапка за миллион!? подпишись на тг «хей! это марьяна!» там больше секретов

таба-лапка за миллион!? подпишись на тг «хей! это марьяна!» там больше секретов

🇺🇸‼️ЦЕ УЛЬТИМАТУМ? Трамп ВСЕ пояснив путіну! 🛡️🇪🇺 Буферна зона із солдатами ЄС? 🤯 Новини від Яніни

🇺🇸‼️ЦЕ УЛЬТИМАТУМ? Трамп ВСЕ пояснив путіну! 🛡️🇪🇺 Буферна зона із солдатами ЄС? 🤯 Новини від Яніни

В ЧЬИХ РУКАХ СУДЬБА МИРА В УКРАИНЕ? БЕСЕДА С ВИТАЛИЙ ПОРТНИКОВ @portnikov.argumenty

В ЧЬИХ РУКАХ СУДЬБА МИРА В УКРАИНЕ? БЕСЕДА С ВИТАЛИЙ ПОРТНИКОВ @portnikov.argumenty

LOTS of PROMO CODES! #standoff #promocode

LOTS of PROMO CODES! #standoff #promocode

Data Lake Fundamentals, Apache Iceberg and Parquet in 60 minutes on DataExpert.io

Data Lake Fundamentals, Apache Iceberg and Parquet in 60 minutes on DataExpert.io

Microservices with Databases can be challenging...

Microservices with Databases can be challenging...

Building Data Lakes on AWS: Build a simple Data Lake on AWS with AWS Glue, Amazon Athena, and S3

Building Data Lakes on AWS: Build a simple Data Lake on AWS with AWS Glue, Amazon Athena, and S3

Set Up and Use Apache Iceberg Tables on Your Data Lake - AWS Virtual Workshop

Set Up and Use Apache Iceberg Tables on Your Data Lake - AWS Virtual Workshop

AWS Glue PySpark: Upserting Records into a Redshift Table

AWS Glue PySpark: Upserting Records into a Redshift Table

Change Data Capture + Event Driven Architecture

Change Data Capture + Event Driven Architecture

Top AWS Services A Data Engineer Should Know

Top AWS Services A Data Engineer Should Know

Building Minimalistic Open Lakehouse w/ Open Source Projects Apache Spark™: Project Nessie & Iceberg

Building Minimalistic Open Lakehouse w/ Open Source Projects Apache Spark™: Project Nessie & Iceberg

Using DMS and DLT for Change Data Capture

Using DMS and DLT for Change Data Capture

НОВЫЙ AMONG US в РЕАЛЬНОЙ ЖИЗНИ - Масленников, Егорик, Милана Хаметова, Супер Стас

НОВЫЙ AMONG US в РЕАЛЬНОЙ ЖИЗНИ - Масленников, Егорик, Милана Хаметова, Супер Стас

How To Choose Mac N Cheese Date Night.. 🧀

How To Choose Mac N Cheese Date Night.. 🧀

The IMPOSSIBLE Puzzle..

The IMPOSSIBLE Puzzle..

Incredibox Sprunki vs Inside Out 2 - Which team rescues the mermaid AnythingAlexia? #shorts

Incredibox Sprunki vs Inside Out 2 - Which team rescues the mermaid AnythingAlexia? #shorts

LOTS of PROMO CODES! #standoff #promocode

LOTS of PROMO CODES! #standoff #promocode

Его считали НЕПОБЕДИМЫМ, но Али доказал, что нет НИЧЕГО НЕВОЗМОЖНОГО #shorts

Его считали НЕПОБЕДИМЫМ, но Али доказал, что нет НИЧЕГО НЕВОЗМОЖНОГО #shorts

От первого лица: Школа 7😡ПОТЕРЯЛ ПАМЯТЬ 🤯 ПРИЗНАЛСЯ в ЛЮБВИ на СЦЕНЕ💔 СБИЛА МАШИНА ГЛАЗАМИ ШКОЛЬНИКА

От первого лица: Школа 7😡ПОТЕРЯЛ ПАМЯТЬ 🤯 ПРИЗНАЛСЯ в ЛЮБВИ на СЦЕНЕ💔 СБИЛА МАШИНА ГЛАЗАМИ ШКОЛЬНИКА

ПРЕМ'ЄРА! Неймовірний серіал! РЕВАНШ. 22 серія

ПРЕМ'ЄРА! Неймовірний серіал! РЕВАНШ. 22 серія