Data Lakehouse workflow Apache Iceberg and Nessie | How Iceberg works | Nessie Branch & Merge

How to build on-premise Data Lake? | Build your own Data Lake | Open Source Tools | On-Premise

What is Apache Iceberg?

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Best Father #katebrush #shorts

💥ПОРТНИКОВ: кріт в Кремлі злив розмову путіна про ОХМАТДИТ! Цей удар - частина нової тактики рф

Create on premise Data Lakehouse with Apache Iceberg | Nessie | MinIO | Lakehouse

BI Insights Inc

Переглядів 4 055

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 22 лип 2024
In this video cover the data lakehouse. A data lake house is a concept that combines elements of both data lakes and data warehouses to bring us the best of both worlds. It aims to provide a unified platform for storing, managing, and analyzing both unstructured data and structured data.
What is Data Lake? aws.amazon.com/big-data/datal...
Link to GitHub repo: github.com/hnawaz007/pythonda...
Link to Data Lake Video:
On-premis: • How to build on-premis...
AWS: • How to create an AWS S...
💥Subscribe to our channel:
/ haqnawaz
📌 Links
-----------------------------------------
#️⃣ Follow me on social media! #️⃣
🔗 GitHub: github.com/hnawaz007
📸 Instagram: / bi_insights_inc
📝 LinkedIn: / haq-nawaz
🔗 / hnawaz100
-----------------------------------------
#dataanalytics #datalakehouse #opensource
Topics covered in this video:
==================================
0:00 - Introduction to Data Lakehouse
0:53 - Data Lakehouse prominent Features
1:50 - Data Lake from Previouse session
2:31 - Data Lakehouse Overview
3:34 - Tech Stack of on-premise Data Lakehouse
3:44 - Start Docker Containers
4:02 - MinIO (S3) Buckets, File(s) & Keys
4:56 - Configure Dremio
5:07 - Add MinIO (S3) Source
5:57 - Add Nessie Catalog
6:38 - Format File
7:33 - Create Iceberg Table
7:59 - Copy Data to Table
8:35 - SQL DML Operations
9:47 - Table History and Time Travel
10:29 - Coming Soon
Наука та технологія

КОМЕНТАРІ • 13

@BiInsightsInc 8 місяців тому
Link to to Data Lake Videos On-premis and AWS:
ua-cam.com/video/DLRiUs1EvhM/v-deo.html&t
ua-cam.com/video/KvtxdF7b_l8/v-deo.html
@hungnguyenthanh4101 8 місяців тому
Can you try with another project with deltalake,hive-metastore?
@andriifadieiev9757 8 місяців тому
Great video, thank you!
3 місяці тому
Amazing!
@hungnguyenthanh4101 8 місяців тому
very good!
@rafaelg8238 2 місяці тому ⁺²
great video, congrats.
If possible, bring an end-to-end architecture with streaming data ingested directly into the lakehouse.
also something related to the integration of datalake and datalakehouse.
@BiInsightsInc 2 місяці тому ⁺¹
That’s a great idea 💡. I will put something together that combines the data streaming and the data lake. This will give an end to end implementation.
3 місяці тому
Today I use apache Nifi to retrieve data from APIs, DBs and mariadb is my main DW. I've been testing dremio/nessie/minIO using docker-compose and I still have doubts about the best way to ingest data in Dremio. There are databases and APIs that cannot be connected directly to it. I tested sending parquet files directly to the storage, but the upsert/merge is very complicated and the jdbc connection with Nifi didn't help me either. What would you recommend for these cases?
@BiInsightsInc 3 місяці тому
Hi there, Dremio is a SQL Query Engine like Trino and Presto. You do not insert/ingest data in dremio directly. The S3 layer is where you store your data. Apache Iceberg provides the Lakehouse Management service (upsert/merge) for the objects in the catalog. I'd advise to handle upsert/merge in the catalog layer rather than S3, sole reason of the iceberg's presence in this stack. Here is an article on how to handle upsert using SQL.
medium.com/datamindedbe/upserting-data-using-spark-and-iceberg-9e7b957494cf
3 місяці тому
This is so insane. Is it also possible to query data from a specific versionstate directly instead of only the metadata? I am wondering if this would be suitable for bigger Datasets? Have you ever benchmarked this stack with a big Dataset? If the versioncontrol is scalable with bigger datasets and higher change frequency, this would be a crazy good solution to implement.
@BiInsightsInc 3 місяці тому ⁺¹
Yes, it is possible to query data using the specific snapshot id. We can time travel using the available snapshot id to view our Iceberg data from a different point in time, see Time Travel Queries. The processing of large dataset depends on your set up. If you have multiple node with enough ram/compute power than you can process large data. Or levrage a cloud cluster that you can scale up or down depening on your needs.
select count(*)
from s3.ctas.iceberg_blog
AT SNAPSHOT '4132119532727284872';
@nicky_rads 8 місяців тому
nice video! Data lakehouses offer a lot of functionality at an affordable price. It seems like dremio is the platform that allows you to aggregate all of these services together ? Could you go a little more in depth on some of the services.
@BiInsightsInc 8 місяців тому
Thanks. Yes, dremio engines brings various services together to offer data lake house functionality. I will be going over Iceberg and the project Nessie in the future.

Наступне

Автоматичне відтворення

Data Lakehouse workflow Apache Iceberg and Nessie | How Iceberg works | Nessie Branch & Merge

Data Lakehouse workflow Apache Iceberg and Nessie | How Iceberg works | Nessie Branch & Merge

How to build on-premise Data Lake? | Build your own Data Lake | Open Source Tools | On-Premise

How to build on-premise Data Lake? | Build your own Data Lake | Open Source Tools | On-Premise

What is Apache Iceberg?

What is Apache Iceberg?

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Best Father #katebrush #shorts

Best Father #katebrush #shorts

💥ПОРТНИКОВ: кріт в Кремлі злив розмову путіна про ОХМАТДИТ! Цей удар - частина нової тактики рф

💥ПОРТНИКОВ: кріт в Кремлі злив розмову путіна про ОХМАТДИТ! Цей удар - частина нової тактики рф

Україна - Італія: ПРЯМА ТРАНСЛЯЦІЯ, футбол / Євро-2024 U19 (фінальна стадія)

Україна – Італія: ПРЯМА ТРАНСЛЯЦІЯ, футбол / Євро-2024 U19 (фінальна стадія)

Data Warehouse vs Data Lake vs Data Lakehouse | What is the Difference? (2024)

Data Warehouse vs Data Lake vs Data Lakehouse | What is the Difference? (2024)

StarRocks on Open Data Lakehouse Tutorial: StarRocks + Apache Iceberg + MinIO

StarRocks on Open Data Lakehouse Tutorial: StarRocks + Apache Iceberg + MinIO

Building an ingestion architecture for Apache Iceberg

Building an ingestion architecture for Apache Iceberg

Why You Shouldn’t Care About Iceberg | Tabular

Why You Shouldn’t Care About Iceberg | Tabular

AWS re:Invent 2023 - Netflix’s journey to an Apache Iceberg-only data lake (NFX306)

AWS re:Invent 2023 - Netflix’s journey to an Apache Iceberg–only data lake (NFX306)

Solving one of PostgreSQL's biggest weaknesses.

Solving one of PostgreSQL's biggest weaknesses.

DataOps in action with Nessie, Iceberg and Great Expectations

DataOps in action with Nessie, Iceberg and Great Expectations

From Data Lake to Data Lakehouse (What, Why and How of Apache Iceberg/Dremio/Nessie Lakehouses)

From Data Lake to Data Lakehouse (What, Why and How of Apache Iceberg/Dremio/Nessie Lakehouses)

Fast results using Iceberg and Trino

Fast results using Iceberg and Trino

Samsung laughing on iPhone #techbyakram

Samsung laughing on iPhone #techbyakram

СТРАННАЯ КОМПЬЮТЕРНАЯ МЫШЬ КОЛЬЦО, ТАКОЙ ТЫ ТОЧНО НЕ ВИДЕЛ

СТРАННАЯ КОМПЬЮТЕРНАЯ МЫШЬ КОЛЬЦО, ТАКОЙ ТЫ ТОЧНО НЕ ВИДЕЛ

Подробный ремонт преобразователя напряжения 12-220V

Подробный ремонт преобразователя напряжения 12-220V

Собери ПК и Получи 10,000₽

Собери ПК и Получи 10,000₽

S24 Ultra and IPhone 14 Pro Max telephoto shooting comparison #shorts

S24 Ultra and IPhone 14 Pro Max telephoto shooting comparison #shorts

Лазер против камеры смартфона

Лазер против камеры смартфона

Анимация лого в After Effects для узбекских клиентов. Без Fiverr и UpWork

Анимация лого в After Effects для узбекских клиентов. Без Fiverr и UpWork

Choose a phone for your mom

Choose a phone for your mom