what is small files problem in spark , How to Fix It in delta lake #optimize #delta #smallfilesissue

Delta Lake Deep Dive: Liquid Clustering

Delta Live Tables A to Z: Best Practices for Modern Data Pipelines

СКАНДАЛЬНЫЙ бой Али, когда в ринге ему противостояли сразу ДВОЕ #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

Liquid Clustering in Databricks,What It is and How to Use,

TechLake

Переглядів 11 837

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 17 січ 2025

КОМЕНТАРІ • 24

@TRRaveendra Рік тому ⁺¹
You can find the notebook in below github location :
github.com/raveendratal/PysparkRaveendra/blob/master/Liquid%20Clustering.ipynb
@ajaykiranchundi9979 Рік тому ⁺³
Thanks Ravi! Great explanation
@TRRaveendra Рік тому
Thank you 🙏
@arunsundar3739 2 місяці тому
liquid clustering concepts nicely explained with practical examples, using databricks dataset helps to easily follow along , comparison with hive-style partitioning & zorder techniques helps to see the performance impact of liquid clustering, thank you for sharing :)
@2007mnkumar Рік тому ⁺³
What a great explanation. Ravi, Day by day the value of your presentations goes higher and higher. It would be greate, If you can share Notebook also.
@TRRaveendra Рік тому ⁺¹
github.com/raveendratal/PysparkRaveendra/blob/master/Liquid%20Clustering.ipynb
@jeetash1 Рік тому ⁺³
First table created using partitionBy on origin and filtering on dayofWeek = 1 and in second table you clustered by "dayofWeek" and filter on dayofWeek = 1 then it will obliviously take more time in case of partition table. I agree it will create files based on total number partitions and it would skip more files to read if table created using partitionBy dayofWeek and add filter on same column.
@TRRaveendra Рік тому ⁺¹
Partition by is not good for small tables
The old approach was partition and Optimize with Zorder By .
Instead of partition By
We can use cluster By
Then we can apply optimize.
No need to use partition By and Zorder By for less than 1TB tables.
@dipalisabale6302 Рік тому
Cluster by is alternate to partition by and z ordering and recommended table size to implement partition &z orderis 1TB .
So does this conclude that we should not apply liquid clustering for table less than 1TB size ?
@oussemakeskes6275 6 місяців тому ⁺¹
totally agree with @jeetash1. if you want to correctly compare and benchmark partitionBy and clusteredby you should use same column otherwise that comparison doesn't make sense. if you created using partitionBy on dayofWeek and filtering on dayofWeek = 1 and in second table you clustered by "origin " and filter on dayofWeek = 1 partitionby will take less time
@padmavathichalla3952 Місяць тому
@@TRRaveendra If this comparision is between hive partitining + Z order and clustering , then the keys for clustering should be (origin, dayofweek) right? (ref: official documentation: use liquid clustering for delta tables)
@saimanideepallu5743 Рік тому ⁺¹
I want to have personalized training from you. Could you please let me know about it please ?
@arunr2265 Рік тому ⁺¹
Hi Ravi, Is your cluster photon acceleration enabled.
@TRRaveendra Рік тому
No, optimize was executed without photon cluster.
@gokulakrishnansoundararaja2835 Рік тому ⁺¹
Sir, Please share the code and also dataset to practice .
@TRRaveendra Рік тому
github.com/raveendratal/PysparkRaveendra/blob/master/Liquid%20Clustering.ipynb
@rajeshr4145 Рік тому ⁺¹
Hi Ravi,
This video was of great use. I have one question. Is it possible to convert an existing table with partitioned having data to liquid cluster? If so can you please suggest the steps?
@TRRaveendra Рік тому
as of now you can use only SQL Table DDL for liquid clustering like while creating a table using SQL CREATE TABLE Table_name(col...) cluster by (col1,col2.)
after that you can alter a table for changing cluster by columns. using alter table ....
@ajaykiranchundi9979 Рік тому
Hello Rajesh,
Did you find an answer ? Did you try directly applying the clustering on the existing table ? was about to try it on one of the tables at my end.
@maheshrathi2608 Рік тому
@TRRaveendra can u share the dataset link please
@TRRaveendra Рік тому
It’s 📌 pinned in comments
Verify the link
@udaybalerao4816 Рік тому
thank you Sir! One question - will liquid clustering be same as Z order for NON Partitioned table?
@januaymagori4642 Рік тому
On partition by why not using coalesce during writing so you can have few files
@PrashantSamant-wp5yl Рік тому
On implementing liquid clustering, when I call desc detail table table name, I see clustering columns..but when I insert data to liquid clustering table using dataframe.write ND then execute same desc detail table, clustering columns are lost.i ran optimize but no use.i have datBricks runtime 13.2

Наступне

Автоматичне відтворення

what is small files problem in spark , How to Fix It in delta lake #optimize #delta #smallfilesissue

what is small files problem in spark , How to Fix It in delta lake #optimize #delta #smallfilesissue

Delta Lake Deep Dive: Liquid Clustering

Delta Lake Deep Dive: Liquid Clustering

Delta Live Tables A to Z: Best Practices for Modern Data Pipelines

Delta Live Tables A to Z: Best Practices for Modern Data Pipelines

СКАНДАЛЬНЫЙ бой Али, когда в ринге ему противостояли сразу ДВОЕ #shorts

СКАНДАЛЬНЫЙ бой Али, когда в ринге ему противостояли сразу ДВОЕ #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

ПРАНК НАД БОЯРСКИМ | КОНФЛИКТ НА ДОРОГЕ

ПРАНК НАД БОЯРСКИМ | КОНФЛИКТ НА ДОРОГЕ

Databricks Tutorial 20 Azure Data Engineering Architecture ADF + Databricks #DatabricksETL #AzureETL

Databricks Tutorial 20 Azure Data Engineering Architecture ADF + Databricks #DatabricksETL #AzureETL

44. How to convert json string to struct using from_json | #pyspark PART44

44. How to convert json string to struct using from_json | #pyspark PART44

Optimizing MERGE Performance using Liquid Clustering

Optimizing MERGE Performance using Liquid Clustering

Common Strategies for Improving Performance on Your Delta Lakehouse

Common Strategies for Improving Performance on Your Delta Lakehouse

The Parquet Format and Performance Optimization Opportunities Boudewijn Braams (Databricks)

The Parquet Format and Performance Optimization Opportunities Boudewijn Braams (Databricks)

Advancing Spark - Give your Delta Lake a boost with Z-Ordering

Advancing Spark - Give your Delta Lake a boost with Z-Ordering

Databricks - Liquid Clustering Introduction

Databricks - Liquid Clustering Introduction

31 Delta Tables - Deletion Vectors and Liquid Clustering | Optimize Delta Tables | Delta Clustering

31 Delta Tables - Deletion Vectors and Liquid Clustering | Optimize Delta Tables | Delta Clustering

Pyspark Scenarios 18 : How to Handle Bad Data in pyspark dataframe using pyspark schema #pyspark

Pyspark Scenarios 18 : How to Handle Bad Data in pyspark dataframe using pyspark schema #pyspark

TOY STORY IN BRAWL STARS!?

TOY STORY IN BRAWL STARS!?

🔥"СВОшник" РОЗНОСИТЬ шоу путіністів! Ведучий ШОКОВАНИЙ від цих СЛІВ #shorts

🔥"СВОшник" РОЗНОСИТЬ шоу путіністів! Ведучий ШОКОВАНИЙ від цих СЛІВ #shorts

МІША ЛЕБІГА і АНДРІЙ ЛУЗАН в СРАЧІ #32

МІША ЛЕБІГА і АНДРІЙ ЛУЗАН в СРАЧІ #32

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts

Гениальное изобретение из обычного стаканчика!

Гениальное изобретение из обычного стаканчика!

Комаровский. Когда конец войны, Трамп не поможет, потеря Украины, эмиграция, многоженство в Украине

Комаровский. Когда конец войны, Трамп не поможет, потеря Украины, эмиграция, многоженство в Украине

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

How Strong Is Tape?

How Strong Is Tape?