34. Databricks - Spark: Data Skew Optimization

Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong

75. Databricks | Pyspark | Performance Optimization - Bucketing

ВОТ ПОЧЕМУ Япония живет в будущем 🤫 Утилизация масла #япония #токио #путешествия #shorts

When you lose control of your Waboba Moon Ball. @TheWabobaTeam #wabobapartner

Перший наступ КНДРівців

How to handle Data skewness in Apache Spark using Key Salting Technique

Tech Island

Переглядів 28 116

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 24 гру 2024

КОМЕНТАРІ • 27

@gurumoorthysivakolunthu9878 Рік тому ⁺³
Hi Sir... Perfect Great Explanation... Thank you for your effort...
I have a doubt :--
After joining The Salting step should be - unsalted and then grouped by has to be applied, Right...?
.....
@SpiritOfIndiaaa 2 роки тому ⁺¹
Thanks but if we have multiple columns as KEY how to handle it ?
@deepanshusingh6057 Місяць тому
I would have appreciated if you would have run the code of salting and showed us on spark UI for better clarity what is happening internally within spark
@vijeandran 4 роки тому
Amazing video.... How can we use the salting technique in PySpark for data skew?
@rishigc 4 роки тому ⁺¹
amazing video.. however, i don't know scala. So can you please give an example on how to implement the salting technique with Spark SQL queries ? that'll be of great help..
@jeevanmadhur3732 4 роки тому
Will update SQL query
@ashwinc9867 4 роки тому
@@jeevanmadhur3732 waiting for the query
@balajia8376 3 роки тому
@@ashwinc9867 did you get it?
@gautamyadav-cx7zx 2 роки тому
Well, I must say, thanks a lot.....have been searching for this kind of explaination.
@pariksheetde4573 4 роки тому ⁺⁴
Excellent. Thank you
@someshchandra007 3 роки тому
This really great and crystal clear explanations....thanks a lot for sharing and spreading knowledge!
@ashwinc9867 4 роки тому ⁺¹
Excellent video..thanks for the explanation and sharing the code
@savage_su 2 роки тому
Good work, its better you show the ourput after the salting dataframes and explain udf more detail.
@joeturkington1304 3 роки тому
Excellent Description
@akashhudge5735 2 роки тому
but the join output will not be correct because in previous scenario it would have joined with all the matching ids but with new salting method it will join with only newly slated key, that's weird
@shwetanandwani9059 2 роки тому
Hey great video, could you also link the associated resources you referred to while making this video?
@thomashass1 2 роки тому
I have 2 questions:
First one: I think that is wrong on your visual presentation of table 2 after salting. Why don't you have z_2 und z_3 there? Also why are you using capital letters sometimes, that's confusing.
Secone question: I don't get the benefit of Key Salting in general. How is this different from broadcasting you second table? Because you explode it and then you will end up with sending the whole table to every executor anyway? No one can give an answer to this question.
@chetansp912 4 роки тому ⁺¹
Amazing video..!!
@arunsundar3739 8 місяців тому
beautifully explained, thank you very much :)
@aravindkumar4411 4 роки тому
Can u please explain how to take the random number count
@jeevanmadhur3732 4 роки тому ⁺¹
Hi Aravind, If I understand your question correctly you wanted to take the first data frame count where we are appending a random number
var df1 = leftTable
.withColumn(leftCol, concat(
leftTable.col(leftCol), lit("_"), lit(floor(rand(123456) * 10))))
We can simply do
df1.select(col("id")).count()
This should give the count of the first data frame column
For more details, you can refer below git link
github.com/gjeevanm/SparkDataSkewness/blob/master/src/main/scala/com/gjeevan/DataSkew/RemoveDataSkew.scala
@MahmoudHanafy1992 3 роки тому ⁺¹
Great Explanation, Thanks for sharing this.
I think there is off by 1 error.
You are using (0 to 3) which will have (0, 1, 2, 3)
but random number range will be (0, 1, 2)
@soumyadipdas1406 4 роки тому ⁺¹
amazing sir! thanks a lot
@NishaKumari-op2ek 4 роки тому
Hi, are you missing something in code ?? I used your code but its throwing an exception for the below code of lines
//join after elminating data skewness
df3.join(
df4,
df3.col("id") df4.col("id")
)
.show(100,false)
}
@jeevanmadhur3732 4 роки тому ⁺¹
Hi,
Thanks for highlighting, there is small issue with checked-in join code which I fixed now. Please pull latest code and try out
@NishaKumari-op2ek 4 роки тому ⁺²
@@jeevanmadhur3732 Thank you Jeevan. your videos helps us a lot :)
@tanushreenagar3116 2 роки тому
best

Наступне

Автоматичне відтворення

34. Databricks - Spark: Data Skew Optimization

34. Databricks - Spark: Data Skew Optimization

Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong

Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong

75. Databricks | Pyspark | Performance Optimization - Bucketing

75. Databricks | Pyspark | Performance Optimization - Bucketing

ВОТ ПОЧЕМУ Япония живет в будущем 🤫 Утилизация масла #япония #токио #путешествия #shorts

ВОТ ПОЧЕМУ Япония живет в будущем 🤫 Утилизация масла #япония #токио #путешествия #shorts

When you lose control of your Waboba Moon Ball. @TheWabobaTeam #wabobapartner

When you lose control of your Waboba Moon Ball. @TheWabobaTeam #wabobapartner

Перший наступ КНДРівців

Перший наступ КНДРівців

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

Spark Parallelism using JDBC similar to Sqoop

Spark Parallelism using JDBC similar to Sqoop

Spark Performance Tuning | Handling DATA Skewness | Interview Question

Spark Performance Tuning | Handling DATA Skewness | Interview Question

Advancing Spark - Understanding the Spark UI

Advancing Spark - Understanding the Spark UI

How Salting Can Reduce Data Skew By 99%

How Salting Can Reduce Data Skew By 99%

24 Fix Skewness and Spillage with Salting in Spark | Salting Technique | How to identify Skewness

24 Fix Skewness and Spillage with Salting in Spark | Salting Technique | How to identify Skewness

How to Read Spark DAGs | Rock the JVM

How to Read Spark DAGs | Rock the JVM

Spark 3.0 Features | Adaptive Query Execution(AQE) | Part 1 - Optimizing SKEW Joins

Spark 3.0 Features | Adaptive Query Execution(AQE) | Part 1 - Optimizing SKEW Joins

Data Caching in Apache Spark | Optimizing performance using Caching | When and when not to cache

Data Caching in Apache Spark | Optimizing performance using Caching | When and when not to cache

Salting in Apache Spark - Part I

Salting in Apache Spark - Part I

МІША ЛЕБІГА і АНДРІЙ ЛУЗАН в СРАЧІ #32

МІША ЛЕБІГА і АНДРІЙ ЛУЗАН в СРАЧІ #32

ВОТ ПОЧЕМУ Япония живет в будущем 🤫 Утилизация масла #япония #токио #путешествия #shorts

ВОТ ПОЧЕМУ Япония живет в будущем 🤫 Утилизация масла #япония #токио #путешествия #shorts

Разобрался голыми руками 😎 #start #кино #фильм #сериал #молотведьм #полиция #пацаны

Разобрался голыми руками 😎 #start #кино #фильм #сериал #молотведьм #полиция #пацаны

The Security Guard Fell Into The Trap Of The Beauty #still #parkour #funny#skate

The Security Guard Fell Into The Trap Of The Beauty #still #parkour #funny#skate

Гениальное изобретение из обычного стаканчика!

Гениальное изобретение из обычного стаканчика!

Нельзя смеяться | Смех с водой | 97 #shorts

Нельзя смеяться | Смех с водой | 97 #shorts

Unexpected way to open the new Audi A6 e-tron Frunk 😮! #shorts

Unexpected way to open the new Audi A6 e-tron Frunk 😮! #shorts

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts