Spark Performance Tuning | Performance Optimization | Interview Question

Spark Performance Tuning | EXECUTOR Tuning | Interview Question

75. Databricks | Pyspark | Performance Optimization - Bucketing

Теона Контридзе о подарке Chanel для дочери

Apple peeling hack

Люди в Курській області просять українську армію захистити їх від російської. ЕКСКЛЮЗИВ ТСН.Тижня

Spark Performance Tuning | Handling DATA Skewness | Interview Question

TechWithViresh

Переглядів 24 352

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 11 вер 2024

КОМЕНТАРІ • 34

@desparadoking8209 2 роки тому ⁺¹
Very informative video 👍🙂
@sahiba2227 3 роки тому ⁺³
7:24, repartition does full shuffling and hence creates equal size partitions. i.e It always guarantess the equal sized partitions.
@brothermalcolm 2 роки тому
Great video, like the pace, like the presentation.
@TechWithViresh Рік тому
Glad you liked it!
@AdityaKommu1 4 роки тому ⁺²
Hello,
Your videos are very good,
Can you please do a video on incremental data load and full data load by taking an example?
@vijeandran 3 роки тому ⁺¹
Hi Viresh, Thanks for the video.... How can we achieve salting technique in Pyspark?
@Dsmehra379 4 роки тому ⁺¹
thank you so much for this video, but i am not able to find the 2nd part of this video.. Can you please comment the link for the part 2 video
@ayanbizz 4 роки тому
Nice explanation.A couple of questions 1) Repartitioning does ensure the data distribution is not skewed (unlike coalesce) 2) You said repartitioning uses the hash value to distribute the data (are you talking about bucketing ?)
@TechWithViresh 4 роки тому
There are two provided partitioners in Spark 1. Hash partitioner and 2. Range partitioner.Default is Hash one.
@harshvardhansolanki1466 4 роки тому
If you repartition on column, there you can get skewed data. If you repartition by number of parts then distribution may be almost equal.
@vivekpuurkayastha1580 2 роки тому
If the partition key is non numeric then how to perform salting? like your tower ids were numeric, but if instead of being 1, 2, .. they are to be A, B, ...
@ramkumarananthapalli7151 2 роки тому ⁺¹
Thank you for making this video. Could you suggest on which column mean, medium and the mode are calculated?
@bentchow Рік тому
The columns are those that are being shuffled by such as the join columns or group by columns. There is data skew when the distribution is not normal.
@ajaywade9418 Рік тому
video from 11:30, we are adding random key to exiting towerid key
for Example. tower id: 101 and salt key : 67 then 101+67= 168 hash value of the 168 would be a final value right.
what in case of partition column is string datatype. ??
@TechWithViresh Рік тому
Incase of strings, we can add surrogate keys, based on string column values and then do the salting.
@rishigc 4 роки тому
@TechWithViresh I simply love your videos. I have watched your other tutorial videos too. They are awesome. I am interested in knowing how to do Iterative Broadcast Join with the SQL API. Any help is highly appreciated. Can you pls advise.
@Gecasomx 3 роки тому
Thanks for the video, no part 2 tho?
@udaynayak4788 2 роки тому
Hi Viresh, can you please share the link for part 2
@SpiritOfIndiaaa 4 роки тому
thank you so much , really good , so what is the difference b/w isolation salting and salting ? and what is difference b/w , isolation map join & map join ??
@thanoojbharateeyudu3786 3 роки тому
We could loose our key join by Salting key adding random numbers
If we want to do join with the same key then problem
May be join key could be the different on other than salted column
@harshvardhansolanki1466 4 роки тому
Thank you so much for the video. I seek some clarification though.
In your example you did mapPartition. Means for each partition of different keys, you updated the key with salt. But still the records remained in the respective partitions only. How will those records be shuffled across partitions for equal distribution?
@TechWithViresh 4 роки тому
Partition will change with the change in the key, as it is essentially the hascode of key+salt now.
@harshvardhansolanki1466 4 роки тому
@@TechWithViresh I tried it so I believe a new DF will have to be created and REPARTITIONED again! in order for the records to be shuffle by updated salted keys. It wont just trigger shuffle on key update in mapPartition function! That only makes sense.
@jaiharsad7121 4 роки тому
Hi sir, pls upload the spark interview question videos which were present earlier.. I'm not able to find them in your playlist
@TechWithViresh 4 роки тому
All the videos are uploaded, please check:)
@chilukapavan6344 4 роки тому
Awesome video 🙏...can you pls share part2 video link
@TechWithViresh 4 роки тому
Coming Soon!! , Thanks :)
@pareshpal3533 3 роки тому
@@TechWithViresh when ?
@rishigc 4 роки тому
where is Part 2 ?
@bhuneshwarsingh630 4 роки тому
please give some solid coding example with explaination
@aneksingh4496 4 роки тому
have u uploaded part 2 of this
@TechWithViresh 4 роки тому
Check out other videos in the playlist for performance optimization and executor tuning.
@Hk-eo5yr 4 роки тому
can u share part 2 video
@TechWithViresh 4 роки тому
Coming Soon!! , Thanks :)

Наступне

Автоматичне відтворення

Spark Performance Tuning | Performance Optimization | Interview Question

Spark Performance Tuning | Performance Optimization | Interview Question

Spark Performance Tuning | EXECUTOR Tuning | Interview Question

Spark Performance Tuning | EXECUTOR Tuning | Interview Question

75. Databricks | Pyspark | Performance Optimization - Bucketing

75. Databricks | Pyspark | Performance Optimization - Bucketing

Теона Контридзе о подарке Chanel для дочери

Теона Контридзе о подарке Chanel для дочери

Apple peeling hack

Apple peeling hack

Люди в Курській області просять українську армію захистити їх від російської. ЕКСКЛЮЗИВ ТСН.Тижня

Люди в Курській області просять українську армію захистити їх від російської. ЕКСКЛЮЗИВ ТСН.Тижня

Сбежать от Granny : Nuggets Gegagedigedagedago пытается удрать от страшной бабульки !

Сбежать от Granny : Nuggets Gegagedigedagedago пытается удрать от страшной бабульки !

Spark Join Without Shuffle | Spark Interview Question

Spark Join Without Shuffle | Spark Interview Question

Data Skew Drama? Not Anymore With Broadcast Joins & AQE

Data Skew Drama? Not Anymore With Broadcast Joins & AQE

How to handle Data skewness in Apache Spark using Key Salting Technique

How to handle Data skewness in Apache Spark using Key Salting Technique

24 Fix Skewness and Spillage with Salting in Spark

24 Fix Skewness and Spillage with Salting in Spark

Spark Interview Question | Bucketing | Spark SQL

Spark Interview Question | Bucketing | Spark SQL

Apache Spark 3 | New Feature | Performance Optimization | Dynamic Partition Pruning

Apache Spark 3 | New Feature | Performance Optimization | Dynamic Partition Pruning

Spark Data Skew

Spark Data Skew

Spark Performance Tuning | Avoid GroupBy | Interview Question

Spark Performance Tuning | Avoid GroupBy | Interview Question

Spark Performance Tuning | Memory Architecture | Interview Question

Spark Performance Tuning | Memory Architecture | Interview Question

Друг без машины #непосредственнокаха

Друг без машины #непосредственнокаха

Statue of Liberty Helps Blind Man Cross Road #shorts

Statue of Liberty Helps Blind Man Cross Road #shorts

Прощання з сімʼєю Базилевич у Льовові

Прощання з сімʼєю Базилевич у Льовові

В наш дом проникли неизвестные средь бела дня..😱🥷🏡

В наш дом проникли неизвестные средь бела дня..😱🥷🏡

Каха отправляет дочь в школу #непосредственнокаха

Каха отправляет дочь в школу #непосредственнокаха

Булли больше на улицу не выпускаем? 🌥 #симбочка #симба #булли

Булли больше на улицу не выпускаем? 🌥 #симбочка #симба #булли

IT'S MY LIFE + WATER #drumcover

IT'S MY LIFE + WATER #drumcover

Сказала дочке НЕТ!

Сказала дочке НЕТ!