Broadcast joins in Apache Spark | Rock the JVM

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules Damji

How to Read Spark DAGs | Rock the JVM

Правильный подход к детям

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

Repartition vs Coalesce in Apache Spark | Rock the JVM

Rock the JVM

Переглядів 5 082

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 січ 2025

КОМЕНТАРІ • 13

@subimalkhatua2886 2 роки тому ⁺²
Coalesce outperform most of the cases . In one of my project i was dealing with skewed data and required the data to compact it into one single partition for down stream application and from there further to redshift now problem arises when I used coalesce instead of repartition I see 1 hr job took 1.45 hrs due to uneven distribution . Job was stuck for straight 45 mins as i checked from the DAG . I went to the documentation found out coalesce assign same number of compute nodes with the partition number what i meant by that is it will basically assign same number of compute node at work with same number of partition which you require and eventually will drastically reduce parallelism . Repartition Does things in evenly manner just because it follows round robin fashion of sending data in sequentially across the partitions. So using repartition it reduced to 8 mins from 45 mins now this is massive .
@heenagirdher6443 2 роки тому ⁺¹
Great Explanation. Could you please create more videos on spark.
@rockthejvm 2 роки тому
Will do!
@satyadevanwubhayavedantapu4860 3 роки тому ⁺²
Thank you!
How do we determine number of repartitions or coalesce?
numbers.repartition(n) or numbers.coalesce(n) - is there any calculation that can be done to come up with the certain number suitable for the operation?
@rockthejvm 3 роки тому
There is no one perfect number - this depends on the shape of your data and what you want to do with it.
@prasadvenkataramasatyanand5559 4 роки тому
Thank you. But what are all the scenarios we go for either repartition or coalesce? Plz explain
@seanxhuo 4 роки тому ⁺³
There are many use cases where repartition is a better choice. When you have a large data set and complex operation other than count, calling coalesce will not be able to take advantage of parallelism, etc only a single task is launched and thus can take far longer to finish.
whereas repartition will be able to run in parallel per number of partitions, and be much faster. As a matter of fact, if coalesce is the last step of the pipeline, the whole pipeline is running in a single task. Be aware!
@rockthejvm 4 роки тому
Indeed, that's not to say that coalesce is always better. We'll do a deeper dive into the tradeoffs in a future video.
@SriniVasan-ml6we 4 роки тому ⁺²
Thanks a lot Sir, your videos pulls me off from Java and python to scala👍.. could you please spend some time to create a video on how to add dependencies in build. Sbt
@rockthejvm 4 роки тому ⁺¹
Will do - there's a lot of content coming soon!
@clasomblog8881 3 роки тому
We can not increase the number of partitions using Coalesce. @Rock the JVM
@rockthejvm 3 роки тому
Yes you can, and in that case it's the same as a repartition.
Fun fact: repartition is implemented in terms of coalesce.

Наступне

Автоматичне відтворення

Broadcast joins in Apache Spark | Rock the JVM

Broadcast joins in Apache Spark | Rock the JVM

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules Damji

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules Damji

How to Read Spark DAGs | Rock the JVM

How to Read Spark DAGs | Rock the JVM

Правильный подход к детям

Правильный подход к детям

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

1% vs 100% #beatbox #tiktok

1% vs 100% #beatbox #tiktok

Algebraic Data Types (ADT) in Scala | Rock the JVM

Algebraic Data Types (ADT) in Scala | Rock the JVM

How to Read Spark Query Plans | Rock the JVM

How to Read Spark Query Plans | Rock the JVM

Top 5 Mistakes When Writing Spark Applications

Top 5 Mistakes When Writing Spark Applications

Spark Join and shuffle | Understanding the Internals of Spark Join | How Spark Shuffle works

Spark Join and shuffle | Understanding the Internals of Spark Join | How Spark Shuffle works

Comparing Kafka Streams, Akka Streams and Spark Streaming: what to use when | Rock the JVM

Comparing Kafka Streams, Akka Streams and Spark Streaming: what to use when | Rock the JVM

Lessons From the Field: Applying Best Practices to Your Apache Spark Applications - Silvio Fiorito

Lessons From the Field: Applying Best Practices to Your Apache Spark Applications - Silvio Fiorito

ALL the Apache Spark DataFrame Joins | Rock the JVM

ALL the Apache Spark DataFrame Joins | Rock the JVM

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ! Новогодний выпуск!

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ! Новогодний выпуск!

Прочистка шлюзов

Прочистка шлюзов

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

до конца, там самая счастливая табалапка🐾🐾 #тикток #табалапка

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

Рождение Немецкой Легенды - Mercedes 190E 2.3-16

How to treat Acne💉

How to treat Acne💉

Правильный подход к детям

Правильный подход к детям

КТО НЕ ДВИНЕТСЯ, ПОЛУЧИТ МАШИНУ!

КТО НЕ ДВИНЕТСЯ, ПОЛУЧИТ МАШИНУ!