Top 5 Mistakes When Writing Spark Applications

Continuous Integration for Spark Apps

Apache Spark Core-Deep Dive-Proper Optimization Daniel Tomes Databricks

Какая Маня милая 😍

УРОВЕНЬ ЕГО ПОНТОВ ЗАШКАЛИВАЕТ! #shorts

"Москва - это правнучка Киева, Крым - это Украина" - Борис Миронов размазал крымнашистов @omtvreal

Top 5 Mistakes When Writing Spark Applications

Spark Summit

Переглядів 101 948

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 28 жов 2024

КОМЕНТАРІ • 39

@johnengelhart4453 8 років тому ⁺¹⁹
I would love to see an example of the salting side that is missing
@35sherminator 2 роки тому
Thanks for superbly breaking down the mistakes and their solutions. Thanks for the excellent presentation.
@Machin396 4 роки тому ⁺²
I am new to Spark and after viewing this presentation I see there's a lot to learn. I liked it a lot, thanks!
@支那湾倭好好贱 5 років тому ⁺¹
5 cores per executor did not work for us. For us, the best number is 3 for on-prem, 2 for EMR. Number larger than that gave us IO exception. You need to adjust case by case.
@kumarrohit8311 4 роки тому ⁺¹
Anyone noticed Sameer Farooqui clicking photos when QnA started?
Awesome guys, all of them!
@bensums 6 років тому ⁺²
At 6:21 it should say divide by 1 + 0.07 not multiply by 1 - 0.07. Also, on more recent versions of Spark it's gone up from 7% to 10%.
@35sherminator 2 роки тому
Absolutely agree, the division is correct.
@sankarasrikrishna 10 днів тому
Thanks for clarification.
@sahebbhattu 7 років тому ⁺²
Hi Mark, awesome explanation regarding exe and exe mem calculations. But this is for how can we use max number of cores or exe in the environment provide to achieve max parallelism . I would like to add one more point that if we are having so much memory load to deal with, we have to trade off number of exe\cores for executor memory. That means in the case of massive memory load we may have to go with lesser number of executers ( lesser than 17 exe) and keeping higher exe mem per exe ( more than 19 gb .....Please correct me if I am wrong...Thanks.
@JoHeN1990 4 роки тому ⁺¹
The data quality check article mentioned in 22:52 can be found here web.archive.org/web/20181116232422/blog.cloudera.com/blog/2015/07/how-to-do-data-quality-checks-using-apache-spark-dataframes/
@rangarajanrao1994 Рік тому
Excellent. Best wishes.
@VasileSurdu 6 років тому ⁺³
why can't they just let them speak and end their presentation for god's sake?? was it that big of a problem letting them finish their last 2 mistakes ? lol.. the last one (caching vs persisting) was very interesting
@sailpawar6164 Рік тому
damn 5 years ago...i absolutely loved the presentation
engaging is a difficult job..u did great
also
is it me or anyone else..these 2 faces looks too familiar by the time video ends
@PizdaRusni2023 2 роки тому
Great
@dtsleite 5 років тому ⁺²
What Cloudera knows about spark applications they dont even update their versions.
@madhavareddy3927 8 років тому
Thank you guys! Done a great job..
@popicf 7 років тому ⁺¹
but what to do if you have only 7 node cluster with 4 cores and 8GB ram?
@charlesli5809 8 років тому
awesome sharing, great thanks
@andriimed6408 5 років тому ⁺¹
it's awesome, thanks a lot!
@gounna1795 7 років тому ⁺³
Great topic, Great explanation!
@gauravkataria1 7 років тому
Thanks a lot. Very helpful!
@vinothsmart1 6 років тому
what was the tool he was talking about for Spark unit testing ?
@clayblythe 6 років тому
I think he said Junit
@kambaalayashwanth123 5 років тому
what about loading small files ?
@sakthivel021 5 років тому
what will be the solution of 2G Spark Shuffle size. ?
@veereshhosagoudar875 4 роки тому
Limit the partitions
@veereshhosagoudar875 4 роки тому
Resize the partion
@AlexanderWhillas 5 років тому ⁺¹
These are also the top reasons Spark is still relatively unpopular :-/
@Machin396 4 роки тому ⁺²
Really? I thought It was already popular in 2020. If not, what else is gaining attention instead?
@TheSmartTrendTrader 5 років тому
What is that special collection to do ETL?
@letscodewithvivek5191 3 роки тому
I have the same question..till now i have been doing etl using df only, never used any custom collections..
@rajjad 7 років тому
where are the slides?
@nguyen4so9 7 років тому
Very cool :) ..!
@CRTagadiya 8 років тому
awesome
@nakget 3 роки тому
How each node gets 3 executors at ua-cam.com/video/WyfHUNnMutg/v-deo.html ?
@StuggleIsSurreal 3 роки тому
Spark, by itself, is not intended to handle CPU-intensive operations on your data. If you have a process against the data that requires a lot of CPU or memory resources and/or is consuming CPU time, move that process into a microservice or competing consumer pattern. This problem will bog down your data handling and prevent you from using Spark effectively.
@MisterKhash 5 років тому
I can't understand what he is saying !!

Наступне

Автоматичне відтворення

Top 5 Mistakes When Writing Spark Applications

Top 5 Mistakes When Writing Spark Applications

Continuous Integration for Spark Apps

Continuous Integration for Spark Apps

Apache Spark Core-Deep Dive-Proper Optimization Daniel Tomes Databricks

Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks

Какая Маня милая 😍

Какая Маня милая 😍

УРОВЕНЬ ЕГО ПОНТОВ ЗАШКАЛИВАЕТ! #shorts

УРОВЕНЬ ЕГО ПОНТОВ ЗАШКАЛИВАЕТ! #shorts

"Москва - это правнучка Киева, Крым - это Украина" - Борис Миронов размазал крымнашистов @omtvreal

"Москва - это правнучка Киева, Крым - это Украина" - Борис Миронов размазал крымнашистов @omtvreal

Human vs Jet Engine

Human vs Jet Engine

Top 20 Apache Spark Interview Questions and Answers | Hadoop Interview Questions and Answers

Top 20 Apache Spark Interview Questions and Answers | Hadoop Interview Questions and Answers

Deep Dive: Apache Spark Memory Management

Deep Dive: Apache Spark Memory Management

When do you use threads?

When do you use threads?

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules Damji

A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules Damji

Advancing Spark - Understanding the Spark UI

Advancing Spark - Understanding the Spark UI

Get Rid of Traditional ETL, Move to Spark! (Bas Geerdink)

Get Rid of Traditional ETL, Move to Spark! (Bas Geerdink)

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Shuffling: What it is and why it's important

Shuffling: What it is and why it's important

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Michael Armbrust

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Michael Armbrust

Їжа Волині. Великий гід.

Їжа Волині. Великий гід.

🤯ЗАБИЛИ В САМОЕ ВЫСОКОЕ КОЛЬЦО В МИРЕ🏀 #shorts #баскетбол

🤯ЗАБИЛИ В САМОЕ ВЫСОКОЕ КОЛЬЦО В МИРЕ🏀 #shorts #баскетбол

УРОВЕНЬ ЕГО ПОНТОВ ЗАШКАЛИВАЕТ! #shorts

УРОВЕНЬ ЕГО ПОНТОВ ЗАШКАЛИВАЕТ! #shorts

TG: nexpertGM ОСНОВАЯ ПРОБЛЕМА РОТОРНОГО МОТОРА СССР #shorts #оживление #automobile #юмор

TG: nexpertGM ОСНОВАЯ ПРОБЛЕМА РОТОРНОГО МОТОРА СССР #shorts #оживление #automobile #юмор

«Легкий способ бросить курить»

«Легкий способ бросить курить»

Купил КЛОУНА на DEEP WEB !

Купил КЛОУНА на DEEP WEB !

Странная суперспособность вомбатов и новый тренд у шимпанзе

Странная суперспособность вомбатов и новый тренд у шимпанзе