1. exclude tables, --exclude-tables flag 2. splittable colums to split the partitons on intbased columns 3. repartion can increase and decrease the columns and does full shuffle. Coalesce avoids full shuffle and better for reduced number of partitions 4. If spark.default.parallelism is set, we'll use the value of SparkContext defaultParallelism as the default partitions number, otherwise we'll use the max number of upstream partitions. 5. I guess they are talking about DERBY server.
If we try to increase partitions using Coalesce it will only take the default value which is 200. Most people feel shuffle is bad and we should be using Coalesce more often than repartition. No! Spark works fine when the partitions sizes are even, Coalesce might give an uneven sized partitions and that would impact the performance of the job(not always!). It's a trade-off and we have chose depend on the need.
Hello Sir Can you please take my mock interview on the same ,if yes so please share your mail id where I can share my latest CV. Having total 11 years of IT experience & out of 11 having 4 years of big data & 7 years of Oracle pl SQL experience.
We need more like this. This was a great conversation. Thank you!
1. exclude tables, --exclude-tables flag
2. splittable colums to split the partitons on intbased columns
3. repartion can increase and decrease the columns and does full shuffle. Coalesce avoids full shuffle and better for reduced number of partitions
4. If spark.default.parallelism is set, we'll use the value of SparkContext defaultParallelism as the default partitions number, otherwise we'll use the max number of upstream partitions.
5. I guess they are talking about DERBY server.
Great interview
nice sir please do have more real time questions from daily activities what the candidate does
better for interview preparation.
Excellent.
Many thanks!
Very Helpful , Excellent
Glad to hear that
Very helpful
Thank you !
How to build jar files in spark and why?
Very nice.. good and fast paced
It would be great if u could post the answer of the question.
Is the preparation good enough for a 5 yr experienced guy?
If we try to increase partitions using Coalesce it will only take the default value which is 200.
Most people feel shuffle is bad and we should be using Coalesce more often than repartition. No!
Spark works fine when the partitions sizes are even, Coalesce might give an uneven sized partitions and that would impact the performance of the job(not always!).
It's a trade-off and we have chose depend on the need.
He was telling about spark actual memory usage toll in 9th min to 11min). Can someone tell toll the name.
Sysdig
Hello Sir Can you please take my mock interview on the same ,if yes so please share your mail id where I can share my latest CV. Having total 11 years of IT experience & out of 11 having 4 years of big data & 7 years of Oracle pl SQL experience.
Shareit2904@gmail.com
Hello sir can you please take my mock interview ...
Pls send your resume to shareit2904@gmail.com
How many years of experience the student has??