Narrow and Wide Transformations and Actions in Spark
Вставка
- Опубліковано 9 лют 2025
- If you need any guidance you can book time here, topmate.io/bha...
Follow me on Linkedin
/ bhawna-bedi-540398102
Instagram
www.instagram....
You can support my channel at: bhawnabedi15@okicici
Platform to use Spark efficiently!!!!!!
Databricks is Unified Data Analytics Platform, from the original creators of Apache Spark.
Databricks includes an interactive notebook environment, monitoring tools, and security controls that make it easy to leverage Spark.
Azure Databricks offers three environments for developing data intensive applications: Databricks SQL, Databricks Data Science & Engineering, and Databricks Machine Learning.
Data-bricks hands on tutorials
• Databricks hands on tu...
Azure Event Hubs
• Azure Event Hubs
Azure Data Factory Interview Question
• Azure Data Factory Int...
SQL leet code Questions
• SQL Interview Question...
Azure Synapse tutorials
• Azure Synapse Analytic...
Azure Event Grid
• Event Grid
Azure Data factory CI-CD
• CI-CD in Azure Data Fa...
Azure Basics
• Azure Basics
Data Bricks interview questions
• DataBricks Interview Q...
Thank you
In union, duplicate rows are removed right? So for removing those duplicate rows, the data should be fetched into single partition, that means data shuffle is there, then how UNION is a narrow transformation?
we can say UNIONALL is a narrow transformation, bcs it does not remove duplicate rows.
Please explain me, I'm confused on this.
Please reply, this question was asked by an interviewer in DE interview.
@harshitgupta355
union() and unionAll() behave differently in Spark as compared to SQL.
In spark:
1. unionAll() is deprecated. It used to work same as union() (different than SQL).
2. union() merges 2 DFs with same schema and duplicates are retained.
3. Because duplicates are retained, so it is a narrow transformation.
Also check out unionByName()