Narrow and Wide Transformations and Actions in Spark

Поділитися
Вставка
  • Опубліковано 27 сер 2024
  • If you need any guidance you can book time here, topmate.io/bha...
    Follow me on Linkedin
    / bhawna-bedi-540398102
    Instagram
    www.instagram....
    You can support my channel at: bhawnabedi15@okicici
    Platform to use Spark efficiently!!!!!!
    Databricks is Unified Data Analytics Platform, from the original creators of Apache Spark.
    Databricks includes an interactive notebook environment, monitoring tools, and security controls that make it easy to leverage Spark.
    Azure Databricks offers three environments for developing data intensive applications: Databricks SQL, Databricks Data Science & Engineering, and Databricks Machine Learning.
    Data-bricks hands on tutorials
    • Databricks hands on tu...
    Azure Event Hubs
    • Azure Event Hubs
    Azure Data Factory Interview Question
    • Azure Data Factory Int...
    SQL leet code Questions
    • SQL Interview Question...
    Azure Synapse tutorials
    • Azure Synapse Analytic...
    Azure Event Grid
    • Event Grid
    Azure Data factory CI-CD
    • CI-CD in Azure Data Fa...
    Azure Basics
    • Azure Basics
    Data Bricks interview questions
    • DataBricks Interview Q...

КОМЕНТАРІ • 4

  • @nagamanickam6604
    @nagamanickam6604 4 місяці тому

    Thank you

  • @harshitgupta355
    @harshitgupta355 7 місяців тому

    In union, duplicate rows are removed right? So for removing those duplicate rows, the data should be fetched into single partition, that means data shuffle is there, then how UNION is a narrow transformation?
    we can say UNIONALL is a narrow transformation, bcs it does not remove duplicate rows.
    Please explain me, I'm confused on this.

    • @harshitgupta355
      @harshitgupta355 7 місяців тому

      Please reply, this question was asked by an interviewer in DE interview.

    • @Shivv2008
      @Shivv2008 3 місяці тому

      @harshitgupta355
      union() and unionAll() behave differently in Spark as compared to SQL.
      In spark:
      1. unionAll() is deprecated. It used to work same as union() (different than SQL).
      2. union() merges 2 DFs with same schema and duplicates are retained.
      3. Because duplicates are retained, so it is a narrow transformation.
      Also check out unionByName()