Salting in Apache Spark - Part I

Поділитися
Вставка
  • Опубліковано 11 січ 2025

КОМЕНТАРІ • 6

  • @TheBigDataShow
    @TheBigDataShow  6 місяців тому

    A practical demonstration will be relaxed tomorrow. Kindly watch this video to understand the theory in depth.

  • @mufaddalrampurawala247
    @mufaddalrampurawala247 6 місяців тому +4

    This also increases the data size of the second dataset as we explode it, so is it still optimized as the data scan will be increased a lot and lot of shuffle will be involved?

    • @nishabansal2978
      @nishabansal2978 6 місяців тому +2

      While salting can increase the data size and shuffle overhead in Spark, its benefits in mitigating data skewness and improving workload distribution often outweigh these drawbacks. The other important thing is to decide on salting factor to choose for your workload as that will again impact the overall distribution

  • @adib4361
    @adib4361 20 днів тому

    If we just do soark.sql.adaptive.enabled = True for spark 3 and above is salting still required?

  • @payalbhatia6927
    @payalbhatia6927 6 місяців тому

    which pentab/device is used for video , can you please share ?