How Spark Creates Partitions || Spark Parallel Processing || Spark Interview Questions and Answers

Поділитися
Вставка
  • Опубліковано 23 гру 2024

КОМЕНТАРІ • 17

  • @tejeskhandagale5463
    @tejeskhandagale5463 3 роки тому +5

    Informative and well explained. Keep posting 👍

  • @ravi19900
    @ravi19900 Місяць тому

    Excellent 👍

  • @onkarlondhe8131
    @onkarlondhe8131 2 роки тому

    Sir,
    I have watched many videos related to this topic, but very few guys were able to explain these concepts the way you did. and this video tempted me to watch full playlist, and I definitely will.
    thanks for sharing your knowledge and understandings with us.
    🙏🙏🙌

  • @vijeandran
    @vijeandran 3 роки тому +1

    Really informative.... neat explanation. Thank u

  • @aneksingh4496
    @aneksingh4496 2 роки тому

    nice catch points explained

  • @ksktest187
    @ksktest187 3 роки тому

    another good efforts for the aspirent of Data engineering job candidates. sound ground for preparing for interview...

  • @ayseak_
    @ayseak_ 3 роки тому +2

    Could you please explain am I getting it right. As I understand partition is a logical division of data in chunks of data (unit of operation that Spark applies).
    So basically when for example we create RDD with 4 partitions it means that Driver Node will read data, create partitions, and serialize it, ship those partitions to Worker Nodes (deserialize here) so that it may make compuations parallelly?

  • @kannadigainusa3751
    @kannadigainusa3751 3 роки тому +1

    All your Vedio’s on Spark are good..Can you assign the numbers in the order to watch from first to last?

    • @cleverstudies
      @cleverstudies  3 роки тому

      We will try to do that. Thanks for watching the videos.

  • @dataaholic
    @dataaholic 3 роки тому

    Can you please provide the download link for the CDH you are using.???

  • @guptaashok121
    @guptaashok121 2 роки тому

    Per my understanding driver sends the logic or program to executor to read only given partition of data. My doubt is how driver node creates those instruction as it does not know exactly what data is present in file specifically if it's big text file, there are no columns or keys or indexes. How it make sure that all data is read by different executorand there are no overlaps.

  • @selvansenthil1
    @selvansenthil1 3 роки тому

    Thank you

  • @nivedita5639
    @nivedita5639 4 роки тому

    Can you explain this question: how to move all partitions in a single node?

    • @narasimharao7007
      @narasimharao7007 4 роки тому

      Are you asking about Reducing/Increasing number of partitions then u can try repartition() Or coalesce(). Remember that repartition will work for increasing and Decreasing the partitions but coalesce will only reduce the number of partitions

    • @nva1719
      @nva1719 3 роки тому

      We can use df.coalesce(1) instead of reparation(1) as coalesce involves lesser or no shuffle while reparation involves full shuffle of data. It is preferred to have minimal shuffle of data.