Most Asked interview question in Apache Spark ‘Joins’

Поділитися
Вставка
  • Опубліковано 27 сер 2024

КОМЕНТАРІ • 23

  • @venkatasai4293
    @venkatasai4293 Рік тому

    Great series . Please include a session on how to handle skew in a join , Bucket join vs shuffle hash join .

  • @tusharhatwar
    @tusharhatwar Рік тому

    Thank You bro for the video

  • @passions9730
    @passions9730 Рік тому

    Thank you nilendra for the video, hope to see more videos reg this topic in your channel. One small suggestion please dont add back ground music in upcoming videos..😊

  • @sriadityab4794
    @sriadityab4794 Рік тому

    Thanks 👍

  • @Sonuyadav-um9fj
    @Sonuyadav-um9fj Рік тому

    Thanks sir 🙏

  • @TechnoSparkBigData
    @TechnoSparkBigData Рік тому +2

    Hi Sir, i would request to please remove the background music 😊. However your content is awesome

    • @dataengineeringforeveryone
      @dataengineeringforeveryone  Рік тому +1

      Point noted. But to me it seems with my voice only video, viewers might get bored. Haha

    • @TechnoSparkBigData
      @TechnoSparkBigData Рік тому +3

      @@dataengineeringforeveryone if content is good then nobody will get bored. Content is key.

    • @TechnoSparkBigData
      @TechnoSparkBigData Рік тому

      This is my video on installing pyspark. I am also a data engineer
      ua-cam.com/video/nOSXvFd4hoY/v-deo.html

    • @aaroncode2634
      @aaroncode2634 Рік тому

      @@dataengineeringforeveryone let the bg music be but reduce the volume level and not dominate your voice. Great content btw 😊

  • @nandlalsharma521
    @nandlalsharma521 Рік тому

    Can you provide link of this document

  • @karthikr4185
    @karthikr4185 Рік тому

    Hi Nilendra, Thank you for sharing ....
    I have one famous question, how many cores and executors are required for 100 gb of data ?? Could you please help us in understanding this ... Thanks in advance

    • @dataengineeringforeveryone
      @dataengineeringforeveryone  Рік тому

      The number of cores and executors needed to process 100 GB of data will depend on several factors, such as the complexity of the processing logic, the type of data, and the resources available on the Spark cluster.
      Here is a rough estimate based on some general assumptions:
      A single executor can process about 10 GB of data in a reasonable amount of time. (Again depends on core capacity )
      A single core can handle about 2-3 executors, depending on the processing logic and the amount of memory required.
      Based on these assumptions, processing 100 GB of data would require:
      10 executors * 2-3 executors per core = 20-30 cores
      However, this is just a rough estimate and your actual requirements may vary. It's always best to conduct performance testing with representative data to determine the actual resources required for your specific use case.

    • @karthikr4185
      @karthikr4185 Рік тому

      @@dataengineeringforeveryone Great Thanks for taking some time in replying... Your videos helped us a lot ... Thanks again....

  • @Someonner
    @Someonner Рік тому

    Bhai background music mat use Kar.

  • @danielu3499
    @danielu3499 Рік тому

    ❣️ 'Promo sm'