Partitioning vs Bucketing | Interview Question | PySpark

Поділитися
Вставка
  • Опубліковано 20 сер 2024
  • Partitioning and bucketing are techniques used to optimize data storage and improve query performance in PySpark. The choice between them depends on the specific use case and the nature of the queries that will be executed on the data.
    Sample Data:
    date product amount region
    01-01-2024 Product_0 0 Region_0
    02-01-2024 Product_1 10 Region_1
    03-01-2024 Product_2 20 Region_2
    04-01-2024 Product_1 30 Region_0
    05-01-2024 Product_4 40 Region_1
    06-01-2024 Product_0 50 Region_2
    07-01-2024 Product_1 60 Region_0
    08-01-2024 Product_2 70 Region_1
    09-01-2024 Product_2 80 Region_2
    10-01-2024 Product_4 90 Region_0
    Check out this video and do let me know your doubts we can connect on
    linkedIn : / priyam-jain-0946ab199
    PWC interview Question:
    • Question 11: PWC Inter...
    • Question 10: PWC Inter...
    Deloitte interview Question:
    • Question 9: Deloitte I...
    Do subscribe @pysparkpulse for more such Questions.
    #pyspark #spark #bigdata #bigdataengineer #dataengineering #dataengineer #deloitte #pwc #mnc

КОМЕНТАРІ • 4

  • @abhishekmalvadkar206
    @abhishekmalvadkar206 2 місяці тому +1

    very well explained 👏

  • @rockroll28
    @rockroll28 2 місяці тому +2

    Good information.
    Constructive criticism:
    You were explaining too fast.
    Chart explained can be part of 1 video and practical can be in another video. This way 2 videos of 10 to 12 minutes could have been helpful.
    Best of luck 👍🏻

    • @pysparkpulse
      @pysparkpulse  2 місяці тому

      Thank you for your feedback will keep this in mind ☺️