Big Data Engineer Mock Interview | AWS | Kafka Streaming | SQL | PySpark Optimization

Поділитися
Вставка
  • Опубліковано 20 сер 2024

КОМЕНТАРІ • 6

  • @arunsundar3739
    @arunsundar3739 4 місяці тому +3

    very insightful on sql, aws, data modeling concepts & applications of those concepts, helps to recall & understand better the concepts learnt in big data master course & sql leetcode playlist :)

  • @sonuparmar5836
    @sonuparmar5836 4 місяці тому +1

    @sumitmittal07 The SQL aggregate question in which we need to calculate cumulative profit won't use ROWS Between as that will be used for rolling profit between a range, instead it should be simply: CUMULATIVE_PROFIT = SUM(profit) OVER(ORDER BY transaction_id, transaction_date). Let me know if I understood the question correctly or not.
    Also, in the partitioning and bucketing question interviewee have explained vice-versa.

    • @aniruths9900
      @aniruths9900 2 місяці тому

      You are right - Buckets are stored as files. Partitions are stored as directories.

  • @ankandatta4352
    @ankandatta4352 4 місяці тому +3

    In the case of creating a primary key in case unavailable, we can select any attribute and check if that attribute has 1 to 1 relationship with other composite values (in excel using a pivot table, check distinct values) and then use sha2 or md5 in adf to form the surrogate key. Correct me if I'm wrong