very insightful on sql, aws, data modeling concepts & applications of those concepts, helps to recall & understand better the concepts learnt in big data master course & sql leetcode playlist :)
@sumitmittal07 The SQL aggregate question in which we need to calculate cumulative profit won't use ROWS Between as that will be used for rolling profit between a range, instead it should be simply: CUMULATIVE_PROFIT = SUM(profit) OVER(ORDER BY transaction_id, transaction_date). Let me know if I understood the question correctly or not. Also, in the partitioning and bucketing question interviewee have explained vice-versa.
In the case of creating a primary key in case unavailable, we can select any attribute and check if that attribute has 1 to 1 relationship with other composite values (in excel using a pivot table, check distinct values) and then use sha2 or md5 in adf to form the surrogate key. Correct me if I'm wrong
very insightful on sql, aws, data modeling concepts & applications of those concepts, helps to recall & understand better the concepts learnt in big data master course & sql leetcode playlist :)
@sumitmittal07 The SQL aggregate question in which we need to calculate cumulative profit won't use ROWS Between as that will be used for rolling profit between a range, instead it should be simply: CUMULATIVE_PROFIT = SUM(profit) OVER(ORDER BY transaction_id, transaction_date). Let me know if I understood the question correctly or not.
Also, in the partitioning and bucketing question interviewee have explained vice-versa.
You are right - Buckets are stored as files. Partitions are stored as directories.
In the case of creating a primary key in case unavailable, we can select any attribute and check if that attribute has 1 to 1 relationship with other composite values (in excel using a pivot table, check distinct values) and then use sha2 or md5 in adf to form the surrogate key. Correct me if I'm wrong
Yes, I was also thinking about md5
👌👌