The cumulative sum question example output I think is wrong. As per my understanding it should be 90 instead of 210 in row4 of output example as in the input 6th row product id changed than 5th row. Even the solution shown will give wrong output. It will partition based on product_id and the cum sum of each product_id will come together like below. id product_id sales_date sales_amt cumsum 1 1 2024-01-01 100 100 2 1 2024-01-02 150 250 4 1 2024-01-03 120 370 6 1 2024-01-04 90 460 3 2 2024-01-01 200 200 5 2 2024-01-02 180 380 But expectation was something else right if I am not wrong.
@15:00 Q: AQE already enabled in spark 3.0, if you still facing an out of memory error. what will be the solution? A: if we increase the shuffle partitions(default 200 to more than 200), will out of memory error resolves?
Much needed videos
Solution provider by interviewer is wrong
3rd txn of every user
Cumulative sum of sales amount of each product id
The cumulative sum question example output I think is wrong. As per my understanding it should be 90 instead of 210 in row4 of output example as in the input 6th row product id changed than 5th row.
Even the solution shown will give wrong output. It will partition based on product_id and the cum sum of each product_id will come together like below.
id product_id sales_date sales_amt cumsum
1 1 2024-01-01 100 100
2 1 2024-01-02 150 250
4 1 2024-01-03 120 370
6 1 2024-01-04 90 460
3 2 2024-01-01 200 200
5 2 2024-01-02 180 380
But expectation was something else right if I am not wrong.
Yes
select * from (
select *,ROW_NUMBER()over(partition by user_id order by transaction_date asc)cnt from int1
)aa where cnt=3
@15:00
Q: AQE already enabled in spark 3.0, if you still facing an out of memory error. what will be the solution?
A: if we increase the shuffle partitions(default 200 to more than 200), will out of memory error resolves?
Sir please upload python video it's been more than 2 weeks