How can we do deduplication and keep the last record instead of first (based on timestamp field in dataframe)? Current implementation for dropDuplicates keep the first occurrence and ignores all subsequent occurrences for that key, how can we tell Spark to update the state and keep the most recent value based on timestamp field.
How can we do deduplication and keep the last record instead of first (based on timestamp field in dataframe)? Current implementation for dropDuplicates keep the first occurrence and ignores all subsequent occurrences for that key, how can we tell Spark to update the state and keep the most recent value based on timestamp field.