A Deep Dive into Stateful Stream Processing in Structured Streaming 2018 Part 2 (Tathagata Das)

Поділитися
Вставка
  • Опубліковано 3 гру 2024

КОМЕНТАРІ • 1

  • @AashishOla
    @AashishOla 4 роки тому +2

    How can we do deduplication and keep the last record instead of first (based on timestamp field in dataframe)? Current implementation for dropDuplicates keep the first occurrence and ignores all subsequent occurrences for that key, how can we tell Spark to update the state and keep the most recent value based on timestamp field.