Handle Late or Duplicated Data and Archive Events for On-Demand Replay | 5/5

Поділитися
Вставка
  • Опубліковано 11 чер 2024
  • Find out how you can use Apache Flink to tackle late or duplicated data and improve data quality with exactly-once processing. We’ll also dive into archiving raw events for on-demand replay or reprocessing with Amazon Data Firehose.
    In this series, Anand Shah (Data Analytics and Streaming Specialist at AWS) will help you build a modern data streaming architecture for a real-time gaming leaderboard. This architecture includes data ingestion, real-time enrichment with database change data capture (CDC), data processing, as well as computing, storing and visualizing the results. You will also learn advanced streaming analytics techniques, such as the control channel method for A/B testing, updating features and parameters with zero downtime, and how to handle late arrival of data. Anand will also talk you through the process of data de-duplication, as well as how you can store historical data for replay on-demand. 🎉
    🌟 Get started with Amazon Managed Service for Apache Flink today, to build and run your fully managed Apache Flink applications on AWS!
    🔗 Github repository: github.com/build-on-aws/real-...
    Resources used in this video:
    🔗 Intro to Amazon Data Firehose: docs.aws.amazon.com/firehose/...
    🔗 Data de-duplication with Apache Flink: nightlies.apache.org/flink/fl...
    🔗 Apache Flink late data handling (Watermarking and reordering): nightlies.apache.org/flink/fl...
    🔗 Apache Flink Filesystem source: nightlies.apache.org/flink/fl...
    Continue your learning:
    🔗 Automate deployment and version updates for Amazon Kinesis Data Analytics applications with AWS CodePipeline: aws.amazon.com/blogs/big-data...
    🔗 SQL-based streaming analytics with Apache Flink: github.com/aws-samples/sql-ba...
    🔗 Amazon Managed Service for Apache Flink Workshop: catalog.workshops.aws/managed...
    🔗 Application scaling in Managed Service for Apache Flink: docs.aws.amazon.com/managed-f...
    🔗 Logging and monitoring in Amazon Managed Service for Apache Flink: docs.aws.amazon.com/managed-f...
    🔗 Audit AWS service events with Amazon EventBridge and Amazon Kinesis Data Streams: aws.amazon.com/blogs/big-data...
    Follow AWS Developers:
    👾 Twitch: / aws
    🐦 Twitter: / awsdevelopers
    💻 LinkedIn: / aws
    Follow Anand Shah: 
    🐦 Twitter: / anandshah110
    💻 LinkedIn: / anandshah110
    00:00 Intro
    00:21 Impact of late data arrival
    01:23 How to handle late data arrival
    01:52 Impact of duplicate messages
    02:52 How to de-duplicate data
    03:30 Demo: CDK source code walkthrough and deploy
    05:00 Demo: Handling late arrival of data
    05:26 Demo: Challenge 5.1 - De-duplicate data
    06:04 Demo: Setup Amazon Data Firehose for data archival
    10:32 Demo: On-demand replay of archived data
    11:29 Demo: Challenge 5.2 - Replay data
    11:53 Conclusion
     #LateDataArrival, #ExactlyOnce, #ArchivalAndReplay, #ManagedServiceForApacheFlink
  • Наука та технологія

КОМЕНТАРІ •