ETL from AWS DynamoDB to Amazon Redshift Using Amazon Kinesis Firehose Delivery Stream & AWS Lambda

Поділитися
Вставка
  • Опубліковано 15 жов 2024
  • ===================================================================
    1. SUBSCRIBE FOR MORE LEARNING :
    / @cloudquicklabs
    ===================================================================
    2. CLOUD QUICK LABS - CHANNEL MEMBERSHIP FOR MORE BENEFITS :
    / @cloudquicklabs
    ===================================================================
    3. BUY ME A COFFEE AS A TOKEN OF APPRECIATION :
    www.buymeacoff...
    ===================================================================
    This video shows how to perform ETL operations with AWS DynamoDB stream to Amazon Redshift using Amazon Kinesis firehose delivery stream and AWS lambda.
    This video has clean flow walk through using pictorial overview and also explanations of service by service connections used in it.
    It shows how to create , configure and make required connections to achieve this ETL operation.
    It also have lambda code walk through and final demo of the scenario by executing the this ETL pipeline.
    This video helps AWS SME , Data engineer , Architects etc.
    File used in demo can found at repo link : github.com/Rek...
    #dynamodb #redshift #kinesisfirehose #etl #aws #awslmbda

КОМЕНТАРІ • 27

  • @SandeepSingh-hn6it
    @SandeepSingh-hn6it 6 місяців тому +1

    Really it very clear to understand, I have some doubts, how to incremental syncing if it happening then how to avoid duplication syncing on redshift, and how much delay to replicate unique record on redshift.

    • @cloudquicklabs
      @cloudquicklabs  6 місяців тому

      Thank you for watching my videos.
      Glad that it helped you.
      We can do INCREMENTAL sync without duplication using AWS Glue job.
      I shall create new video on this topic soon.

  • @theskygivesusreasons
    @theskygivesusreasons Рік тому +1

    Hello! Do you know if you would be able to use Redshift Serverless with Kinesis Firehose instead of Redshift Provisioned Clusters? Thank you for the wonderful video!

    • @cloudquicklabs
      @cloudquicklabs  Рік тому +1

      Thank you for watching my videos.
      As per my reading currently serverless redshift does not support public endpoints and Kinesis fire needs public endpoints, hence currently it may not be supported. But shall create video on it once it starts supporting it. Thank you

  • @anujsaraswat864
    @anujsaraswat864 5 місяців тому +1

    If I am putting json format sample data in firehose so in the copy command section, do I need to put Json or what?

    • @cloudquicklabs
      @cloudquicklabs  5 місяців тому

      Thank you for watching my videos.
      You would need to provide JSON format.

  • @ansh1ta
    @ansh1ta 2 роки тому +1

    How do you handle the updates to the records in DynamoDB tables to get reflected back to Redshfit??

    • @cloudquicklabs
      @cloudquicklabs  2 роки тому

      Thank you for watching my videos.
      This would require a customization may be you need think it with another Pipeline containing lambda updating record at Redshift when it is updating at DynamoDB.

    • @ansh1ta
      @ansh1ta 2 роки тому +1

      But can Lambda write to a Redshift table? My impression is that it can only query the tables.

    • @cloudquicklabs
      @cloudquicklabs  2 роки тому

      Thank you for watching my videos.
      Yes lambda can as under the wood it is executions SQL queries on RDS database table..

  • @ansh1ta
    @ansh1ta 2 роки тому +1

    Can u please share what Roles and Permission are needed. I am getting an error when the Firehose is trying to connect to my Redshift cluster. Have opened the Security Groups to allow all communications, but still facing an issue

    • @cloudquicklabs
      @cloudquicklabs  2 роки тому

      Thank you for watching my videos.
      I have given blanket permission ( admin) access to the role I am using in video. May be if you can the error in Firehose I can help you there.

  • @anuragbond913
    @anuragbond913 Рік тому +1

    Has AWS stoped giving Free trial of Redshift because i could not find it in my Redshift cluster ? Anyone has any idea about this.

    • @cloudquicklabs
      @cloudquicklabs  Рік тому

      Thank you for watching my videos.
      I have not heard about free tier but you could use low cost Dev/Test options here. For more details about free tier here aws.amazon.com/redshift/free-trial/

    • @anuragbond913
      @anuragbond913 Рік тому

      @@cloudquicklabs Like in this video you have used free tier Redshift. I think AWS stoped the free tier and we have to use low cost Redshift cluster instead.
      Just one more thing can I use Redshift serverless instead of Redshift cluster, AWS provide 300$ worth free serverless Redshift.
      You videos are very helpful, Thanks for the good work 👍😊

  • @liumx31
    @liumx31 Рік тому +1

    Can this workflow be done in step function? Or could the Lambda directly write to Redshift?

    • @cloudquicklabs
      @cloudquicklabs  Рік тому

      Indeed this scenario could be achieved with many ways with serverless function. You are right we could do that.

    • @prashanthm2446
      @prashanthm2446 4 місяці тому

      @liumx31, I had the same question in my mind, glad you have already asked. Thanks @cloudquicklabs for answering.

  • @khandoor7228
    @khandoor7228 2 роки тому +1

    great content on this channel!!

    • @cloudquicklabs
      @cloudquicklabs  2 роки тому

      Thank you for watching my videos.
      Appreciate your encouragement here.
      Keep watching and keep learning.

  • @keane26mar30
    @keane26mar30 Рік тому +1

    File "/var/task/lambda_function.py", line 22, in lambda_handler
    firehoseRecord = convertToFirehoseRecord(ddbRecord)
    File "/var/task/lambda_function.py", line 8, in convertToFirehoseRecord
    newImage = ddbRecord['NewImage']. Hi sir do you know why im getting such an error

    • @cloudquicklabs
      @cloudquicklabs  Рік тому

      Thank you for watching my videos.
      Did you check if your DynamoDB colum names are as same it is mentioned in python code.

    • @keane26mar30
      @keane26mar30 Рік тому +1

      @@cloudquicklabs okay but may i know what policies did you use for your iam roles, especially the redshift ones

    • @cloudquicklabs
      @cloudquicklabs  Рік тому

      Thank you for coming back on this.
      I have given 'Administrator' access to this as it is a demo. But at production you could fine grain it.

    • @keane26mar30
      @keane26mar30 Рік тому

      @@cloudquicklabs AccessDenied
      User: arn:aws:sts::880387018372:assumed-role/voclabs/user2209860=KEANE_LOO_JUN_XIAN is not authorized to perform: redshift:DescribeClusterSubnetGroups on resource: arn:aws:redshift:us-east-1:880387018372:subnetgroup:* because no identity-based policy allows the redshift:DescribeClusterSubnetGroups action
      AccessDenied
      User: arn:aws:sts::880387018372:assumed-role/voclabs/user2209860=KEANE_LOO_JUN_XIAN is not authorized to perform: redshift:DescribeEvents on resource: arn:aws:redshift:us-east-1:880387018372:event:* because no identity-based policy allows the redshift:DescribeEvents action
      AccessDenied
      User: arn:aws:sts::880387018372:assumed-role/voclabs/user2209860=KEANE_LOO_JUN_XIAN is not authorized to perform: redshift:DescribeClusters on resource: arn:aws:redshift:us-east-1:880387018372:cluster:* because no identity-based policy allows the redshift:DescribeClusters action

    • @tusharmalhan2206
      @tusharmalhan2206 Рік тому +1

      @@cloudquicklabs Hi , its because in the lamnda code , we did not mention the key "NewImage" which is the result of the erorr, cause in ur input json too , it requires a key which further will extract the ID , name, phone number .. etc ...