AWS Tutorials - Incremental Data Load from JDBC using AWS Glue Jobs

Поділитися
Вставка
  • Опубліковано 6 вер 2024

КОМЕНТАРІ • 22

  • @susilpadhy9553
    @susilpadhy9553 Рік тому +2

    Please make a video on how to handle the incremental load using timestamp column that will really helpful thanks in advance,i watched so many videos of yours it really helps.

  • @jnana1985
    @jnana1985 Рік тому +1

    Is it only for inserting new records or does it also work with update and delete records also?

  • @rajatpathak4499
    @rajatpathak4499 Рік тому +1

    great tutorial, keep brings us more video on real time scenarion. if you can cover video on glue workflow , which can include source,then lambda invocation, which triggers glue job for cataloging and then another trigger for transformation , after that insert into db which will then trigger lambda for archiving.

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  Рік тому

      Please check my vide for event based pipeline. I have explained there what you are talking about.

  • @manishchaturvedi7908
    @manishchaturvedi7908 8 місяців тому

    Please add a video which leverages timestamp in the source table to incrementally load data

  • @brockador_93
    @brockador_93 6 місяців тому

    hello, how are you? One question, I created a bookmark job, based on the primary key of the source table, when making an update to an already processed record, it was not changed in the destination file, how can I make the job understand that there was a change in this record? for example, the table key is the ID field, and the changed field was the "name" field.

  • @fredygerman_
    @fredygerman_ 11 місяців тому

    Great video but can you show an example where you connect to an external database using jdbc connection, I.e a database from superbase

  • @user-on5zy2gc2u
    @user-on5zy2gc2u Рік тому +1

    Great Content. I'm facing an issue while loading data from ms sql into redshift using glue, scenario is I have multiple tables regarding customers with customer id as primary key I want to get output as when we update any phone number or address related to a customer id I have to write it into redshift with a new row and if any new entry comes it should get inserted as new row is there any solution for this?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  Рік тому

      You can create a job which filters data from RDS based on last run datetime and pick records (based on created / modified date greater than last run datetime). Then insert picked records into target database.

  • @basavapn6487
    @basavapn6487 5 місяців тому

    Can you please make a video on delta files to achieve scd type 1, because in this scenario it was full file ,but i want to process on incremental files

  • @shrishark
    @shrishark 9 місяців тому

    what is the best approach to read huge volume of data from any on perm sql dbs , identify sensitive data, replace with fake data and to push to aws s3 bucket for specific criteria.?

    • @victoriwuoha3081
      @victoriwuoha3081 6 місяців тому

      redact the data using kms during processing before storage.

  • @tcsanimesh
    @tcsanimesh Рік тому

    Beautiful video!! Can you please add use case for update and delete as well

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  Рік тому

      In data lake, you generally do not perform update and delete. You only insert. But if want CRUD operation then you should be thinking to use Iceberg, Hudi or Delta Lake on S3.

  • @gulo101
    @gulo101 Рік тому

    Great video, thank you! Couple questions: I will be using the data I copy from JDBC DB to S3 for staging, before it's moved to Snowflake. After I move it to Snowflake, is it safe to delete it from S3 bucket without any negative impact on the bookmark progress? Also, is there any way to see what the current value of the bookmark is, or manually change it in case of load issues? Thank you

  • @federicocremer7677
    @federicocremer7677 Рік тому

    Excelent tutorial and great explanation. Thank you, you got my sub!. Just to be sure, if I have one "updated_at" field in my schema and in my data source (let's say JDBC - Postgres instance) are daily updated rows but rather not inserted new rows, those updated rows will be catched by the new job with bookmark enabled? If that is correct, do I have to add not only my "id" field but also my "updated_at" field in jobBookmarkKeys?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  Рік тому +1

      You can use key(s) for job bookmark as long as they meet certain requirements. here are the rules
      For each table, AWS Glue uses one or more columns as bookmark keys to determine new and processed data. The bookmark keys combine to form a single compound key.
      You can specify the columns to use as bookmark keys. If you don't specify bookmark keys, AWS Glue by default uses the primary key as the bookmark key, provided that it is sequentially increasing or decreasing (with no gaps).
      If user-defined bookmarks keys are used, they must be strictly monotonically increasing or decreasing. Gaps are permitted.
      AWS Glue doesn't support using case-sensitive columns as job bookmark keys.

  • @helovesdata8483
    @helovesdata8483 Рік тому

    I can't get my jdbc data source to connect with glue. The only error I get is test connection failed

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  Рік тому

      test connection fail because of many reasons -
      1) not using the right VPC, Subnet and Security Group associated with the JDBC source
      2) Security Group is not configured with right rules
      3) Not having VPC Endpoints (S3 Gateway and Glue Interface) in the VPC of the JDBC

  • @canye1662
    @canye1662 Рік тому +1

    awesome vid...100%