Hi. Thanks for the tutorial. However, there is one issue. So currently i am using client source (source 'A' ) & Mysource (source 'B') and trying to get incremental data from client database tables to my database tables and as per this solution: We are using watermark table's sourcequery column to get data from client source. So in copy activity i have client source and mysource as sink. But when writing query in the source of copy activity. The sourcequery is using sub-query. Subquery is using watermark table but this watermark table is not present in client source. It is present in Mysource. So. Now i am getting error invalid watermark_config table while performing copy activity for obvious reason. The watermark table is present in Mysource database. Can you assist how it will work in this scenario ?
Is this a what we generally follow in organisation? or Is there any other method as well? Please let us know upsert method as well how do we achieve the same in databricks as well?
Great video but this only works if we are able to create the watermark table in the source database. A lot of the time we do not have write access from the source database, just read. It looks like this pipeline would need to be changed so that the watermark date is read from the target database (or some other database where we have control), and that date is set as a parameter for a different source query, which just takes the date as a parameter
Can you explain please why we should use isactive column? Only for filtering purpose? Isactive 1 means we need to do incremental load and 0 means full load? Thanks in advance
Isactive column helps you to control which tables you want to load, 1 means load and 0 means do not load, by this we dont need to make any changes in pipeline and it will be yet dynamic.
Nice Explanation !!!! I have one question suppose some records has been updated in source before the watermark date value. How we can capture that data also?? Please revert
The expression 'split(body('Copy data1ComposeRuntimeVariables')?.AzureSqlTable231218ca8779d4849972127c23583de10.tablename,'.')[1]' cannot be evaluated because array index '1' is outside bounds (0, 0) of array. Getting this error Please someone help me regarding this
I am getting the error like:- ErrorCode=InvalidTemplate, ErrorMessage=The template validation failed: 'The 'runAfter' property of template action 'ForEach Each ItemScope' contains non-existent action: 'Read the WaterMark Table Scope'
@item().SourceQuery :- might creating the issue. I have created watermark table in the source database and using this code in the source of the copy activity. Need your assistance..
Hi, My name is Varun. I follow your UA-cam channel. Firstly I thank you for the videos. I saw your multiple table incremental video, I'm having doubt. 1. In all the table you have taken delta column as time, in many tables time column won't be there. In this case what to specify the data type in stored procedure because data type of LPV (last processed value)will be different for different table
Hello Varun, Thank you for watching our videos. Yes that is practical you won't have a date time column in all tables, in such case I can choose anyone from the below ways: 1. Truncate and load if data is small 2. Upsert provided there is a primary key or identifier. 3. Use IN to compare which records exists and copy the non existing ones, you need to build a custom query for it and can lead to performance issues.
Nice video bro, but in realtime scenario we have lot of table right, we can not create source query as like you, if we create like you its very lenghty process, please suggest us how to overcome this situation.
Thank you for watching. Totally agree. You can't adopt this practice when file size is huge. But this was an example to get started with so people know the process and limitations of this approach
By watching this video you need to understand one thing and conclude. Data Lake has raw file which cannot be queried directly, hence you have to load the data in database and then query it. But later developments like synpase serverless and delta lake make it easy to overcome this limitation
one more time watched and found it the best real time incremental load video in youtoube. Thanks again👍👍👍
Hi. Thanks for the tutorial.
However, there is one issue.
So currently i am using client source (source 'A' ) & Mysource (source 'B') and trying to get incremental data from client database tables to my database tables and as per this solution: We are using watermark table's sourcequery column to get data from client source. So in copy activity i have client source and mysource as sink. But when writing query in the source of copy activity. The sourcequery is using sub-query.
Subquery is using watermark table but this watermark table is not present in client source. It is present in Mysource.
So. Now i am getting error invalid watermark_config table while performing copy activity for obvious reason. The watermark table is present in Mysource database.
Can you assist how it will work in this scenario ?
Is this a what we generally follow in organisation? or Is there any other method as well?
Please let us know upsert method as well how do we achieve the same in databricks as well?
Hi, Is there any to copy multiple incremental data from SQL to Azure Blob Storage?
Great video but this only works if we are able to create the watermark table in the source database. A lot of the time we do not have write access from the source database, just read. It looks like this pipeline would need to be changed so that the watermark date is read from the target database (or some other database where we have control), and that date is set as a parameter for a different source query, which just takes the date as a parameter
Great video and really helpful
Glad you liked it
Dear by using this approach we can not handle the upsert statement , could you please help me out how to handle upsert as well.
Yes, upsert scenario is different, this is purely delta load
sir upsert ko kaise perfrom krenge let us know or give us the link of that videos.@@cloudanddatauniverse
My pipeline getting Failed even first time. Kindly guide me.
Can you explain please why we should use isactive column? Only for filtering purpose? Isactive 1 means we need to do incremental load and 0 means full load?
Thanks in advance
Isactive column helps you to control which tables you want to load, 1 means load and 0 means do not load, by this we dont need to make any changes in pipeline and it will be yet dynamic.
Good explanation👍🏻
Nice Explanation !!!! I have one question suppose some records has been updated in source before the watermark date value. How we can capture that data also?? Please revert
Excellent. Thank you!
Welcome 😊
The expression 'split(body('Copy data1ComposeRuntimeVariables')?.AzureSqlTable231218ca8779d4849972127c23583de10.tablename,'.')[1]' cannot be evaluated because array index '1' is outside bounds (0, 0) of array.
Getting this error Please someone help me regarding this
Can you recheck the steps in case you missed something in video
I am getting the error like:- ErrorCode=InvalidTemplate, ErrorMessage=The template validation failed: 'The 'runAfter' property of template action 'ForEach Each ItemScope' contains non-existent action: 'Read the WaterMark Table Scope'
Might be I have not created the water mark table in the destination database as well. is that correct?
@item().SourceQuery :- might creating the issue. I have created watermark table in the source database and using this code in the source of the copy activity. Need your assistance..
Can you cross check the steps in the video, considering this is a complex example, there is possibility of errors, kindly check the steps once again.
@@cloudanddatauniverse Thanks so much for the reply, yes issue got resolved :). I am testing my pipeline as per my expectations.🙂
Awesome
Hi,
My name is Varun.
I follow your UA-cam channel.
Firstly I thank you for the videos.
I saw your multiple table incremental video, I'm having doubt.
1. In all the table you have taken delta column as time, in many tables time column won't be there. In this case what to specify the data type in stored procedure because data type of LPV (last processed value)will be different for different table
Hello Varun,
Thank you for watching our videos. Yes that is practical you won't have a date time column in all tables, in such case I can choose anyone from the below ways:
1. Truncate and load if data is small
2. Upsert provided there is a primary key or identifier.
3. Use IN to compare which records exists and copy the non existing ones, you need to build a custom query for it and can lead to performance issues.
Instead of taking the lookup activity again we can take storage procedure
Nice video bro, but in realtime scenario we have lot of table right, we can not create source query as like you, if we create like you its very lenghty process, please suggest us how to overcome this situation.
Thank you for watching. Totally agree. You can't adopt this practice when file size is huge. But this was an example to get started with so people know the process and limitations of this approach
@@cloudanddatauniverse
Bro, how can we overcome this situation bro
By watching this video you need to understand one thing and conclude. Data Lake has raw file which cannot be queried directly, hence you have to load the data in database and then query it. But later developments like synpase serverless and delta lake make it easy to overcome this limitation
@@cloudanddatauniverse
Ho, we overcome this by create synapse correct bro?
Bro, could you please make that session, that would be very helpful for us