Does this solution is applicable for source DB with millions of records? The reason for the ask is, how does this hash comparison will work in the case of millions of records. Will it have performance issues?
Hi Madam, Video is good. I have few doubts. 1) But want to know why not used watermark table? It is not having full history load which is SCD-2. As per your approach, it may affect performance which we comparing all recs from target. 2) what are all the activities are you used. difficult to find in video because of changed names of those activities. could you list me the activities along with this? Thanks madm.
Hi madum ,how we can convert the different date formats into one date format. For example 'yy/mm/dd' (or) 'dd/mm/yyyy' into 'yyyy-MM-dd' date format. We can implement this in azure dataflow
Hi Mam Please respond urgent query. I have time in CSV file so how to convert time in Data Factory into Time. As I don't have date. I need to convert CSV time field into time format.
Hello Mam, I need some suggestions. I need to make incremental data extraction pipeline in ADF. ServiceNow is my source and I am extracting data in json format and storing into blob storage. I need to extract only the latest updated or inserted data from ServiceNow.
Let me know if my understanding is incorrect, but isn't this similar to the upsert operation and cant this be achieved using the alter row-->upsert option as before? Also this looks structurally same as SCD component's output in ssis!
Hi, I need to copy the data from 5 tables in Azure data lake to 1 table in Cosmos DB. we need a particular field based on the relationships. Thanks in advance
Hello Ma'am, I have a problem with the incremental load I want to create an incremental pipeline from the Oracle on-premise server to Azure data lake(blob storage) I don't have Azure SQL. I just want to push in blob storage as a CSV file. in my case, I have confusion about where I should create the watermark table. someone told me in your case you have to use parquet data. please help me with this I am stuck for many days.
Thanks for the sharing your knowledge. Could you do a video on How to delete target sql table rows which are not exist in source file. Tried through doesn't exist but giving a weird results. If in source 5 records missing exist in target sql table doesn't exist showing 30 records not sure why it is?. Thanks in advance
Mam i have a doubt in the fault tolerance part in adf. I have configured adls gen2 storage account for writing the log where i'm getting this error. "Azure Blob connect cant support this feature for hierarchical namespace enabled storage accounts, please use azure data lake gen2 linked service for this storage account instead". The thing is i'm already using the azure data lake store gen2 , but still receiving the error. Can you help in fixing this.
that was clearly explained... however it would have been even useful if you could have actually dragged the components and set up the whole thing manually.
I need urgent solution, can you please soon... Hello mam, how to load data to database whose connection to Sink is not available, for example mysql or postgreSQL, Azure has option to source but it does not support Sink. In that case how to load to that dB?
As you are simply overriding i.e. not SCD type2/3, there is no need to have the hash key. You simple could have used the PK of the target table and use lookup whether that PK(unique value) is already present or not - IMO
@@rajeevsharma2664 For updated columns, if we do not have Hash Key, and if there are over 20 + columns, we have to compare all these individually right? So won't hashing help in those situations?
You explain like school teacher. I really feel as if my class teacher is teaching me the concepts. Very thankful for your efforts Mam!!
Your videos are always so exceptional and relevant to real life tasks required at work. Thanks. Keep up the good work
Thanks so much for ur time to comment and appreciate..much needed motivation!!
Exceptional & simple. Many thanks....Happy teacher's day...
Thanks so much Abhishek 🙏
Does this solution is applicable for source DB with millions of records? The reason for the ask is, how does this hash comparison will work in the case of millions of records. Will it have performance issues?
Very good explanation and nice scenario👍
Can you please tell the query you wrote to create hash column.
When i tried i got same values for all the rows in hash column
Hi Madam, Video is good. I have few doubts.
1) But want to know why not used watermark table? It is not having full history load which is SCD-2. As per your approach, it may affect performance which we comparing all recs from target.
2) what are all the activities are you used. difficult to find in video because of changed names of those activities. could you list me the activities along with this?
Thanks madm.
look up has a limit for 5k rows right? how to deal with the input has 1Million rows?
Really very helpful. Thanks for creating this video
Thanks a lot for your feedback
Very good explanation madam
Thanks, glad it helps.
Hi Mam, We don't have date column in the source side.Can we also implement the same process?
Please let me know why lookup needed, any how we have conditionalsplit right?
Thanks for giving such skills
Thanks.
Hi madum ,how we can convert the different date formats into one date format.
For example 'yy/mm/dd' (or) 'dd/mm/yyyy' into 'yyyy-MM-dd' date format.
We can implement this in azure dataflow
Hi Mam
Please respond urgent query.
I have time in CSV file so how to convert time in Data Factory into Time. As I don't have date. I need to convert CSV time field into time format.
ma'am what is the difference between switch activity and if condition in ADF. Please reply
Isn't it the same as alter-row(upsert)? We can achieve the same right?
Hello Mam, I need some suggestions. I need to make incremental data extraction pipeline in ADF. ServiceNow is my source and I am extracting data in json format and storing into blob storage. I need to extract only the latest updated or inserted data from ServiceNow.
Thanks for knowledge sharing
You are welcome Orion !!
Let me know if my understanding is incorrect, but isn't this similar to the upsert operation and cant this be achieved using the alter row-->upsert option as before? Also this looks structurally same as SCD component's output in ssis!
Hi, I need to copy the data from 5 tables in Azure data lake to 1 table in Cosmos DB. we need a particular field based on the relationships. Thanks in advance
Thanks a lot for your help .
Glad it helped
Pls make this incremental load as dynamic ,,it wil help us a lot...
good info without bla bla
Hello Ma'am,
I have a problem with the incremental load I want to create an incremental pipeline from the Oracle on-premise server to Azure data lake(blob storage) I don't have Azure SQL. I just want to push in blob storage as a CSV file. in my case, I have confusion about where I should create the watermark table. someone told me in your case you have to use parquet data. please help me with this I am stuck for many days.
Hmm. Since your source is on Prem we can't use data flow otherwise we can implement the logic as shown in ua-cam.com/video/evqQRwsF_Ps/v-deo.html
Thanks for the sharing your knowledge. Could you do a video on How to delete target sql table rows which are not exist in source file. Tried through doesn't exist but giving a weird results. If in source 5 records missing exist in target sql table doesn't exist showing 30 records not sure why it is?. Thanks in advance
Sure
Very nice...
Thanks.. Found very helpful 😊
Thanks much 👍
Mam i have a doubt in the fault tolerance part in adf. I have configured adls gen2 storage account for writing the log where i'm getting this error.
"Azure Blob connect cant support this feature for hierarchical namespace enabled storage accounts, please use azure data lake gen2 linked service for this storage account instead".
The thing is i'm already using the azure data lake store gen2 , but still receiving the error. Can you help in fixing this.
It's fishy. Can you pls share the settings where you write the log along with error to funlearn0007@gmail.com
that was clearly explained... however it would have been even useful if you could have actually dragged the components and set up the whole thing manually.
How can we identify if a record is deleted in source ? how do we capture that in target ?,
Thanks. Really helpful
Thanks.
Thanks for this video ma'am
Welcome 🙏
I need urgent solution, can you please soon...
Hello mam, how to load data to database whose connection to Sink is not available, for example mysql or postgreSQL, Azure has option to source but it does not support Sink. In that case how to load to that dB?
Is there no connector at all or you don't have an option to load directly.
Export the data to a CSV and then consume them in that DB. Did this make sense?
As you are simply overriding i.e. not SCD type2/3, there is no need to have the hash key. You simple could have used the PK of the target table and use lookup whether that PK(unique value) is already present or not - IMO
You are right, I just wanted to explain the hashing mechanism as one of my subscribers asked for the steps. And, thanks for your comment👍
@@AllAboutBI No problem - it was my pleasure. Rather I want to validate whether I'm missing anything or not :)
@@rajeevsharma2664 For updated columns, if we do not have Hash Key, and if there are over 20 + columns, we have to compare all these individually right? So won't hashing help in those situations?
Notequal operator accepts two expressions but you mention (hashColumn, Hash), what it means ?
even more, you didn't declare or create those columns.
Hashcolumn comes from my table.
Hash comes from the derived column transformation for all the incoming rows.
Not equal operator compares the above two