20. Get Latest File from Folder and Process it in Azure Data Factory

Поділитися
Вставка
  • Опубліковано 12 тра 2021
  • In this Video, I discussed about how to get latest file from folder and process it in Azure Data Factory
    Link for Azure Databricks Play list:
    • 1. Introduction to Az...
    Link for Azure Functions Play list:
    • 1. Introduction to Azu...
    Link for Azure Basics Play list:
    • 1. What is Azure and C...
    Link for Azure Data factory Play list:
    • 1. Introduction to Azu...
    Link for Azure Data Factory Real time Scenarios
    • 1. Handle Error Rows i...
    Link for Azure LogicApps playlist
    • 1. Introduction to Azu...
    #Azure #ADF #AzureDataFactory
  • Наука та технологія

КОМЕНТАРІ • 79

  • @vijaybodkhe8379
    @vijaybodkhe8379 8 місяців тому +2

    I think Set Variable 2 (i.e PreviousModifiedDate) should have been inside If condition. Current file modified time should always compared with the highest modified time among previous files.

  • @MaheshReddyPeddaggari
    @MaheshReddyPeddaggari 3 роки тому +1

    I am very much waiting for these kind of scenarios
    Thanks Maheer

  • @roshankumargupta46
    @roshankumargupta46 3 роки тому +1

    Thanks for this video! Can you also create a video to explain how can we verify sources and target tables? Like how can we verify all the rows and columns value got copied correctly using data factory

  • @AI-Health-posts
    @AI-Health-posts 3 роки тому +1

    Thanks Maheeer I was looking for this video cheers!

    • @WafaStudies
      @WafaStudies  3 роки тому

      Welcome 🤗

    • @vru5696
      @vru5696 4 місяці тому

      @@WafaStudies Thank you for your videos. Can you please create video on extract the file from Sharepoint location and load into sql table. Thanks

  • @user-iv5mg9bi5d
    @user-iv5mg9bi5d 11 місяців тому

    Hi Maheer, I'm learning ADF watching your videos, it's amazing series, I just want to cross check, I think we need to use 2 set variables in true condition only, need to put previous modified date first then latest file name variable, then only it's working fine in my case, Thanks

  • @ronaldorn84
    @ronaldorn84 2 роки тому +1

    Wonderful tips! TKS you help a lot

  • @sayonbhattacharjee7470
    @sayonbhattacharjee7470 2 роки тому +4

    Amazing Video...
    Just wanted to cross check that I think set var (last mod date) should also come under IF activity. Then only it is working correctly in my case.

  • @Kirbys911Heaven
    @Kirbys911Heaven 2 роки тому +1

    Super helpful. Thank you very much.

  • @HnkBnnndk
    @HnkBnnndk 2 роки тому

    Nice and clear explenation. But when i try this on files in different subfolders, the mechanisme doesn't work with wildcards for subdirectories. Do you have a solution for that?

  • @abbasfatehm7149
    @abbasfatehm7149 Рік тому +1

    Thank You So Much Sir always helping.

  • @prathaps419
    @prathaps419 Рік тому

    Hi Maheer, Thaks for your efforts on doing this video's.its really very helpful.I am looking for similar scenario but instead of file need to get latest table records from SQL server..can you pls explain it how to get them..
    Thanks

  • @annukumari9629
    @annukumari9629 3 роки тому +3

    Very informative 😊 Thanks for sharing

    • @WafaStudies
      @WafaStudies  3 роки тому

      Thank you Madam 😊

    • @tusharchirame820
      @tusharchirame820 Рік тому

      @@WafaStudies could you please share Your email id. I have some doubt.

  • @sravankumar1767
    @sravankumar1767 2 роки тому

    superb.........

  • @rajabhakshi3333
    @rajabhakshi3333 Рік тому

    Thanks for this video

  • @mysahil25
    @mysahil25 2 роки тому +7

    Hi Sir, I am really liking your series of videos & learning from it. I believe there is one issue in above implementation as the setVariable2 should also come under if true condition only along with setting variable 1. It's working in your case since your latest file is the last file that runs in foreach loop but if It won't be last then it will not copy that file. pls check.

    • @sourabhgupta1428
      @sourabhgupta1428 2 роки тому

      Thanks for highlighting this issue, even I noticed while implementing the same that it wont work if your first file is latest file. @
      Bandhu Gupta
      : Can you please suggest where we need to correct to implement the logic correctly?

    • @shalakapowar0707
      @shalakapowar0707 2 роки тому +2

      @@sourabhgupta1428 you can put both variables PreviousModifiedDate and LatestFileName inside If-> true activity. Outside If activity you can add new variable and assign it to @variables('LatestFileName') - this will give u latest file name

    • @venkatasatishnamana4681
      @venkatasatishnamana4681 Рік тому +1

      Just move the 2nd set variable activity to IF - >TRUE activity before or after the 1st set variable activity and join them. No need to modify other things

  • @anmolganju1864
    @anmolganju1864 2 роки тому

    What if I have a date level hirearchy in a data lake gen 2 where I have folder strutcure for each table as /table1/2022/01/03.. /table1/2022/01/10 and files are present there, now how should I pick latest file in this case?

  • @abhishekrana5626
    @abhishekrana5626 3 роки тому +2

    hi sir thanks for such simple n very informative videos , Can you make one video on how we can resume failed copy activity from where it is failed not from the starting how we can achieve it ?

  • @dineshdeshpande6197
    @dineshdeshpande6197 2 роки тому

    Sir , can we sort the list that we got in JSON of the file son lastmodified date in DESC order and get latest modified file.

  • @datatuber
    @datatuber 3 роки тому +1

    Thanks for sharing 👍

  • @aishwaryam8520
    @aishwaryam8520 2 роки тому +1

    Hello sir,
    Can you please tell me what if we are getting two files with same last modified date and time?
    What can be done for this

  • @govardhanbola1195
    @govardhanbola1195 Рік тому

    We are processing files from SFTP location. but the issue is each time we upload a new file in sftp location and run the pipeline, it's processing already processed files along with the new file. As the number of files keep growing, it is becoming a problem. instead what we want is, once the file processed, we want that to move to an archive folder in SFTP location so thatonly latest file will be processedin next run. how to do this

  • @abhishekmehta5193
    @abhishekmehta5193 2 роки тому

    Hi Everyone,
    Quick Question: When I am uploading 2 files (A & B) at the same time, then it only copies file A but it does not copy file B. So can you please help me with logic on what to do if we upload 2 files at the same time?

  • @mabunnicherukuri6751
    @mabunnicherukuri6751 Рік тому

    Thanks for this video, can you please share we have input is excel daily basis files, we want latest file name with last modified date, how to implement ADF pipeline

  • @joyyoung3288
    @joyyoung3288 3 роки тому

    what file for the data set from very beginning? is it static csv contains all file name? cannot follow.

  • @user-sw1cg7wv7d
    @user-sw1cg7wv7d 6 місяців тому

    How fetch files from more than one folder and trigger respective pipeline or databricks notebook

  • @multipleaccounts9207
    @multipleaccounts9207 6 місяців тому

    @wafastudies thanks for your explanation. But this solution is not scalable right with the increase in number of files the for loop has to check all the files everyday to get the last file everyday. Any scalable solution would you suggest?

  • @gopalammanikantarao593
    @gopalammanikantarao593 Рік тому

    HI Sir, could you please help me out this requirement. how to get oldest file from folder and process it in azure data factory

  • @varung2911
    @varung2911 3 роки тому +2

    Hi Maheer, Cant we have both the 'Set variables activity' inside the 'if condition' true activity?

  • @gautampoddar3392
    @gautampoddar3392 3 місяці тому

    It is asking to provide FileName in 1st Get Metadata activity, what to give there can someone help please

  • @bhawnabedi9627
    @bhawnabedi9627 3 роки тому +1

    👍🏻👍🏻

  • @sriramch3128
    @sriramch3128 2 роки тому

    Is this like incremental loading

  • @MoHz-rx5my
    @MoHz-rx5my 8 місяців тому

    I think solution would be to use notebook PySpark or python? Am i right?

  • @abdullahmukminahmad4479
    @abdullahmukminahmad4479 Рік тому

    Hi Wafa how to extract file from folder partition by date

  • @birendrasinghrawat9614
    @birendrasinghrawat9614 2 роки тому +2

    Thank you Maheer, the video is really very good. But this will not work in all the scenario to get the latest file from the folder. If there are 4 files in the folder and in that the first file is the latest file and the last file is the second latest file then it will pick the second latest file because of updating the Previous_Modified_Date variable with Last_modified_date in all the scenarios inside for for each loop.

    • @sourabhgupta1428
      @sourabhgupta1428 2 роки тому

      Thanks for highlighting this issue, even I noticed while implementing the same that it wont work if your first file is latest file. @
      birendra singh rawat : Can you please suggest where we need to correct to implement the logic correctly?

    • @anmolganju1864
      @anmolganju1864 2 роки тому +1

      @@sourabhgupta1428 Try to run the for each loop without sequence it should give the correct output

    • @sfn1231
      @sfn1231 Рік тому

      ​@@sourabhgupta1428 store the previous date only if the condition is true, like storing file name, that way you will always have latest date in the previous date variable, this should work

  • @papachoudhary5482
    @papachoudhary5482 3 роки тому +2

    Thanks

  • @kausarnafisa5804
    @kausarnafisa5804 2 роки тому

    This logic is good thanks for it but this isn't working, we need to set previous modified date inside if condition activity then only it will work.

  • @chaitanyapanchal1312
    @chaitanyapanchal1312 Рік тому

    Hi Maheer,
    I think this is not feasible solution because as the number of files will increase count for comparison will also increase. Suppose we have 50,000 Files then there will be 50,000 comparisons which will decrease the performance.

    • @MoHz-rx5my
      @MoHz-rx5my 8 місяців тому

      Yes..i think better use notebook PySpark or python

  • @shivag7777
    @shivag7777 2 роки тому +1

    Hi Maheer , This is not working when you have 2 or more files modified and it is considering the date of last file in your input folder but not based on last modified date, looks some logic is wrong in this video. I would please keep 7 to 8 files with different date values and the file should be having random date. Kindly have a look again and let me know if you want to share more details.

    • @tarakakrishnavemula3949
      @tarakakrishnavemula3949 Рік тому

      This is what my doubt is , hi broo plzz give solution

    • @gowthamprasad7182
      @gowthamprasad7182 Рік тому

      Instead of PreviousModifiedDate we have to store PreviousMaxModifiedDate using set variable in the if condition along with LatestFileName set variable.

    • @bhagyashree4744
      @bhagyashree4744 Рік тому

      Hi did you got to know how to store multiple files using last modified date ?

  • @CitizenIndia143
    @CitizenIndia143 3 роки тому +1

    Hi @WafaStudies, can’t we get modified date in metadata1 itself??

    • @WafaStudies
      @WafaStudies  3 роки тому

      No bcs your first getmetadata dataset is pointing to folder and if u try to get lastmodified then it will give u folder lastmodified info

    • @thendralponnusamy7973
      @thendralponnusamy7973 3 роки тому

      @WafaStudies what if we have files in multiple sub folders, from the root folder?

  • @sriramch3128
    @sriramch3128 2 роки тому

    If it is sql

  • @Akshaya.medagam
    @Akshaya.medagam Рік тому

    it shouldn't be working as we expected if you had files like below
    File1 8/10/2022 10:53:00:00
    File2 8/10/2022 10:51:00:00
    File3 8/10/2022 10:52:00:00

  • @neerajnaik5161
    @neerajnaik5161 2 роки тому

    this is not the correct way. check incremental data load using data factory in Microsoft documentation.

  • @rohitsethi5696
    @rohitsethi5696 Рік тому

    this is wrong i have test there are 4 files dates are below in the first iteration it make
    (previous date=last modified date)
    previous 01/01/199
    10/2/2023 last modified date
    2/2/2023
    1/2/2023
    5/2/2023
    18/2/2023
    in the first iteration 10/02/2023 is greater it do not go to second iteration
    bsc condition is not satisfied but actually latest file is 18/02/2023

  • @rohitsethi5696
    @rohitsethi5696 Рік тому

    she has used assign value of previous value to latmodfied inside the if condition ua-cam.com/video/sYM6kVpng28/v-deo.html
    which is best approach

  • @Discodave676
    @Discodave676 2 роки тому +1

    Nice video, but you need to speak slower.

    • @WafaStudies
      @WafaStudies  2 роки тому +1

      Thank you. Sure. Thanks for feedback. I will work on it.

  • @suryaa30
    @suryaa30 2 роки тому

    Hi How to pick if we have a list of files with a date suffix like FIle_YYYYMMDD.csv

  • @prasangisrinivasarao4174
    @prasangisrinivasarao4174 Рік тому

    @greater(FormatDateTime(activity('Get Metadata2').output.Lastmodified,'yyyyMMddHHmmss'),FormatDateTime(variables('prevLastModifiedDate'),'yyyyMMddHHmmss'))