20. Get Latest File from Folder and Process it in Azure Data Factory

Поділитися
Вставка
  • Опубліковано 7 січ 2025

КОМЕНТАРІ • 80

  • @vijaybodkhe8379
    @vijaybodkhe8379 Рік тому +4

    I think Set Variable 2 (i.e PreviousModifiedDate) should have been inside If condition. Current file modified time should always compared with the highest modified time among previous files.

  • @sayonbhattacharjee7470
    @sayonbhattacharjee7470 3 роки тому +5

    Amazing Video...
    Just wanted to cross check that I think set var (last mod date) should also come under IF activity. Then only it is working correctly in my case.

  • @MaheshReddyPeddaggari
    @MaheshReddyPeddaggari 3 роки тому +1

    I am very much waiting for these kind of scenarios
    Thanks Maheer

  • @mysahil25
    @mysahil25 3 роки тому +7

    Hi Sir, I am really liking your series of videos & learning from it. I believe there is one issue in above implementation as the setVariable2 should also come under if true condition only along with setting variable 1. It's working in your case since your latest file is the last file that runs in foreach loop but if It won't be last then it will not copy that file. pls check.

    • @sourabhgupta1428
      @sourabhgupta1428 3 роки тому

      Thanks for highlighting this issue, even I noticed while implementing the same that it wont work if your first file is latest file. @
      Bandhu Gupta
      : Can you please suggest where we need to correct to implement the logic correctly?

    • @shalakapowar0707
      @shalakapowar0707 2 роки тому +2

      @@sourabhgupta1428 you can put both variables PreviousModifiedDate and LatestFileName inside If-> true activity. Outside If activity you can add new variable and assign it to @variables('LatestFileName') - this will give u latest file name

    • @venkatasatishnamana4681
      @venkatasatishnamana4681 Рік тому +1

      Just move the 2nd set variable activity to IF - >TRUE activity before or after the 1st set variable activity and join them. No need to modify other things

  • @ranganathmaimal
    @ranganathmaimal 3 місяці тому

    Super Informative 👌

  • @annukumari9629
    @annukumari9629 3 роки тому +3

    Very informative 😊 Thanks for sharing

    • @WafaStudies
      @WafaStudies  3 роки тому

      Thank you Madam 😊

    • @tusharchirame820
      @tusharchirame820 Рік тому

      @@WafaStudies could you please share Your email id. I have some doubt.

  • @abbasfatehm7149
    @abbasfatehm7149 Рік тому +1

    Thank You So Much Sir always helping.

  • @SathishKotte-p3z
    @SathishKotte-p3z Рік тому

    Hi Maheer, I'm learning ADF watching your videos, it's amazing series, I just want to cross check, I think we need to use 2 set variables in true condition only, need to put previous modified date first then latest file name variable, then only it's working fine in my case, Thanks

  • @ronaldorn84
    @ronaldorn84 2 роки тому +1

    Wonderful tips! TKS you help a lot

  • @Imrannaseem818
    @Imrannaseem818 3 роки тому +1

    Thanks Maheeer I was looking for this video cheers!

    • @WafaStudies
      @WafaStudies  3 роки тому

      Welcome 🤗

    • @vru5696
      @vru5696 9 місяців тому

      @@WafaStudies Thank you for your videos. Can you please create video on extract the file from Sharepoint location and load into sql table. Thanks

  • @varung2911
    @varung2911 3 роки тому +3

    Hi Maheer, Cant we have both the 'Set variables activity' inside the 'if condition' true activity?

  • @abhishekrana5626
    @abhishekrana5626 3 роки тому +2

    hi sir thanks for such simple n very informative videos , Can you make one video on how we can resume failed copy activity from where it is failed not from the starting how we can achieve it ?

  • @roshankumargupta46
    @roshankumargupta46 3 роки тому +1

    Thanks for this video! Can you also create a video to explain how can we verify sources and target tables? Like how can we verify all the rows and columns value got copied correctly using data factory

  • @aishwaryam8520
    @aishwaryam8520 2 роки тому +1

    Hello sir,
    Can you please tell me what if we are getting two files with same last modified date and time?
    What can be done for this

  • @Kirbys911Heaven
    @Kirbys911Heaven 3 роки тому +1

    Super helpful. Thank you very much.

  • @anmolganju1864
    @anmolganju1864 3 роки тому

    What if I have a date level hirearchy in a data lake gen 2 where I have folder strutcure for each table as /table1/2022/01/03.. /table1/2022/01/10 and files are present there, now how should I pick latest file in this case?

  • @multipleaccounts9207
    @multipleaccounts9207 11 місяців тому

    @wafastudies thanks for your explanation. But this solution is not scalable right with the increase in number of files the for loop has to check all the files everyday to get the last file everyday. Any scalable solution would you suggest?

  • @dineshdeshpande6197
    @dineshdeshpande6197 3 роки тому

    Sir , can we sort the list that we got in JSON of the file son lastmodified date in DESC order and get latest modified file.

  • @mabunnicherukuri6751
    @mabunnicherukuri6751 Рік тому

    Thanks for this video, can you please share we have input is excel daily basis files, we want latest file name with last modified date, how to implement ADF pipeline

  • @govardhanbola1195
    @govardhanbola1195 2 роки тому

    We are processing files from SFTP location. but the issue is each time we upload a new file in sftp location and run the pipeline, it's processing already processed files along with the new file. As the number of files keep growing, it is becoming a problem. instead what we want is, once the file processed, we want that to move to an archive folder in SFTP location so thatonly latest file will be processedin next run. how to do this

  • @HnkBnnndk
    @HnkBnnndk 3 роки тому

    Nice and clear explenation. But when i try this on files in different subfolders, the mechanisme doesn't work with wildcards for subdirectories. Do you have a solution for that?

  • @prathaps419
    @prathaps419 Рік тому

    Hi Maheer, Thaks for your efforts on doing this video's.its really very helpful.I am looking for similar scenario but instead of file need to get latest table records from SQL server..can you pls explain it how to get them..
    Thanks

  • @datatuber
    @datatuber 3 роки тому +1

    Thanks for sharing 👍

  • @joyyoung3288
    @joyyoung3288 3 роки тому

    what file for the data set from very beginning? is it static csv contains all file name? cannot follow.

  • @birendrasinghrawat9614
    @birendrasinghrawat9614 3 роки тому +2

    Thank you Maheer, the video is really very good. But this will not work in all the scenario to get the latest file from the folder. If there are 4 files in the folder and in that the first file is the latest file and the last file is the second latest file then it will pick the second latest file because of updating the Previous_Modified_Date variable with Last_modified_date in all the scenarios inside for for each loop.

    • @sourabhgupta1428
      @sourabhgupta1428 3 роки тому

      Thanks for highlighting this issue, even I noticed while implementing the same that it wont work if your first file is latest file. @
      birendra singh rawat : Can you please suggest where we need to correct to implement the logic correctly?

    • @anmolganju1864
      @anmolganju1864 3 роки тому +1

      @@sourabhgupta1428 Try to run the for each loop without sequence it should give the correct output

    • @sfn1231
      @sfn1231 2 роки тому

      ​@@sourabhgupta1428 store the previous date only if the condition is true, like storing file name, that way you will always have latest date in the previous date variable, this should work

  • @gopalammanikantarao593
    @gopalammanikantarao593 2 роки тому

    HI Sir, could you please help me out this requirement. how to get oldest file from folder and process it in azure data factory

  • @shubhamsingh-j5q
    @shubhamsingh-j5q Рік тому

    How fetch files from more than one folder and trigger respective pipeline or databricks notebook

  • @abhishekmehta5193
    @abhishekmehta5193 2 роки тому

    Hi Everyone,
    Quick Question: When I am uploading 2 files (A & B) at the same time, then it only copies file A but it does not copy file B. So can you please help me with logic on what to do if we upload 2 files at the same time?

  • @gautampoddar3392
    @gautampoddar3392 9 місяців тому

    It is asking to provide FileName in 1st Get Metadata activity, what to give there can someone help please

  • @sriramch3128
    @sriramch3128 2 роки тому

    Is this like incremental loading

  • @shivag7777
    @shivag7777 3 роки тому +1

    Hi Maheer , This is not working when you have 2 or more files modified and it is considering the date of last file in your input folder but not based on last modified date, looks some logic is wrong in this video. I would please keep 7 to 8 files with different date values and the file should be having random date. Kindly have a look again and let me know if you want to share more details.

    • @tarakakrishnavemula3949
      @tarakakrishnavemula3949 Рік тому

      This is what my doubt is , hi broo plzz give solution

    • @gowthamprasad7182
      @gowthamprasad7182 Рік тому

      Instead of PreviousModifiedDate we have to store PreviousMaxModifiedDate using set variable in the if condition along with LatestFileName set variable.

    • @bhagyashree4744
      @bhagyashree4744 Рік тому

      Hi did you got to know how to store multiple files using last modified date ?

  • @abdullahmukminahmad4479
    @abdullahmukminahmad4479 2 роки тому

    Hi Wafa how to extract file from folder partition by date

  • @rajabhakshi3333
    @rajabhakshi3333 2 роки тому

    Thanks for this video

  • @MoHz-rx5my
    @MoHz-rx5my Рік тому

    I think solution would be to use notebook PySpark or python? Am i right?

  • @Mylittleprincessvenna
    @Mylittleprincessvenna 2 роки тому

    it shouldn't be working as we expected if you had files like below
    File1 8/10/2022 10:53:00:00
    File2 8/10/2022 10:51:00:00
    File3 8/10/2022 10:52:00:00

  • @sravankumar1767
    @sravankumar1767 3 роки тому

    superb.........

  • @chaitanyapanchal1312
    @chaitanyapanchal1312 Рік тому

    Hi Maheer,
    I think this is not feasible solution because as the number of files will increase count for comparison will also increase. Suppose we have 50,000 Files then there will be 50,000 comparisons which will decrease the performance.

    • @MoHz-rx5my
      @MoHz-rx5my Рік тому

      Yes..i think better use notebook PySpark or python

  • @kausarnafisa5804
    @kausarnafisa5804 2 роки тому

    This logic is good thanks for it but this isn't working, we need to set previous modified date inside if condition activity then only it will work.

  • @papachoudhary5482
    @papachoudhary5482 3 роки тому +2

    Thanks

  • @CitizenIndia143
    @CitizenIndia143 3 роки тому +1

    Hi @WafaStudies, can’t we get modified date in metadata1 itself??

    • @WafaStudies
      @WafaStudies  3 роки тому

      No bcs your first getmetadata dataset is pointing to folder and if u try to get lastmodified then it will give u folder lastmodified info

    • @thendralponnusamy7973
      @thendralponnusamy7973 3 роки тому

      @WafaStudies what if we have files in multiple sub folders, from the root folder?

  • @bhawnabedi9627
    @bhawnabedi9627 3 роки тому +1

    👍🏻👍🏻

  • @sriramch3128
    @sriramch3128 2 роки тому

    If it is sql

  • @neerajnaik5161
    @neerajnaik5161 3 роки тому

    this is not the correct way. check incremental data load using data factory in Microsoft documentation.

  • @Discodave676
    @Discodave676 2 роки тому +1

    Nice video, but you need to speak slower.

    • @WafaStudies
      @WafaStudies  2 роки тому +1

      Thank you. Sure. Thanks for feedback. I will work on it.

  • @rohitsethi5696
    @rohitsethi5696 Рік тому

    this is wrong i have test there are 4 files dates are below in the first iteration it make
    (previous date=last modified date)
    previous 01/01/199
    10/2/2023 last modified date
    2/2/2023
    1/2/2023
    5/2/2023
    18/2/2023
    in the first iteration 10/02/2023 is greater it do not go to second iteration
    bsc condition is not satisfied but actually latest file is 18/02/2023

  • @rohitsethi5696
    @rohitsethi5696 Рік тому

    she has used assign value of previous value to latmodfied inside the if condition ua-cam.com/video/sYM6kVpng28/v-deo.html
    which is best approach

  • @suryaa30
    @suryaa30 2 роки тому

    Hi How to pick if we have a list of files with a date suffix like FIle_YYYYMMDD.csv

  • @prasangisrinivasarao4174
    @prasangisrinivasarao4174 2 роки тому

    @greater(FormatDateTime(activity('Get Metadata2').output.Lastmodified,'yyyyMMddHHmmss'),FormatDateTime(variables('prevLastModifiedDate'),'yyyyMMddHHmmss'))