12. How to copy latest file or last modified file of ADLS folder using ADF pipeline

Поділитися
Вставка
  • Опубліковано 6 січ 2025

КОМЕНТАРІ •

  • @ktdickinson
    @ktdickinson Рік тому +7

    Stack Overflow, ChatGPT, and Microsoft documentation couldn't easily explain this process to me. This UA-cam video is gold - thank you for the detailed and clear steps! This worked perfectly and helped me understand the nuances of the ADF pipeline.

  • @anoop1971
    @anoop1971 Рік тому +1

    Thanks for a great step-by-step explanation of how to copy files by the last modified date.

  • @mohank.4221
    @mohank.4221 7 місяців тому +1

    It is Really helpful annu. I suppose to start like this. But dont have proper time. Anyways you done a great job. My self Anil. I’m a solution architect in data engineering

  • @johnpaulprathipati153
    @johnpaulprathipati153 2 роки тому +1

    Very good explanation 🙂 many thanks ANU.
    i am following your each every video keep going....

  • @aadilmohammad2936
    @aadilmohammad2936 Рік тому +1

    I like the way of teaching and content

  • @sravankumar1767
    @sravankumar1767 2 роки тому +1

    Nice explanation Annu 👌 👍 👏

  • @ippilivenkataramana5974
    @ippilivenkataramana5974 2 роки тому +2

    Good explanation.
    Small advice for you to make it more useful this video to viewers is write the steps while implementing or before starting the implementation. Then learners can easily remember the steps and process. Don't think this is -ve comment. Just explored my experience after watching your video. I have seen lot of videos in other UA-cam channels, no one is explaining the implementation with writing steps. Beginners like get confusing with the steps. If possible to you then you can add the steps. This is my suggestion only

    • @azurecontentannu6399
      @azurecontentannu6399  2 роки тому

      Great feedback. I will surely implement it going further. Thanks a lot

    • @tusharchirame820
      @tusharchirame820 Рік тому +1

      @@azurecontentannu6399 could you please share Your email id. I have some doubt.

    • @azurecontentannu6399
      @azurecontentannu6399  Рік тому

      @@tusharchirame820 annukumari.ak@outlook.com

  • @jovianaditya7209
    @jovianaditya7209 Рік тому +1

    Hi Annu, I have two parquet files in my storage. After I run my pipeline why am I getting two set variable2 and two set variable1 as the pipeline result? It is supposed only to show one set variable1 and one set variable2 right as the result for the latest modified file?

  • @dataisfun4964
    @dataisfun4964 Рік тому +1

    This is Golden thanks a lot you just made my day.

  • @jamieashton660
    @jamieashton660 Місяць тому +1

    This is the example you're looking for.

  • @radhikakamalesh1327
    @radhikakamalesh1327 2 роки тому +1

    Clean and very simple explanation. Can you please let me know how I can send the latest file as a attachment in a email instead of copying it to another folder

  • @bashabash3697
    @bashabash3697 4 місяці тому +1

    Nice Explanation. Thank you

  • @ZacClark1
    @ZacClark1 Рік тому +1

    Awesome video and very helpful, thank you!

  • @thammineniraviteja2821
    @thammineniraviteja2821 2 роки тому +1

    Annu garu ..good explanation

  • @Dbly915
    @Dbly915 Рік тому +1

    Excellent explanation, thank you!

  • @aaravkumarsingh4018
    @aaravkumarsingh4018 Рік тому +1

    But this only work for single file ,like when i upload more than one file Ex.- 2 file ,then it only fetch latest uploaded file not both files , how can i handle this ?

  • @DeepDeep-zd5jq
    @DeepDeep-zd5jq 2 роки тому +1

    Hello I need your help...In my scenario we have a CSV file in azure data lake with no header and some column data contained comma delimeter like a,b,c while doing copy data task ADF treat this data as a separate column but this data should be on one column... thanks

    • @azurecontentannu6399
      @azurecontentannu6399  2 роки тому +1

      Hi Deep
      By default the column delimiter in csv dataset is comma, you can change it to add symbol value that is not present in the data .. For eg $ using add dynamics content

  • @AdenilsonFTJunior
    @AdenilsonFTJunior 2 місяці тому +1

    Sensational video! One question, why in ForEach1 did you have to create a dataset that points to the parameterised files in the dataset and not use the result of getmedata1, which already returns a list with each of the names and dates?

    • @azurecontentannu6399
      @azurecontentannu6399  2 місяці тому +1

      @@AdenilsonFTJunior The one outside Foreach is pointing to folder level.. So we are getting the file names within the folder using child items and the one inside Foreach is parameterized to process those files one by one through iteration and get the last modified date of the files

    • @AdenilsonFTJunior
      @AdenilsonFTJunior 2 місяці тому

      @@azurecontentannu6399 thanks!

  • @VinodRS01
    @VinodRS01 Рік тому +1

    Crystal clear 🔥

  • @MahalingSawle-m6b
    @MahalingSawle-m6b 4 місяці тому +1

    Can you please confirm if For each activity is run in parallel or sequential mode? If its parallel, the set variable gives random result.

  • @jardila7701
    @jardila7701 29 днів тому +1

    Hi, I am working with a excel file instead of csv. And when debugging it is asking me about to select a Sheet name

    • @azurecontentannu6399
      @azurecontentannu6399  28 днів тому

      @@jardila7701 ya sheet name or sheet index u need to provide for excel

  • @akshataprabhu8752
    @akshataprabhu8752 9 місяців тому

    hi annu...i am looking for the same opertaion to be performed using azcopy command in a shell script

  • @gowrishankart2683
    @gowrishankart2683 Рік тому +1

    Hi Annu, if i want to compare current date and lastModified date then what would be the if condition expression? , If yes thn success els failure..

  • @bolisettisaisatwik2198
    @bolisettisaisatwik2198 7 місяців тому

    Would the set variable capture two files with same last modified data and two different file names?

  • @SachinGupta-nh5vy
    @SachinGupta-nh5vy Рік тому +1

    Hello,
    I have read that we can not use set variable activity inside For-Each when for-each is set to parallel .
    But here we are using it.Can you please explain In which case this limitation will create impact.

    • @azurecontentannu6399
      @azurecontentannu6399  Рік тому +1

      It's not like we can't use it , but the thing is if we use set variable inside foreach and do not set it to sequential , then it might mess up the expected output as variable is going to be set as A for first run which needs to flow in consecutive activities which is consuming that value but before that itself variable is getting set to B

  • @astabratakundu3923
    @astabratakundu3923 2 роки тому +1

    Thaks a lot Annu..

  • @Prashant-s7f
    @Prashant-s7f Рік тому +1

    When I re-run same pipeline, Why I am getting different folder names as latest folder? Can you help.. I am trying this logic on folders.

  • @ashishgochhi5466
    @ashishgochhi5466 5 місяців тому

    Is it possible to apply same in SharePoint to extract the new/modified files..?

  • @dineshpandey5008
    @dineshpandey5008 4 місяці тому

    Thanks for this wonderful explanation, but I have one doubt: can we do it using filter by last modified option of Getmeta data activity. It is just my doubt, overall I am following your channel and learning new things daily.. amazing ..

  • @alokkumar-db9vw
    @alokkumar-db9vw 7 місяців тому

    HI, Thanks for the detailed steps .Its a very good content and its working also. Could you please share how to load latest file to snowflake table from blob storage ? Please note the file is excel one.

  • @MrKalanidhi
    @MrKalanidhi Рік тому +1

    Very good explanation. Thanks for that . I have followed your video and tried but i was getting multiple set variables in my second run and each one showing the files name and last modified . I need only one out put for my second run for latest file . please help me

    • @azurecontentannu6399
      @azurecontentannu6399  Рік тому

      Don't worry about getting multiple set variables .. make the foreach run sequential and the last set variable will give the latest file name

  • @rajatmehta4597
    @rajatmehta4597 9 місяців тому +1

    Hi ,kudos! This is really an informative video, however- I have a small doubt- “what if all 5 filenames are same but only last modified date is different. Then how will it pick the latest file? Bcz you only set the - FileName variable to set variable output. In this condition- all files are same. I hope i am able to explain my question.
    Pls help with a solution. Thanks

    • @azurecontentannu6399
      @azurecontentannu6399  9 місяців тому

      Hi.. There is no way you can store 2 files with same name in an ADLS folder. So the question doesn't come at the first place 😊

  • @insane2093
    @insane2093 Рік тому

    Good explanation

  • @husnabanu4370
    @husnabanu4370 Рік тому +1

    Hello your videos are awesome..in real time scenario what if we are moving the files after copying to archieve folder this way source folder will have only latest files..should we still consider using this approach??if we have more number of files it will take huge time and performance may impact?please advice

    • @azurecontentannu6399
      @azurecontentannu6399  Рік тому

      Thankyou so much . Glad to know my videos were helpful. In that case you should watch part 14 where SQL table has been used to log all the filenames and their last modified date time and write your custom query to get top N last modified files: ua-cam.com/video/9PYZ3uEa6nk/v-deo.html

  • @acocietocioto
    @acocietocioto 2 роки тому +1

    great tutorial, thanks!

  • @jitendrasharma911
    @jitendrasharma911 Місяць тому

    Hi I'm doing this activity in same way but in set variables return last modified date for all files
    This flow only working for csv ??

  • @varshiniparamasivam2951
    @varshiniparamasivam2951 7 місяців тому

    Great video, I am having this doubt here in this video we are getting latest file in dataset from there can we use that lastest file in dataflow

  • @pyspark9496
    @pyspark9496 6 місяців тому +1

    very helpfull,but walk through bit slowly

  • @user-ug64e3st87op
    @user-ug64e3st87op 9 місяців тому

    How do I copy the contents of the last found file into a table in Snowflake?

  • @rohitsethi5696
    @rohitsethi5696 Рік тому +1

    Anu why you have used last modified value to referdate inside if condition..It should be after if condition. do you thing it is right approach

    • @azurecontentannu6399
      @azurecontentannu6399  Рік тому

      Sorry what do you mean by after if condition.. We are comparing lastmodified
      Value with refdatetime value in the if condition

    • @rohitsethi5696
      @rohitsethi5696 Рік тому

      @@azurecontentannu6399 it is not comparing but assigning to lastmodified value to referdate after if conditions,Comparing is done before if conditoin to invoke

  • @hosseinj2622
    @hosseinj2622 Рік тому

    Thanks very much Annu. it's been a very helpful and handy video. It helped me to get the latest file in a blob storage. I have a question however, if I have a mix of different file types (lets say csv and JSON), is it possible to specify to extract the name of only the CSV file? I created a blob>eliminated csv dataset, however it seems to select any file times in the directory. I would appreciate it if you could advise how to sort this issue out. Thanks

  • @lokeshkumar-cb2qp
    @lokeshkumar-cb2qp Рік тому +1

    Thank you !!!

  • @user-ug64e3st87op
    @user-ug64e3st87op 9 місяців тому +1

    ua-cam.com/video/sYM6kVpng28/v-deo.html
    I did not undesrstend, you say that value of output similar as value of file in storage account, BUT time of output 2022-09-03T12:14:01Z, and time of the last modification file in container 9/3/2022, 5:44:02 PM
    How can it be the same time?

    • @azurecontentannu6399
      @azurecontentannu6399  9 місяців тому

      Hey sorry for the confusion. The pipeline output is in UTC time zone. Storage account is showing time in IST (India), if you add 5 hours 30 mins to pipeline output, it's same time as lastmodified time in storage account.

  • @deepakrawat418
    @deepakrawat418 2 роки тому +1

    Hi Annu I am Data Analyst can I start this playlist?

    • @azurecontentannu6399
      @azurecontentannu6399  2 роки тому

      Data analysis and Data engineering goes hand in hand.. So it might be useful if you want to know how data is ingested and transformed before it's ready for reporting.

  • @sacnan
    @sacnan 5 місяців тому +1

    Imagine the level of debugging required incase of an issue.
    Cant you just put the list of files into a SQL table and then return file on the max modified date and just copy that file ?

    • @azurecontentannu6399
      @azurecontentannu6399  5 місяців тому

      @@sacnan yes that approach is covered in this video : ua-cam.com/video/9PYZ3uEa6nk/v-deo.htmlsi=WgggbuoZKEZ5EK16

    • @sacnan
      @sacnan 5 місяців тому

      @@azurecontentannu6399 Thanks mam

  • @hritiksharma7154
    @hritiksharma7154 2 роки тому

    Great

  • @azharaktherk2432
    @azharaktherk2432 19 днів тому +1

    Go slow it would be much helpful for learners.

  • @sucharitadey6417
    @sucharitadey6417 8 місяців тому +1

    Nicely explained