#101

Поділитися
Вставка
  • Опубліковано 5 жов 2024
  • This video shows the steps required to split a file to smaller ones with just 3 steps.

КОМЕНТАРІ • 56

  • @dorgeswati
    @dorgeswati 3 роки тому +2

    keep it up, very good series. really enjoying it. i am learning ADF

    • @AllAboutBI
      @AllAboutBI  3 роки тому

      Thanks dorgeswati!

    • @carlosbode8830
      @carlosbode8830 3 роки тому

      @Titan Forest Try flixzone. You can find it by googling :)

  • @MaheshReddyPeddaggari
    @MaheshReddyPeddaggari 3 роки тому

    Very good explanation
    Thanks for sharing knowledge

  • @uttapa22
    @uttapa22 4 місяці тому

    Hi,
    Thanks for posting this video.
    Can you please clarify how you ensured the files were split by country?

  • @vuppalanaveen82246
    @vuppalanaveen82246 Рік тому

    Good Interview Question might be.How to do an incremental data processing in azure data factory or data bricks if the file size is large

  • @groupdancebypz
    @groupdancebypz 2 роки тому

    Thanks for the detailed explanation.
    When trying I am getting only two partitions out of which one file size is zero and the other one is the full file (where the split was for 4 when calculated). Could you please help me sort the issue as where I went wrong?

  • @mikaelburban
    @mikaelburban 2 роки тому

    Hi,
    Thank your for this video, very helpful. Quick question: how can I set up the dataflow in order to have only the first file with the header and the other ones with only the data please ? I need to split a file into chunks before sending it through API and thus i need to have only the first file with the header. Thanks !

  • @vasudevankrishnamurthy7046
    @vasudevankrishnamurthy7046 3 роки тому

    Nice feature.Thanks for the video

  • @sinikishan1408
    @sinikishan1408 3 роки тому

    Informative session mam

  • @Mehboob472
    @Mehboob472 3 роки тому

    Very informative! 👍🏻

  • @nireeshagayathri2149
    @nireeshagayathri2149 3 роки тому

    Very well explained🙏🙏 madam

  • @sreekarsastry9395
    @sreekarsastry9395 Рік тому

    Hi..
    Thanks a lot for the video

  • @B_S0305
    @B_S0305 2 роки тому

    hi Thank you so much for explanation can you please tell me that now my datasets are partitioned then how can i now use these portioned datasets in my transformation in my databricks notebook? how can now load this split datasets? in SCALA

  • @ShriramVasudevan
    @ShriramVasudevan 3 роки тому

    Very useful

  • @tesgheb2963
    @tesgheb2963 Рік тому

    A good video ! How do we partition the file by date instead of size ?

    • @AllAboutBI
      @AllAboutBI  Рік тому

      May be you have to check this ua-cam.com/video/hVfGr8AD35I/v-deo.html

  • @upskillup
    @upskillup Рік тому

    @All About BI! Hi Maam, What if my JSON file has 4 GB in ADLS and wants to load the data into SQL DB, Do you recommend the same process - where it creates around 4000 files and loads using DF? please advise the best solution to achieve it. I tried large clusters with memory-optimized, partitions but had no luck. DF is failing due to OOM. Pls suggest.

  • @deepeshsalvi7760
    @deepeshsalvi7760 3 роки тому

    Hi mam,can you please help understand how the data in the same files distributed?? How to we identify what data is available in which files

  • @AnandKumar-dc2bf
    @AnandKumar-dc2bf 3 роки тому

    Can u show a scenario to copy only a set of fields from tables(say 10 columns data from overall 20 columns) in SQL into ADLS as csv files

  • @shankarnarayanan24
    @shankarnarayanan24 Рік тому

    So I have .gz file which is 20gb in sftp.. I want it into ADLS as it is as a .gz file.. with this approach I can partition it and then how do I compress it back?

  • @hello2_35
    @hello2_35 Рік тому

    Can it be done without using data flow?

  • @vardhanvavilala4850
    @vardhanvavilala4850 3 роки тому

    Hi Ma’am if we have multiple datasets in a single file how do we split the file into individual dataset

  • @skselva403
    @skselva403 11 місяців тому

    Is this one applicable for Database to Database ?

  • @sravankumar1767
    @sravankumar1767 2 роки тому

    Without Dataflows how can we do . Can you please explain

  • @vivekzalki
    @vivekzalki 2 роки тому

    Super thank you : )

  • @sonalijaiswal9110
    @sonalijaiswal9110 2 роки тому

    Can we split large xml files also into smaller xml files.

  • @bhavindedhia3976
    @bhavindedhia3976 11 місяців тому

    - - - folder
    .json - files
    .json
    -
    .json
    .json
    how to upload file in this format

  • @sravankumar1767
    @sravankumar1767 2 роки тому

    Nics explanation

  • @oriono9077
    @oriono9077 3 роки тому

    Useful Tip 👍👍

  • @sambathgurusamy566
    @sambathgurusamy566 2 роки тому

    Hi..
    Anybody faced duplicate issue..?
    The source file is being splitted as expected, but one or few splitted files have duplicate records.. i have cross checked, there is no issue in source file..

  • @navnathjarare4829
    @navnathjarare4829 Рік тому

    NICE

  • @aditishrivastava4850
    @aditishrivastava4850 10 місяців тому

    can we split large parquet file into small parquet files using same method ?

  • @விரேவதி
    @விரேவதி 3 роки тому

    Very nice

  • @rajeevsharma2664
    @rajeevsharma2664 3 роки тому

    Just one question - lets say I split a file which contains the fact table data into 5 files. When I load the data from DataLake to SQL DW, how the splitting would help?

    • @AllAboutBI
      @AllAboutBI  3 роки тому

      Data flow can point to the folder which has the split files. It can load all files parallely.

    • @rajeevsharma2664
      @rajeevsharma2664 3 роки тому

      @@AllAboutBI My apologies, I'm not clear. Lets say you break a fact csv into 6 csvs. So while you load to DW fact table, you'll be using a FOREACH loop and eventually it'll be loading sequentially

    • @AllAboutBI
      @AllAboutBI  3 роки тому

      @@rajeevsharma2664 no, no need to use foreach .. make ur data flow source to point to the folder where the files are present like, output/*.CSV..
      By giving wildcard file name, data flow will load all matching files in parallel

  • @ShriyaKYadav
    @ShriyaKYadav 3 роки тому

    Hi mam, I am trying to apply same scenario, but while validation I m getting error as "linked service with self hosted integration runtime is not supported in data flow"

    • @AllAboutBI
      @AllAboutBI  3 роки тому

      Hey, as the error says, you can't connect to an on prem data store inside data flow

    • @ShriyaKYadav
      @ShriyaKYadav 3 роки тому

      Ya thank you.. that issue is resolved.. but now my files are not splitting in same size.. I have 34 MB file if I split file size is diff.. how to deal with it..

    • @AllAboutBI
      @AllAboutBI  3 роки тому

      @@ShriyaKYadav why do you want to have it all on same size .. any reason

    • @ShriyaKYadav
      @ShriyaKYadav 3 роки тому

      @@AllAboutBI as I can't load more than 16 mb file in snowflake table.. in a single column.. so I tried your way but one file is splitted and generated with 17 mb size..

  • @AnandKumar-dc2bf
    @AnandKumar-dc2bf 3 роки тому

    Thanks...

  • @ambersingh3175
    @ambersingh3175 Рік тому

    I like ur accent lol