Microsoft Fabric: Incremental ETL for Warehouse using Dataflow Gen 2, SQL Procedure, Data Pipeline

Поділитися
Вставка
  • Опубліковано 3 лис 2024

КОМЕНТАРІ • 45

  • @zhiyingwang1234
    @zhiyingwang1234 5 днів тому +1

    Hi Amit, you may not know me, but I have encountered you so many times in Power BI Community. I started to learn Fabric recently and suddenly you appear everywhere when I search for Fabric tutorials. What a happy reunion !

    • @AmitChandak
      @AmitChandak  5 днів тому

      Hi there! Thanks for the kind words! 😊 I'm glad to hear we've crossed paths in the Power BI Community and now with Fabric too! Welcome to the Fabric journey - happy to be part of your learning experience!

  • @MonkXD
    @MonkXD Рік тому +1

    My friend, i would like to thanks for all the effort and time that you've been dedicating for this series. Thanks!

    • @AmitChandak
      @AmitChandak  Рік тому +2

      Thanks 🙏
      Hope you will like the full series
      Mastering Microsoft Fabric 35+ Videos: ua-cam.com/video/p-v0I5S-ybs/v-deo.html

  • @arthurvanrijn68
    @arthurvanrijn68 Рік тому +2

    Thank you Amit, very nice done. Fabric is the future 😎

    • @AmitChandak
      @AmitChandak  Рік тому +1

      Thank you for your kind words! I completely agree with you. Microsoft Fabric is an incredible technology that holds immense potential for the future. Its innovative features and capabilities make it a game-changer in the industry. I'm excited to see how it will continue to evolve and shape the future of computing. Kudos to the Microsoft team for their outstanding work!
      Hope you will like the full series
      ua-cam.com/video/p-v0I5S-ybs/v-deo.html

  • @shafa7668
    @shafa7668 Рік тому +1

    Fantastic. Exactly the vedio I needed. I highly appreciate you Amit for doing this.

    • @AmitChandak
      @AmitChandak  Рік тому +1

      I'm glad to hear that the video was exactly what you needed! Microsoft Fabric is truly an impressive technology, and I'm happy that the video provided you with the information you were looking for. Thank you for sharing your positive feedback!
      One more is coming soon - Incremental for Warehouse using only data pipeline

  • @kranthikumar-el5lz
    @kranthikumar-el5lz 6 місяців тому +1

    Thanks so much. Please make more videos on Dataflow gen2

    • @AmitChandak
      @AmitChandak  6 місяців тому +1

      Sure. Hope you will like more video in the series
      Mastering Microsoft Fabric 40+ Videos: ua-cam.com/video/p-v0I5S-ybs/v-deo.html

  • @ashi145
    @ashi145 3 місяці тому +1

    Hi Amit - in this demo, you basically pulling all data from source and then doing lookup with help of fact table to identify which needs modification. can't you use proc from source which has data parameter to passed in so it only return records after last data sync?

    • @AmitChandak
      @AmitChandak  3 місяці тому +1

      There are many ways you can do this. In some cases, I have even used a SQL statement to get the max date using a SQL Server connection (for Lakehouse and Warehouse-).

  • @sanishthomas2858
    @sanishthomas2858 6 місяців тому +1

    Good. But PostRSQL is on Premise right so how we have used the self hosted IR or Data ways to access it?

    • @AmitChandak
      @AmitChandak  6 місяців тому +1

      You can use on-premise gateway
      Microsoft Fabric: Load local SQL server data in Lakehouse using on-premise Gateway| Dataflow Gen2 - ua-cam.com/video/oEF-jHVmvdo/v-deo.html
      Now Integrate On-Premise Data with Microsoft Fabric Using Data Pipelines: Gateway Update- ua-cam.com/video/sVePvZOjtoo/v-deo.html

  • @trivediasit
    @trivediasit Рік тому +1

    Thanks for the detailed explanation here. What would happen in case there is no incremental data? For example no new sales orders over the weekend for namesake. It'll be great if you can share the process to handle that.

    • @AmitChandak
      @AmitChandak  Рік тому

      If there is no data, then add an additional condition , update only when date is not null or use coalesce. If the case is if there no new data coming in.

  • @adilmajeed8439
    @adilmajeed8439 Рік тому +1

    Thanks again for such a nice blog, is it possible for you to tag the numbers on the videos so that it will be easy to follow instead of looking at the publishibg date. Thanks for understanding.

    • @AmitChandak
      @AmitChandak  Рік тому +1

      Please follow the series link. I will see if, I can rename and add number
      ua-cam.com/video/p-v0I5S-ybs/v-deo.html

  • @sukumarm5926
    @sukumarm5926 6 місяців тому +1

    Hi , Is it possible to use where clause in Copy data activity from ADF source as Microsoft Fabric , SQL Analytics Endpoint?

    • @AmitChandak
      @AmitChandak  6 місяців тому +1

      Please check the Incremental load video, I have given an example to use where the clause
      Microsoft Fabric: Incremental ETL for Warehouse using Data Pipeline, SQL Procedure: ua-cam.com/video/qsOIfTzjCSQ/v-deo.html
      Microsoft Fabric: Incremental ETL for Warehouse using Dataflow Gen 2, SQL Procedure, Data Pipeline: ua-cam.com/video/mpFRnZBXsvQ/v-deo.html
      #microsoftfabric- How to use Append data option of Data Flow Gen 2 and Data Pipeline| Incremental ua-cam.com/video/E5axX6auBN4/v-deo.html
      Microsoft Fabric: How to append only Incremental data using Data Pipeline in Lakehouse: ua-cam.com/video/xzhmaQnB2fM/v-deo.html

  • @Shreekanthsharma-t6x
    @Shreekanthsharma-t6x 5 місяців тому +1

    Hi Amit, Good Morning!, I have to convert the existing SQL server stored procedure into fabric environment, In my stored procedures there are CURSOR commands but fabric doesn't support CURSOR commands, in this case how do I proceed, is there any alternative.

    • @AmitChandak
      @AmitChandak  4 місяці тому +2

      I have tried something very similar, TSQL loops
      learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-loops
      Search for Azure Synapse solution without using cursor

  • @Kiran-eb3rm
    @Kiran-eb3rm 8 місяців тому +1

    Hi Amit,
    One question what if we update some records will they also come, or only newly added records will come.
    Thanks!

    • @AmitChandak
      @AmitChandak  8 місяців тому +1

      Depending on the type of incremental load you run, if you use the update date as the basis, it will bring in both new and previously updated data, which you will need to handle in the ETL process.

  • @JoanSCruz
    @JoanSCruz 8 місяців тому +1

    Hi Amit, sorry if it looks as I'm not doing the proper effort but I quite don't get how you can use DML in the SQL endpoint, whenever I try to use insert, update, delete, etc, I get an error that those commands are not supported, I don't get what I'm doing wrong. On the other hand I really appreciate all your material.

    • @AmitChandak
      @AmitChandak  8 місяців тому +2

      Hi Joan,
      Hope you are using that on the Warehouse SQL endpoint, Not the Lakehouse. On Lakehouse you can run using Spark SQL -
      Managing Microsoft Fabric Lakehouse using Spark SQL: Data Manipulation Insert, Update, Delete, Alter- ua-cam.com/video/PunKbz4iCEg/v-deo.html

    • @JoanSCruz
      @JoanSCruz 8 місяців тому

      @@AmitChandak you're so kind, yes I kept watching your videos and found out the issue was exactly that, I was using lakehouse. Thanks, your videos are excellent

  • @ADhuidv
    @ADhuidv 4 місяці тому +1

    Sir,
    How can we build the JDBC/Pyodbc connection between Fabric Data warehouse and Fabric Notebook.
    I have been finding it since a long time, but un-successful

    • @AmitChandak
      @AmitChandak  4 місяці тому +1

      I have not tried on notebook, but I have tried from local Python to Warehouse. Check if that code can help
      Microsoft Fabric: Load local SQL server data in Warehouse using Python, Pandas, and sqlalchemy - ua-cam.com/video/P0o-a-8rFH0/v-deo.html

    • @ADhuidv
      @ADhuidv 4 місяці тому

      ❤❤
      It will definitely solve my scenario

    • @ADhuidv
      @ADhuidv 4 місяці тому

      Thank you so much sir 😊😊

  • @zhiyingwang1234
    @zhiyingwang1234 5 днів тому

    The tutorial uses sql commands in data warehouse to create/alter tables, I can only load data into lakehouse via copy data. Dataflow Gen2 won't work due to firewall setup. How can I implement incremental loading in data lakehouse? I understand one can't change data in sql endpoint of a lakehouse.

    • @AmitChandak
      @AmitChandak  5 днів тому

      @zhiyingwang1234 , You can use Spark SQL, or PySpark. In spark sql try merge
      spark.sql("""
      MERGE INTO target_table AS target
      USING source_table AS source
      ON target.id = source.id
      WHEN MATCHED THEN
      UPDATE SET target.value = source.value
      WHEN NOT MATCHED THEN
      INSERT (id, value) VALUES (source.id, source.value)
      """)

  • @GudhawalaLadka
    @GudhawalaLadka Рік тому +1

    Can this process handle millions of data. I have a table which has 13 million rows which keeps of increasing every month. I am not able to use incremental refresh on Dataflows because of the large data. I am being hopeful if it can work out

    • @AmitChandak
      @AmitChandak  Рік тому +1

      I successfully managed to load data of up to 4 million records without incremental load. Let me to share a method to achieve this. Create a table that has two dates: "Start Date" and "End Date." Customize the date range to correspond with either a year or a month, depending on your specific requirements. After each loading cycle, once the data processing concludes, adjust the "Start Date" and "End Date" to accommodate the subsequent year or month, as dictated by your business needs. This can be achieved by performing an update within a procedure. Execute this process multiple times or create a loop within a data pipeline to ensure a seamless and efficient data loading operation.

  • @ilseespinobarros2418
    @ilseespinobarros2418 8 місяців тому

    Hi Amit, do you know if it is possible to connect directly to a Pipeline and use the upsert option for an SQL server, I have tried it but I can not connect to the Server. I have connected succesfully using Data Flows, but Pipeline gives the following error: "The server was not found or was not accessible." This would make it so much easier for incremental ETL in a SQL Server but I think the option is still not available in Fabric.

    • @AmitChandak
      @AmitChandak  8 місяців тому +1

      Hi Ilse,
      It seems like you're encountering issues with connecting directly to an on-premise SQL Server using Pipeline and encountering the error "The server was not found or was not accessible." If you're using an on-premise SQL Server, it requires a gateway to establish a connection. Currently, Dataflow Gen 2 is the only option supporting on-premise sources. Unfortunately, the option to directly connect via Pipeline might not be available in Fabric at the moment.
      Thanks,
      Amit

  • @mohammedmanna3488
    @mohammedmanna3488 Рік тому +1

    Thank you Mr.Amit, Great job sir
    May be aske you a question .MS Fabric , Power bi and Ms Sql server in my laptop
    The data base which I worked with including more than 4000 tables Actually it is for Jdedwards Enterprisepoweri it is called JDEData and my server name called for example Localhost since last 2 weeks,i tried to connect to my local host but no way icant connect
    Notic sir i connect to power query by just giving the server name and database name and simple select statement for the table and the fileds which ineed
    by the way i never work with power bi service before
    can you please guide me to connect to the my local host throw what ever data flow G 2 or pipeline or what ever in Fabric lakehouse or any thing
    I am very sorry my comment is very tall but Itried hundreds time but i could not doing and also iam not network experience
    i
    again iam sory and thank you

    • @AmitChandak
      @AmitChandak  Рік тому +1

      To connect with Local, first you need install on-premise gateway and in dataflow you have use that gateway, while creating connection. Last time when I tried the flow on Fabric, it was giving some issue. And reported the same. I will check and get back on this.

    • @AmitChandak
      @AmitChandak  Рік тому

      I am able to complete flow using local SQL server and latest on-premise gateway -ua-cam.com/video/oEF-jHVmvdo/v-deo.html

  • @huongbui4376
    @huongbui4376 Рік тому

    I'm wondering, your PosgrestSQL database is on local. So how you can connect Fabric without using a gate way?

    • @AmitChandak
      @AmitChandak  Рік тому +1

      No, It is on cloud, I have used an IP.

    • @huongbui4376
      @huongbui4376 Рік тому

      @@AmitChandak I tried again by using PosgresSQL on cloud and successfully ran each step: Dataflow and Stored procedure. However, when I created a pipeline with these 2 steps like your video. I got fail when ran Dataflow in this pipeline. Could you please explain me the reasons?
      The details of the error:
      Error code: 20302
      Failure type: User configuration issue
      Details: Microsoft.DataTransfer.Execution.Core.ExecutionException: Refresh Dataflow failed with status: NotFound message: Not Found

  • @lakshmipriyakrishnan2727
    @lakshmipriyakrishnan2727 Рік тому +1

    I am trying to call a dataflow and execute it from a pipeline, but I get a 'User Configuration issue' error. Any idea how to fix this?

    • @AmitChandak
      @AmitChandak  Рік тому +1

      I am able to call dataflow gen2 from pipeline. Tested few mins back. Please provide more details. to reproduce.