Q: wish y’all would start declaring in the description what licensing/capacity is required to replicate whatever it is you’re doing in your videos. Often, I’ll see one of your tutorials, get exited about implementing a version into our own workspace, and then realize it requires premium capacity, PPU, or some other licensing we don’t have (at least not at the moment).
I see the DataFlows Gen 2 has the ability to make columns a "key" and was hopeful that we would be able to do an upsert type of operation when writing to a target. For example, this could be a huge help when building a dimension, adding an "index" column as a surrogate key, and then doing an upsert into the target. Instead it current just appends all records to the existing records or wipes and replaces.
Could it be (i'm guessing) that fabric and process so much data that there may not be a need to upsert? makes no sense why they wouldn't give the option
Before running a downstream ETL I wanna make sure the upstream ETL tables have complete information for the previous day/hour. How to create dependencies in pipeline so that I am not running ETL on same old data? Thanks!
I have mentioned that Dataflow does not seem to be able to link to another query with load enabled, (I did drop info using the contact us on your website,) visually looking at Gen2 Dataflow, seems that Gen2 can do this, Edit: a PowerBI dataset would be an awesome destination, though I see it doesnt do incremental refresh so seems Gen1 still has a use case
I needed a story to compute some metrics [ e.g. count of records with foo, bar and foobar present ] on the 1st and middle of every month. I found that i could use power bi dataflow legacy to schedule execution of sql statement that produced a single record with each of those count based metrics in a column that it stuffed into an azure data lake storage gen2 container entry. Then i was able to use power bi data source connector support for azure data lake storage gen2 container to pull in the set of all those individual 1 record files and create a trend lines chart for all thos metrics as the months ticked by. Now with power bi data factor dataflows gen2 solution they no longer rely on an azure data lake storage gen2 container so this maneuver to get at all the scheduled metric computations as a data set isn't going to work. Any insights as to how i should be accomplishing this metrics computation and display of the set over time in current power bi experience?
Q: Great video and now I see there’s a general data warehouse option, but then there’s a kql warehouse, and lake house with sql endpoint. What makes sense to architect beyond the bronze layer where a lake house intuitively makes sense? What are the advantages/disadvantages for each of them if a team handles a medallion structure end to end?
Thanks for the beautiful content. I saw that whenever I load data to data warehouse you delete the data alredy in DW How is done incremental loading in frabric?
That's also something I'm searching for. Incremental refresh schedule is not available for dataflow Gen2 and I'm looking for a way to incrementally refresh the tables in warehouse
Can you trigger a pipeline with an end point URL call. Like you can with Power Automate? There is a trigger that is an HTTP request and it gives you a URL to call to trigger the flow?
You can use these for tranformations but NEVER to move data - they are painfully slow. Use a copy activity in a pipeline or a notbook to load the data. You can do tranformations after they have loaded. In fairness - if you use a notebook to load the data you might aswell use it to transform the data too.
Is anyone else asking themselves how can we do a SCD2 in Dataflows if we can only delete all a insert or append? Ok for insert append works, but what do you to update. Are you running some scripts in the data warehouse itself, called by some activity in the pipeline ? Maybe I am not yet seeing the full picture.
Hi guys, sorry i have A question from the company I work for. We are trying to create a dataflow in a workspace by connecting to a table of a dataflow in another workspace. We can connect to the table, but the new dataflow can only replicate that table, we cannot save the new dataflow after applying a simple filter of the table of the original dataflow. The new dataflow cannot be saved after applying a simple transformation to the original table (message from power BI referencing “Linked tables”).thank you
Looking forward to the day when I can try this with on-premise SQL server (i.e. "my own") data. For now, all I get is: "There was a problem refreshing the dataflow. Please try again later. "
You should be able to do this if your gateway version is >= July 2023. I can load a dataflow, but it will not write to a Lakehouse or Warehouse. Checking gateway firewall settings to see if that's where the problem is.
Q: wish y’all would start declaring in the description what licensing/capacity is required to replicate whatever it is you’re doing in your videos. Often, I’ll see one of your tutorials, get exited about implementing a version into our own workspace, and then realize it requires premium capacity, PPU, or some other licensing we don’t have (at least not at the moment).
Merge needs a nice search feature so you can quickly filter the columns to find the field you need.
I see the DataFlows Gen 2 has the ability to make columns a "key" and was hopeful that we would be able to do an upsert type of operation when writing to a target. For example, this could be a huge help when building a dimension, adding an "index" column as a surrogate key, and then doing an upsert into the target. Instead it current just appends all records to the existing records or wipes and replaces.
Exactly. I was curious about updates (upserts)
Could it be (i'm guessing) that fabric and process so much data that there may not be a need to upsert? makes no sense why they wouldn't give the option
Great video. 04:20 Pipelines also has an option to append or replace data at the destination.
Before running a downstream ETL I wanna make sure the upstream ETL tables have complete information for the previous day/hour. How to create dependencies in pipeline so that I am not running ETL on same old data? Thanks!
I have mentioned that Dataflow does not seem to be able to link to another query with load enabled, (I did drop info using the contact us on your website,) visually looking at Gen2 Dataflow, seems that Gen2 can do this,
Edit: a PowerBI dataset would be an awesome destination, though I see it doesnt do incremental refresh so seems Gen1 still has a use case
Hey would this process enable to import data in D365 at all?
I needed a story to compute some metrics [ e.g. count of records with foo, bar and foobar present ] on the 1st and middle of every month. I found that i could use power bi dataflow legacy to schedule execution of sql statement that produced a single record with each of those count based metrics in a column that it stuffed into an azure data lake storage gen2 container entry. Then i was able to use power bi data source connector support for azure data lake storage gen2 container to pull in the set of all those individual 1 record files and create a trend lines chart for all thos metrics as the months ticked by. Now with power bi data factor dataflows gen2 solution they no longer rely on an azure data lake storage gen2 container so this maneuver to get at all the scheduled metric computations as a data set isn't going to work. Any insights as to how i should be accomplishing this metrics computation and display of the set over time in current power bi experience?
Q: Great video and now I see there’s a general data warehouse option, but then there’s a kql warehouse, and lake house with sql endpoint. What makes sense to architect beyond the bronze layer where a lake house intuitively makes sense? What are the advantages/disadvantages for each of them if a team handles a medallion structure end to end?
Hi Patrick @guyinacube can you please make a video about streaming dataflows and streaming datasets in Microsoft Fabric ?
what about upsert? and Merge the information? , It's a good idea delete and load the information all time?
Thanks for the beautiful content.
I saw that whenever I load data to data warehouse you delete the data alredy in DW
How is done incremental loading in frabric?
That's also something I'm searching for. Incremental refresh schedule is not available for dataflow Gen2 and I'm looking for a way to incrementally refresh the tables in warehouse
What is better folks df gen 2 or informatica when it comes to transformation?
what happens to all the pipelines and data flows I already have in Azure Data Factory? Can they be migrated?
Hi Patrick! Is Analysis Services supported with these? I think it was depreciated recently in the last Dataflows….
Can you trigger a pipeline with an end point URL call. Like you can with Power Automate? There is a trigger that is an HTTP request and it gives you a URL to call to trigger the flow?
Q: Is dataflow Gen2 uses live connection or import data into power query engine , performs transformations and load back into Lakehouse?
Hi, how can I pass a parameter / variable to a dataflow from the pipeline??
Can we put one data destination for all tables that we load in Dataflow gen2? Is it possible??
Please help.
this is same as datamart.. what is the difference btw gen2 and datamart??
You can use these for tranformations but NEVER to move data - they are painfully slow. Use a copy activity in a pipeline or a notbook to load the data. You can do tranformations after they have loaded.
In fairness - if you use a notebook to load the data you might aswell use it to transform the data too.
Can dataflows gen2 handle high volume data like Spark? Or for large data, should I continue to use Spark?
I wouldn't recommend it - they are painfully slow
How much does it cost in the trial Fabric? I am scared that I will run up the costs.
With this new ability of Fabric, can we make a JOIN between two Power BI cubes?
JOIN with cardinality type and not like merge, JOIN of PK with FK
Great video Patrick, thank you very much! (although it took you 13 days to publish it
Q:will it have same risk when referenced queries run multiple times?
Is anyone else asking themselves how can we do a SCD2 in Dataflows if we can only delete all a insert or append? Ok for insert append works, but what do you to update. Are you running some scripts in the data warehouse itself, called by some activity in the pipeline ? Maybe I am not yet seeing the full picture.
After 1:33 I had to click View - diagram view to get same view as Patrik
Q: Is any of this scriptable? Clicking tabs, gears, checkboxes, etc. has to be the slowest way to develop and maintain data pipelines.
Hi guys, sorry i have A question from the company I work for. We are trying to create a dataflow in a workspace by connecting to a table of a dataflow in another workspace. We can connect to the table, but the new dataflow can only replicate that table, we cannot save the new dataflow after applying a simple filter of the table of the original dataflow. The new dataflow cannot be saved after applying a simple transformation to the original table (message from power BI referencing “Linked tables”).thank you
Looking forward to the day when I can try this with on-premise SQL server (i.e. "my own") data. For now, all I get is: "There was a problem refreshing the dataflow. Please try again later. "
You should be able to do this if your gateway version is >= July 2023. I can load a dataflow, but it will not write to a Lakehouse or Warehouse. Checking gateway firewall settings to see if that's where the problem is.
@@jason.campbell474 I am seeing the same behaviour, let me know what you find.
Gen2 Data Flow tends to be slower, why?