Great video! I think this is one of the videos that advanced developers should watch. For instance, I tended to use the Table.Buffer for referencing several queries, but now I got it! Thanks for this great video!
Just found this video and starting to explore powerquery. I think this will help reduce some of my looong load times. So thank you will need to run some tests now :)
Excellent video, I do think this somehow should be made simpler. At least easier to see what happens. (In this case it is easy because of flow, but some sources are far less clear).
I've been grappling with this for a while from an Excel perspective, and everything I've read and seen suggests that there is actually no explicit way to ensure your data source is only queried once. The workaround I've been using is to load the 'intermediate' step to a worksheet, then use the worksheet as the source for subsequent queries, but that obviously leaves you with a superfluous worksheet (and is no good if your data set contains more than 1m rows). It's a bit unsatisfactory, really. I wish there was some way to reverse the evaluation so that it went top-down rather than bottom-up. It already knows the dependencies, after all. Makes me just want to use Python for everything on the ETL side.
So, if i understood right, Table.Buffer(): - is not useful between multiple query executions, because it can't share its output that way; - is only useful when you want to reuse the data multiple times in the same query; - gets executed everytime a query that implements it is called, instead of sharing its result Is this correct?
Would be nice if you can have those settings at the query level and not the file level. You'll have cases where you need both the parallel setting and the non-parallel setting.
Thanks for this very interesting video. I asked about this recently via FB Messenger and received an adequate answer fairly quickly. By disabling parallel loading, it improved the query speed as I had a lot of tables in my model that were referenced from a single data source table. But still I don’t understand why there is no mechanism that uses some kind of internal querying. So a mechanism where you have one table that is retrieved from the data source and where the reference tables query that one table. Would it be a feature that can be added in the future or are there concrete reasons why one wouldn’t want that at all?
Maarten van Roij very good point! Microsoft need to sit down with real business users and get this thing right, and tight, across Excel and Power BI. I’ve built some excellent tools in Excel using Power Query and this goes a long way for business users, so... Business users, Business users, Business users, Business users, Business users, Business users, Business users, Business users, (Like developers developers developers!...). Move onto business users as well now, who aren’t developers! Also I want to see power query be able to upload data, to a store, maybe there are round the bush ways, but this would be great. I do know of Power Update and I’m yet to try it out. Why hasn’t MS done this themselves already though ? Ok enough wishes and complaining from me. FTR I use power query every day and I’m so glad it exists!
Sir , I have a table from which data is coming from power query, and in it I an adding many columns with formula, when I am self refrencing it, my formulas are going away. Please provide me a solution on it.
❤ Amazing. Thank you very much. My project ran from 3 hours to 2 minutes. You guys are great. Now the big question, where should I use Table.Buffer? I indeed remove it all. Anyway, Awesome video.
Great video! If you replicate those queries in Dataflows instead (and you have premium) is it true that the number of calls can be reduced to 1? I think that's how DFs work, and I'm trying to justify needing premium licences. thank you!
Hello, question about the M code used in the original source table (GetData). if my source table has multiple columns (unlike the one in the example), do I need to change the code so it references all those columns and their data types? Thank you!
I have the source flat file within Excel, what would be the Power Pivot function(s) to query into cache only once? So that subsequent Reference queries 2, 3, 4, 5 (for example) doesn't re-load the source flat file again? The link to the PBIX sample doesn't work. Chris, excellent explanations. You're highly knowledgeable.
It's not a Power Pivot thing. This is done from Power Query. Power Pivot is the data model. Unfortunately, the settings shown in this video aren't available from an Excel side ☹ The behavior illustrated applies to Excel though.
@@GuyInACube Sorry, my bad. Typo...not Power Pivot, rather it's Power Query. Same scenario as Chris' demo.....I have 5 reference queries all pointing to the original get data query. It's slow to refresh. I'm using Excel (rather than PBI) so do not have the option to turn off Parallel in Power BI. Wondering if there's a function within M code to query into cache just once? Then, all subsequent queries references the cache. Topic for an upcoming video, perhaps? Thanks so much.
Are these settings automatically loaded into the PowerBI premium workspace online service too? or is there additional setting to do that or does that happen accordingly?
Great video! Would any of these "tricks" allow you to bypass the error "Formula.Firewall: Query references other queries, so it may not directly access a data source"? I've received this error when trying to use the response from one of my web service queries as the basis for another query. Also, any good sources on how to utilize Microsoft Flow with PowerBI.
Hey guys! Thanks for the really great video! Helped me a lot in understanding how Power Query an M handle query excecution.I thought ;) Because I have a question, after my Report has not updated as expected: 1. I have two queries: 2 folders of Excel tables, that are separately queried and transormed. 2. Then I have a third query in which I purely append the two queries with "Table.Combine(...)". I would expect, that with all the settings that you just showed, it would query the two folders first and then very quickly append the two results. What happens is this: Both folders are loaded and then in the third query all the files are loaded again :( Very slow and very inefficient.
Can’t follow in Excel 2016. Parallel option is not even present. I load my main query to a table. I then load that table as a new query and reference the cache. Essentially I’m using the worksheet table as a cache. So I have a macro which runs the initial query and then refreshes what would be my ‘reference queries’, but are now just queries which reference the cached table on worksheet, and it speeds things up a whole lot. I’ll have to watch this video again and try to follow in Excel. If my initial query got very large then this may put strain in the system. It’s not the most elegant way to do it, but once it’s linked up with the macro, it sure beats the alternative of querying 100k rows, 7 times or whatever. MS should have a feature to say update daily and freeze, or have some logical management layer above all of power query (as part of the View Dependencies view) to manage how we want tables buffered. All this focus on Azure and Synapse and stuff and MS still can’t get Power Query easily usable for the business user. I will keep trying to learn this method. Shall rewatch properly, and let y’all know how I manage.
I use the same method in excel2016 to store intermediate table and avoid their permanent recalculation each time a query ireferencing these results is refreshed..but working this way, the query dependency loose the real data flow… I was wondering if using table.buffer has the same efficiency ..
This video, single handledly, smashed my excel grind, from 2 hours for 6 data sets - down to 20 seconds with 1000 data sets!
Excellent video, made it very easy to understand something I've googled more than a few times. Thank you
Awesome! Yeah this concept is a bit hard to understand. 👊
Yes it is more than what i have thought, it gave me a food for the weekend to digest. Thanks a lot guys for this eye opening video. Thanks Chris Webb.
Excellent! Definitely a heavy topic. Thanks for watching 👊
Great video! I think this is one of the videos that advanced developers should watch. For instance, I tended to use the Table.Buffer for referencing several queries, but now I got it! Thanks for this great video!
Thanks for the video! More episodes with Chris please! :)
Most welcome! Tons of great info in this video. 👊
Just found this video and starting to explore powerquery. I think this will help reduce some of my looong load times. So thank you will need to run some tests now :)
Patrick surprised Chris with question about Current File settings :)
Excellent video, I do think this somehow should be made simpler. At least easier to see what happens. (In this case it is easy because of flow, but some sources are far less clear).
I've been grappling with this for a while from an Excel perspective, and everything I've read and seen suggests that there is actually no explicit way to ensure your data source is only queried once. The workaround I've been using is to load the 'intermediate' step to a worksheet, then use the worksheet as the source for subsequent queries, but that obviously leaves you with a superfluous worksheet (and is no good if your data set contains more than 1m rows). It's a bit unsatisfactory, really. I wish there was some way to reverse the evaluation so that it went top-down rather than bottom-up. It already knows the dependencies, after all.
Makes me just want to use Python for everything on the ETL side.
So, if i understood right, Table.Buffer():
- is not useful between multiple query executions, because it can't share its output that way;
- is only useful when you want to reuse the data multiple times in the same query;
- gets executed everytime a query that implements it is called, instead of sharing its result
Is this correct?
You got it! 👊
@@GuyInACube Awesome, thanks a lot for this video.
Would be nice if you can have those settings at the query level and not the file level. You'll have cases where you need both the parallel setting and the non-parallel setting.
Agreed. Make sure to get your feedback into ideas.powerbi.com! 👊
So is there a solution? I don't want to download the same excel file for every query. Can I download the file just once?
Thanks for this really important information to cut down dataset gathering time!
Thanks for this very interesting video. I asked about this recently via FB Messenger and received an adequate answer fairly quickly. By disabling parallel loading, it improved the query speed as I had a lot of tables in my model that were referenced from a single data source table. But still I don’t understand why there is no mechanism that uses some kind of internal querying. So a mechanism where you have one table that is retrieved from the data source and where the reference tables query that one table. Would it be a feature that can be added in the future or are there concrete reasons why one wouldn’t want that at all?
Maarten van Roij very good point! Microsoft need to sit down with real business users and get this thing right, and tight, across Excel and Power BI. I’ve built some excellent tools in Excel using Power Query and this goes a long way for business users, so... Business users, Business users, Business users, Business users, Business users, Business users, Business users, Business users, (Like developers developers developers!...). Move onto business users as well now, who aren’t developers! Also I want to see power query be able to upload data, to a store, maybe there are round the bush ways, but this would be great. I do know of Power Update and I’m yet to try it out. Why hasn’t MS done this themselves already though ? Ok enough wishes and complaining from me. FTR I use power query every day and I’m so glad it exists!
I’d love to see some analysis on this with SQL tables or other files, like CSV files on disk, in SharePoint, and DataLake.
Yeah it would be great to explore further to see the behavior. Some great weekend fun 😀
Is this not an option in excels power query editor? to disable the parallel data loading?
Sir , I have a table from which data is coming from power query, and in it I an adding many columns with formula, when I am self refrencing it, my formulas are going away. Please provide me a solution on it.
❤ Amazing. Thank you very much. My project ran from 3 hours to 2 minutes. You guys are great. Now the big question, where should I use Table.Buffer? I indeed remove it all. Anyway, Awesome video.
Great video! I'm going to have to watch this a few more times
typical with cwebb...he makes you put your thinking cap on :)
Appreciate the feedback Kyle! and we agree with Greg, Chris is good at getting you thinking. 👊
Thanks Chris and Patrick. Very interesting topic!!!
Most welcome Bill! Thanks for watching. 👊
If Bill Szysz says its a interesting Power Query video, then you'd do well to invite Chris Webb back.
How we can implement it in dataflow?
Thanks for the great video!
I think I missed expression of Reference Query 1, 2 and 3. Is it just =#"Call Web Service" ?
Great video! If you replicate those queries in Dataflows instead (and you have premium) is it true that the number of calls can be reduced to 1? I think that's how DFs work, and I'm trying to justify needing premium licences. thank you!
Hello, question about the M code used in the original source table (GetData). if my source table has multiple columns (unlike the one in the example), do I need to change the code so it references all those columns and their data types?
Thank you!
I second this question. Would encourage a follow-up blog post on what's going on here.
I have the source flat file within Excel, what would be the Power Pivot function(s) to query into cache only once? So that subsequent Reference queries 2, 3, 4, 5 (for example) doesn't re-load the source flat file again? The link to the PBIX sample doesn't work. Chris, excellent explanations. You're highly knowledgeable.
It's not a Power Pivot thing. This is done from Power Query. Power Pivot is the data model. Unfortunately, the settings shown in this video aren't available from an Excel side ☹ The behavior illustrated applies to Excel though.
@@GuyInACube Sorry, my bad. Typo...not Power Pivot, rather it's Power Query. Same scenario as Chris' demo.....I have 5 reference queries all pointing to the original get data query. It's slow to refresh. I'm using Excel (rather than PBI) so do not have the option to turn off Parallel in Power BI. Wondering if there's a function within M code to query into cache just once? Then, all subsequent queries references the cache. Topic for an upcoming video, perhaps? Thanks so much.
Are these settings automatically loaded into the PowerBI premium workspace online service too? or is there additional setting to do that or does that happen accordingly?
Great video! Would any of these "tricks" allow you to bypass the error "Formula.Firewall: Query references other queries, so it may not directly access a data source"? I've received this error when trying to use the response from one of my web service queries as the basis for another query. Also, any good sources on how to utilize Microsoft Flow with PowerBI.
excellent video, please invite Chris back ;)
We will try our best. 👍 Thanks for watching!
Mind blowing detail ! 👍👍
That's very helpful, but also just shows how unintuitive and (arguably) even broken Power BI is :)
Hey guys! Thanks for the really great video! Helped me a lot in understanding how Power Query an M handle query excecution.I thought ;)
Because I have a question, after my Report has not updated as expected:
1. I have two queries: 2 folders of Excel tables, that are separately queried and transormed.
2. Then I have a third query in which I purely append the two queries with "Table.Combine(...)".
I would expect, that with all the settings that you just showed, it would query the two folders first and then very quickly append the two results.
What happens is this:
Both folders are loaded and then in the third query all the files are loaded again :( Very slow and very inefficient.
Can’t follow in Excel 2016. Parallel option is not even present. I load my main query to a table. I then load that table as a new query and reference the cache. Essentially I’m using the worksheet table as a cache. So I have a macro which runs the initial query and then refreshes what would be my ‘reference queries’, but are now just queries which reference the cached table on worksheet, and it speeds things up a whole lot. I’ll have to watch this video again and try to follow in Excel. If my initial query got very large then this may put strain in the system. It’s not the most elegant way to do it, but once it’s linked up with the macro, it sure beats the alternative of querying 100k rows, 7 times or whatever. MS should have a feature to say update daily and freeze, or have some logical management layer above all of power query (as part of the View Dependencies view) to manage how we want tables buffered. All this focus on Azure and Synapse and stuff and MS still can’t get Power Query easily usable for the business user. I will keep trying to learn this method. Shall rewatch properly, and let y’all know how I manage.
I use the same method in excel2016 to store intermediate table and avoid their permanent recalculation each time a query ireferencing these results is refreshed..but working this way, the query dependency loose the real data flow…
I was wondering if using table.buffer has the same efficiency ..
I have been thinking of doing this as well - my users are frustrated with long query refresh times
So:
Reference isnt a reference
Buffer isnt a buffer
Exactly. That’s microsoft for you 😬
Thank you!!!
million thanks
this channel should be called as we call here in Argentina: The Bald's channel
hahaha nice. Although Chris isn't bald?
@@GuyInACube isn't him? I see both bald men hehe. Great channel btw. Thanks for this much.
Gripping!
hahahahaha indeed 😀
12:00 - I mean this is just a poor implementation of PowerQuery though...
Mind blowing detail ! 👍👍