I have been using date parameters for a proof of concept for my client and it works very well. It allowed me to avoid technical limitations of the relational database in use at the organization. Once in the service, all the data was pulled. I didn't know there was a simple way to limit the number of rows directly via Power Query. I'll test that now. Thank you for the useful information.
Another easy way to reduce dataset size without reducing number of rows is to convert decimal values to whole numbers (assuming you don't need the additional accuracy)
Hi Patric, great video as always. I totally forgot about query folding, as I'm usually making custom SQL queries that PBI can't fold anyway... so I'm usually injecting my parameter into SQL query (where parameter) directly. Unfortunately, this approach makes SQL query quite badly human-readable (adding lots of code like #(lf) instead of new line etc.). Thank's for bringing up again the possibility to fold our queries in an intuitive way.
Big problem with 2 and 3 at our org is that we have millions of rows and those queries run first before it can get to those steps limiting the data. So it's still a huge pain on desktop to sit there and wait. We also don't have access to views so that's out the window. I have made some parameters that will switch between "ROWNUM < 100" or "TOP 100" depending on our server and that works OK, but it's still kind of a pain.
What did you do? I have the same problem, I limit it by date (start date and end date) and first it loads the entire query, which takes hours and hours
Bingo, the video is a little misleading because it says "limit the # of rows being brought back from the data source". But thats not what these are doing, its applying the filter AFTER all of that data is queried. Every time you modify a table or anything and "Apply changes" you have to sit through all that data being brought in again. There needs to be a simple way to limit the # brought over on import.
That is some great advice - thank you. I was also wondering in the third way using start and end dates; are you able to dynamically set the values to be current date for end date and say the date starting from the last 3 months? This is so that once you are done with uploading the report and you have set the schedule the start and end date parameter in power bi service could always use the current and last date (which will be different every day).
I built upon your question, because you are correct you need the last updated data. 1. Create a parameter as "Date" which will act as your start date (don't worry about the end date) 2. On your transformation, look for a date column and choose "Is After" and use your Parameter Date. - let's say put a date 3 months ago 3. Publish your PBIX, then change your parameter date as far as you need( e.g. 5 years ago) 4. Let the heavy lifting happen on Powerbi service - you should get the latest data from 5 years till today
OOOOOHHH! If only I was allowed to use the service!!! So many things would be possible. Great content and I will keep it in mind if I ever get them to change their minds.
Really useful tips that will help people working from home and working with slow broadband through vpn develop and deploy reports and take less time waiting for rows to load, thanks Patrick 😊
I love the fact that you can select the table that you want to load.. as in the one which is filtered or not filtered... I am using this feature for testing putt he load with fewer rows.
Done something similar with the parameter values. Instead, we set up two dataflow entities to work off of, a full set of the data, and a sample set of the data. Names Source & Source_Sample. Then we can just parameterize the source. So start with the parameter point to Source_Sample, when it's all done and published - update parameter to Source. I think this way, as it can standardize the process for all report builders in our company and we can ensure they're getting a good distribution of the data within the Sample. Report writers just toss in a couple standard lines in Advanced Editor (copy/paste) and they're all set up. We don't need to rely on them to find janky ways to try to get their own sample set.
The 3rd method is the most interesting "BUT" if I want the complete period (5 years for example) but with a reduced dataset. The workaround I used was: 1. sort the dataset using the date column 2. create an index column 3. add a helper column in which I divide the index values using Number.Mod (lets say over 10 which means I need to keep only one row out of every 10 rows) 4. filter the helper column to keep 0 values only BINGO: I reduced my data to 10% sample of the total volume and kept all the date range Advantage is: having sample data across the entire time Disadvantage is: step no.1 above means the full dataset is loaded in first step :( I hope if someone can achieve the same goal without loading the full dataset.
thank you for this - I'm going to try and set up some refresh parameters now on the report I'm building because it links to Salesforce but I only need to see one of the territories/divisions so hopefully, hopefully, this will help reduce the thousands of lines it pulls through!
Thanks for great idea (2-nd way with parameter)! I can't use it "as is" because Report Server (on-premises) haven't parameter's option for scheduled refresh. But (as always) xlsx-file with number of rows in network folder works well :) According to Matthew's "upstream" maxima I included number of rows from xlsx-file in SELECT-section of query. Now I know that "first 100 rows" has different translation for MS SQL-source and for Oracle-source. Interesting experience!
Question- what if my model includes multiple tables and data sets, would I need to create separate parameters for each table? Will I be able to adjust them all in the service?
Q: I had a question regarding using the first tip with incremental refresh. The table I want to implement incremental refresh on takes too long to load data on my desktop file so I am using an sp to load only a small subset of the data on my desktop file and then after implementing incremental refresh on it I publish it and then make changes to my sp to bring in all the data and refresh the data on service. This method worked before without incremental refresh but now refresh the initial refresh fails after some 8-10h and I am not sure what the exact cause may be. Is it because I am using an sp with incremental refresh? Or I have to load all the data in the file before publish it for thw initial refresh to work fine?
Hey Patrick does this work if you are appending multiple CSV's and then various transformations i.e. would the filter on the final table be enough to limit the refresh or do I need to do it in each source import?
Hello Patrick. Interesting stuff. Do you have any advice on optimizing performance between Power BI Desktop and AWS Athena using ODBC connection? Would the reduce rows option you showed be a good option in case of 150+ million rows? Appreciate your feedback.
Patrick - plz start a tutorial series step by step video to reach the goal as pro on Power BI. I have checked all through your uploaded videos and unable to find out the tutorials series on Power Bi. if you have all modules published via through one can be professional then plz arrange the module in order like ascending to descending to be a pro.
hi matt, are you running the refresh on a laptop or server, because the refresh of the 500k rows is pretty fast? if it on laptopn can you please share the specs, as even 100k rows its takes alot of time on my working laptop
Hallo Patrick, i set this up and it words fine with a value in the parameter, but when i set the parameter to 0 then all my previous steps in the Query Desiner are lost. What cab be my problem?
Hi Patric, I Am having a weird error, When I put NumberOf Rows in Advance Query Editor.. and click ok, it shows an error " The import NumberOfRows matches no exports. Did you miss a module reference?" Please Help
can the parameter(s) be a list of 500 or so employee ids? The only thing I have to limit the 6M+ record pull is the employee id or employee email. I don't need all 125,000 employees - just our little 500+ person group.
Hi, I'm looking for changing an existing report from odbc to powerbi. The source is postgresql. After I installed the connector in my local machine, I could successfully connect to DB by changing properties in advanced editor of the query. But when I published the report to the Service, it is not working. Could you help with it?
Hi, Patrick huge fan of yours.. I am stuck with a time Slicer. I have static time column which has values from 12:00:00 AM to 12:59:59 PM in the time table. I want to create a time slicer that will look like a range slicer. If I do that using between option in the sclicer I am not getting time values to select in the boxes I see date values there. My goal is to filter out the page by a time range. Please help!!!!!!!!!
I have been using date parameters for a proof of concept for my client and it works very well. It allowed me to avoid technical limitations of the relational database in use at the organization. Once in the service, all the data was pulled. I didn't know there was a simple way to limit the number of rows directly via Power Query. I'll test that now. Thank you for the useful information.
Your videos has really helped me grow in power bi practice in my organisation.. thank you so much😀
i was looking for this trick for a long time. Hours and hours wasted waiting for refresh. Thank youu !
Another easy way to reduce dataset size without reducing number of rows is to convert decimal values to whole numbers (assuming you don't need the additional accuracy)
Yes, I discovered this recently - who needs pennies with a bottom line in the millions?
@@Acheiropoietos Finance
Definitely! I was able to cut my dataset size down by 40% by doing this.
Thanks so much... spent so much time working in editor making small changes and waiting for the changes to apply on 6MM rows... cheers
Hi Patric, great video as always. I totally forgot about query folding, as I'm usually making custom SQL queries that PBI can't fold anyway... so I'm usually injecting my parameter into SQL query (where parameter) directly. Unfortunately, this approach makes SQL query quite badly human-readable (adding lots of code like #(lf) instead of new line etc.). Thank's for bringing up again the possibility to fold our queries in an intuitive way.
I was using exactly this approach (parameter), I'm glad I'm not the only one.
Big problem with 2 and 3 at our org is that we have millions of rows and those queries run first before it can get to those steps limiting the data. So it's still a huge pain on desktop to sit there and wait. We also don't have access to views so that's out the window. I have made some parameters that will switch between "ROWNUM < 100" or "TOP 100" depending on our server and that works OK, but it's still kind of a pain.
I think query folding should work here, which should push back the filter directly to the source.
@@akhilannan most of our code is done with custom SQL from the start so I don't think so in that case.
What did you do? I have the same problem, I limit it by date (start date and end date) and first it loads the entire query, which takes hours and hours
Bingo, the video is a little misleading because it says "limit the # of rows being brought back from the data source". But thats not what these are doing, its applying the filter AFTER all of that data is queried. Every time you modify a table or anything and "Apply changes" you have to sit through all that data being brought in again. There needs to be a simple way to limit the # brought over on import.
100% correct@@cowpoke98
That is some great advice - thank you.
I was also wondering in the third way using start and end dates; are you able to dynamically set the values to be current date for end date and say the date starting from the last 3 months?
This is so that once you are done with uploading the report and you have set the schedule the start and end date parameter in power bi service could always use the current and last date (which will be different every day).
I built upon your question, because you are correct you need the last updated data.
1. Create a parameter as "Date" which will act as your start date (don't worry about the end date)
2. On your transformation, look for a date column and choose "Is After" and use your Parameter Date. - let's say put a date 3 months ago
3. Publish your PBIX, then change your parameter date as far as you need( e.g. 5 years ago)
4. Let the heavy lifting happen on Powerbi service - you should get the latest data from 5 years till today
OOOOOHHH! If only I was allowed to use the service!!! So many things would be possible. Great content and I will keep it in mind if I ever get them to change their minds.
Really useful tips that will help people working from home and working with slow broadband through vpn develop and deploy reports and take less time waiting for rows to load, thanks Patrick 😊
This channel is making power bi no1 in the world.
Another amazing video! Great job Patrick! I always wondered how models with large data sets were created and published without a powerful machine.
I love the fact that you can select the table that you want to load.. as in the one which is filtered or not filtered... I am using this feature for testing putt he load with fewer rows.
Thank you so much Patrick😀 This is really helpful! Would love to see videos on customer attrition analysis. Cheers👍
Done something similar with the parameter values. Instead, we set up two dataflow entities to work off of, a full set of the data, and a sample set of the data. Names Source & Source_Sample.
Then we can just parameterize the source. So start with the parameter point to Source_Sample, when it's all done and published - update parameter to Source. I think this way, as it can standardize the process for all report builders in our company and we can ensure they're getting a good distribution of the data within the Sample. Report writers just toss in a couple standard lines in Advanced Editor (copy/paste) and they're all set up. We don't need to rely on them to find janky ways to try to get their own sample set.
The 3rd method is the most interesting "BUT" if I want the complete period (5 years for example) but with a reduced dataset. The workaround I used was:
1. sort the dataset using the date column
2. create an index column
3. add a helper column in which I divide the index values using Number.Mod (lets say over 10 which means I need to keep only one row out of every 10 rows)
4. filter the helper column to keep 0 values only
BINGO: I reduced my data to 10% sample of the total volume and kept all the date range
Advantage is: having sample data across the entire time
Disadvantage is: step no.1 above means the full dataset is loaded in first step :(
I hope if someone can achieve the same goal without loading the full dataset.
Hi Patric, thank you so mucho. I was looking for this information before
thank you for this - I'm going to try and set up some refresh parameters now on the report I'm building because it links to Salesforce but I only need to see one of the territories/divisions so hopefully, hopefully, this will help reduce the thousands of lines it pulls through!
it worked! Thank goodness, that's definitely reduced the strain on this(!)
Thanks for great idea (2-nd way with parameter)!
I can't use it "as is" because Report Server (on-premises) haven't parameter's option for scheduled refresh. But (as always) xlsx-file with number of rows in network folder works well :)
According to Matthew's "upstream" maxima I included number of rows from xlsx-file in SELECT-section of query. Now I know that "first 100 rows" has different translation for MS SQL-source and for Oracle-source. Interesting experience!
Hi Patrik, I need a solution for monitoring scheduled refresh from multiple workspaces.. can you please make a video on this...
Question- what if my model includes multiple tables and data sets, would I need to create separate parameters for each table? Will I be able to adjust them all in the service?
Q: I had a question regarding using the first tip with incremental refresh. The table I want to implement incremental refresh on takes too long to load data on my desktop file so I am using an sp to load only a small subset of the data on my desktop file and then after implementing incremental refresh on it I publish it and then make changes to my sp to bring in all the data and refresh the data on service.
This method worked before without incremental refresh but now refresh the initial refresh fails after some 8-10h and I am not sure what the exact cause may be. Is it because I am using an sp with incremental refresh? Or I have to load all the data in the file before publish it for thw initial refresh to work fine?
Hey Patrick does this work if you are appending multiple CSV's and then various transformations i.e. would the filter on the final table be enough to limit the refresh or do I need to do it in each source import?
Great Video Patrick!
Hello Patrick. Interesting stuff. Do you have any advice on optimizing performance between Power BI Desktop and AWS Athena using ODBC connection? Would the reduce rows option you showed be a good option in case of 150+ million rows? Appreciate your feedback.
Want about joining tables with millions of rows? Performance? Normally need SSAS..
SSAS Tabular is the same VertiPaq engine as Power BI. It works :)
Informative session
Patrick - plz start a tutorial series step by step video to reach the goal as pro on Power BI. I have checked all through your uploaded videos and unable to find out the tutorials series on Power Bi. if you have all modules published via through one can be professional then plz arrange the module in order like ascending to descending to be a pro.
hi matt, are you running the refresh on a laptop or server, because the refresh of the 500k rows is pretty fast? if it on laptopn can you please share the specs, as even 100k rows its takes alot of time on my working laptop
If I make a parameter for selecting only 100 rows does it affect to my final result?
Hallo Patrick, i set this up and it words fine with a value in the parameter, but when i set the parameter to 0 then all my previous steps in the Query Desiner are lost.
What cab be my problem?
Wow superb video ❤️
I have done the parameter adjusted to have 100 rows, but on the Cloud service does not let me to refresh because of credential pass. any thought?
this helps. thanks for this trick
Hi Patric, I Am having a weird error, When I put NumberOf Rows in Advance Query Editor.. and click ok, it shows an error " The import NumberOfRows matches no exports. Did you miss a module reference?" Please Help
can the parameter(s) be a list of 500 or so employee ids? The only thing I have to limit the 6M+ record pull is the employee id or employee email. I don't need all 125,000 employees - just our little 500+ person group.
Hi, I'm looking for changing an existing report from odbc to powerbi. The source is postgresql. After I installed the connector in my local machine, I could successfully connect to DB by changing properties in advanced editor of the query. But when I published the report to the Service, it is not working. Could you help with it?
Nice, but end user wants to download normal export without reduce data how to do that
Would this first method help to publish a pbix file that is above 1gb?
How would you configure the editor when the source is not a single file but a folder? I can't see how to do it :(
that's awesome, man.
This didn't work for me, because I was calculating last 30 days, filtered from 1 jan 23, only on 1 jan 23 doesn't show correct on the service.
Even the credential password now is working, modifying the parameter does not still refresh my report.
Dat Power BI yellow Apple Watch strap is glorious! 😆
Also I’m anxious to upvote this video as it’s 666 likes now ._.
Hi, Patrick huge fan of yours..
I am stuck with a time Slicer. I have static time column which has values from 12:00:00 AM to 12:59:59 PM in the time table. I want to create a time slicer that will look like a range slicer. If I do that using between option in the sclicer I am not getting time values to select in the boxes I see date values there.
My goal is to filter out the page by a time range.
Please help!!!!!!!!!
👌🏼👍🏼
Will Smith knows powerbi?!?
Can we apply custom last one month filter in advance editor
Yes