3 ways to REDUCE DATA In Power BI Desktop (No Premium needed!)

Поділитися
Вставка
  • Опубліковано 26 лис 2024

КОМЕНТАРІ • 65

  • @dinoben9527
    @dinoben9527 3 роки тому +1

    I have been using date parameters for a proof of concept for my client and it works very well. It allowed me to avoid technical limitations of the relational database in use at the organization. Once in the service, all the data was pulled. I didn't know there was a simple way to limit the number of rows directly via Power Query. I'll test that now. Thank you for the useful information.

  • @hatimali4945
    @hatimali4945 3 роки тому +6

    Your videos has really helped me grow in power bi practice in my organisation.. thank you so much😀

  • @ayouberrarhbi8912
    @ayouberrarhbi8912 3 роки тому

    i was looking for this trick for a long time. Hours and hours wasted waiting for refresh. Thank youu !

  • @samiphoenix5268
    @samiphoenix5268 3 роки тому +11

    Another easy way to reduce dataset size without reducing number of rows is to convert decimal values to whole numbers (assuming you don't need the additional accuracy)

    • @Acheiropoietos
      @Acheiropoietos 3 роки тому

      Yes, I discovered this recently - who needs pennies with a bottom line in the millions?

    • @frankcook5884
      @frankcook5884 3 роки тому +4

      @@Acheiropoietos Finance

    • @jaypollock78
      @jaypollock78 3 роки тому +1

      Definitely! I was able to cut my dataset size down by 40% by doing this.

  • @SlothyUK
    @SlothyUK 3 роки тому

    Thanks so much... spent so much time working in editor making small changes and waiting for the changes to apply on 6MM rows... cheers

  • @otakarveleba7202
    @otakarveleba7202 3 роки тому +1

    Hi Patric, great video as always. I totally forgot about query folding, as I'm usually making custom SQL queries that PBI can't fold anyway... so I'm usually injecting my parameter into SQL query (where parameter) directly. Unfortunately, this approach makes SQL query quite badly human-readable (adding lots of code like #(lf) instead of new line etc.). Thank's for bringing up again the possibility to fold our queries in an intuitive way.

  • @TitusRex
    @TitusRex 3 роки тому

    I was using exactly this approach (parameter), I'm glad I'm not the only one.

  • @mcnater
    @mcnater 3 роки тому +5

    Big problem with 2 and 3 at our org is that we have millions of rows and those queries run first before it can get to those steps limiting the data. So it's still a huge pain on desktop to sit there and wait. We also don't have access to views so that's out the window. I have made some parameters that will switch between "ROWNUM < 100" or "TOP 100" depending on our server and that works OK, but it's still kind of a pain.

    • @akhilannan
      @akhilannan 3 роки тому +1

      I think query folding should work here, which should push back the filter directly to the source.

    • @mcnater
      @mcnater 3 роки тому

      @@akhilannan most of our code is done with custom SQL from the start so I don't think so in that case.

    • @HenriqueAzvdo
      @HenriqueAzvdo Рік тому +1

      What did you do? I have the same problem, I limit it by date (start date and end date) and first it loads the entire query, which takes hours and hours

    • @cowpoke98
      @cowpoke98 Рік тому +2

      Bingo, the video is a little misleading because it says "limit the # of rows being brought back from the data source". But thats not what these are doing, its applying the filter AFTER all of that data is queried. Every time you modify a table or anything and "Apply changes" you have to sit through all that data being brought in again. There needs to be a simple way to limit the # brought over on import.

    • @mcnater
      @mcnater Рік тому

      100% correct@@cowpoke98

  • @getusama
    @getusama 2 роки тому +2

    That is some great advice - thank you.
    I was also wondering in the third way using start and end dates; are you able to dynamically set the values to be current date for end date and say the date starting from the last 3 months?
    This is so that once you are done with uploading the report and you have set the schedule the start and end date parameter in power bi service could always use the current and last date (which will be different every day).

    • @ruelformales5184
      @ruelformales5184 Рік тому

      I built upon your question, because you are correct you need the last updated data.
      1. Create a parameter as "Date" which will act as your start date (don't worry about the end date)
      2. On your transformation, look for a date column and choose "Is After" and use your Parameter Date. - let's say put a date 3 months ago
      3. Publish your PBIX, then change your parameter date as far as you need( e.g. 5 years ago)
      4. Let the heavy lifting happen on Powerbi service - you should get the latest data from 5 years till today

  • @pabeader1941
    @pabeader1941 3 роки тому

    OOOOOHHH! If only I was allowed to use the service!!! So many things would be possible. Great content and I will keep it in mind if I ever get them to change their minds.

  • @sands7779
    @sands7779 3 роки тому

    Really useful tips that will help people working from home and working with slow broadband through vpn develop and deploy reports and take less time waiting for rows to load, thanks Patrick 😊

  • @premjitchowdhury262
    @premjitchowdhury262 2 роки тому

    This channel is making power bi no1 in the world.

  • @vpwl
    @vpwl 3 роки тому

    Another amazing video! Great job Patrick! I always wondered how models with large data sets were created and published without a powerful machine.

  • @intelabhinav
    @intelabhinav 3 роки тому

    I love the fact that you can select the table that you want to load.. as in the one which is filtered or not filtered... I am using this feature for testing putt he load with fewer rows.

  • @apekshaawaji185
    @apekshaawaji185 3 роки тому

    Thank you so much Patrick😀 This is really helpful! Would love to see videos on customer attrition analysis. Cheers👍

  • @tymccurdy8932
    @tymccurdy8932 2 роки тому +1

    Done something similar with the parameter values. Instead, we set up two dataflow entities to work off of, a full set of the data, and a sample set of the data. Names Source & Source_Sample.
    Then we can just parameterize the source. So start with the parameter point to Source_Sample, when it's all done and published - update parameter to Source. I think this way, as it can standardize the process for all report builders in our company and we can ensure they're getting a good distribution of the data within the Sample. Report writers just toss in a couple standard lines in Advanced Editor (copy/paste) and they're all set up. We don't need to rely on them to find janky ways to try to get their own sample set.

  • @AbouAli01006300091
    @AbouAli01006300091 Рік тому

    The 3rd method is the most interesting "BUT" if I want the complete period (5 years for example) but with a reduced dataset. The workaround I used was:
    1. sort the dataset using the date column
    2. create an index column
    3. add a helper column in which I divide the index values using Number.Mod (lets say over 10 which means I need to keep only one row out of every 10 rows)
    4. filter the helper column to keep 0 values only
    BINGO: I reduced my data to 10% sample of the total volume and kept all the date range
    Advantage is: having sample data across the entire time
    Disadvantage is: step no.1 above means the full dataset is loaded in first step :(
    I hope if someone can achieve the same goal without loading the full dataset.

  • @miguelguillenpaz
    @miguelguillenpaz 11 місяців тому

    Hi Patric, thank you so mucho. I was looking for this information before

  • @D_ytAcct
    @D_ytAcct 2 роки тому

    thank you for this - I'm going to try and set up some refresh parameters now on the report I'm building because it links to Salesforce but I only need to see one of the territories/divisions so hopefully, hopefully, this will help reduce the thousands of lines it pulls through!

    • @D_ytAcct
      @D_ytAcct 2 роки тому

      it worked! Thank goodness, that's definitely reduced the strain on this(!)

  • @KateMikhailova
    @KateMikhailova 2 роки тому

    Thanks for great idea (2-nd way with parameter)!
    I can't use it "as is" because Report Server (on-premises) haven't parameter's option for scheduled refresh. But (as always) xlsx-file with number of rows in network folder works well :)
    According to Matthew's "upstream" maxima I included number of rows from xlsx-file in SELECT-section of query. Now I know that "first 100 rows" has different translation for MS SQL-source and for Oracle-source. Interesting experience!

  • @naveenkumarm.r4692
    @naveenkumarm.r4692 3 роки тому

    Hi Patrik, I need a solution for monitoring scheduled refresh from multiple workspaces.. can you please make a video on this...

  • @vpwl
    @vpwl 3 роки тому

    Question- what if my model includes multiple tables and data sets, would I need to create separate parameters for each table? Will I be able to adjust them all in the service?

  • @runexn5940
    @runexn5940 8 місяців тому

    Q: I had a question regarding using the first tip with incremental refresh. The table I want to implement incremental refresh on takes too long to load data on my desktop file so I am using an sp to load only a small subset of the data on my desktop file and then after implementing incremental refresh on it I publish it and then make changes to my sp to bring in all the data and refresh the data on service.
    This method worked before without incremental refresh but now refresh the initial refresh fails after some 8-10h and I am not sure what the exact cause may be. Is it because I am using an sp with incremental refresh? Or I have to load all the data in the file before publish it for thw initial refresh to work fine?

  • @watkinson8230
    @watkinson8230 3 роки тому

    Hey Patrick does this work if you are appending multiple CSV's and then various transformations i.e. would the filter on the final table be enough to limit the refresh or do I need to do it in each source import?

  • @AlbertoTrujillo
    @AlbertoTrujillo 3 роки тому

    Great Video Patrick!

  • @aliramadan7425
    @aliramadan7425 3 роки тому

    Hello Patrick. Interesting stuff. Do you have any advice on optimizing performance between Power BI Desktop and AWS Athena using ODBC connection? Would the reduce rows option you showed be a good option in case of 150+ million rows? Appreciate your feedback.

  • @LandscapeInMotion
    @LandscapeInMotion 3 роки тому +2

    Want about joining tables with millions of rows? Performance? Normally need SSAS..

    • @GuyInACube
      @GuyInACube  3 роки тому

      SSAS Tabular is the same VertiPaq engine as Power BI. It works :)

  • @sumitchoubey3859
    @sumitchoubey3859 3 роки тому +1

    Informative session

  • @akhileshgupta1971
    @akhileshgupta1971 3 роки тому

    Patrick - plz start a tutorial series step by step video to reach the goal as pro on Power BI. I have checked all through your uploaded videos and unable to find out the tutorials series on Power Bi. if you have all modules published via through one can be professional then plz arrange the module in order like ascending to descending to be a pro.

  • @marounsader318
    @marounsader318 3 роки тому

    hi matt, are you running the refresh on a laptop or server, because the refresh of the 500k rows is pretty fast? if it on laptopn can you please share the specs, as even 100k rows its takes alot of time on my working laptop

  • @Carito_k
    @Carito_k 10 місяців тому

    If I make a parameter for selecting only 100 rows does it affect to my final result?

  • @ronvdlaan1
    @ronvdlaan1 3 роки тому

    Hallo Patrick, i set this up and it words fine with a value in the parameter, but when i set the parameter to 0 then all my previous steps in the Query Desiner are lost.
    What cab be my problem?

  • @singhraunak09
    @singhraunak09 3 роки тому +1

    Wow superb video ❤️

  • @haydehkaveh3047
    @haydehkaveh3047 3 роки тому

    I have done the parameter adjusted to have 100 rows, but on the Cloud service does not let me to refresh because of credential pass. any thought?

  • @saikirankusumanchi4522
    @saikirankusumanchi4522 3 роки тому

    this helps. thanks for this trick

  • @sayelichakraborty7222
    @sayelichakraborty7222 Рік тому

    Hi Patric, I Am having a weird error, When I put NumberOf Rows in Advance Query Editor.. and click ok, it shows an error " The import NumberOfRows matches no exports. Did you miss a module reference?" Please Help

  • @marymurphy9014
    @marymurphy9014 2 роки тому

    can the parameter(s) be a list of 500 or so employee ids? The only thing I have to limit the 6M+ record pull is the employee id or employee email. I don't need all 125,000 employees - just our little 500+ person group.

  • @meeradominic9094
    @meeradominic9094 3 роки тому

    Hi, I'm looking for changing an existing report from odbc to powerbi. The source is postgresql. After I installed the connector in my local machine, I could successfully connect to DB by changing properties in advanced editor of the query. But when I published the report to the Service, it is not working. Could you help with it?

  • @muralichiyan
    @muralichiyan 2 роки тому

    Nice, but end user wants to download normal export without reduce data how to do that

  • @arnaudcampestre
    @arnaudcampestre Рік тому

    Would this first method help to publish a pbix file that is above 1gb?

  • @lestatelvampir69
    @lestatelvampir69 3 роки тому

    How would you configure the editor when the source is not a single file but a folder? I can't see how to do it :(

  • @felipefrancisco01
    @felipefrancisco01 3 роки тому

    that's awesome, man.

  • @ahmadbadalov8489
    @ahmadbadalov8489 Рік тому

    This didn't work for me, because I was calculating last 30 days, filtered from 1 jan 23, only on 1 jan 23 doesn't show correct on the service.

  • @haydehkaveh3047
    @haydehkaveh3047 3 роки тому

    Even the credential password now is working, modifying the parameter does not still refresh my report.

  • @Kaiyening
    @Kaiyening 3 роки тому

    Dat Power BI yellow Apple Watch strap is glorious! 😆
    Also I’m anxious to upvote this video as it’s 666 likes now ._.

  • @pankajsonawane3958
    @pankajsonawane3958 3 роки тому

    Hi, Patrick huge fan of yours..
    I am stuck with a time Slicer. I have static time column which has values from 12:00:00 AM to 12:59:59 PM in the time table. I want to create a time slicer that will look like a range slicer. If I do that using between option in the sclicer I am not getting time values to select in the boxes I see date values there.
    My goal is to filter out the page by a time range.
    Please help!!!!!!!!!

  • @GurpreetSingh-jg9ll
    @GurpreetSingh-jg9ll 3 роки тому +2

    👌🏼👍🏼

  • @CuddleStories
    @CuddleStories 3 роки тому +1

    Will Smith knows powerbi?!?

  • @rahulmaurya9444
    @rahulmaurya9444 3 роки тому

    Can we apply custom last one month filter in advance editor