Power BI Incremental Refresh - Understanding Detect Data Changes

Поділитися
Вставка
  • Опубліковано 31 лип 2024
  • Patrick breaks down how detect data changes works with Power BI Incremental Refresh. He demystifies some common misunderstandings when using this setting in Power BI Premium.
    Download Patrick's free cheat-sheet: guyinacu.be/detectdatachanges...
    Incremental refresh documentation - docs.microsoft.com/power-bi/s...
    Getting started with Incremental Refresh in Power BI (video) - • Incremental Refresh wi...
    ********
    LET'S CONNECT!
    ********
    -- / guyinacube
    -- / awsaxton
    -- / patrickdba
    -- / guyinacube
    -- / guyinacube
    -- guyinacube.com
    **Gear**
    Check out our Tools page - guyinacube.com/tools/
    #PowerBI #IncrementalRefresh #PowerBIPremium
  • Наука та технологія

КОМЕНТАРІ • 86

  • @Baldur1005
    @Baldur1005 5 років тому +8

    "I'm kind of big deal" when I spotted it I laughed. Patrick you've most swag from every business intelligence youtuber.

  • @dangelo90
    @dangelo90 3 роки тому +1

    Epic, as always. Needed info, showing how to, straight to the point. Thank you for your videos!

  • @andresgeb
    @andresgeb 4 роки тому +1

    Hi Pathric! I've a question.. what if I have set incremental refresh for the last 1 month, Detect data changes on my "UpdatedDate" column (yes, Is a DATE, not datetime column unafortunatelly) AND I've schedule refresh every hour..
    Then, one day records were loaded into the DB, the first Time the dataset Is refreshed loads the records into the dataset..
    Then, the UpdatedDate column Is updated for the partition (todays date).
    Later on, a user loads More records into the DB... same vale un the UpdatedDate column... Then question Is... As the UpdatedDate Will be the dame value... The refresh would not load the new data.. that Is correct?

  • @AJ-gv2hj
    @AJ-gv2hj 5 років тому

    Always great- thanks for the content!

  • @jessecox4581
    @jessecox4581 4 роки тому

    Great idea as usual Patrick, I'm late to the party on this one but determined to get caught up!

  • @trentonsmith8739
    @trentonsmith8739 2 роки тому +6

    For anyone reading in 2022, custom polling queries allow you use an M expression to detect data changes instead of a single datetime column selection that Power BI provides. You can use this for many things, including deletions if need be. No need for the soft deletion limitation, so if you are thinking that is a stopping point I would take a look at that feature.

    • @prabhatbhargav1925
      @prabhatbhargav1925 2 роки тому

      Can you please help me with the links, where can I read about it?

    • @wexwexexort
      @wexwexexort Рік тому

      Can you elaborate? please share an example

  • @luzernchan5691
    @luzernchan5691 5 років тому

    Thanks! you are a star!

  • @avivzach4926
    @avivzach4926 3 роки тому +1

    Amazing feature! can it be used for updating data? lets say I have 5 years of data, records being updated randomly. can I use it to update just the records that were changed in the last day fir example? and refresh once a day?

  • @tsepont
    @tsepont 4 роки тому

    This is gonna work great for CommCare/PowerBi - the refreshes take a while! Thanks chief!

  • @rosecraigie
    @rosecraigie 4 роки тому +1

    Hi Patrick, I've just set up the incremental refresh with detect data changes and we are seeing a lot of queries come through with select top 1000, some with the partition parameters and some without. Is this due to something I have done wrong in the set up or is it just how incremental refresh works?

  • @pranavkumar2336
    @pranavkumar2336 5 років тому

    Hi Patrick, is it possible to display value is x out of y in tooltip. For example, 10 out of 100 (10.00%) using DAX or some other mean.....????

  • @sumanchakraborty681
    @sumanchakraborty681 4 роки тому

    My DB has 1 month rolling data . I trying keep 3 months data in power bi. So implemented incremental refresh . But once data gets deleted from DB also getting removed from the power bi. Can you help me on this ?

  • @ayanmukherjee1873
    @ayanmukherjee1873 4 роки тому

    Hi I'm having an error ""OLE DB or ODBC error: [DataSource.Error] An error happened while reading data from the provider: 'An item with the same key has already been added.'."After I created a filter on a DateTime for an incremental refresh with RangeStart and RangeEnd. Can you help me with?

  • @DavidWilsonNZ
    @DavidWilsonNZ 5 років тому

    So totally awesome! Thank you... I might have implemented an "efficiency" change. :-)

  • @philipmead3575
    @philipmead3575 3 роки тому

    i am about to get premium just so i can do this, however does this work when refreshing over 2500 sharepoint files? cause at the moment its only brining in around 1200 files into power bi. so if i got premium and setup incremental refresh would that fix the issue?

  • @krishnansrinivasan9770
    @krishnansrinivasan9770 3 роки тому

    I want to know if we can use power flow or power automate to detect changes to underlying tables and if so refresh the power bi dataset automatically

  • @roelturnbull
    @roelturnbull 3 роки тому

    Question: Would this technique also work with partitions in the future? The incremental refresh in PowerBI Desktop you can only refresh data in the last year/quarters/month/day. But I have data for future months. Can I add records for future months in the config_FIS_DOC table for the future so that the partitions are created as soon as the record in config_FIS_DOC is triggered for refresh?

  • @KevinAnchi
    @KevinAnchi 3 роки тому

    Is it possible to configure the incremental refresh for data coming from API

  • @JosefineNord
    @JosefineNord 3 роки тому

    Hi, great video!
    Do you know if I can use incremental refresh in a Dataflow, when getting data from Amazon Redshift?

  • @b4uarvind
    @b4uarvind 2 роки тому

    Hi Patrick, I have a problem. How can I map two tables on multiple columns based on some conditions. Like I have table1 with columns A,B,C,D and table2 with columns A,B,C,E now I want to map table1 and table2 based on columns A,B,C in both the tables.

  • @mabl4367
    @mabl4367 Рік тому

    What if you have no control over the data source and it does hard deletes.
    Do you have to always get all the records in order to check if any record was deleted?

  • @PeterNovakIng
    @PeterNovakIng Рік тому

    Thank you Patrick

  • @marcford6866
    @marcford6866 4 роки тому

    Sounds like Inc Refresh has to go and get (example) 5yrs of data the 1st time and can the settle into smaller refreshes afterward. My issue is, when I make a small change to the report in Desktop and publish it again, it seems like we start all over needing 5 years. Am I missing something?

  • @dmitryvakhrushev81
    @dmitryvakhrushev81 3 роки тому

    Why does the incremental refresh issue TWO queries for each day (e.g. refresh rows in the last 10 days)?

  • @manojkahar5718
    @manojkahar5718 5 років тому

    Hi Patrick, How are you?
    I have Query Related To Dynamic File lookup and Folder Pick up.
    Example I have Folder 2017, 2018, 2019. In those folder there are Sub Folders with Month Names. In the month folder there are daily excel files.
    My Question is on the Dashboard how user can select the previous dated file and display the number ect.
    Manoj

  • @abhisoulstormer
    @abhisoulstormer 5 років тому +1

    If i am using a function to pull data from Excel files from SharePoint... would it be possible to set up incremental refresh on the "Date Modified" meta data in SharePoint for that Excel? Meaning i dont need to run my function for each and every Excel file as it increases month on month but limits it to just running the function on the excel that was modified in the last month?

    • @abhisoulstormer
      @abhisoulstormer 5 років тому

      Nvm! went and saw Adam and Christian's video on it and that answers my question! Amazing stuff as always! thanks a million :D

  • @nataliyar2982
    @nataliyar2982 5 років тому

    was trying to ask this question on your website but could not find a way so here it is: I am trying to use a conditional formatting based on the if statement. If the statement is true take the info from this data source if false go to another data source. I want to make them colored based on which data source the date came from.

  • @vinaypugalia6627
    @vinaypugalia6627 3 роки тому

    Hello Patrick, the process you have explained at 5:10 is not TRUE for me. It's not only pulling records with LastUpdateTime greater than the MAX() but also the ones which are lesser than that. So, effectively it's pulling all records for which the LastUpdateTime got changed. Can you please explain this?

  • @Bharath_PBI
    @Bharath_PBI 3 роки тому

    Thank you.1 question, incremental refresh can not be set up on a dual storage table and requires import mode. But having a fact agg table in import mode and base fact in direct query, will injection happen from import to DQ query which is sent to db? If yes how's the performance on large models with hundreds of millions on both dim and fact tables?

    • @anjanashetty482
      @anjanashetty482 Рік тому

      Did you get answer for this? I am also trying to implement the same. Your response would greatly help.

  • @rodolfoe90b
    @rodolfoe90b 5 років тому

    Hi ! Great videos.. i've got a question. If I create my report on PowerBI Online and it gets the data from an excel in onedrive, it update the dataset every hour by it self. Is there any way i can do the same with powerbi desktop? What I need is to have my dashboards updated every hour automatically but i dont want to create them on PowerBI ONLINE. THANKS!

    • @roymou5856
      @roymou5856 8 місяців тому

      power bi desktop is just an editor. you can try doing refresh there to see the new data. but it will not be automatic.

  • @sarva0483
    @sarva0483 5 років тому

    How do we handle if we use 'Refresh rows where Column OrderDate=1 Month' only, and what happens if the record already exists in the stored data on older dates. But this will create in duplicate record with multiple timesatmps. what if i use some column as primary key in the model?. How to enforse soft delete for the oldest duplicate record.

    • @lokeshramaraj2800
      @lokeshramaraj2800 5 років тому

      That's why Patrick kept the time periods for history and refresh periods as same 24 months. However, for the incremental refresh is based on the audit field for data changes so the queries will be issued for all 24 months but only the partitions where the subset of data was changed will be merged

  • @gabrielmorais7312
    @gabrielmorais7312 3 роки тому

    Hey! The cheat sheet link appears to be broken

  • @djwalborn1
    @djwalborn1 4 роки тому +2

    Configuring incremental refresh can greatly reduce the total query time by setting up these partitions.Awesome feature.
    However, Power BI Dataset refresh is limited to 5 hours in premium capacity before a timeout error. My question is how do you approach the initial load of the data? I can use incremental refresh to continually update partitions successfully (each month at a time), but I'm not sure how to load in historical (years) without running into the refresh timeout during the initial load. Thanks!

  • @dmitrypirshtuk3450
    @dmitrypirshtuk3450 3 роки тому

    cheat-sheet link is not available.

  • @visheshjjain
    @visheshjjain 3 роки тому

    Hey Patrick!
    Where can I get the tee?!?!?!

  • @BoxLaneProductions
    @BoxLaneProductions 4 роки тому

    I just want it to change everytime a new score is saved to the data sheet it's looking at which happens once or twice a minute, is this the correct thing for that?

    • @GuyInACube
      @GuyInACube  4 роки тому

      For that fast of a data change, you would need to either look at DirectQuery, or perhaps a streaming dataset that Power Automate can push into.

  • @astriddini1665
    @astriddini1665 3 роки тому +1

    hello good explanation. but i have problem when using this. I often get duplicate IDs on certain tables. i use a dataset. and I have worskspace and everything is scheduled refresh. Is it because there are many refresh times at the same time causing someone to fail to delete the data that is there. For real data I use DataView in my database. have you also experienced something like me ..?
    thank you..

    • @josephetorne7226
      @josephetorne7226 3 роки тому

      i'm also having problems with duplicates and i'm trying to find a solution to solve it. were you able to resolve the duplicates issue?

  • @DaveGrosser
    @DaveGrosser 5 років тому

    Great video about somewhat frustrating ETL / not really ETL functionality in PowerBI. So nice to see what's going on behind the scenes, thank you. I noticed in another video:
    - You note that Incremental refresh doesn't work with native SQL queries
    - A commenter noted that you can parameterize your native query and it DOES work. Neato. Here's my question -
    If you want to parameterize your native query to use incremental refresh AND detect data changes - how do you do *that*? I'm guessing there needs to be something else in the SQL statement to detect data changes, right?

    • @GuyInACube
      @GuyInACube  5 років тому

      Parameters should work. Using a native query technically will work, but query folding is broke and the start/end range filters won't be passed. So, if you have, say, 4 partitions that fire, you will have 4 of those native queries hit your data source without being filtered down. It isn't recommended.

    • @DaveGrosser
      @DaveGrosser 5 років тому +1

      @@GuyInACube Even if the RangeStart/rangeEnd params are written into the native query? I'm going to have to try it with my azure admin next week. I did notice that date params seem to be broken in Oracle, at least with the driver my org is using. More details to come.

    • @simonmasters3295
      @simonmasters3295 2 роки тому

      2 years ago? What's the latest?

    • @trentonsmith8739
      @trentonsmith8739 2 роки тому +1

      @@simonmasters3295 not sure about 2019, but you can use query folding in native queries, meaning you can use shared expressions (parameters in the PBI world) in your native query and it will work with query folding so it will work for incremental refresh.

  • @dgosbell
    @dgosbell 5 років тому +1

    Hey Patrick, can you do a follow up video going into more depth on the soft delete idea? I can't see how you can add the filter in your partition query, but not have it folded down into the updated date polling query too...

    • @GuyInACube
      @GuyInACube  5 років тому +2

      Sure thing. Will put it on the list. 👊 The idea of a soft delete is that you have a bit field in your table (true/false). When you delete an item, you set the field to true. and caused the updated field to have the current date. You can then filter those items out from a Power Query perspective so they don't show up in your report.

    • @dgosbell
      @dgosbell 5 років тому

      @@GuyInACube Thanks Patrick, I understand the concept of the soft delete. But it was not immediately obvious how to include the deleted records for polling purposes, but exclude them for data refresh. Thinking about this a bit more I can see that if you used one of the techniques to include an identical "max updated date" value for in every row in the partition that you could filter the values in that column differently. I'd be interested to see if you used this technique or something different and I think it's probably one of those small details that will trip people up. Thanks for all the great content - keep up the good work!

    • @user-ct5uq3rs1s
      @user-ct5uq3rs1s 5 місяців тому

      @@dgosbell I agree with your initial statement. I have tested it and it does fold the filter into the polling query. I'm not sure how to get around it. Patrick has an updated video on how to handle deletes, but I currently don't have premium cap.

  • @soratchjustbeyou7472
    @soratchjustbeyou7472 4 роки тому

    Have you tried incremental refresh with dataflow from a couchbase data source using a cdata odbc driver?

    • @GuyInACube
      @GuyInACube  4 роки тому

      We have not.

    • @soratchjustbeyou7472
      @soratchjustbeyou7472 4 роки тому

      @@GuyInACube Thank you. Was wondering, incremental refresh over dataflows from a source that provides json docs as in couchbase. Incremental refresh needs a datetime field. The date fields in a json doc reads as a string. Can i still leverage incremental refresh if I transform the date field from json to datetime after the initial pull? Hopefully that makes since. BTW I love this channel and send videos out to my co-works :)

  • @gdmitrich
    @gdmitrich 3 роки тому

    Hey guys,
    does incremental refresh only available for premium users?

    • @varun1023
      @varun1023 3 роки тому +1

      No. With any license can use

  • @bshulke
    @bshulke 5 років тому

    When will this come out of preview? I work for a Fortune 100 and they won't allow us to deploy Preview features in production, and I desperately need it with my billion+ row tables.

    • @ZachRenwickData
      @ZachRenwickData 3 роки тому

      This has been out of preview for a while now, we are using it extensively.

  • @mathew9665
    @mathew9665 5 років тому

    Will Power BI Service/Desktop (I.E. Non-Premium users), get Incremental refresh? if no is there an method for non-premium capacity?

    • @GuyInACube
      @GuyInACube  5 років тому

      I'm not aware of any plans to make incremental refresh available for non-Premium datasets. Non-Premium datasets are capped at 1GB in size and usually incremental refresh wouldn't be as helpful as it would be for larger datasets. What we have found, working with customers, is sometimes you can do some model optimizations (removing unused columns, cleaning up fields that are being used to make them more efficient, etc...) and that can help reduce model size and make refresh go faster.
      The only other option i could think of would be to host the model in something like Azure Analysis Services, where you can define partitioning and aren't blocked by the size of the model.
      Again, I'd seriously look at possible model optimizations if you feel incremental refresh is needed on a dataset that is 1GB or less in size.

    • @namkce
      @namkce 4 роки тому +1

      Non-premium (Pro license) now has incremental refresh as of February 2020

  • @patsickboy743
    @patsickboy743 3 місяці тому

    I don't get email for download the cheat code file

  • @DineshKumar-hc3tj
    @DineshKumar-hc3tj 3 місяці тому

    Selecting detect data changes on the last update time of the records is returning the error saying duplicate records found. There are no actual duplicates in the table, but seems like the refresh is trying to insert the record instead of updating the record. How can I get rid of this?

    • @greathelmm
      @greathelmm 26 днів тому

      Having the same issue.
      Power bi is treating the updated row as a new row, causing duplicate error.

  • @dmitryvakhrushev81
    @dmitryvakhrushev81 3 роки тому

    Why is it 48 queries for 24 partitions?

  • @bradj229
    @bradj229 3 роки тому

    Cool feature, but if a full refresh is required AFTER you publish the report and there's a potential for hundreds of millions of rows, if not billions, seems like you could run into issues with requiring hours of query runtime to initially import that data in the service where there's less monitoring of which partition possibly failed or didn't. Seems like a better option is to do initial refresh locally to monitor the initial full imported, then turn on the incremental refresh feature, then publish and have it run incremental from then on within the service. The downside to this is, you could be publishing a large pbix file (5gb+, etc.), but at least if it failed it was all or nothing and you could try to upload again. Have you ran into this issue?

  • @SamuelRoshan
    @SamuelRoshan 4 роки тому

    Patrick , I was under the impression that if I set my data to refresh for the last 7 days, it would delete those rows from Dataset and reload. Is that not true?

    • @GuyInACube
      @GuyInACube  4 роки тому +1

      It will re-process those given partitions. If you choose 7 days, you will have 7 partitions for each day, and those will be wiped and reloaded. The problem comes with if you are only refreshing the last 7 days. What happens when a row in day 9 was deleted? that's where you have a problem as that partition wont' be refreshed.

    • @SamuelRoshan
      @SamuelRoshan 4 роки тому +1

      @@GuyInACube Thank you so much for responding! How do I address this issue? I have Orders numbers data that don't have a fixed timeline to get updated. Lets say I had an Order that was processed on 25-Jan-2018 and is in "OPEN" status. This row is now residing in the 2018 partition (or Jan-2018 partition depending on my settings). Now after 1 year, on Jan 15th 2019, the status of the Order changed to "Closed" - How will incremental refresh know, it look for This Order in the 2018 partition and update it? Could you please visually demonstrate how Incremental refresh handles such scenarios? Thank you once again! :)

    • @avivzach4926
      @avivzach4926 3 роки тому

      @@GuyInACube so how can you overcome this problem?

  • @78special
    @78special 4 роки тому +1

    I get why you need the soft delete functionality but that limitation alone means the whole incremental process is a big NO for us. We can't change a suite of existing products to perform a different delete model.

  • @DataSpinners
    @DataSpinners 4 роки тому +1

    So basically unless you go down the soft delete route then you won't be able to detect that the partition has changed and trigger the refresh through incremental refresh. It makes sense, but it will limit the number of use cases for the incremental refresh feature.

  • @garymelhaff7779
    @garymelhaff7779 2 роки тому

    So I get why you don't want to refresh partitions that aren't needed but....but if the timestamp is for an old time period and you don't update those partitions that have real changes then you've got outdated data so that would be like....um like a problem right? Customers don't like outdated data. So I can see doing this on daily or intra-day basis but say once a week doing full reload to get caught up.

  • @jomzrpg6312
    @jomzrpg6312 3 роки тому

    Hold on, so you set 24 months to store, data and then 24 months to refresh? Not sure but doesn't that beat the purpose of incremental refresh?

  • @shamalibi6380
    @shamalibi6380 4 роки тому +1

    Great video, I have API data which I want to be refreshed every hour and maintain the history as well as in incremental fresh with historical data. Can u please do a video on this. Please. I am on the edge of killing myself because of this.

    • @GuyInACube
      @GuyInACube  4 роки тому +1

      I don't know if that would work against the API data as i don't think it can be folded which would make incremental refresh pretty worthless unfortunately.

    • @shamalibi6380
      @shamalibi6380 4 роки тому

      Thank you very much for the quick response. Is there any alternative way which you suggest? for an example using some kinda scripts to call the API and get the data and store it in SharePoint or any other destination, (which can be scheduled to append the new data) so that we can retrieving from power BI. Thank you in advance.

  • @TheEVEInspiration
    @TheEVEInspiration 3 роки тому +3

    The soft-update pattern is never a good option when a very large number of records is involved.
    It will add a lot of complexity, cause data-explosion and speed degradation on the supplying system.
    ETL tables aren't typically maintained in-place with insert, updates and deletes like normal tables are.
    But instead created/(partition-)truncated, then filled with fresh data with insert queries where needed (typically not simply querys and with ever changing rules).
    Microsoft should have looked at count(*) as well as max(dateupdate).
    Then if either this or max(dateupdate) changes, it is certain a change happened, including any deletes.
    It is so easy to do this right, I just cannot fathom why they botched this feature in it's entirety by using just max(dateupdate) for detection!
    It is fixable however, with just one more reference value (the prior count) and extending the query a little.

    • @Tony-dp1rl
      @Tony-dp1rl Рік тому

      count(*) has a LOT of issues too though, including it has to scan the entire table within the range - if you are scanning the entire table, there are probably better options anyway. Not to mention, partition sizes may change, so counts become very complex in that situation. I am pretty sure they weighed the pro's and con's correctly and didn't "botch" it.

    • @TheEVEInspiration
      @TheEVEInspiration Рік тому

      @@Tony-dp1rl I disagree, reading an entire table is ultra cheap, even hashing a whole table is pretty fast. And it can be done very cleverly so only updated partitions have to be processed.
      What is however very expensive is writes, inserts into a filled table in particular. And sending data over a connections is also not cheap. Especially when things aren't local which is the typical case.
      The current feature basically force the user to always load the entire table in, or keep a separate copy of all the data so that it knows what got deleted and can re-inject those records with a proper timestamp. Completely undoing the whole optimization of truncate + insert in empty partition. And then some ...
      That is extra storage, extra writing, extra custom logic on the ETL side and extra reading. It is far more complex and expensive that just basic reading, which is honestly, blazingly fast these days (GB's per second).
      I have similar issues with pass through queries, no support of functions as a table source. It is almost as if they got stuck in the mid-90s with their relational database features and application support.