Common mistakes in big data models

SQLBI

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 15 січ 2025

КОМЕНТАРІ • 64

@Nelmistro 2 роки тому ⁺⁴⁴
To see all 4 of these people in one video is kinda like seeing the Avengers of Big Data. I've definitely learned a lot from these guys!
@jkschola1 Рік тому ⁺³
I've been tuning into both @SQLBI and Guy in a Cube's videos to enhance my Power BI Best Practices knowledge - what an epic crossover! 👏 Kudos to both of you for your valuable contributions! 👏
@Don_Modern_Ancestor 3 роки тому ⁺²¹
I've taken courses and consumed SQLbi and Guy in a Cube videos which are really good, but as a group you guys were great! We're starting strategy meetings for data governance, administration and modeling for a large enterprise system and appreciate your insights.
@stumit63_tech 2 роки тому
Hear, hear!
@clausm73 3 роки тому ⁺⁶
This is one of the best Power BI videos ever! Why? Because a lot of concepts are being touched and should lead the spectator to investigate further.
@Pooja-pd1ee 3 роки тому ⁺¹¹
Loved the entire session. There's so much you are doing for the bi community. Keep up the good work, guys.
@Furryfattal 2 роки тому ⁺¹
Great talk gentlemen. What kinda put the bow on the whole thing for me was the impromptu demo of how "follow up" meetings are formed at the very end of the segment. That really hit home, so thank you again!
@kostaskolotouros4345 2 роки тому ⁺¹
The Big Four of Power BI in the same panel! Thank you guys!
@bbbradl 2 роки тому ⁺¹
Thanks! I hope you have more content on very large dataset issues & solutions!
My take-away is "don't waste" and "Import whenever possible".
In our case, the modeling team is using Direct Query to Views that sometimes have 50+ joins, that are non-Scalar (use ranges) that themselves are joined to others in the model.
I'm just the front-end guy, but I've offered them the following suggestions:
1. Make all keys Integer.
2. Use in-memory Materialled Views for the most complex queries.
3. On SSAS, bring the most common tables into memory.
4. Place values most used as filters and parameters out of the Snowflakes (like Employee Status Code, Unit Code) and into their parent dimension tables, because they're always referenced and relationships are costly.
However, the restrictions on Live Connect to an SSAS model (using the Analysis Services connector) are many. I struggle daily with how I can get by without being able to create columns or tables, how to join other datasets or models when needed, or in case I need to bypass their model and go straight to the OLTP, then how to preserve Row Level Security.
A series would be A W E S O M E!!!
@shujashakir9952 2 роки тому ⁺²
All masters together. Great!
@sushilgulati Рік тому ⁺¹
Thanks for this great conversation guys !
@joannageorge6628 3 роки тому ⁺²
Wait a minute, so you have ALL my favorite people in 1 video? I cant handle this :)
@micahdail 3 роки тому ⁺⁴
Yes, 4 billion. ;-) 14:18 .
I do want to add that to Patrick's point it is a balance of what's in your model and what you need. 4 billion worked very well regardless, but after internal discussions we decided to leave the less-frequently-used grains in direct query and free up some P2 space for other models. So we are back down to slim 1.5B on that one model which is enabling us to deploy more models to.
@anhphan9141 3 роки тому ⁺²
OMG my favorite Power BI People in 1 video. Love it!
@rrestituti 3 роки тому ⁺¹
You are freaking intelligent!!!!!
I love your channels.
@happyheart9431 Рік тому ⁺¹
Billion Thanks for sharing such a valuable knowledge
@wexwexexort 3 роки тому ⁺¹
Fantastic session, thanks!
@ThanhNhandl 3 роки тому ⁺²
i love the way they discussed most
@jjjj8058 2 роки тому ⁺¹
great discussion. My issue with import over DQ is that you lose the centrally managed RLS applied at the source.
@reynaldomalave 3 роки тому ⁺¹
Thank you guys!
@mikar636 3 роки тому ⁺¹
Amazing!!! Thanks for sharing!!
@AnandDwivedi 3 роки тому ⁺⁴
Marco Alberto expecting a video on XMLA endpoints inc refresh
@FedericoLov 2 роки тому
Patrick! even if the load happens once a day stakeholders don't want to introduce any additional lag.
The problem is yes it runs once a day but we don't know at what time exactly so setting up a refresh schedule becomes difficult.
Also, partitions management in powerbi is still behind what we used to have in SSAS and incremental refresh is only available on premium.
@hmhkh 3 роки тому ⁺²
Hi Marco, Alberto, Adam and Patrick I am fan and follower for four of you, thanks for the informative discussion, we are telecom operator and we have tremendous amount of data from all our nodes, we may need a consultation from your side to optimize our data models.
Our backend is SSAS tabular and our front end is Tableau.
@SQLBI 3 роки тому ⁺¹
You can contact us on www.sqlbi.com/consulting/
@tejatarigopula3266 3 роки тому ⁺¹
Thank you for sharing real-time solutions for real world problems. Adam was referring to dictionary on a column. Can you please elaborate on this. Thanks
@SQLBI 3 роки тому ⁺¹
Read this: www.sqlbi.com/articles/measuring-the-dictionary-size-of-a-column-correctly/
@artigasmoises 3 роки тому ⁺²
Excelente video, saludos.
@Ajuneja99 2 роки тому ⁺¹
Thanks for the great content. You guys talked about direct query being last resort in scenarios like latest data required, or client not willing to store data to Power BI. What I have seen with a client is having database side RLS as one of the reason of not using the import mode. Any thoughts around that or any other content I can go though related to this? I'm thinking now if there is any workaround to that.
@zecosta9579 3 роки тому
What was that term brought up a couple times, "filter dags" (e.g. around the 49:00 mark)? Have not heard of it before and I don't think I even understood the word correctly.
Many thanks for the great video folks, gave me a lot to think about when developing my next reports :)
@SQLBI 3 роки тому ⁺¹
It's "filtered aggs" meaning "filtered aggregations" - it's a specific optimization technique for SQL Server (materialized view, filtered indexes, ...) that could improve performance in a DirectQuery scenario.
@tarekdemiati9991 2 роки тому
It will be very interesting to have another discussion after the "new"Datamart feature will have been implemented in production with a few enterprise customers.
It seems like the DataMart could be an amazing tool for handling multiple large data sources.
@sandeepbarge4699 5 місяців тому
With the introduction of Direct Lake connection mode with data in Lakehouse, would you recommend Direct Lake or Import? Which one is more efficient and faster? Assume that I am using Fabric Lakehouse as a source.
@SQLBI 5 місяців тому
It depends. Read: www.sqlbi.com/blog/marco/2024/04/06/direct-lake-vs-import-mode-in-power-bi/
@AnandDwivedi 3 роки тому ⁺¹
amazing Discussion
@bhaveshpatelaus 3 роки тому
We are talking about using Direct Query for large tables. However we need to remember that No more than 1 million rows are supported in the query. The visual is going to throw an error if it has to query 1 million rows. Need more settings to control how DQ behaves with source.
@SQLBI 3 роки тому
The limitation is related to the temporary tables moved to the formula engine. You can have 10 billion rows in DirectQuery, but you have to be aware of the limitation depending on the queries. Very generally speaking, dimension size could be an issue in a limited relationship, but there are many other elements in play.
But yes, this limit is definitely something we'd like to customize. It is customizable in Analysis Services, not in Power BI (yet).
@bojanjovanovic756 3 роки тому ⁺¹
Thank you very much ! Your contribution is priceless !
@NeumsFor9 3 роки тому
Chris Adamson's Mastering DWH Aggregates is a good reference for solving these issues, but also Spark is not a bad alternative in some cases.
@rerangelt 3 роки тому ⁺³
This was a deluxe video.
@kananparmar9656 2 роки тому ⁺¹
Hi Adam Saxton, Patrick LeBlanc, Alberto Ferrari, and Marco Russo I have been new to Power Bi where I'm trying to fetch the on- premise data into Azure cloud and then to visualized through Power Bi.
I'm struggling here what azure architure to develop for Azure to fetch data from POS and what Azure components is to be used to get the project going and how this Data Modeling and relationship knowledge I have got can be implemented.
@anilkumar-rg9jk 2 роки тому ⁺¹
Do more on dax guys like this
@franciscoclaudio4818 3 роки тому
Here in Brazil, analysts use Pentaho a lot in the processing of big data. Do you indicate a better tool?
@terryliu3635 3 роки тому ⁺²
I like the part “no cached” mode instead of “direct query”. Lol.
@hojatalaii8360 Рік тому
How to calculate max consecutive negative number in a column by DAX?
@anselmolopez8584 3 роки тому ⁺¹
congratulation on the knowlage you shere, i need it and i will folow you...from Spain
@paradeen2730 3 роки тому
I heard you talken about text search in large dimensions. At least the filter visuals I have tested couldn’t fold a SQL query supporting SQL full text index search. Is this your experience to?
@SQLBI 3 роки тому
Yes, the best we've seen is using LIKE. However, *depending on the hardware*, the same search could be faster on a relational database.
@netflixaccount3210 3 роки тому ⁺⁵
I laughed so hard at this comment of Kevin Hunt:
Marco did you take all of Alberto's books?
@Karenshow 10 місяців тому
what about telematics data, that gets big pretty fast.
@bhaveshpatelaus 3 роки тому
As per the Analysis Services Documentation, The cardinality of the columns should not be greater than 2 billion rows. I think this is also applied for Power Bi. Imagine having ticket number in a fact table that you need to display in your report and have more than 2 billion unique values.
@SQLBI 3 роки тому ⁺¹
The cardinality can be bigger, the issue is mainly related to performance when the column is used in a relationship. Another scenario is DISTINCTCOUNT, where the cardinality has a big role in performance. Approximate algorithms can be very useful there, take a look at this article: dax.tips/2019/12/20/dax-approx-distinct-count/
Absolute numbers also depend on the hardware, over time the "limit" can change because of hardware improvements. In 2012 1 million was a high cardinality, today 4 million rows could have reasonable performance on dedicated hardware (I would stay below 2m rows on cloud services, though).
@natnaelstesfagiorgis3077 Рік тому
Advice on connecting with SQL
@shafa7668 2 роки тому ⁺¹
I think the issue is business requirements change over time and it is expected that the existing model should be able to answer the future questions. That's why is it safe to keep as many columns as available. So if you have narrow fact tables there is a risk that in future your model will not serve the purpose.
@silviab5579 3 роки тому
❤❤❤❤
@NeumsFor9 3 роки тому ⁺¹
In companies with high turnover, it can be easier to start over as opposed to trying to decompose what other teams implemented. If you can't decompose it in a week, chances are you could be better off starting from scratch. The big companies are actually the worst offenders.
@richard-social8125 3 роки тому
Summary?
@VladMZ 3 роки тому
Oracle's OBIEE enforce Star Schemas religiously. You simply think thrice before adding another attribute to your fact table. Let alone you CANNOT create standalone fact tables without any dimensions.
@rcb3921 3 роки тому ⁺⁴
"We have a small youtube channel"
@timianalytics7150 Рік тому ⁺¹
What's that on Patrick's chest😂
@timewithsopy 6 місяців тому
I'm dealing with a view in our Data Warehouse that has 57 columns and 225 million rows of records. I also have another table with 3 billion rows of records. My refreshes drains the life energy out of me because they take so long. I know it's not a good idea to have everyhting in a single view/table. I'm here trying to find something to show the company that what we have is bad approach. Anybody out there can give me a summary and clean insight that I can pass on to our data warehouse team?
@SQLBI 6 місяців тому
Implement an incremental refresh: you build your house every day when you go camping, but you don't bring the furniture. Your historical data is like the content of the furniture, you would not relocate every day, right?
@gondebas 3 роки тому
Wooow! Greetings! I got a question for you, big guys, on how do you welcome calculated columns in a fact table. Whenever the fact table gets fat enough, say 1 GB, the star schema does not shine anymore. What works for me in terms of performance and speeding up reports is to add calculated columns to the fact table. It is efficient not only if there is any complexity in the measure. I can notice the speed jump by staying away from such a simple function as SUMX to calculate Sales Value. Adding the additional calculated column to FactTable increases the total size of the report negligibly, so it is a relatively low cost. Even replacing measure SUMX with a pair of the calculated column of sales value and then using a measure with a simple SUM function increases the speed approximately five times. If the measure contains any grain of complexity, say the IF function, then the speed increase manifold. Am I missing something here?
@SQLBI 3 роки тому
Details are important. If you have an IF, the calculated column is faster, but it would be better to compute the column in the data source or in Power Query, because a calculated column in a large table is not a good idea for other reasons (lower compression, higher memory consumption at refresh time, longer processing time).
If you have a simple expression that can be pushed to the storage engine (like Quantity * Price using two columns of the same table), usually you don't see any performance advantage at query time from the calculated column.

Наступне

Автоматичне відтворення

Development Tools for Power BI and Analysis Services models