Agreed! Conversational tone, appropriate visuals, and well-paced. I'm in a $4k EDW graduate school class that teaches from bulleted PowerPoint slides. Doesn't compare 🙆😟
I watched this video to prep for an interview. Not only did I get the job but here I am a year later rewatching this with more confidence and still learning. Thank you, sir
The most understandable, detailed & overall introduction to Dimensional Modeling, with clear key words explanation and logically sequential arrangement on the slide content. Thanks Bryan!
This is the most informative session I have ever seen! 54 minutes of pure knowledge. If only everything I had an interest in learning was taught by Bryan!
My current favorite channel on UA-cam being a senior data engineer. I would like to have a video created by you telling more about what cubes are, OLAP, ROLAP, etc. This kind of nomenclature is getting more and more rare and people joining data engineering in this modern data stack area should understand them well so that they can understand how we got here and also talk to people familiar with this data engineering nomenclature.
ROLAP and MOLAP are only used in the legacy SSAS Multidimensional Models. They were cool but the Tabular model does not use them b/c everything is in memory. Thanks for the suggestion. Are you using SSAS Multidimensional Models?
I come from an OLTP software data architect side of the house and only had to hand off streaming/replication to an ODS. I just took a role that will require dimensional modeling. So glad I found your video. This clicks.
Hi Bryan, many thanks to provide this tutorial. The most solid foundation tutorial on data warehouse & dimensional modelling I have even seen. For those who want to build their career path as a data analyst, data architect etc. This is the best start point. I really regret that I did not see this video 2 years ago when I start off my career.
These are the best 53 minutes I've spent recently. Crisp and Clear. I feel much more confident and I request more videos on Dimensional Modeling. Thank you for your effort.
Thanks. I did do a video on Slowly Changing Dimensions which is a big area for interview questions. What topics are you most interested in regarding dimensional modeling?
I'm grateful for the time and efforts you put in explaining one of the key concepts anyone deals with enterrpise data needs. Clear and comprehensive. Thanks a lot
A very comprehensive and fully understandable video during my very first viewing itself. I felt like having read a high quality book on the topic within an hour. Bryan never wastes time on unnecessary talks and very effectively concentrates on the core through and through. Thank you!!
@@BryanCafferky My statement was true. Not just a passing compliment. I got a full perspective on what is dimension modelling and how it is different from Data warehousing. Leave alone warehousing, it actually helped me in designing various reports using data from Database tables. I was able to see the common threads that run through them all.
I have been working in data warehousing for the last 5 years and this video gave me the answer I have looking for so long- why is there so much normalisation in data warehouses we see today? Nobody ever gave me a satisfactory answer but you did sir. Big thanks !! This video deserves to be on the billboard.
Good sir, I typically never like or comment UA-cam videos. This was a must. You are masterful, and could quite possibly mint some people into data engineers off of this video. God bless you. I hope to become a teacher like you one day
I dont usually spend time watching a 53 min learning video most of the time unless the course and the orator is really worth it. Trust me sir, you kept me glued every minute. God bless you Sir !
I wish I had watched it earlier when i was struggling to understand Datawarehousing and Dimension Modeling. Very informative video. Every second was worth spending.
A very useful introduction to data warehousing and the common terminologies, presented in an interesting and easy to understand manner. IT was a very quick 53 minutes - I only meant to get an idea of the content and then watch it later - but got completely absorbed.
This is an excellent primer. The best one I've seen. I'll recommend it if anyone asks for advice on a good starting point. One point however: USER STORY. In 99% of Agile Software Development, the example from that book is NOT a User Story. A user story actually describes something that needs to be done and why. e.g. As a business manager I need to be able to drill down into the types of sales act my book stores to see the types of books being purchased, how they are being purchased, how they are being paid for, down to the level of the book title, so that I can understand what is being sold and better stock my stores. What the book you mention is actually doing is closer to a Use Case. There is semantic impedance going on. But it's just data. :) Anyway, I really do like this video and think the approach of using use cases is very good. I think though that there will be some need to translate between different groups kind of like what is a location to different systems. :)
Thank you for this excellent presentation on Dimensional Modeling. I'm a student and it was so easy to follow because you made it interesting and you provided some excellent examples to support your slides.
Excellent video. I do have a question at 15:57 . I was unable to understand how the design of the SalesFact table differs from what the OLTP table for Sales would have been. My OLTP Sales table would have been almost identical to SalesFact shown in this presentation, with the exception of a SalesLineItem.
Yeah. I had a hard time seeing the difference in the beginning too. Good question. In the OLTP table at about 23:09, notice the sample table has descriptive columns like CustomerFirstName, CustomerLastName, Product, SalesDate, Order Number, and OrderLineNumber. These are Dimension attributes. It also has Quantity and Price which are Facts or Measures. In a Star Schema, these cannot be in the same table. The Dimension attributes are stored in a separate table that has its own Primary Key. The Facts are stored in a Fact table with a foreign key to the Dimension table. The Dimension table Primary Key is called a Surrogate key and is usually an auto-generated identity column. There is no effort to reduce data redundancy for Dimensions, i.e. you could have Product Category values in the same table with Product Model values. It is not efficient to maintain the data that way which is why OLTP design would not do this. But it is fine for a Star Schema, i.e. Dimensional Modeled design. Make sense?
Hi Bryan, This video is one of the best videos have watched and has every information required, to the point and well described...Hats off...Thank You. Highly recommend must watch for everyone who is working in DWH domain.
This is a must watch video for anyone having a hard time understanding Dimensional Modeling. Wish you could do a full series on Database systems and Warehousing.
Thank you so much!! Struggling through my data modeling/structuring course and your video was incredibly helpful in understanding dimensional modeling.
Excellent Video and loved it. Being an OLTP modeler, this gave me a very nice idea about dimension modeling. The only one thing which I wanted to bring it to your attention that, When you talked about 7W's of dimensional modeling, it only had 6 Ws and I was searching for the 7th one :)
Hi Sathya, Great observation! Actually, the 'How Many?' is one of the 7 but yeah, it is an H not a W. The book 'Agile Data Warehouse Design' presents it that way so I did to. It should be the 6 W's and 1 H of Data Warehouse Design but that is harder to say. :-) Actually, it helps me remember the How because it is different. Thanks!
This is a phenomenal presentation on dimensional modeling but i don’t understand the implementation of surrogate keys. I feel like I’m missing an obvious and low compute way of maintaining all the surrogate keys on your facts. No videos I’ve seen discuss this. But it seems every time a new fact record is generated you have to join every related dim on the foreign natural key and update the fact with the dim’s related surrogate key. So that you can later perform joins using the surrogate key. Am I thinking through this correctly?
Yeah. It does add complexity but you have the gist of it. Surrogate keys are particuarly important when you want SCD history since natural keys would result in duplicate keys on the dim tables. Also, they isolate changes from the backend systems to the dw. But they do add some extra work.
I listed out some topics (after my failed interview )to gain clearcut understanding and this video answered all my questions in detail,Sir big thumbs to you ,if possible please do a video on interview questions and how to answer them (Dw,DBMS concepts).
Hi Sai, That's a great idea for a video. Do you have any specific questions in mind? BTW: Interviews always have questions to stump you. But I'd like to help with a video that helps. Thanks! Bryan.
Thanks, This is very helpful. When you discussed the scenarios regarding a person getting married, it triggered a bunch of other questions I can ask for my project. I enjoyed the descriptions of what is fact vs what is dimension.
I have a question about the Time_Dimension table at 19:02. This captures Day,Month, Quarter, Year, Fiscal_year, Day_of_week. I understand how this would beneficial. Consider the following scenario - where the requirement now is to capture the time (hour+minute) of the Sales transaction? - How would this dimension table change?
I learned so much from this one video. Thank you! Also, 23:51 "snowflake is something you may get questioned in an interview, so wake up" I feel personally attacked lol. I wasn't sleeping (your video was long but not boring at all) but I DID get asked about this in a recent interview and I totally flopped. At least now I can answer that question :)
@BryanCafferky Thank you for taking the time to put this together. This is a great foundational video for anyone getting started and presents the subject in a very relatable way.
Good day Sir Bryan. I have a question regarding data changes. I use to compare incoming/ingested data to staging db records via 'dateUpdated' column (if the condition is not equal and incoming data dateUpdated is greater than staging). If the ff. condition has been satisfied, incoming records will be processed. My question is, what if one of the columns has been updated via script without including the 'dateUpdated' column, should i considered this scenario and continue to be processed (need to compare the records column by column)? What was the best practice to consider a data changes if it is done normal and intentionally (data manipulation via script)? Looking forward to your advice. Thank you.
Hi, Well, the best practice is not to update data outside the application, i.e. should not update DW tables via scripts other than your ETL. If this is unavoidable, you could add a table trigger to automatically update the dateUpdated column whenever a change to a table row occurs. Will that work in your scenario?
@@BryanCafferky thanks for your response. Same thing for me that changes should be done normally specifically if there are applications that maintains the original data. I brought it out as there are testers who manage to modify the records via scripts then run the ETL process. So it was tagged as a failed process.
I love this video, the facts, and color commentary you present with it. But what is the relevance of star schema (vs wide flat) for consumption by analysis tools such as Tableau which implicitly and automatically creates a high performing dimensional model from a flat view without a human needing to do any dimensional data development? I would love to hear your thoughts on this.
Thanks. Sure. First, a Star Schema is arranged for efficient and easy data analysis for Tableau, Power BI, or any other tool. There is more to the world than Tableau. The use of surrogate keys supports inclusion of dimension history, i.e. slowly changing dimensions. Thinking things out like conformed dimensions builds in extensibility. The dimensional model creates the foundation of your data which will sustain your organization long term through many changes in the reporting tools that consume the data. The second reason to use a Star Schema is performance when used by reporting tools. You need to load the data and that process can be slow if a lot of joins are needed to get the data together. The Star Schema connects the fact to dimension tabless in one join. No need to join dimension to dimension tables. See medium.com/data-ops/why-do-i-need-a-star-schema-338c1b029430 Many organizations don't want to spend the effort to build a star schema but then run into problems and build hacks to solve them. It's pay now or pay later.
Hi Bryan, Thank you so much for helping me understand dimensional modeling. I had a question regarding fact tables. Is it an acceptable practice to create a separate fact table that reports on a different grain? So say for example we have an orders fact table that consist of billions or rows. There are requests to create reports on the lowest grain possible, so in this case it would be the order_id but there other reports where the business wants to do their analysis at a higher grain, so say for example total number of order by day and country in the past 3 years. Due to the number of records, the query to preform this takes a a lot of time and eats into costs. If so, would it make sense to script the ETL to create this other fact table by utilizing the original fact table as the base table? I hope my question made sense. Thanks!
Yes. Aggregated fact tables are a way to do what you are saying. See www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/aggregate-fact-table-cube/#:~:text=Aggregate%20fact%20tables%20are%20simple,aggregate%20level%20at%20query%20time. I there is a need for the more detailed grain too, you can have that as a fact table too.
Thanks Brian for awesome presentation, can you please also cover topics like how to handle late arriving dimensions and is it relevant in Midern Data Warehouse?
Hi Bryan, great tutorial! I've been dimensional modeling for quite some time but it's always good to review the basics again. I was wondering if you might have some thoughts on a fact at the atomic grain for a order header/detail fact where your order lines have different dimension attributes. For example, imagine a fact for vehicle maintenance where 2 or 3 line numbers might be for labor with 3 different employees, and 5 line numbers for different parts. This is all easy to rollup, but getting it to the atomic grain leaves a lot of zero keys for the dimensions that aren't applicable. Would be curious to see your thoughts on this. Thanks!
My thought is to consolidate the data (employye, parts, etc.) into a single dimension table and add a column, charge type with values like 'Emp', 'Part'. Assign a surrogate key and point the fact table to this. The line detail points to the related charges which is now in one table.
@@BryanCafferky Hi Bryan, thanks for the reply. Interesting. So if you had 1 line item with 1 employee and 3 parts, then another line item with the same employee and 2 parts, I'm guessing that you'd have to calculate ratios of hours/costs for the labor to each row that's got a part...otherwise, you'd have repeating values (i.e., double-counting) of the labor. In my original thought, it avoids that by having a row for each record type, but I get why that's a problem. Another approach is to just create a fact for each type of line item detail, but then a work order (similar to a sales order) would be broken up among multiple line item detail facts and make drill down impossible.
@@craigdubin6325 If feasible, creating the DW at the lowest grain you may ever need gives you a lot of future flexibility. Your second reply indicate each line item has both an employee(s) and part(s) so you could just aggregate for reporting. Maybe aggregate to the employee/part level for drill down or whatever the business needs.
@@BryanCafferky Yup, I always go down to the atomic grain if possible. And I think I probably made it more confusing. A work order is comprised of basically 2 different types of records, 1) labor, 2) parts. So imagine a quote or invoice from a mechanic. You might have the summary costs at the top with total labor and total parts. But there will be n number of lines with only labor charges, and n number of lines with only parts. They're never on the same line. So quantity on a labor line might be hours for the employee on that labor line, unit cost would be dollars per hour, and total cost would be hours x unit cost. On the parts lines, the quantity would be the number of the specific part number, the unit cost would be the cost of the particular part, and the total cost would be quantity x unit cost. My concern with this method was that the dim_employee_key will always be 0 (or possibly could make it -1 for N/A) on the parts line...while the dim_part_key will always be 0 (or -1 as above) for the part number on the labor lines. Hopefully that's making more sense!
I've seen hours of videos on data warehousing. These are the most valuable 56 minutes.
Well said
53 minutes*
100% agreed
Could not agree more, this is the best video on data warehousing I have watched so far!!
Agreed! Conversational tone, appropriate visuals, and well-paced. I'm in a $4k EDW graduate school class that teaches from bulleted PowerPoint slides. Doesn't compare 🙆😟
I watched this video to prep for an interview. Not only did I get the job but here I am a year later rewatching this with more confidence and still learning. Thank you, sir
That's great to hear. Glad it helped and thanks for letting me know.
The most understandable, detailed & overall introduction to Dimensional Modeling, with clear key words explanation and logically sequential arrangement on the slide content. Thanks Bryan!
This is by far the best guide to dimensional modeling I've found on the internet!
This is the most informative session I have ever seen! 54 minutes of pure knowledge. If only everything I had an interest in learning was taught by Bryan!
Thanks. What are your tech learning interests?
Using for an interview prep. Great refresher. Perfect length and depth of content...
My current favorite channel on UA-cam being a senior data engineer. I would like to have a video created by you telling more about what cubes are, OLAP, ROLAP, etc. This kind of nomenclature is getting more and more rare and people joining data engineering in this modern data stack area should understand them well so that they can understand how we got here and also talk to people familiar with this data engineering nomenclature.
ROLAP and MOLAP are only used in the legacy SSAS Multidimensional Models. They were cool but the Tabular model does not use them b/c everything is in memory. Thanks for the suggestion. Are you using SSAS Multidimensional Models?
I come from an OLTP software data architect side of the house and only had to hand off streaming/replication to an ODS. I just took a role that will require dimensional modeling. So glad I found your video. This clicks.
Glad it is helpful.
By far the MOST lucid and practical no-nonsense explanation of key terms. Loved it!
Glad it is helpful! Thank you for the kind words.
Hi Bryan, many thanks to provide this tutorial. The most solid foundation tutorial on data warehouse & dimensional modelling I have even seen. For those who want to build their career path as a data analyst, data architect etc. This is the best start point. I really regret that I did not see this video 2 years ago when I start off my career.
You're very welcome!
Perfect combination of experience and theoretical handle on the subject. Thank you for the time it took to record it.
You're welcome. Glad it was helpful.
These are the best 53 minutes I've spent recently. Crisp and Clear. I feel much more confident and I request more videos on Dimensional Modeling. Thank you for your effort.
Thanks. I did do a video on Slowly Changing Dimensions which is a big area for interview questions. What topics are you most interested in regarding dimensional modeling?
I'm grateful for the time and efforts you put in explaining one of the key concepts anyone deals with enterrpise data needs. Clear and comprehensive. Thanks a lot
This is premium content on the topic. Simple yet effective explanation shows your understanding of the subject. Thank you Bryan!
Glad it was helpful! Thanks
A very comprehensive and fully understandable video during my very first viewing itself. I felt like having read a high quality book on the topic within an hour. Bryan never wastes time on unnecessary talks and very effectively concentrates on the core through and through. Thank you!!
Thanks
@@BryanCafferky My statement was true. Not just a passing compliment. I got a full perspective on what is dimension modelling and how it is different from Data warehousing.
Leave alone warehousing, it actually helped me in designing various reports using data from Database tables. I was able to see the common threads that run through them all.
I have been working in data warehousing for the last 5 years and this video gave me the answer I have looking for so long- why is there so much normalisation in data warehouses we see today? Nobody ever gave me a satisfactory answer but you did sir. Big thanks !! This video deserves to be on the billboard.
Great! Glad it helped!
Good sir, I typically never like or comment UA-cam videos. This was a must. You are masterful, and could quite possibly mint some people into data engineers off of this video. God bless you. I hope to become a teacher like you one day
Thank You
I dont usually spend time watching a 53 min learning video most of the time unless the course and the orator is really worth it. Trust me sir, you kept me glued every minute. God bless you Sir
!
Thank you!
Lovely way of teaching.
Looking up for more of your material about data warehouse on the web.
Great presentation!👍 Thank you Bryan ! 👏
YW. Glad it is helpful.
I'm about 4 or 5 years late to this party. You've still done a great job in this video compared to many other sources. Thanks, and well done!
YW and thanks for watching.
Amazing video. Well explained. Uploaded 3 years ago, still valid for educational and professional training for 2020. Thanks Bryan!
Thank you for taking the time to put this together. I found it very educational and helpful!
You're welcome and thanks for the kind words.
I wish I had watched it earlier when i was struggling to understand Datawarehousing and Dimension Modeling. Very informative video. Every second was worth spending.
Thanks. Yeah. I struggled understanding this too for some time. Glad it helped.
A very useful introduction to data warehousing and the common terminologies, presented in an interesting and easy to understand manner. IT was a very quick 53 minutes - I only meant to get an idea of the content and then watch it later - but got completely absorbed.
Thanks. Glad you liked it.
Many Thanks !! This is the best video I have even seen on this subject. Simple explanation of all complicated areas.
This is an excellent primer. The best one I've seen. I'll recommend it if anyone asks for advice on a good starting point. One point however: USER STORY. In 99% of Agile Software Development, the example from that book is NOT a User Story. A user story actually describes something that needs to be done and why. e.g. As a business manager I need to be able to drill down into the types of sales act my book stores to see the types of books being purchased, how they are being purchased, how they are being paid for, down to the level of the book title, so that I can understand what is being sold and better stock my stores. What the book you mention is actually doing is closer to a Use Case. There is semantic impedance going on. But it's just data. :) Anyway, I really do like this video and think the approach of using use cases is very good. I think though that there will be some need to translate between different groups kind of like what is a location to different systems. :)
Thanks!
this is great Dimensional Modeling tutorial ever.. !! Thank you so much, sir..
The best video I have ever watched till date. Very well explained and neatly presented. This video helped a lot!!!
Glad it was helpful!
The concept has been explained thoroughly with real time examples. Thanks Bryan.
YW. Thanks for watching.
Great explanation of dimensional modelling. Highly insightful.
An excellent video on data warehousing, easily the best I've seen.
Quality. Seeing your videos, I realise that this is the subsection of my role that I enjoy the most and need to learn more about. Great video.
Thank you for this excellent presentation on Dimensional Modeling. I'm a student and it was so easy to follow because you made it interesting and you provided some excellent examples to support your slides.
Thanks. Glad it helped. Hope you find my other videos helpful too.
Excellent video. I do have a question at 15:57 . I was unable to understand how the design of the SalesFact table differs from what the OLTP table for Sales would have been. My OLTP Sales table would have been almost identical to SalesFact shown in this presentation, with the exception of a SalesLineItem.
Yeah. I had a hard time seeing the difference in the beginning too. Good question. In the OLTP table at about 23:09, notice the sample table has descriptive columns like CustomerFirstName, CustomerLastName, Product, SalesDate, Order Number, and OrderLineNumber. These are Dimension attributes. It also has Quantity and Price which are Facts or Measures. In a Star Schema, these cannot be in the same table. The Dimension attributes are stored in a separate table that has its own Primary Key. The Facts are stored in a Fact table with a foreign key to the Dimension table. The Dimension table Primary Key is called a Surrogate key and is usually an auto-generated identity column. There is no effort to reduce data redundancy for Dimensions, i.e. you could have Product Category values in the same table with Product Model values. It is not efficient to maintain the data that way which is why OLTP design would not do this. But it is fine for a Star Schema, i.e. Dimensional Modeled design. Make sense?
This is what I subscribe internet for, it's beautiful piece of 53 mins straight
Thanks. Hope you check out my other videos like the ones on Databricks and Python too.
The ideal video for a dba trying to reach the dw world (using mssql server and also Azure) . Thanks a lot for the video !
Glad it helped!
Amazing simple and focused explanations.
Thanks Bryan
You're Welcome! Glad it was helpful.
Hi Bryan, This video is one of the best videos have watched and has every information required, to the point and well described...Hats off...Thank You.
Highly recommend must watch for everyone who is working in DWH domain.
Thanks! Appreciate the kind words.
This was an excellent primer. I was alert and it really grounded me on a number of key points. Thank you so much for this contribution.
This is a must watch video for anyone having a hard time understanding Dimensional Modeling. Wish you could do a full series on Database systems and Warehousing.
Thanks. What specific topics are you thinking of?
This is the first time I loved blue screen on my computer. :) Very good advice.
This is so helpful. I'm modeling a data warehouse for my Org and only have experience with OLTP. This has saved so much headache
Great! Yeah. Dimensional Modeling is a very different mindset. Good luck!
AMAZING lecture, Bryan - thanks so much! Exactly what I was looking for and an extremely well articulated 56 minutes.
Very well presented! Clear and concise with real-world use cases!
Great tutorial sir , Thank you so much for such relevent information .
You are most welcome
Great Job Bryan, great content .Thanks for sharing
Thanks!
Thank you Bryan for this video. You did an excellent job of explaining the concepts data warehouse design.
Thanks!
For someone just getting started, this was amazing thank you so much!
YW!
I am listening you for the first time and I found out that you are a great teacher.
Thanks!
@@BryanCafferky After an year I am listening again to refresh concepts.
Great content Bryan. Great level of detail and insights (from actual experience). Please keep it up !
among several videos i watched on dimensional modeling, this is the one with more insight and experience sharing!
Thank you so much!! Struggling through my data modeling/structuring course and your video was incredibly helpful in understanding dimensional modeling.
Wow! Really glad to hear that. Thanks for letting me know.
You are amazing!! Thank you so much for this. Best summary you can get and which can make you talk like a pro..
Thanks.
It's a great content and presentation. Thank you very much for this wonderful work!
YW!
Amazing video, very detail level. Thanks so much.
Very well done and explained. Thank you for sharing your knowledge.
Thanks for watching.
Thanks Bryan, great video.
Excellent Video and loved it. Being an OLTP modeler, this gave me a very nice idea about dimension modeling. The only one thing which I wanted to bring it to your attention that, When you talked about 7W's of dimensional modeling, it only had 6 Ws and I was searching for the 7th one :)
Hi Sathya, Great observation! Actually, the 'How Many?' is one of the 7 but yeah, it is an H not a W. The book 'Agile Data Warehouse Design' presents it that way so I did to. It should be the 6 W's and 1 H of Data Warehouse Design but that is harder to say. :-) Actually, it helps me remember the How because it is different. Thanks!
@@BryanCafferky
hoW?
What?
When?
Where?
Who?
hoW many?
Why?
makes 7.
Thanks for the video. Very informative!!!👍
A nice and concise presentation of dimensional modeling for data warehouse.
One of the best videos I’ve seen on DM!!
Excellent explanation Bryan!!
This is a phenomenal presentation on dimensional modeling but i don’t understand the implementation of surrogate keys. I feel like I’m missing an obvious and low compute way of maintaining all the surrogate keys on your facts. No videos I’ve seen discuss this. But it seems every time a new fact record is generated you have to join every related dim on the foreign natural key and update the fact with the dim’s related surrogate key. So that you can later perform joins using the surrogate key. Am I thinking through this correctly?
Yeah. It does add complexity but you have the gist of it. Surrogate keys are particuarly important when you want SCD history since natural keys would result in duplicate keys on the dim tables. Also, they isolate changes from the backend systems to the dw. But they do add some extra work.
Thank you for confirming my understanding and great presentation!
Great video Bryan.
Great lecture, definitely worth the watch!
Excellent, specially focus on process model. Thank you
YW!
Amazing lecture. Thanks Sir. 🙏🙏🙏
I listed out some topics (after my failed interview )to gain clearcut understanding and this video answered all my questions in detail,Sir big thumbs to you ,if possible please do a video on interview questions and how to answer them (Dw,DBMS concepts).
Hi Sai, That's a great idea for a video. Do you have any specific questions in mind? BTW: Interviews always have questions to stump you. But I'd like to help with a video that helps.
Thanks! Bryan.
Thanks, This is very helpful. When you discussed the scenarios regarding a person getting married, it triggered a bunch of other questions I can ask for my project. I enjoyed the descriptions of what is fact vs what is dimension.
I have a question about the Time_Dimension table at 19:02. This captures Day,Month, Quarter, Year, Fiscal_year, Day_of_week. I understand how this would beneficial. Consider the following scenario - where the requirement now is to capture the time (hour+minute) of the Sales transaction? - How would this dimension table change?
Hi Saurabh,
Did you see my answer to your last question? Is that answered now?
Thanks,
Bryan
Thank you, loved the content and how well it was structured and presented. Looking forward to your other tutorials!
Absolutely amazing video, thanks Bryan!
YW. Glad it was helpful.
I learned so much from this one video. Thank you! Also, 23:51 "snowflake is something you may get questioned in an interview, so wake up" I feel personally attacked lol. I wasn't sleeping (your video was long but not boring at all) but I DID get asked about this in a recent interview and I totally flopped. At least now I can answer that question :)
Yes. I find I always remember answers to interview questions I missed.
Thank you so much for sharing your knowledge and your skills to teach them in a so clean, so comprehensible.
Thanks. So glad it is helpful!
Great presenation. Very clear and to the point!
Excellent video! Thanks for sharing
YW
Great video, great explanation!
Amazing explanation
Thanks!
Absolutely amazing explanations.
@Bryan Cafferky - thank you for creating this great video. Its really a marvel , simple, realistic approach to understand.
This video was perfect to answer my questions! Thank you!
Glad to hear that!
Best video on DW design ever!
outstanding overview, well done
Excellent job! Thank you for this wonderful video!
Wow. well detail and explainable. Thanks Bryan
YW!
@BryanCafferky Thank you for taking the time to put this together. This is a great foundational video for anyone getting started and presents the subject in a very relatable way.
Thanks.
Hey Bryan! This is amazing, thank you for the great video. Could we get a download link for the slide deck?
Good day Sir Bryan.
I have a question regarding data changes.
I use to compare incoming/ingested data to staging db records via 'dateUpdated' column (if the condition is not equal
and incoming data dateUpdated is greater than staging). If the ff. condition has been satisfied, incoming records will
be processed. My question is, what if one of the columns has been updated via script without including
the 'dateUpdated' column, should i considered this scenario and continue to be processed (need to compare the records column by column)?
What was the best practice to consider a data changes if it is done normal and intentionally (data manipulation via script)?
Looking forward to your advice. Thank you.
Hi, Well, the best practice is not to update data outside the application, i.e. should not update DW tables via scripts other than your ETL. If this is unavoidable, you could add a table trigger to automatically update the dateUpdated column whenever a change to a table row occurs. Will that work in your scenario?
@@BryanCafferky thanks for your response. Same thing for me that changes should be done normally specifically if there are applications that maintains the original data. I brought it out as there are testers who manage to modify the records via scripts then run the ETL process. So it was tagged as a failed process.
Thank you so much for this! Very organized lecture, and I love how you included the time stamps.
Thank you for this worderful session.Very clear and informative.Really enjoyed it.
Thanks!
I love this video, the facts, and color commentary you present with it. But what is the relevance of star schema (vs wide flat) for consumption by analysis tools such as Tableau which implicitly and automatically creates a high performing dimensional model from a flat view without a human needing to do any dimensional data development? I would love to hear your thoughts on this.
Thanks. Sure.
First, a Star Schema is arranged for efficient and easy data analysis for Tableau, Power BI, or any other tool. There is more to the world than Tableau. The use of surrogate keys supports inclusion of dimension history, i.e. slowly changing dimensions. Thinking things out like conformed dimensions builds in extensibility. The dimensional model creates the foundation of your data which will sustain your organization long term through many changes in the reporting tools that consume the data.
The second reason to use a Star Schema is performance when used by reporting tools. You need to load the data and that process can be slow if a lot of joins are needed to get the data together. The Star Schema connects the fact to dimension tabless in one join. No need to join dimension to dimension tables. See medium.com/data-ops/why-do-i-need-a-star-schema-338c1b029430
Many organizations don't want to spend the effort to build a star schema but then run into problems and build hacks to solve them. It's pay now or pay later.
best video on data warehousing on youtube..
Thank you sir, you saved my life.
Hi Bryan,
Thank you so much for helping me understand dimensional modeling. I had a question regarding fact tables. Is it an acceptable practice to create a separate fact table that reports on a different grain?
So say for example we have an orders fact table that consist of billions or rows. There are requests to create reports on the lowest grain possible, so in this case it would be the order_id but there other reports where the business wants to do their analysis at a higher grain, so say for example total number of order by day and country in the past 3 years. Due to the number of records, the query to preform this takes a a lot of time and eats into costs.
If so, would it make sense to script the ETL to create this other fact table by utilizing the original fact table as the base table?
I hope my question made sense.
Thanks!
Yes. Aggregated fact tables are a way to do what you are saying. See www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/aggregate-fact-table-cube/#:~:text=Aggregate%20fact%20tables%20are%20simple,aggregate%20level%20at%20query%20time.
I there is a need for the more detailed grain too, you can have that as a fact table too.
Thanks Brian for awesome presentation, can you please also cover topics like how to handle late arriving dimensions and is it relevant in Midern Data Warehouse?
You're a savior, thank you Bryan.
YW. Glad it helps.
Hi Bryan, great tutorial! I've been dimensional modeling for quite some time but it's always good to review the basics again. I was wondering if you might have some thoughts on a fact at the atomic grain for a order header/detail fact where your order lines have different dimension attributes. For example, imagine a fact for vehicle maintenance where 2 or 3 line numbers might be for labor with 3 different employees, and 5 line numbers for different parts. This is all easy to rollup, but getting it to the atomic grain leaves a lot of zero keys for the dimensions that aren't applicable. Would be curious to see your thoughts on this. Thanks!
My thought is to consolidate the data (employye, parts, etc.) into a single dimension table and add a column, charge type with values like 'Emp', 'Part'. Assign a surrogate key and point the fact table to this. The line detail points to the related charges which is now in one table.
@@BryanCafferky Hi Bryan, thanks for the reply. Interesting. So if you had 1 line item with 1 employee and 3 parts, then another line item with the same employee and 2 parts, I'm guessing that you'd have to calculate ratios of hours/costs for the labor to each row that's got a part...otherwise, you'd have repeating values (i.e., double-counting) of the labor. In my original thought, it avoids that by having a row for each record type, but I get why that's a problem. Another approach is to just create a fact for each type of line item detail, but then a work order (similar to a sales order) would be broken up among multiple line item detail facts and make drill down impossible.
@@craigdubin6325 If feasible, creating the DW at the lowest grain you may ever need gives you a lot of future flexibility. Your second reply indicate each line item has both an employee(s) and part(s) so you could just aggregate for reporting. Maybe aggregate to the employee/part level for drill down or whatever the business needs.
@@BryanCafferky Yup, I always go down to the atomic grain if possible. And I think I probably made it more confusing. A work order is comprised of basically 2 different types of records, 1) labor, 2) parts. So imagine a quote or invoice from a mechanic. You might have the summary costs at the top with total labor and total parts. But there will be n number of lines with only labor charges, and n number of lines with only parts. They're never on the same line. So quantity on a labor line might be hours for the employee on that labor line, unit cost would be dollars per hour, and total cost would be hours x unit cost. On the parts lines, the quantity would be the number of the specific part number, the unit cost would be the cost of the particular part, and the total cost would be quantity x unit cost. My concern with this method was that the dim_employee_key will always be 0 (or possibly could make it -1 for N/A) on the parts line...while the dim_part_key will always be 0 (or -1 as above) for the part number on the labor lines. Hopefully that's making more sense!
I'm getting into data engineering and I really enjoyed this content.
Amazing. The only thing I don’t understand is to „declare the grain“ 😪 any tip on what to research or how to think about it?
Great video, and very helpful. I've been struggling with the dryness of the Kimball book.
Thanks. Yeah. The book can be a bit dry.