just started out my career watch the videos 4 months back gave me a high level overview ------------------- came back after a session with my seniors I see the granula details thanks
Hey Mike, this video was straight 🔥🔥🔥 - This is such an important topic for all businesses taking the journey right now and it really blows my mind how much understanding of modelling has been lost. Your overview of the pros and cons of each was a well balanced and to-the-point summary - I especially love that you correctly call out the source data and scale risks of OBT. I think too many peeps in the MDS are working for SaaS companies where they don't realise that 1) source systems are way less static in enterprise and 2) most enterprises outside of SaaS aren't just okay with massive pipeline code spaghettis. Great video!
Great summary, thank you! Personally, I'm more comfortable with the Kimball approach, but the "modern data stack" part in the title drove catched my attention, and I think it's important to know what trends align better withe the modern tech. Also, I couldn't agree more than design is a key part of the process, and if done correctly is going to prevent a lot of headaches. It would be nice to get deeper into the hybrid model you presented.
Just because storage is cheap doesn't mean it's unimportant. I/O is the often slowest part of OLAP and ETL queries and directly dictates compute cost. The two current OLAP pricing models (pay by minute or pay by bytes "scanned") both depend on how much data you load per query. The former because you pay for an idle cluster that is waiting for data, the latter because it literally bills you for how much data you load. Data modeling helps you get the compression and partitioning you need to control and minimize that cost.
Solid overview! I’ve mainly used dimensional modeling with fact/dim tables. A good data model goes a long way within the analytics pipeline, important stuff. Thanks
Wow this is spot on. I've been using the modern data stack for the last ~8 years or so. Starting out, I was definitely focusing on a star schema/denormalized approach with the MPP databases but as I started to learn they do best with fewer joins and can handle wide tables I strive for the OBT approach. In practice, the hybrid is what typically happens, there are so many dimensional tables which very in need from team to team so especially in a larger enterprise, the hybrid is almost a guaranteed. Great video! Love the work you did here.
Oh man, it's so nice to get something substantial on UA-cam for data modeling! Awesome awesome stuff. What do you think of unistore? I have yet to dive in, and I'm also very fresh into the industry, but the prospect of combining oltp and olap capabilities is certainly compelling!
Great overview! Thanks for educating us on the newer approaches. We currently are using Denormalized Modeling. This is due to the fact that all our current needs are around our ERP. The Marketing Dept wants to start collecting more website and estore analytics, which I believe will lead us to a Hybrid model.
Thanks for addressing this often overlooked but important topic! I'm looking for some good sources on dimensional data modeling. Of course I have the Kimball books, but something more practical (books, courses...) and hands-on, perhaps with exercises on various business scenarios / sources would be great. Any ideas?
What are the downsides of doing all three in one? Pull all source systems raw data (inmon) then modeling fact and dim tables (kimball) then making data marts? If storage is getting cheaper wouldn’t this be the best way?
Hi Michael, interesting approach the hybrid model. What could be used to transition from the star schema to the OBT data marts? for example, views, materialized views? or are they separate schemas? Also, in what scenario this would make sense?
hello, really nice video about data modeling. I was looking for this. It's very difficult to define a right approach for data modeling, each case is a case, in my experience I did a lot of star schema during my carrer, but in nows day I see a trend to one big table in modern data warehouse like bigquery, redshift or synapse. Do you have the same impression?
Yep I'm seeing the same thing. Truly case by case. I think the concept of star schema/data models are still very applicable today mainly b/c of the organization and structure it brings rather than for any performance gains.
Until you are being grilled about these in an interview with companies like airbnb, Netflix and Facebook and you look like a complete clueless idiot and you get shown the door
Deliver more impact w/ modern data tools, without getting overwhelmed
See how in The Starter Guide for Modern Data → www.kahandatasolutions.com/guide
just started out my career watch the videos 4 months back gave me a high level overview ------------------- came back after a session with my seniors I see the granula details thanks
Hey Mike, this video was straight 🔥🔥🔥 - This is such an important topic for all businesses taking the journey right now and it really blows my mind how much understanding of modelling has been lost. Your overview of the pros and cons of each was a well balanced and to-the-point summary - I especially love that you correctly call out the source data and scale risks of OBT. I think too many peeps in the MDS are working for SaaS companies where they don't realise that 1) source systems are way less static in enterprise and 2) most enterprises outside of SaaS aren't just okay with massive pipeline code spaghettis. Great video!
Really appreciate the comment John!
Addresses the challenges and thoughts to be taken when going into the cloud. But not enough details on modelling itself from the Data.
Great summary, thank you! Personally, I'm more comfortable with the Kimball approach, but the "modern data stack" part in the title drove catched my attention, and I think it's important to know what trends align better withe the modern tech. Also, I couldn't agree more than design is a key part of the process, and if done correctly is going to prevent a lot of headaches. It would be nice to get deeper into the hybrid model you presented.
Just because storage is cheap doesn't mean it's unimportant. I/O is the often slowest part of OLAP and ETL queries and directly dictates compute cost. The two current OLAP pricing models (pay by minute or pay by bytes "scanned") both depend on how much data you load per query. The former because you pay for an idle cluster that is waiting for data, the latter because it literally bills you for how much data you load. Data modeling helps you get the compression and partitioning you need to control and minimize that cost.
Solid overview! I’ve mainly used dimensional modeling with fact/dim tables.
A good data model goes a long way within the analytics pipeline, important stuff. Thanks
Definitely. Appreciate you sharing your experience
I second that. As a original DBA, proper DB modeling (either in OLTP or OLAP) will provide peace of mind in the future.
Get the data model right and the rest falls into place IMO 👍
Wow this is spot on. I've been using the modern data stack for the last ~8 years or so. Starting out, I was definitely focusing on a star schema/denormalized approach with the MPP databases but as I started to learn they do best with fewer joins and can handle wide tables I strive for the OBT approach. In practice, the hybrid is what typically happens, there are so many dimensional tables which very in need from team to team so especially in a larger enterprise, the hybrid is almost a guaranteed.
Great video! Love the work you did here.
Appreciate the feedback! What you describe is really similar to my journey as well.
Good one. I think taking an actual data and flowing through model would be great.
Oh man, it's so nice to get something substantial on UA-cam for data modeling! Awesome awesome stuff.
What do you think of unistore? I have yet to dive in, and I'm also very fresh into the industry, but the prospect of combining oltp and olap capabilities is certainly compelling!
Awesome video , really enjoyed very clear and straight to the point
Appreciate it! Thanks for watching
Great overview! Thanks for educating us on the newer approaches. We currently are using Denormalized Modeling. This is due to the fact that all our current needs are around our ERP. The Marketing Dept wants to start collecting more website and estore analytics, which I believe will lead us to a Hybrid model.
Glad it was helpful! I still think denormalized is a great strategy.
Good one Michael. Well summarized and to the point. Even I think Hybrid approach is the best for the MDS.
This was great
great stuff - when are you going on Joe Reis and Matt Housley's podcast?
Great Job. Great Presentation.
Thank you!
thank you, incredibly helpful.
Glad it helped!
Excellent Video !
In some cases, we observe another pattern whether industry standard data model is used after raw layer.
I literally hit this video so fast sincerely thinking I was going to learn "How to date a model." My brain went faster than my eyes and lost the game.
Thanks for addressing this often overlooked but important topic!
I'm looking for some good sources on dimensional data modeling. Of course I have the Kimball books, but something more practical (books, courses...) and hands-on, perhaps with exercises on various business scenarios / sources would be great. Any ideas?
Any luck?
@@deltagamma1442 not really :/. Took some inspiration from Gitlab data handbook as I'm mostly looking for SaaS use cases
great video. thank you. can you upload a vlog on Speech to Text transcripts using AI
hey, videos was top tier, can you suggest a any good course to get the in depth understandin of the DM
*slow clap* thank you, fantastic.
hey ur explanation really calm and clear ! did you have any udemy course?
Next: How to date a model!
What are the downsides of doing all three in one? Pull all source systems raw data (inmon) then modeling fact and dim tables (kimball) then making data marts? If storage is getting cheaper wouldn’t this be the best way?
can someone please tell me what the website/document that was shown at 13 sec point?
Could you please explain the differences between different data models(Inmon,Kimball,3NF,Dimension Modelling,Data Vault).
How do Data Products in their various guises fit with these data modelling concepts.
Hi Michael, interesting approach the hybrid model. What could be used to transition from the star schema to the OBT data marts? for example, views, materialized views? or are they separate schemas? Also, in what scenario this would make sense?
hello, really nice video about data modeling. I was looking for this.
It's very difficult to define a right approach for data modeling, each case is a case, in my experience I did a lot of star schema during my carrer, but in nows day I see a trend to one big table in modern data warehouse like bigquery, redshift or synapse.
Do you have the same impression?
Yep I'm seeing the same thing. Truly case by case. I think the concept of star schema/data models are still very applicable today mainly b/c of the organization and structure it brings rather than for any performance gains.
merci beaucoup a toi :))
de rien!
How to create a LDm in Magic draw
just theoretical bla bla in my opinion.. great for COO's to talk about stuff they have no idea about
Until you are being grilled about these in an interview with companies like airbnb, Netflix and Facebook and you look like a complete clueless idiot and you get shown the door