Data Modeling in the Modern Data Stack

Поділитися
Вставка
  • Опубліковано 22 чер 2024
  • Data Modeling is arguably the most impactful decision for a data team.
    It determines your architecture and the path that the whole team will follow.
    While this is not a new topic, the new tools and tech over the last decade has caused many to reconsider what's best in a modern landscape.
    So in today's video I want to break down this topic.
    We'll discuss:
    1. Why is Data Modeling (still) important?
    2. What are the common approaches?
    3. What things should you consider?
    ►► The Starter Guide for Modern Data → bit.ly/starter-mds
    Simplify “modern” architectures + better understand common tools & components
    Timestamps:
    0:00 - Intro
    0:44 - Why is Data Modeling Important?
    2:33 - Common Approaches
    7:39 - Things to Consider
    Title & Tags:
    Data Modeling in the Modern Data Stack
    #kahandatasolutions #dataengineering #datamodeling

КОМЕНТАРІ • 49

  • @KahanDataSolutions
    @KahanDataSolutions  Рік тому +11

    ►► The Starter Guide for Modern Data → bit.ly/starter-mds
    Simplify “modern” architectures + better understand common tools & components

  • @nlishivha1005
    @nlishivha1005 Місяць тому

    just started out my career watch the videos 4 months back gave me a high level overview ------------------- came back after a session with my seniors I see the granula details thanks

  • @johncosgrove7727
    @johncosgrove7727 Рік тому +8

    Hey Mike, this video was straight 🔥🔥🔥 - This is such an important topic for all businesses taking the journey right now and it really blows my mind how much understanding of modelling has been lost. Your overview of the pros and cons of each was a well balanced and to-the-point summary - I especially love that you correctly call out the source data and scale risks of OBT. I think too many peeps in the MDS are working for SaaS companies where they don't realise that 1) source systems are way less static in enterprise and 2) most enterprises outside of SaaS aren't just okay with massive pipeline code spaghettis. Great video!

  • @nlopedebarrios
    @nlopedebarrios Рік тому +2

    Great summary, thank you! Personally, I'm more comfortable with the Kimball approach, but the "modern data stack" part in the title drove catched my attention, and I think it's important to know what trends align better withe the modern tech. Also, I couldn't agree more than design is a key part of the process, and if done correctly is going to prevent a lot of headaches. It would be nice to get deeper into the hybrid model you presented.

  • @Joeymbryan
    @Joeymbryan 9 місяців тому

    Wow this is spot on. I've been using the modern data stack for the last ~8 years or so. Starting out, I was definitely focusing on a star schema/denormalized approach with the MPP databases but as I started to learn they do best with fewer joins and can handle wide tables I strive for the OBT approach. In practice, the hybrid is what typically happens, there are so many dimensional tables which very in need from team to team so especially in a larger enterprise, the hybrid is almost a guaranteed.
    Great video! Love the work you did here.

    • @KahanDataSolutions
      @KahanDataSolutions  9 місяців тому

      Appreciate the feedback! What you describe is really similar to my journey as well.

  • @theconfusedchannel6365
    @theconfusedchannel6365 Рік тому +1

    Good one. I think taking an actual data and flowing through model would be great.

  • @firefoxmetzger9063
    @firefoxmetzger9063 10 місяців тому +3

    Just because storage is cheap doesn't mean it's unimportant. I/O is the often slowest part of OLAP and ETL queries and directly dictates compute cost. The two current OLAP pricing models (pay by minute or pay by bytes "scanned") both depend on how much data you load per query. The former because you pay for an idle cluster that is waiting for data, the latter because it literally bills you for how much data you load. Data modeling helps you get the compression and partitioning you need to control and minimize that cost.

  • @ArmstrongNigere
    @ArmstrongNigere Рік тому +2

    Awesome video , really enjoyed very clear and straight to the point

  • @zulkhaireesulaiman8575
    @zulkhaireesulaiman8575 Рік тому +2

    thank you, incredibly helpful.

  • @shreshti82
    @shreshti82 Рік тому +1

    Addresses the challenges and thoughts to be taken when going into the cloud. But not enough details on modelling itself from the Data.

  • @wingnut29
    @wingnut29 Рік тому +1

    Great overview! Thanks for educating us on the newer approaches. We currently are using Denormalized Modeling. This is due to the fact that all our current needs are around our ERP. The Marketing Dept wants to start collecting more website and estore analytics, which I believe will lead us to a Hybrid model.

    • @KahanDataSolutions
      @KahanDataSolutions  Рік тому

      Glad it was helpful! I still think denormalized is a great strategy.

  • @datasleek7950
    @datasleek7950 Рік тому +2

    Great Job. Great Presentation.

  • @mrcool4uall
    @mrcool4uall 8 місяців тому

    Good one Michael. Well summarized and to the point. Even I think Hybrid approach is the best for the MDS.

  • @nicky_rads
    @nicky_rads Рік тому +11

    Solid overview! I’ve mainly used dimensional modeling with fact/dim tables.
    A good data model goes a long way within the analytics pipeline, important stuff. Thanks

    • @KahanDataSolutions
      @KahanDataSolutions  Рік тому +1

      Definitely. Appreciate you sharing your experience

    • @datasleek7950
      @datasleek7950 Рік тому +2

      I second that. As a original DBA, proper DB modeling (either in OLTP or OLAP) will provide peace of mind in the future.

    • @summer_xo
      @summer_xo Рік тому +1

      Get the data model right and the rest falls into place IMO 👍

  • @davemeech
    @davemeech 7 місяців тому +1

    Oh man, it's so nice to get something substantial on UA-cam for data modeling! Awesome awesome stuff.
    What do you think of unistore? I have yet to dive in, and I'm also very fresh into the industry, but the prospect of combining oltp and olap capabilities is certainly compelling!

  • @johnytheripper
    @johnytheripper Рік тому +2

    Thanks for addressing this often overlooked but important topic!
    I'm looking for some good sources on dimensional data modeling. Of course I have the Kimball books, but something more practical (books, courses...) and hands-on, perhaps with exercises on various business scenarios / sources would be great. Any ideas?

    • @deltagamma1442
      @deltagamma1442 Рік тому

      Any luck?

    • @johnytheripper
      @johnytheripper Рік тому

      @@deltagamma1442 not really :/. Took some inspiration from Gitlab data handbook as I'm mostly looking for SaaS use cases

  • @Rex_793
    @Rex_793 Рік тому +2

    great stuff - when are you going on Joe Reis and Matt Housley's podcast?

  • @medhatatef7737
    @medhatatef7737 Рік тому

    merci beaucoup a toi :))

  • @bradleymiller437
    @bradleymiller437 Рік тому +1

    I literally hit this video so fast sincerely thinking I was going to learn "How to date a model." My brain went faster than my eyes and lost the game.

  • @nlopedebarrios
    @nlopedebarrios 10 місяців тому

    Hi Michael, interesting approach the hybrid model. What could be used to transition from the star schema to the OBT data marts? for example, views, materialized views? or are they separate schemas? Also, in what scenario this would make sense?

  • @severn_creek2374
    @severn_creek2374 Рік тому +76

    Click bait. He did not say one word about how to date a model.

  • @amitsaha7756
    @amitsaha7756 Рік тому +1

    In some cases, we observe another pattern whether industry standard data model is used after raw layer.

  • @user-dx2dg4bd7q
    @user-dx2dg4bd7q Рік тому

    Could you please explain the differences between different data models(Inmon,Kimball,3NF,Dimension Modelling,Data Vault).

  • @sunil-de
    @sunil-de 7 місяців тому

    hey, videos was top tier, can you suggest a any good course to get the in depth understandin of the DM

  • @Lima3578user
    @Lima3578user 8 місяців тому

    great video. thank you. can you upload a vlog on Speech to Text transcripts using AI

  • @rpelegrini
    @rpelegrini Рік тому +2

    hello, really nice video about data modeling. I was looking for this.
    It's very difficult to define a right approach for data modeling, each case is a case, in my experience I did a lot of star schema during my carrer, but in nows day I see a trend to one big table in modern data warehouse like bigquery, redshift or synapse.
    Do you have the same impression?

    • @KahanDataSolutions
      @KahanDataSolutions  Рік тому +1

      Yep I'm seeing the same thing. Truly case by case. I think the concept of star schema/data models are still very applicable today mainly b/c of the organization and structure it brings rather than for any performance gains.

  • @jimgillespie3540
    @jimgillespie3540 Рік тому +1

    *slow clap* thank you, fantastic.

  • @renvils
    @renvils 3 місяці тому

    hey ur explanation really calm and clear ! did you have any udemy course?

  • @theukulelegod
    @theukulelegod 3 місяці тому

    What are the downsides of doing all three in one? Pull all source systems raw data (inmon) then modeling fact and dim tables (kimball) then making data marts? If storage is getting cheaper wouldn’t this be the best way?

  • @S_B_S1
    @S_B_S1 2 місяці тому

    How do Data Products in their various guises fit with these data modelling concepts.

  • @okj4521
    @okj4521 9 місяців тому +1

    Next: How to date a model!

  • @user-wf7ni3ml6u
    @user-wf7ni3ml6u 11 місяців тому

    How to create a LDm in Magic draw

  • @largpack
    @largpack 4 місяці тому +1

    just theoretical bla bla in my opinion.. great for COO's to talk about stuff they have no idea about

    • @moonfire5069
      @moonfire5069 2 місяці тому

      Until you are being grilled about these in an interview with companies like airbnb, Netflix and Facebook and you look like a complete clueless idiot and you get shown the door