How to Create a Data Modeling Pipeline (3 Layer Approach)

Поділитися
Вставка
  • Опубліковано 1 сер 2024
  • ►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide
    A data warehouse acts as the main hub for most data teams, yet it often becomes a mess.
    While there are many different strategies to handle this, in this video I want to share the approach I follow.
    It's based around a simple 3-layered design to take raw source data into meaningful data marts ready for analytics.
    Enjoy!
    Timestamps:
    0:00 - Intro
    0:28 - High Level Overview
    1:04 - Staging Layer
    1:37 - Warehouse Layer
    1:54 - Marts Layer
    3:13 - Example: Staging
    6:05 - Example: Warehouse
    7:35 - Example: Marts
    8:23 - Importance of Modeling
    Title & Tags:
    [How to Create a Data Modeling Pipeline (3 Layer Approach)
    #kahandatasolutions #dataengineering #datamodeling

КОМЕНТАРІ • 26

  • @KahanDataSolutions
    @KahanDataSolutions  3 місяці тому +1

    ►► Establish a Well-Structured Data Warehouse for Your Small Team In 90 Days (Free Guide) → www.kahandatasolutions.com/guide

  • @runningwithstelvio
    @runningwithstelvio 3 місяці тому +4

    This is a three layer approach that I like. The weakness I believe is the integration and data loading performance of a star schema, which is more complex then a more normalized model like the Data Vault for example. With the flat wide DM tables in any case you ensure performance in extracting the data from the DWH, so why don't use a normalized model instead of a star schema, to ensure loading performance and agile integration of the model, and then use a star schema or flat tables in the the DMs to ensure performance in extracting the data? Thanks for your video, I like the format and the way you present these data model topics. And luckily we have still someone who strongly believe in data modelling!

    • @KahanDataSolutions
      @KahanDataSolutions  3 місяці тому +2

      Thanks for the comment! Star Schema/Dimensional Modeling is just one of many approaches you could take in that middle "warehouse" layer. It's personally what I'm most comfortable with but you could certainly use other techniques.

  • @AdamSmith-lg2vn
    @AdamSmith-lg2vn 3 місяці тому +2

    Really really clearly explained. I like the idea of using separating out a staging view for trivial rename/cleaning vs. business logic in the warehouse layer loads. I push for a very similar architecture but I'm going to integrate that detail going forward.

  • @minhtungo453
    @minhtungo453 3 місяці тому +1

    Straight to the point, with clear examples. It does help me alot. Thanks Kahan

  • @andreseduardoquinones4790
    @andreseduardoquinones4790 17 годин тому

    Thanks for your video! Nicely explained. A question, would this approach be the same as a medallion architecture? if not, what are the differences?

  • @ahmedsomir
    @ahmedsomir 2 місяці тому

    Simple as great thanks for this, but from my point of view, the staging layer is the raw data tables (incremental one-one from sources) and the enforced reading mechanize with VIEW is perfect with a retention for staging after 2-3 days.
    what do u think?

  • @vishal_uk
    @vishal_uk 3 місяці тому +1

    Thanks a lot! I watch all your vids and I've subscribed. Could you share the layers for DEV and CI, please?

    • @KahanDataSolutions
      @KahanDataSolutions  3 місяці тому +1

      Check out this video - ua-cam.com/video/a7H9Rkz2kqo/v-deo.htmlsi=e-qIKGRHuo34dvwu

    • @vishal_uk
      @vishal_uk 3 місяці тому

      @@KahanDataSolutions Thanks a bunch!

  • @sanjidnet
    @sanjidnet 3 місяці тому +3

    What’s the benefit of setting up staging as view instead of a regular table?

    • @KahanDataSolutions
      @KahanDataSolutions  3 місяці тому +3

      To avoid duplicate storage since typically there are no other joins involved. Think of it like a glorified select statement. However, if performance becomes an issue you may look to deploy as a table.

    • @andynelson2340
      @andynelson2340 2 місяці тому

      Views will always grab the latest data upstream, they'll always be fresh relative to the source data.

  • @reiko551
    @reiko551 2 місяці тому

    This is a idempotent pipeline. Can you do incremental one via dbt?

  • @StuartWeir
    @StuartWeir 3 місяці тому +1

    Where do you feel Entity Resolution fits into this? For example, I have multiple data sources pertaining to the same data type; does ER fit between staging and the DW in this 3 Layered Approach?

    • @KahanDataSolutions
      @KahanDataSolutions  3 місяці тому

      Sounds like that would be handled in the Warehouse layer. Each table in the different sources would have its own Staging view, then combined in the Warehouse to create the single entity.

  • @dmeeuwsen4105
    @dmeeuwsen4105 3 місяці тому

    So where in this example would you implement the dbt snapshotting?
    At the client im working for atm we have the raw source data coming in, in the landing schema. Then in the staging schema the history is build (dbt snapshot, with a select * from landing.table).
    The snapshotted tables are then used in the models, where (among other things ofc) we rename columns and clean some data.

    • @KahanDataSolutions
      @KahanDataSolutions  3 місяці тому

      I'd probably look to do something like this:
      source > snapshot > staging > warehouse > mart

  • @alxsbn
    @alxsbn 3 місяці тому

    Hi @KahanDataSolution. How did you merge this organization with dbt folder structure recommandations especially for the intermediate and marts layers/folders. Do you advise to create a new folder named warehouse? Did you advise not letting end users having access to facts and dimensions? you're speaking about OBT, is this the way to go?

    • @KahanDataSolutions
      @KahanDataSolutions  2 місяці тому +1

      I tend to break out warehouse as a separate directory. In dbt it'll look something like this:
      models/
      marts
      staging
      warehouse
      For intermediate, these are more like "helper" models. So you could organize them in a separate directory right under models/ or nest it under the particular layer it's supporting. For example:
      models/
      marts
      staging
      warehouse
      intermediate
      As mentioned in the video, I prefer to keep end-user access limited to Marts but some organizations allow direct fact/dimension access. It really depends on the user base, their understanding & familiarity with those concepts.
      Overall, there is no single "right" approach but will be a mixture based on your particular company. But this 3 layer design has worked well for me and others as a great starting point.
      For more on the different "warehouse" models, check out this video - ua-cam.com/video/IdCmMkQLvGA/v-deo.htmlsi=ZyfIVFSII7qznVBI
      Hope this helps!

  • @chiragpramod308
    @chiragpramod308 3 місяці тому +1

    Thanks alot for Practical pov !!
    Do you still suggest reading Kimball or are there better and more modern & Practical books on modelling
    Love your content man

    • @KahanDataSolutions
      @KahanDataSolutions  3 місяці тому +1

      I still like to follow a star schema for the core "warehouse" model for clarity & organizational purposes. Then create wider user-facing "marts" on top of those to leverage the capabilities of more modern cloud DBs.

    • @chiragpramod308
      @chiragpramod308 3 місяці тому

      ​@@KahanDataSolutionsThanks for your insight.
      I heard about Lakehouse modelling being a Hype but It only happens on Larger companies. So is it important to get thoroughly in touch with the basics before heading towards such Complex Topics?

    • @KahanDataSolutions
      @KahanDataSolutions  3 місяці тому

      @@chiragpramod308 Learning the basics before moving to complex topics is always a good strategy.