The Missing Piece in Many Data Pipelines

Поділитися
Вставка
  • Опубліковано 21 гру 2024

КОМЕНТАРІ • 20

  • @KahanDataSolutions
    @KahanDataSolutions  5 місяців тому

    Looking for help with your team's data strategy? → www.kahandatasolutions.com
    Looking to improve your data engineering skillset?→ bit.ly/more-kds

  • @sunilbabu588
    @sunilbabu588 4 місяці тому +3

    Thanks man. Hope this channel blows up in the days to come.

  • @Billbillbillhahagdvdve
    @Billbillbillhahagdvdve 4 місяці тому

    Your data modelling playlist is fantastic !

  • @SQLGuyLLC
    @SQLGuyLLC 3 місяці тому +1

    What I like - is your English and punctuation, but not those people from India who thinks that their English is native

  • @bertjanvdberg
    @bertjanvdberg 5 місяців тому +2

    Nice! Question: Do you also use views in your warehouse and mart layers? I've been at companies where the marts were basically views based on views based on views times 10 which was terrible for the performance of getting the data.

    • @ramtadam1469
      @ramtadam1469 5 місяців тому +2

      We always use tables as marts and then sometimes on top build views that do things with the materialized marts data.

  • @iARAVIND666
    @iARAVIND666 3 місяці тому

    Excellent series! Thank you :)

  • @andresarmua
    @andresarmua 5 місяців тому

    Nice! I use a staging layer as a view and then 4 more layers for the pipeline until I get to the mart. I usually alternate between views and materialized tables, but I am not quite sure how to know the optimal way to decide between tables and views at each time. How do you compare performance, storage and other practical factors?

  • @thedavidabides
    @thedavidabides 5 місяців тому +2

    Nice work! Where should the staging layer come when using a bronze, silver, gold medallion structure ?

    • @muhammadbadar6089
      @muhammadbadar6089 5 місяців тому +2

      from my understanding you would use your bronze layer as a staging layer pulling from all source systems

    • @personalbranddata
      @personalbranddata 5 місяців тому +1

      It's the silver layer. Bronze = raw data in this video. Silver = "staging"/cleaned data in this video. Gold = Warehouse in this video. I don't like that he's using the term "staging" to refer to cleaned data because in traditional data warehousing a staging table typically refers to uncleaned data straight after you've loaded it from a source system and the cleaning happens later.

    • @ArmandsPutnis
      @ArmandsPutnis 5 місяців тому +3

      it does not really matter how you call them if you have agreed on the purpose. Bronze layer can be raw_source or it can be staging.
      personally i like to keep the source out of the way and use bronze for staging - cleaning/transforming.
      silver for joining multiple bronze tables, what i know can be reused for multiple use cases in a gold layer.
      gold layer for the final solution/consumption joining some silver and bronze tables.

    • @gatorpika
      @gatorpika 5 місяців тому

      @@ArmandsPutnis yeah, this. Bronze, silver and gold is an abstraction to help you think about your structure, not something with set rules you have to follow dogmatically. Figure out what layers you need to solve your problems and then just structure your layers appropriately. Staging serves a purpose to help you shift the transforms left so changes are easier down the road given they will propagate through all your downstream transforms. Then transform on top of that assuming the stage takes care of most of the cleaning/formatting for you. If your management makes you pick a metal, I suggest the titanium layer.

  • @sarfarazanjum007
    @sarfarazanjum007 Місяць тому +1

    Thanks a lot. Can you please take the real world project and covert into data model.

  • @johnpower1458
    @johnpower1458 5 місяців тому

    Do you truncate the data each batch pipeline run on staging and capture the cleaned data in snapshots? If not, how do you avoid duplicates down stream if you’re using say SCD Type 2?

  • @williamchurch711
    @williamchurch711 5 місяців тому

    The staging layer would be equivalent to a landing zone?

    • @senarl
      @senarl 5 місяців тому

      Migh be wrong but I take that the staging layer would be a bronze layer in the Medallion architecture, so we would have landing with raw data, bronze with cleaned raw data, silver with any new columns or any enhancement to the data and Gold with the joins and business logic. But thats just how I use at work and it can be changed to fit your needs

    • @KahanDataSolutions
      @KahanDataSolutions  Місяць тому +1

      Here's a new video I made about the Landing Zone - ua-cam.com/video/TaSIdUX4YXk/v-deo.htmlsi=DXOWjummSZWHQ-un

    • @williamchurch711
      @williamchurch711 Місяць тому

      @@KahanDataSolutions thank you

  • @Milhouse77BS
    @Milhouse77BS 5 місяців тому

    Stage All the Things