Nice! Question: Do you also use views in your warehouse and mart layers? I've been at companies where the marts were basically views based on views based on views times 10 which was terrible for the performance of getting the data.
Nice! I use a staging layer as a view and then 4 more layers for the pipeline until I get to the mart. I usually alternate between views and materialized tables, but I am not quite sure how to know the optimal way to decide between tables and views at each time. How do you compare performance, storage and other practical factors?
It's the silver layer. Bronze = raw data in this video. Silver = "staging"/cleaned data in this video. Gold = Warehouse in this video. I don't like that he's using the term "staging" to refer to cleaned data because in traditional data warehousing a staging table typically refers to uncleaned data straight after you've loaded it from a source system and the cleaning happens later.
it does not really matter how you call them if you have agreed on the purpose. Bronze layer can be raw_source or it can be staging. personally i like to keep the source out of the way and use bronze for staging - cleaning/transforming. silver for joining multiple bronze tables, what i know can be reused for multiple use cases in a gold layer. gold layer for the final solution/consumption joining some silver and bronze tables.
@@ArmandsPutnis yeah, this. Bronze, silver and gold is an abstraction to help you think about your structure, not something with set rules you have to follow dogmatically. Figure out what layers you need to solve your problems and then just structure your layers appropriately. Staging serves a purpose to help you shift the transforms left so changes are easier down the road given they will propagate through all your downstream transforms. Then transform on top of that assuming the stage takes care of most of the cleaning/formatting for you. If your management makes you pick a metal, I suggest the titanium layer.
Do you truncate the data each batch pipeline run on staging and capture the cleaned data in snapshots? If not, how do you avoid duplicates down stream if you’re using say SCD Type 2?
Migh be wrong but I take that the staging layer would be a bronze layer in the Medallion architecture, so we would have landing with raw data, bronze with cleaned raw data, silver with any new columns or any enhancement to the data and Gold with the joins and business logic. But thats just how I use at work and it can be changed to fit your needs
Looking for help with your team's data strategy? → www.kahandatasolutions.com
Looking to improve your data engineering skillset?→ bit.ly/more-kds
Thanks man. Hope this channel blows up in the days to come.
Your data modelling playlist is fantastic !
What I like - is your English and punctuation, but not those people from India who thinks that their English is native
Nice! Question: Do you also use views in your warehouse and mart layers? I've been at companies where the marts were basically views based on views based on views times 10 which was terrible for the performance of getting the data.
We always use tables as marts and then sometimes on top build views that do things with the materialized marts data.
Excellent series! Thank you :)
Nice! I use a staging layer as a view and then 4 more layers for the pipeline until I get to the mart. I usually alternate between views and materialized tables, but I am not quite sure how to know the optimal way to decide between tables and views at each time. How do you compare performance, storage and other practical factors?
Nice work! Where should the staging layer come when using a bronze, silver, gold medallion structure ?
from my understanding you would use your bronze layer as a staging layer pulling from all source systems
It's the silver layer. Bronze = raw data in this video. Silver = "staging"/cleaned data in this video. Gold = Warehouse in this video. I don't like that he's using the term "staging" to refer to cleaned data because in traditional data warehousing a staging table typically refers to uncleaned data straight after you've loaded it from a source system and the cleaning happens later.
it does not really matter how you call them if you have agreed on the purpose. Bronze layer can be raw_source or it can be staging.
personally i like to keep the source out of the way and use bronze for staging - cleaning/transforming.
silver for joining multiple bronze tables, what i know can be reused for multiple use cases in a gold layer.
gold layer for the final solution/consumption joining some silver and bronze tables.
@@ArmandsPutnis yeah, this. Bronze, silver and gold is an abstraction to help you think about your structure, not something with set rules you have to follow dogmatically. Figure out what layers you need to solve your problems and then just structure your layers appropriately. Staging serves a purpose to help you shift the transforms left so changes are easier down the road given they will propagate through all your downstream transforms. Then transform on top of that assuming the stage takes care of most of the cleaning/formatting for you. If your management makes you pick a metal, I suggest the titanium layer.
Thanks a lot. Can you please take the real world project and covert into data model.
Do you truncate the data each batch pipeline run on staging and capture the cleaned data in snapshots? If not, how do you avoid duplicates down stream if you’re using say SCD Type 2?
The staging layer would be equivalent to a landing zone?
Migh be wrong but I take that the staging layer would be a bronze layer in the Medallion architecture, so we would have landing with raw data, bronze with cleaned raw data, silver with any new columns or any enhancement to the data and Gold with the joins and business logic. But thats just how I use at work and it can be changed to fit your needs
Here's a new video I made about the Landing Zone - ua-cam.com/video/TaSIdUX4YXk/v-deo.htmlsi=DXOWjummSZWHQ-un
@@KahanDataSolutions thank you
Stage All the Things