What about the shift in competences? It seems that data analysts and scientists lose a big part of the pipeline (bronze-silver passage) in favor of data engineers that now works on the entire "ETL" process that is more complicated than before (EDA, CDC, flink SQL, iceberg creation, ecc...) and incorporates all the hops until the silver node.
It looks like this guy is mixing two things. With Medallion, the Bronze layer is main the raw data, which doesn't make sense to have the "ETL" part before. Sio, you just ingest your data "AS-IT-IS" in the Bronze, and then do the following transformation to Silver and Gold, so the "ETL" is pushed into the Medallion. Beautiful graphs, but unclean explanation. The advantage of having the raw data without any changes is that you can reuse that data for many other purposes, now 3 use cases, tomorrow feed 10 more. Today store that is cheap (s3, adls, gcs) and reprocess can be also cheap with the right tool.
Adam here. FWIW, I've been a data engineer for more than a decade, and have worked extensively on Medallion architectures (and pre-medallion). The point of shift-left is pretty simple. Take the work you're already doing from your data lake to clean up your bronze layer data, and shift it left so that systems outside of the data lake can make use of it. For eg: SaaS systems, multiple data lakes / warehouses, operational systems (with realtime needs), GenAI RAG, etc. Now, if you have no need speed nor data access anywhere else in your company, then feel free to keep it all in your data lake. But the vast majority of customers we see need their data in multiple places, and trip all over themselves rebuilding the same cleaning logic that you're proposing ("having the raw data without any changes is that you can reuse that data for many other purposes").
Clear & Simple 😊❤
What about the shift in competences? It seems that data analysts and scientists lose a big part of the pipeline (bronze-silver passage) in favor of data engineers that now works on the entire "ETL" process that is more complicated than before (EDA, CDC, flink SQL, iceberg creation, ecc...) and incorporates all the hops until the silver node.
This seems to be an expected evolution from moving out ETL to ELT
It looks like this guy is mixing two things. With Medallion, the Bronze layer is main the raw data, which doesn't make sense to have the "ETL" part before. Sio, you just ingest your data "AS-IT-IS" in the Bronze, and then do the following transformation to Silver and Gold, so the "ETL" is pushed into the Medallion. Beautiful graphs, but unclean explanation. The advantage of having the raw data without any changes is that you can reuse that data for many other purposes, now 3 use cases, tomorrow feed 10 more. Today store that is cheap (s3, adls, gcs) and reprocess can be also cheap with the right tool.
Adam here. FWIW, I've been a data engineer for more than a decade, and have worked extensively on Medallion architectures (and pre-medallion).
The point of shift-left is pretty simple. Take the work you're already doing from your data lake to clean up your bronze layer data, and shift it left so that systems outside of the data lake can make use of it. For eg: SaaS systems, multiple data lakes / warehouses, operational systems (with realtime needs), GenAI RAG, etc.
Now, if you have no need speed nor data access anywhere else in your company, then feel free to keep it all in your data lake. But the vast majority of customers we see need their data in multiple places, and trip all over themselves rebuilding the same cleaning logic that you're proposing ("having the raw data without any changes is that you can reuse that data for many other purposes").