To be honest, I didn't find this to be very helpful. I'm a project manager tasked with redesigning the whole data environment in a small enterprise, technically minded but never formally studied. It seemed like the presenter didn't make the case for the presentation's title "Build frameworks, not pipelines." I didn't observe a part where he discounted pipelines. The beginning 10 minutes about many units being used across Britain as an analogy for different technologies and systems in data didn't reveal any insights and can be safely skipped IMO. After that, the diagramming of a framework from the data source all the way to a data warehouse seems more like an explanation for beginner's, but without the clarity that such an explanation should possess. Overall, seemed like an inadequately organized way to present a basic idea. Though, some individual points from this presentation that I took away: - Keep HTML files from web scraping, not just fields, for access to the data at any time without going back to the original source - Maintain a layer for failed data extractions: this has been my idea for a long time but good to see it articulated by an actual data engineer - Maintain a layer as a staging data warehouse, prior to the production data warehouse Instead, I found this recommended video better, even though it was more complex: ua-cam.com/video/C6Abv87D5dU/v-deo.html It goes more in-depth about one company's challenges in designing a new data pipeline and offers insights that are generalizable to anyone setting up or upgrading such a pipeline.
00:00 Welcome 00:34 Merchant John Story 08:17 Need for standardization 10:25 Traditional Pipeline vs Ideal Framework with Validations 18:02 Principles 22:26 Q&A
First 10 minutes he talks about different measuring units in Britain as a bad analogy for the importance of standards in modern daya engineering: it has zero relevance to data engineering platforms. Really poor analogy. Just skip to 10:20.
Fast-forward to 3 years later: AIRFLOW now has robust documentation for authoring, scheduling and monitoring your data pipeline
Great presentation, really nice analogy and very clear.
To be honest, I didn't find this to be very helpful. I'm a project manager tasked with redesigning the whole data environment in a small enterprise, technically minded but never formally studied.
It seemed like the presenter didn't make the case for the presentation's title "Build frameworks, not pipelines." I didn't observe a part where he discounted pipelines. The beginning 10 minutes about many units being used across Britain as an analogy for different technologies and systems in data didn't reveal any insights and can be safely skipped IMO. After that, the diagramming of a framework from the data source all the way to a data warehouse seems more like an explanation for beginner's, but without the clarity that such an explanation should possess. Overall, seemed like an inadequately organized way to present a basic idea.
Though, some individual points from this presentation that I took away:
- Keep HTML files from web scraping, not just fields, for access to the data at any time without going back to the original source
- Maintain a layer for failed data extractions: this has been my idea for a long time but good to see it articulated by an actual data engineer
- Maintain a layer as a staging data warehouse, prior to the production data warehouse
Instead, I found this recommended video better, even though it was more complex: ua-cam.com/video/C6Abv87D5dU/v-deo.html
It goes more in-depth about one company's challenges in designing a new data pipeline and offers insights that are generalizable to anyone setting up or upgrading such a pipeline.
Thanks for your time and effort to write a detailed review
00:00 Welcome
00:34 Merchant John Story
08:17 Need for standardization
10:25 Traditional Pipeline vs Ideal Framework with Validations
18:02 Principles
22:26 Q&A
This was very helpful. That analogy is simply the best.
Verry interesting présentation. Tanks🙏
That's a great talk!
00:00 Welcome
00:34 Merchant John Story
08:17 Need for standardization
22:26 Q&A
Will update it later.
Somehow this makes me think of XKCD's Standards comic.
I likes the xkdc about date format. There is only one good date format according to the ISO 8601 which is YYYY-MM-DD e.g 2021-12-15
Great talk!
great...
Nice
First 10 minutes he talks about different measuring units in Britain as a bad analogy for the importance of standards in modern daya engineering: it has zero relevance to data engineering platforms. Really poor analogy. Just skip to 10:20.
It's too simple, and anyone can learn the process of sorting out, transforming and transmitting data without any need of good knowledge of CS