Data Engineering Principles - Build frameworks not pipelines - Gatis Seja

Поділитися
Вставка
  • Опубліковано 3 жов 2024

КОМЕНТАРІ • 16

  • @efeorikpete8774
    @efeorikpete8774 2 роки тому +18

    Fast-forward to 3 years later: AIRFLOW now has robust documentation for authoring, scheduling and monitoring your data pipeline

  • @MrKane101111
    @MrKane101111 2 роки тому +8

    Great presentation, really nice analogy and very clear.

  • @horaceweatherby2910
    @horaceweatherby2910 Рік тому +9

    To be honest, I didn't find this to be very helpful. I'm a project manager tasked with redesigning the whole data environment in a small enterprise, technically minded but never formally studied.
    It seemed like the presenter didn't make the case for the presentation's title "Build frameworks, not pipelines." I didn't observe a part where he discounted pipelines. The beginning 10 minutes about many units being used across Britain as an analogy for different technologies and systems in data didn't reveal any insights and can be safely skipped IMO. After that, the diagramming of a framework from the data source all the way to a data warehouse seems more like an explanation for beginner's, but without the clarity that such an explanation should possess. Overall, seemed like an inadequately organized way to present a basic idea.
    Though, some individual points from this presentation that I took away:
    - Keep HTML files from web scraping, not just fields, for access to the data at any time without going back to the original source
    - Maintain a layer for failed data extractions: this has been my idea for a long time but good to see it articulated by an actual data engineer
    - Maintain a layer as a staging data warehouse, prior to the production data warehouse
    Instead, I found this recommended video better, even though it was more complex: ua-cam.com/video/C6Abv87D5dU/v-deo.html
    It goes more in-depth about one company's challenges in designing a new data pipeline and offers insights that are generalizable to anyone setting up or upgrading such a pipeline.

    • @ooker777
      @ooker777 Рік тому

      Thanks for your time and effort to write a detailed review

  • @TheSolbiatii
    @TheSolbiatii 2 роки тому +6

    00:00 Welcome
    00:34 Merchant John Story
    08:17 Need for standardization
    10:25 Traditional Pipeline vs Ideal Framework with Validations
    18:02 Principles
    22:26 Q&A

  • @severtone263
    @severtone263 2 роки тому

    This was very helpful. That analogy is simply the best.

  • @boudehoucherahma8083
    @boudehoucherahma8083 2 роки тому +1

    Verry interesting présentation. Tanks🙏

  • @dmytrooliinyk3083
    @dmytrooliinyk3083 4 місяці тому

    That's a great talk!

  • @AshokTak
    @AshokTak 2 роки тому +2

    00:00 Welcome
    00:34 Merchant John Story
    08:17 Need for standardization
    22:26 Q&A
    Will update it later.

  • @firefoxmetzger9063
    @firefoxmetzger9063 2 роки тому +10

    Somehow this makes me think of XKCD's Standards comic.

    • @julianatlas5172
      @julianatlas5172 2 роки тому +5

      I likes the xkdc about date format. There is only one good date format according to the ISO 8601 which is YYYY-MM-DD e.g 2021-12-15

  • @jamesattwood3454
    @jamesattwood3454 2 роки тому

    Great talk!

  • @augugninfin1034
    @augugninfin1034 Рік тому

    great...

  • @mayurarun
    @mayurarun 2 роки тому

    Nice

  • @RedShipsofSpainAgain
    @RedShipsofSpainAgain Рік тому +1

    First 10 minutes he talks about different measuring units in Britain as a bad analogy for the importance of standards in modern daya engineering: it has zero relevance to data engineering platforms. Really poor analogy. Just skip to 10:20.

  • @vansf3433
    @vansf3433 Рік тому

    It's too simple, and anyone can learn the process of sorting out, transforming and transmitting data without any need of good knowledge of CS