Why Data Engineers LOVE/HATE Airflow (FT.

Поділитися
Вставка
  • Опубліковано 14 жов 2024
  • Airflow is a favorite tool of many data engineers. But some data engineers dislike it.
    It can be tricky to scale and hard to manage if set up incorrectly.
    Let's talk about it.
    Also, I am an advisor at Mage which is working to make data workflow orchestration easier - www.mage.ai/
    If you enjoyed this video, check out some of my other top videos.
    Top Courses To Become A Data Engineer In 2022
    • Top Courses To Become ...
    What Is The Modern Data Stack - Intro To Data Infrastructure Part 1
    • What Is The Modern Dat...
    If you would like to learn more about data engineering, then check out Googles GCP certificate
    bit.ly/3NQVn7V
    If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
    seattledataguy...
    Or check out my blog
    www.theseattle...
    And if you want to support the channel, then you can become a paid member of my newsletter
    seattledataguy...
    Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio
    _____________________________________________________________
    Subscribe: / @seattledataguy
    _____________________________________________________________
    About me:
    I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
    *I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.

КОМЕНТАРІ • 62

  • @SeattleDataGuy
    @SeattleDataGuy  2 роки тому +3

    If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/ or join the discord here discord.gg/2yRJq7Eg3k

  • @MarcLamberti
    @MarcLamberti 2 роки тому +15

    Thank you for making this video. I don't want to over promote Airflow because I'm obviously a little bit biased, but I do think a lot of people still know Airflow from version 1.10.X and haven't tried 2.X yet. Many things have been fixed (performances, dag autorhing, UI, etc.). The gap is just huge. Also, I would say the flexibility/freedom that Airflow brings is a double edge sword: You can do a lot, you can configure many things, touch any details to fit perfectly with you needs, but the deeper you go the steeper the learning curve. It's easy to get lost in all features and parameterable things that Airflow brings. However, it's relatively easy IMHO if you just want to run data pipelines and execute a few tasks.❤

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому

      Thank you, yeah I think, as you said, most people use Airflow at a very base level. Even if they are using 2.X. Also, I think a while back you may have had a comment on collabing...I feel like I never got back to you on that

    • @splashoui3760
      @splashoui3760 Рік тому

      What is the best way to learn and practise airflow?

    • @sreemantakesh9637
      @sreemantakesh9637 Рік тому

      Hi @@SeattleDataGuy . I am seeing lot of people using Airflow to orchestrate ADF in Azure. Is it really worth using it given we already have ADF triggers?

  • @anna__geller
    @anna__geller 2 роки тому +3

    Awesome video, a very balanced perspective without focusing only on strengths or weaknesses of any single tool 👍

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +1

      Glad you found it helpful! I really was trying to be balanced so I am glad you felt that way.

  • @mehdio
    @mehdio 2 роки тому +4

    Cool journalist approach, glad to have other's opinion included! 👍

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому

      Thank you so much for all your perspective on the topic!

  • @miguelvera9465
    @miguelvera9465 2 роки тому +1

    This was very interesting, glad to hear the different insights. Hope to see more collaborations in the community

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +1

      That is my goal! I really want to get more perspectives than my own.

  • @rdean150
    @rdean150 Рік тому +2

    We've adopted Argo Workflows, which is a Cloud Native Computing Foundation project built on top of Kubernetes.

  • @chetansurwade
    @chetansurwade 2 роки тому +1

    I for one didn't face any issue while working with Xcoms, specially with large dataset using custom backend of Azure Blob storage. And Airflow by design is an orchestrator, so offloading computation is more sensible.

  • @sevegarza
    @sevegarza 2 роки тому +12

    Do a video about Prefect!

  • @anildangol
    @anildangol 2 роки тому +8

    I don't think there is best ETL pipeline and I would not bother finding the best one. Each company and team operates differently depending on their skillset, Line of Business & priorities. I never had problem while working in SSIS and rarely have problem while working in Data factory either. Yes, each tool have lots of limitations but you will find a way to overcome those limitations.
    One thing which I liked about Azure Data Factory is its ease of use with no code and extremely cheap to maintain. Yes, I like to code in Python and work on airflow which gives extreme flexibility which I couldn't have it in ADF but if ADF gives me headache then I will go with this tool anyway. I've onboarded a junior dev who have never worked in any ETL tool in a week. It's that easy.
    May time we, data engineers, spend our time tweaking and finding best tools possible in the market but companies hired us to deliver result.

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +3

      That's fair. Tool wise I think its always up in the air in terms of which is best. I think finding a process that works with the tools you have at hand is probably far more important...because once you switch companies it will be a completely new tool. As you said, results first, fancy tools later.

    • @alexischicoine2072
      @alexischicoine2072 2 роки тому

      I’ve also found data factory to work well for orchestrating and the low code keeps it simple. The actual steps being orchestrated are already complex enough.

    • @tanmaybagul2957
      @tanmaybagul2957 2 роки тому

      😅😅

  • @nashaeshire6534
    @nashaeshire6534 6 місяців тому +1

    Thanks a lot, much appreciate. I plan to use Apache Kafka on log system. In order to add maintenability to my ETL (transform on Kafka and before ElasticSearch), I wish to add air flow. But Apache Kafka connect look pretty good too. Over this 2 solutions, what will you choose for an ELK + Kafka Pipeline ?

  • @mohamedyasser5285
    @mohamedyasser5285 2 роки тому +3

    Great video! I would love to hear your opinion about Apache Kedro.

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +3

      It feels like one day it could be great, but I feel like its still early and needs a stronger community before I would adopt it.

  • @brettstoddard7947
    @brettstoddard7947 2 місяці тому

    I've only used airflow in a narrow capacity for handling scheduling & dependencies. What's the k8s drama about? I've never heard airflow and k8s used in the same sentence before

  • @gava5327
    @gava5327 2 роки тому +1

    Can you review the Meta Database Engineer Professional Certificate on Coursera when it comes out?

  • @peterbizik224
    @peterbizik224 Рік тому +1

    Interesting point of views, thanks for the video. As I see it, technology evolves, but the tech stacks, getting crazy complicated. At the end, mostly it got stuck on the budget, get someone cheap (overpromise data engineer) and you are getting headache, can't move from dev environment and most of the data pipelines are sql at the end. But I could be wrong.

    • @SeattleDataGuy
      @SeattleDataGuy  Рік тому

      They are getting crazy, keep things as simple as possible for as long as possible

  • @JimRohn-u8c
    @JimRohn-u8c 2 роки тому +3

    How do you feel about Prefect?

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +1

      I still haven't got it into production. I believe Madison has a better opinion here. madisonmae.substack.com/p/sorry-i-hate-airflow

  • @minthura24
    @minthura24 2 роки тому +2

    Thanks for the video.

  • @kevinsu2219
    @kevinsu2219 Рік тому +2

    Do a video about flyte

  • @DataPains
    @DataPains 3 місяці тому +1

    Used it for years, I also tried the later 2.x version, I still don't like it, and I think there are better ways of architecting pipelines. But yeah I was amazed when I saw Airflow the first time, and it did solve a lot of problems, but I still think, it is a tool of the past. I hope I am wrong!

    • @SeattleDataGuy
      @SeattleDataGuy  3 місяці тому +1

      It's been a decade, so I wouldn't be surprised to see it replaced in the next 5 years. But never know, some things are hard to get rid of.

  • @janswee1
    @janswee1 Рік тому

    You should summarize pros and cons in the beginning

  • @mauludinrohman6177
    @mauludinrohman6177 Рік тому

    What is the different between airflow and astronomer, can you help me sir ?

  • @Emanuel-yb3qk
    @Emanuel-yb3qk 2 роки тому

    Hi
    I’m a new subscriber and I just saw your video of “roadmap to become a data engineer” and, I wonder if you could advice me a course to learn python.
    You channel is awesome

  • @robot01001
    @robot01001 Рік тому

    I'm halfway through this video and I still don't know wtf AIrflow is. I know it has a k8s operator but I have no idea what it is or why I would use it. Maybe this video is for advanced people.

  • @datawitharslan
    @datawitharslan Рік тому +1

    As a starter in Modern Data Stack , should i learn Prefect or Airflow ? What you recommend

    • @valerianmp
      @valerianmp Рік тому +1

      Just pick any one of that, you can always learn the other one later when you need it

    • @SeattleDataGuy
      @SeattleDataGuy  Рік тому +2

      Valerian is kind of right. Airflow is far more popular for now. But in tech you're constantly learning. What was important to know this year is old news 3 years from now.

  • @rguez2332
    @rguez2332 2 роки тому +1

    Is Pentaho PDI used for different purposes than Airflow??

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +1

      I don't think Pentaho is that popular..but i could be wrong. Where do you use it? Have you used it alot?

    • @rguez2332
      @rguez2332 2 роки тому +1

      @@SeattleDataGuy Its used for ETL. But I can mention popular tools.
      Im still learnirng and I was wondering if you can ETL,ELT the data with tools like FiveTran/Airbyte/Stitch without using Airflow. Is Airflow used just to automate or you can get the whole ETL process with it?

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +2

      @@rguez2332 In the past airflow was used for everything. But technically its just an orchestrator. Nowadays people are trying to use other tools like airbyte, dbt + airflow to make pipelines. But thats more for open-source style pipelines. There are so many other tools that people like out there.

    • @rguez2332
      @rguez2332 2 роки тому +1

      @@SeattleDataGuy thx so much!

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +1

      @@rguez2332 you're welcome!

  • @sana-sz5ue
    @sana-sz5ue 2 роки тому +1

    What are peoples thoughts on what data engineer career progression is like because you dont gain a qualification, only work experience???

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому

      Do you mean like certificates?

    • @sana-sz5ue
      @sana-sz5ue 2 роки тому

      @@SeattleDataGuy yes like how can you keep working your way up without qualifications or in the tech industry do certificates work the same way?

  • @lucasbayout195
    @lucasbayout195 2 роки тому

    Airflow is amazing.

  • @TheSilpelit
    @TheSilpelit 2 роки тому +1

    Why can't you use the well known DevOps tools like Jenkins?

    • @SeattleDataGuy
      @SeattleDataGuy  2 роки тому +2

      To manage custom data pipelines? I have seen it done. It was pretty hairy though.

  • @AmantisAnalytics
    @AmantisAnalytics 11 місяців тому +3

    Mage AI

  • @ASO-xh5vu
    @ASO-xh5vu 2 роки тому

    This is a perfect channel. My only criticism is "verbosity". Too many words...

  • @deepanshurathore9661
    @deepanshurathore9661 Місяць тому

    Airflow is shit...

  • @samsal073
    @samsal073 Рік тому

    Airflow sucks ....it should be thrown in the trash ......doesn't support muli system, all code based which violates low code-no code rules....impossible to install cluster on standalone mode on premise without involving technologies like docker and kubernetes which increases complexity...etc.

    • @romank7944
      @romank7944 3 місяці тому

      In this case, please tell me what tools you could recommend for orchestrating ETL processes in Python (on Windows) What do you prefer? thanks

    • @samsal073
      @samsal073 3 місяці тому

      @romank7944 look at apache nifi . It's great tool that is easy to install, setup and scale. Even though it's recommended to run on Unix based systems it runs fine on windows. It's java based app but post 2.0 version supports writing python extensions.