Airflow Vs. Dagster: The Full Breakdown!

Поділитися
Вставка
  • Опубліковано 29 лис 2024

КОМЕНТАРІ • 29

  • @thanhbinh24
    @thanhbinh24 Рік тому +5

    This video is just right on point! I had my first job as a DE recently and was tasked with migrating all the cronjobs to an orchestration tool and I was looking for the best option, and now i'm pretty sure that we'll be better off with Airflow.
    Thank you and keep up the good work my man

    • @thedataguygeorge
      @thedataguygeorge  Рік тому +1

      Thank you so much, happy this helped you make a decision!!

  • @jarredthedataengineer
    @jarredthedataengineer 8 місяців тому +13

    This an interesting video but it is fairly inaccurate about Dagster, I'm sure not out of malice, but probably because op is more familiar with Airflow.
    for ex... Dagster is open-source, it is super extensible and modular, etc.
    I'd also point out a pretty important difference between Dagster and Airflow, Dagster enables a local to production test-build and deploy cycle, which is not really possible with Airflow. Also, Dagster comes with a ton of automation capabilities that just aren't possible with an imperative orchestator like Airflow.
    This is a pretty deep subject that requires a fair amount of knowledge by the author to really give a fair comparison, and it's somewhat lacking in this video.

  • @ricardomalla6533
    @ricardomalla6533 9 місяців тому +1

    would airflow be a good fit to orquestate a couple of python scripts to send marketing emails to our customers based on certain criteria?
    is there something better for this application?

    • @thedataguygeorge
      @thedataguygeorge  9 місяців тому

      Thats a great use case for Airflow! MailChimp might also be a good option for that particular use case as well!

  • @simondelorean
    @simondelorean Рік тому +1

    Thank you, that was very helpful.

  • @baja
    @baja Рік тому +6

    Coming here as someone who uses dagster daily and wants to know if Airflow is worth it so appreciate this comparison
    A few things on the Dagster side: for the first example you can do exactly what you have in Airflow in Dagster. You can create branching logic by having an Op have multi outputs (not all required) and only output the single one for the day of the week. You can wrap this branching Op and the specific day of the week Ops in a graph and build this graph into one of the assets shown. If guitar lessons, family dinner, etc... produce assets, you can just make them their own assets and have a similar not required feature where they only fire on their specific day of the week. In the UI you can expand the assets to their Ops and Graphs to see the branching logic
    I use this for example by training a ML model every monday and then running predictions using it after. Every other day of the week, we just use the previous model for predictions without retraining
    I don't really understand the point about testing in dagster? You can add assertions/raise errors in the Dagster Assets, there's also hooks which are separate functions that run after the completion of an asset (these can send messages to slack, do any quality checks, etc... it's just a python function) - which is just nicer to keep things separate. Most of those logs you're seeing in Dagster will be user specified as the logger gets passed into the Asset function - I log debug info, errors, warnings, etc...
    I don't really understand the last point about dagster api?? You can run anything in Dagster, for example if you want to trigger something in Fivetran or DBT Cloud, the dagster code is just hitting the endpoint and polling while computations are done elsewhere. You can set up your own api's to do a similar thing. I don't really like how Dagster couples compute and orchestration so much but it seems like Airflow is doing a similar thing and you don't have to use Dagster this way. There's IO managers to manage the data passing between assets. This doesn't have to be JSON data from an API but any python variable. I run dagster on kubernetes where each asset is run in it's own pod so I'll use S3 or GCS, etc... to pickle the python objects and pass between pods. My understanding is that this is an advantage dagster has because it type checks the data going between pods. There's other tasks where my assets just run cli, one example being running scripts in R

    • @thedataguygeorge
      @thedataguygeorge  Рік тому +2

      Wow thank you so much for that breakdown, really really appreciate it! Am planning on a revised version of this video to give Dagster more credibility after learning all these things, made the video when I was still relatively new to Dagster

    • @baja
      @baja Рік тому

      @@thedataguygeorge All good, and looking forward to the new videa! It did take me a lot of time using Dagster to learn a lot of these thigns

    • @nixbruh
      @nixbruh Рік тому

      one thing i have to say that sucks is let's say you want to have two ops in a job, and have them run in parallel - dagster won't let you do that if your io managers are in memory. it will force one to wait for the other. for me that defeats the whole purpose honestly. maybe im clueless?@@baja

    • @baja
      @baja Рік тому +1

      @@nixbruh This shouldn't depend on the io manager but on the executor you're using. Are you using a multiprocess executor or in process?
      I don't have an issue using multiprocess locally or I typically use a k8s_executor when deploying. I typically use the fs_io_manager instead of in memory locally but again that shouldn't matter

  • @datalearningsihan
    @datalearningsihan Рік тому +2

    Thank you. I feel privileged for making the video on my request. I know I know, I will take the whole of the credits :D

  • @nixbruh
    @nixbruh Рік тому +1

    awesome stuff bro. question, is there any reason why not just to use these things as schedulers and just have them spin up containers that hold the code? i feel like you get tied to a specific framework and it turns into a nightmare...

    • @nixbruh
      @nixbruh Рік тому +1

      i guess the only downside is that you can stop and start parts of the code that might fail or just to run things manually? but idk if that trade off is worth it...hoping people who know what they're doing can share opinions

    • @thedataguygeorge
      @thedataguygeorge  Рік тому +2

      That is a totally valid approach, honestly one that I think Airflow excels at. A lot I see using Airflow in production are just using it to call out to other containers/services to run those jobs there, and just have Airflow as a centralized error-handling/monitoring layer on top in addition to its scheduling capabilities

  • @ofnotandi
    @ofnotandi Рік тому +4

    Dagster is open source according to the homepage

    • @thedataguygeorge
      @thedataguygeorge  Рік тому +2

      Sorry you're right, I think it's more of an open-core since there's not much development outside of the dagster company but that's definitely up for debate!

  • @luiztauffer8513
    @luiztauffer8513 Рік тому

    Hey, thanks a lot for the insightful overview! And your channel is awesome for Airflow content.
    I'd love to see a similar comparisons with Flyte and Kestra

    • @thedataguygeorge
      @thedataguygeorge  Рік тому +1

      Thanks Luiz! Really appreciate the love! And will put them in the schedule, thanks for the idea!

  • @joshuasmith2814
    @joshuasmith2814 Рік тому

    Great content... (horrid audio, was your landlady vacuuming?)

    • @thedataguygeorge
      @thedataguygeorge  Рік тому +2

      Thanks Josh! And apologies, I had facade construction going on outside my window from 8-6 the past couple months that was really screwing me up, all done now though, hopefully its better in recent videos!

  • @ricardomalla6533
    @ricardomalla6533 9 місяців тому +1

    genius.

  • @christophergutknecht8683
    @christophergutknecht8683 3 місяці тому +1

    Love the content! Audio could be better, squeaky chair and booming background noise are a little distracting

    • @thedataguygeorge
      @thedataguygeorge  3 місяці тому

      Thanks for the tips, hope my more recent videos are more up to snuff!

  • @StefanoMessina-ux2mj
    @StefanoMessina-ux2mj 5 місяців тому +1

    I'm pretty sure Dagster is open source

    • @thedataguygeorge
      @thedataguygeorge  5 місяців тому

      It technically is but 90% of the dev work is from the on-staff Dagster team