This video is just right on point! I had my first job as a DE recently and was tasked with migrating all the cronjobs to an orchestration tool and I was looking for the best option, and now i'm pretty sure that we'll be better off with Airflow. Thank you and keep up the good work my man
This an interesting video but it is fairly inaccurate about Dagster, I'm sure not out of malice, but probably because op is more familiar with Airflow. for ex... Dagster is open-source, it is super extensible and modular, etc. I'd also point out a pretty important difference between Dagster and Airflow, Dagster enables a local to production test-build and deploy cycle, which is not really possible with Airflow. Also, Dagster comes with a ton of automation capabilities that just aren't possible with an imperative orchestator like Airflow. This is a pretty deep subject that requires a fair amount of knowledge by the author to really give a fair comparison, and it's somewhat lacking in this video.
would airflow be a good fit to orquestate a couple of python scripts to send marketing emails to our customers based on certain criteria? is there something better for this application?
Coming here as someone who uses dagster daily and wants to know if Airflow is worth it so appreciate this comparison A few things on the Dagster side: for the first example you can do exactly what you have in Airflow in Dagster. You can create branching logic by having an Op have multi outputs (not all required) and only output the single one for the day of the week. You can wrap this branching Op and the specific day of the week Ops in a graph and build this graph into one of the assets shown. If guitar lessons, family dinner, etc... produce assets, you can just make them their own assets and have a similar not required feature where they only fire on their specific day of the week. In the UI you can expand the assets to their Ops and Graphs to see the branching logic I use this for example by training a ML model every monday and then running predictions using it after. Every other day of the week, we just use the previous model for predictions without retraining I don't really understand the point about testing in dagster? You can add assertions/raise errors in the Dagster Assets, there's also hooks which are separate functions that run after the completion of an asset (these can send messages to slack, do any quality checks, etc... it's just a python function) - which is just nicer to keep things separate. Most of those logs you're seeing in Dagster will be user specified as the logger gets passed into the Asset function - I log debug info, errors, warnings, etc... I don't really understand the last point about dagster api?? You can run anything in Dagster, for example if you want to trigger something in Fivetran or DBT Cloud, the dagster code is just hitting the endpoint and polling while computations are done elsewhere. You can set up your own api's to do a similar thing. I don't really like how Dagster couples compute and orchestration so much but it seems like Airflow is doing a similar thing and you don't have to use Dagster this way. There's IO managers to manage the data passing between assets. This doesn't have to be JSON data from an API but any python variable. I run dagster on kubernetes where each asset is run in it's own pod so I'll use S3 or GCS, etc... to pickle the python objects and pass between pods. My understanding is that this is an advantage dagster has because it type checks the data going between pods. There's other tasks where my assets just run cli, one example being running scripts in R
Wow thank you so much for that breakdown, really really appreciate it! Am planning on a revised version of this video to give Dagster more credibility after learning all these things, made the video when I was still relatively new to Dagster
one thing i have to say that sucks is let's say you want to have two ops in a job, and have them run in parallel - dagster won't let you do that if your io managers are in memory. it will force one to wait for the other. for me that defeats the whole purpose honestly. maybe im clueless?@@baja
@@nixbruh This shouldn't depend on the io manager but on the executor you're using. Are you using a multiprocess executor or in process? I don't have an issue using multiprocess locally or I typically use a k8s_executor when deploying. I typically use the fs_io_manager instead of in memory locally but again that shouldn't matter
awesome stuff bro. question, is there any reason why not just to use these things as schedulers and just have them spin up containers that hold the code? i feel like you get tied to a specific framework and it turns into a nightmare...
i guess the only downside is that you can stop and start parts of the code that might fail or just to run things manually? but idk if that trade off is worth it...hoping people who know what they're doing can share opinions
That is a totally valid approach, honestly one that I think Airflow excels at. A lot I see using Airflow in production are just using it to call out to other containers/services to run those jobs there, and just have Airflow as a centralized error-handling/monitoring layer on top in addition to its scheduling capabilities
Sorry you're right, I think it's more of an open-core since there's not much development outside of the dagster company but that's definitely up for debate!
Hey, thanks a lot for the insightful overview! And your channel is awesome for Airflow content. I'd love to see a similar comparisons with Flyte and Kestra
Thanks Josh! And apologies, I had facade construction going on outside my window from 8-6 the past couple months that was really screwing me up, all done now though, hopefully its better in recent videos!
This video is just right on point! I had my first job as a DE recently and was tasked with migrating all the cronjobs to an orchestration tool and I was looking for the best option, and now i'm pretty sure that we'll be better off with Airflow.
Thank you and keep up the good work my man
Thank you so much, happy this helped you make a decision!!
This an interesting video but it is fairly inaccurate about Dagster, I'm sure not out of malice, but probably because op is more familiar with Airflow.
for ex... Dagster is open-source, it is super extensible and modular, etc.
I'd also point out a pretty important difference between Dagster and Airflow, Dagster enables a local to production test-build and deploy cycle, which is not really possible with Airflow. Also, Dagster comes with a ton of automation capabilities that just aren't possible with an imperative orchestator like Airflow.
This is a pretty deep subject that requires a fair amount of knowledge by the author to really give a fair comparison, and it's somewhat lacking in this video.
would airflow be a good fit to orquestate a couple of python scripts to send marketing emails to our customers based on certain criteria?
is there something better for this application?
Thats a great use case for Airflow! MailChimp might also be a good option for that particular use case as well!
Thank you, that was very helpful.
You're very welcome, glad it was helpful!
Coming here as someone who uses dagster daily and wants to know if Airflow is worth it so appreciate this comparison
A few things on the Dagster side: for the first example you can do exactly what you have in Airflow in Dagster. You can create branching logic by having an Op have multi outputs (not all required) and only output the single one for the day of the week. You can wrap this branching Op and the specific day of the week Ops in a graph and build this graph into one of the assets shown. If guitar lessons, family dinner, etc... produce assets, you can just make them their own assets and have a similar not required feature where they only fire on their specific day of the week. In the UI you can expand the assets to their Ops and Graphs to see the branching logic
I use this for example by training a ML model every monday and then running predictions using it after. Every other day of the week, we just use the previous model for predictions without retraining
I don't really understand the point about testing in dagster? You can add assertions/raise errors in the Dagster Assets, there's also hooks which are separate functions that run after the completion of an asset (these can send messages to slack, do any quality checks, etc... it's just a python function) - which is just nicer to keep things separate. Most of those logs you're seeing in Dagster will be user specified as the logger gets passed into the Asset function - I log debug info, errors, warnings, etc...
I don't really understand the last point about dagster api?? You can run anything in Dagster, for example if you want to trigger something in Fivetran or DBT Cloud, the dagster code is just hitting the endpoint and polling while computations are done elsewhere. You can set up your own api's to do a similar thing. I don't really like how Dagster couples compute and orchestration so much but it seems like Airflow is doing a similar thing and you don't have to use Dagster this way. There's IO managers to manage the data passing between assets. This doesn't have to be JSON data from an API but any python variable. I run dagster on kubernetes where each asset is run in it's own pod so I'll use S3 or GCS, etc... to pickle the python objects and pass between pods. My understanding is that this is an advantage dagster has because it type checks the data going between pods. There's other tasks where my assets just run cli, one example being running scripts in R
Wow thank you so much for that breakdown, really really appreciate it! Am planning on a revised version of this video to give Dagster more credibility after learning all these things, made the video when I was still relatively new to Dagster
@@thedataguygeorge All good, and looking forward to the new videa! It did take me a lot of time using Dagster to learn a lot of these thigns
one thing i have to say that sucks is let's say you want to have two ops in a job, and have them run in parallel - dagster won't let you do that if your io managers are in memory. it will force one to wait for the other. for me that defeats the whole purpose honestly. maybe im clueless?@@baja
@@nixbruh This shouldn't depend on the io manager but on the executor you're using. Are you using a multiprocess executor or in process?
I don't have an issue using multiprocess locally or I typically use a k8s_executor when deploying. I typically use the fs_io_manager instead of in memory locally but again that shouldn't matter
Thank you. I feel privileged for making the video on my request. I know I know, I will take the whole of the credits :D
hahahaha no worries man, doing it all for you!
awesome stuff bro. question, is there any reason why not just to use these things as schedulers and just have them spin up containers that hold the code? i feel like you get tied to a specific framework and it turns into a nightmare...
i guess the only downside is that you can stop and start parts of the code that might fail or just to run things manually? but idk if that trade off is worth it...hoping people who know what they're doing can share opinions
That is a totally valid approach, honestly one that I think Airflow excels at. A lot I see using Airflow in production are just using it to call out to other containers/services to run those jobs there, and just have Airflow as a centralized error-handling/monitoring layer on top in addition to its scheduling capabilities
Dagster is open source according to the homepage
Sorry you're right, I think it's more of an open-core since there's not much development outside of the dagster company but that's definitely up for debate!
Hey, thanks a lot for the insightful overview! And your channel is awesome for Airflow content.
I'd love to see a similar comparisons with Flyte and Kestra
Thanks Luiz! Really appreciate the love! And will put them in the schedule, thanks for the idea!
Great content... (horrid audio, was your landlady vacuuming?)
Thanks Josh! And apologies, I had facade construction going on outside my window from 8-6 the past couple months that was really screwing me up, all done now though, hopefully its better in recent videos!
genius.
Thanks man!
Love the content! Audio could be better, squeaky chair and booming background noise are a little distracting
Thanks for the tips, hope my more recent videos are more up to snuff!
I'm pretty sure Dagster is open source
It technically is but 90% of the dev work is from the on-staff Dagster team