Apache Airflow
Apache Airflow
  • 294
  • 942 042
Managing version upgrades without feelings of terror
Presented by Daniel Standish at Airflow Summit 2024.
Airflow version upgrades can be challenging. Maybe you upgrade and your dags fail to parse (that’s an easy fix). Or maybe you upgrade and everything looks fine, but when your dag runs, you can no longer connect to mysql because the TLS version changed. In this talk I will provide concrete strategies that users can put into practice to make version upgrades safer and less painful.
Topics may include:
- What semver means and what it implies for the upgrade process
- Using integration test dags, unit tests, and a test cluster to smoke out problems
- Strategies around constraints files / pinning, and managing providers vs core versions
- Using db clean prior to upgrade to reduce table size
- Rollback strategies
- What to do about warnings (e.g. deprecation warnings)?
I’ll also focus on keeping it simple. Sometimes things like “integration tests” and “CI” can be scary for people. Even without having set up anything automated, there are still things you can do to make management of upgrades a little less painful and risky.
Переглядів: 68

Відео

dbt core & Airflow 101: Building data pipelines demystified
Переглядів 1292 години тому
Presented by Luan Moreno Medeiros at Airflow Summit 2024. dbt became the de facto for data teams building reliable and trustworthy SQL code leveraging a modern data stack architecture. The dbt logic needs to be orchestrated, and jobs scheduled to meet business expectations. That’s where Airflow comes into play. In this quick introduction session, you’ll gonna learn: - How to leverage dbt-Core &...
Data Orchestration for Emerging Technology Analysis
Переглядів 492 години тому
Presented by Jennifer Melot at Airflow Summit 2024. The Center for Security and Emerging Technology is a think tank at Georgetown University that studies security implications of emerging technologies, including data-driven analyses across bibliometric, patenting, and investment datasets. This talk will describe CSET’s data infrastructure which uses Airflow to orchestrate data ingestion, model ...
Streamlining a Mortgage ETL Pipeline with Apache Airflow
Переглядів 612 години тому
Presented by Zhang Zhang & Jenny Gao at Airflow Summit 2024. At Bloomberg, it is our team’s responsibility to ensure the timely delivery to our clients worldwide of a vast dataset comprising approximately 5 billion data points on roughly 50 million loans and over 1.4 million securities, disclosed twice a month by three major government-sponsored mortgage entities. Ingesting this data so we can ...
Airflow UI Roadmap
Переглядів 1342 години тому
Presented by Brent Bovenzi at Airflow Summit 2024. Soon we will finally switch to a 100% React UI with a full separation between the API and UI as well. While we are doing such a big change, let’s also take the opportunity to imagine whole new interfaces vs just simply modernizing the existing views. How can we use design to help you better understand what is going on with your DAG? Come listen...
Running Airflow Tasks Anywhere, in any Language
Переглядів 432 години тому
Presented by Ash Berlin-Taylor & Vikram Koka at Airflow Summit 2024. Imagine a world where writing Airflow tasks in languages like Go, R, Julia, or maybe even Rust is not just a dream but a native capability. Say goodbye to BashOperators; welcome to the future of Airflow task execution. Here’s what you can expect to learn from this session: - Multilingual Tasks: Explore how we empower DAG autho...
Unleash the Power of AI: Streamlining Airflow DAG Development with AI-Driven Automation
Переглядів 312 години тому
Presented by Sriharsh Adari, Jeetendra Vaidya & Joseph Morotti at Airflow Summit 2024. Nowadays, conversational AI is no longer exclusive to large enterprises. It has become more accessible and affordable, opening up new possibilities and business opportunities. In this session, discover how you can leverage Generative AI as your AI pair programmer to suggest DAG code and recommend entire funct...
Building on Cosmos: Making dbt on Airflow Easy
Переглядів 542 години тому
Presented by Lewis Macdonald & Ethan Stone at Airflow Summit 2024. Balyasny Asset Management (BAM) is a diversified global investment firm founded in 2001 with over $20 billion in assets under management. As dbt took hold at BAM, we had multiple teams building dbt projects against Snowflake, Redshift, and SQL Server. The common question was: How can we quickly and easily productionise our proje...
Evolution of Airflow at Uber
Переглядів 412 години тому
Presented by Shobhit Shah & Sumit Maheshwari at Airflow Summit 2024. Up until a few years ago, teams at Uber used multiple data workflow systems, with some based on open source projects such as Apache Oozie, Apache Airflow, and Jenkins while others were custom built solutions written in Python and Clojure. Every user who needed to move data around had to learn about and choose from these system...
Airflow, Spark, and LLMs: Turbocharging MLOps at ASAPP
Переглядів 522 години тому
Presented by Udit Saxena at Airflow Summit 2024. This talk explores ASAPP’s use of Apache Airflow to streamline and optimize our machine learning operations (MLOps). Key highlights include: * Integrating with our custom Spark solution for achieving speedup, efficiency, and cost gains for generative AI transcription, summarization and intent categorization pipelines * Different design patterns o...
Optimizing Airflow Performance: Strategies, Techniques, and Best Practices
Переглядів 462 години тому
Presented by Pankaj Singh & Pankaj Koti at Airflow Summit 2024. Airflow is widely adopted for its flexibility and scalability. However, as workflows grow in complexity and scale, optimizing Airflow performance becomes crucial for efficient execution and resource utilization. This session delves into the importance of optimizing Airflow performance and provides strategies, techniques, and best p...
Automated Testing and Deployment of DAGs
Переглядів 552 години тому
Presented by Austin Bennett at Airflow Summit 2024. DAG integrity is critical. So are coding conventions, consistency in standards for the group. In this talk, we share the various lessons learned for testing/verifying our DAGs as part of our GitHub workflows [ for testing as part of the pull request process, and for automated deployment - eventually to production - once merged ]. We dig into h...
Orchestrating & Optimizing a Batch Ingestion Data Platform for Americas #1 Sportsbook
Переглядів 1617 годин тому
Presented by Gunnar Lykins at Airflow Summit 2024. FanDuel Group, an industry leader in sports-tech entertainment, is proud to be recognized as the #1 sports betting company in the US as of 2023 with 53.4% market share. With a workforce exceeding 4,000 employees, including over 100 data engineers, FanDuel Group is at the forefront of innovation in batch processing orchestration platforms. Curre...
Building Reliable Data Products
Переглядів 1317 годин тому
Presented by Astronomer at Airflow Summit 2024. Data engineers have shifted from delivering data for internal analytics applications to customer-facing data products. And with that shift comes a whole new level of operational rigor necessary to instill trust and confidence in the data. How do you hold data pipelines to the same standards as traditional software applications? Can you apply princ...
Airflow Path to Industry Orchestration Standard
Переглядів 567 годин тому
This talk was presented by Google at Airflow Summit 2024. In the realm of data engineering, machine learning pipelines and using cloud and web services there is a huge demand for orchestration technologies. Apache Airflow belongs to the most popular orchestration technologies or even is the most popular one. In this presentation we are going to focus these aspects of Airflow that make it so pop...
Product Management perspective on Data Observability with Databand
Переглядів 507 годин тому
Product Management perspective on Data Observability with Databand
Airflow and Control-M: Where Data Pipelines Meet Business Applications in Production
Переглядів 597 годин тому
Airflow and Control-M: Where Data Pipelines Meet Business Applications in Production
Airflow at Burns & McDonnell: Orchestration from 0 to 100 - Airflow Summit 2024
Переглядів 1367 годин тому
Airflow at Burns & McDonnell: Orchestration from 0 to 100 - Airflow Summit 2024
Comparing Airflow Executors and Custom Environments
Переглядів 487 годин тому
Comparing Airflow Executors and Custom Environments
Boost Airflow Monitoring and Alerting with Automation Analytics & Intelligence
Переглядів 617 годин тому
Boost Airflow Monitoring and Alerting with Automation Analytics & Intelligence
Growing with Apache Airflow: A provider's journey
Переглядів 507 годин тому
Growing with Apache Airflow: A provider's journey
Optimizing Airflow Performance: Strategies, Techniques, and Best Practices
Переглядів 1769 годин тому
Optimizing Airflow Performance: Strategies, Techniques, and Best Practices
Airflow Datasets and Pub/Sub for Dynamic DAG Triggering - Airflow Summit 2024
Переглядів 2039 годин тому
Airflow Datasets and Pub/Sub for Dynamic DAG Triggering - Airflow Summit 2024
From Tech Specs to Business Impact: How to Design A Truly End-to-End Airflow Summit 2024
Переглядів 779 годин тому
From Tech Specs to Business Impact: How to Design A Truly End-to-End Airflow Summit 2024
Activating operational metadata with Airflow, Atlan and OpenLineage - Airflow Summit 2024
Переглядів 1269 годин тому
Activating operational metadata with Airflow, Atlan and OpenLineage - Airflow Summit 2024
Optimize Your DAGs: Embrace Dag Params for Efficiency and Simplicity - Airflow Summit 2024
Переглядів 1179 годин тому
Optimize Your DAGs: Embrace Dag Params for Efficiency and Simplicity - Airflow Summit 2024
Event-driven Data Pipelines with Apache Airflow - Airflow Summit 2024
Переглядів 7129 годин тому
Event-driven Data Pipelines with Apache Airflow - Airflow Summit 2024
LLMs for Software Development & Apache Airflow
Переглядів 4149 годин тому
LLMs for Software Development & Apache Airflow
Winning Strategies: Powering a World Series Victory with Airflow Orchestration - Airflow Summit 2024
Переглядів 909 годин тому
Winning Strategies: Powering a World Series Victory with Airflow Orchestration - Airflow Summit 2024
Airflow 3 Roadmap Discussion
Переглядів 32012 годин тому
Airflow 3 Roadmap Discussion

КОМЕНТАРІ

  • @MarcLamberti
    @MarcLamberti 2 дні тому

    Another amazing presentation!

  • @whemmakatatt5311
    @whemmakatatt5311 3 дні тому

    A bit unfortunate this content isnt liked by the youtube algo that much. Pay up or something xD Or make shock faces for thumbnails editions 😂 Great content! I feel lucky to have found this🎉

  • @DavidManouchehri
    @DavidManouchehri 10 днів тому

    One of the best production summaries I’ve seen, thanks!

  • @thumbox
    @thumbox 18 днів тому

    "it's easy to point out an issue, but better to provide a solution". This is absolutely true. It's painful when we need to interpret what the reviewers want to say.

  • @andyvandenberghe6364
    @andyvandenberghe6364 20 днів тому

    Is the usecase for reverse ETL still the same in 2024 ?

  • @Eriddoch
    @Eriddoch Місяць тому

    Airflow is *terrible* for data scientists. And Metaflow is *amazing* for DS. But running 2 tools becomes a lot to manage (or pay for). The fact that you can get the DS DevEx of Metaflow, but run on top of Airflow is FREAKING AWESOME!

  • @rembautimes8808
    @rembautimes8808 Місяць тому

    Very innovative feature, customers will definitely appreciate it

  • @jacobogonzalezvargas9924
    @jacobogonzalezvargas9924 2 місяці тому

  • @hasancemreok2597
    @hasancemreok2597 2 місяці тому

    You really don’t want us to use custom transformations on AirByte, you put DBT to video’s title, you put it into the slide as a seperate page but you just use one little sentence about it in whole video. There’s nothing about what did you transform? How did you transform? Interesting. Btw, the video might be 2 years old but I have feelings quite new.

  • @avinashchavhanwebadict
    @avinashchavhanwebadict 2 місяці тому

    How skip the task group

  • @RickyZhang-p7k
    @RickyZhang-p7k 3 місяці тому

    There is no technical detail in the presentation.

  • @nikolaibarinov8660
    @nikolaibarinov8660 4 місяці тому

    Why does a Kubernetes Pod operator require pairs of nodes? Pods != nodes

  • @sanjana2584
    @sanjana2584 4 місяці тому

    hi i followed the first step and second step but in the upgrade command it was taking so much time,

  • @edithpuclla6188
    @edithpuclla6188 4 місяці тому

    bravoo!!!

  • @talshor5198
    @talshor5198 5 місяців тому

    Excellent presentation, thank you!

  • @fraternitas5117
    @fraternitas5117 5 місяців тому

    make a sandwich.

  • @samiStarh
    @samiStarh 5 місяців тому

    Impressionnant ! Bravo !

  • @archanareddy651
    @archanareddy651 5 місяців тому

    How do we upgrade from 2.2.2 and latest is 2.7.2

  • @yemmey4ever
    @yemmey4ever 6 місяців тому

    Thank you for all you do for the community! All of you!!👏👏👏

  • @SaraH-fd4si
    @SaraH-fd4si 6 місяців тому

    anyone knows how he did the interactive filtering?

  • @alfahatasi
    @alfahatasi 7 місяців тому

    I want to mask some data. How do you do this via superset?

  • @basamahmad2464
    @basamahmad2464 7 місяців тому

    Can you install Superset on Windows Server 2019?

  • @lucasbraga2649
    @lucasbraga2649 7 місяців тому

    I'm trying to install pip install etsy-dagtest with a new virtual environment in Python 3.9.12 and it's not working. Any ideas on how to solve it?

    • @DodaGarcia
      @DodaGarcia 5 місяців тому

      From what I understand this is currently an internal tool, they're just showcasing the workflow.

  • @MorbidRotten
    @MorbidRotten 7 місяців тому

    Nice content! Could you please provide the source code?

  • @tas9676
    @tas9676 7 місяців тому

    What a gem! Thanks for sharing!!!!

  • @nataliamora8344
    @nataliamora8344 7 місяців тому

    Has the project been published anywhere?

  • @MinhPhamCong-t5c
    @MinhPhamCong-t5c 7 місяців тому

    Thank you!!!!

  • @danielbartley516
    @danielbartley516 8 місяців тому

    On one level, this very cool. On another, since you can’t install the proprietary libraries this video gets your hopes up and then disappoints you.

  • @oklander1
    @oklander1 8 місяців тому

    Zohar and Alina, great job!!!

  • @danielbartley516
    @danielbartley516 8 місяців тому

    So many foot guns

  • @samplebricks234
    @samplebricks234 9 місяців тому

    Airflowctl doesn‘t work on win10, does it?

  • @digitallworld
    @digitallworld 10 місяців тому

    very helpful . thanks.

  • @ozzyoz6824
    @ozzyoz6824 10 місяців тому

    Thank you!

  • @caseygarrison6733
    @caseygarrison6733 10 місяців тому

    'Promo sm'

  • @edithpuclla6188
    @edithpuclla6188 11 місяців тому

    Bravooo!!

  • @thomasgremm6127
    @thomasgremm6127 Рік тому

    Please consider OpenShift/RedHat certified Airflow Operator.

  • @martin77813
    @martin77813 Рік тому

    +1, I once created some operators (i.e., ExtractOperator) with the idea of leveraging connection/hook. So we could extract data from various type of DB with the same operator but different connection name and much lesser operators to maintain in house.

  • @thienthai1053
    @thienthai1053 Рік тому

    Can you share source code of your airflow_client ?

  • @mikekenneth2339
    @mikekenneth2339 Рік тому

    Amazing tips cost-saving & performance boost and well presented.

  • @logulokesh2651
    @logulokesh2651 Рік тому

    Why do we need Airflow in your demo ? when scheduling can be done airbyte itself ?

  • @cfhel1
    @cfhel1 Рік тому

    Sound is very poor :(

  • @JeanBzh
    @JeanBzh Рік тому

    Very good presentation, in fact that's the next topic I'll cover with our data engineering team, I'll send them a link to this video beforehand. On the topic of not wasting money, in most use cases, EMR Serverless might really not be cost effective. It's like Glue jobs, depending on your usage, it might cost you several *times* more than EMR on EKS for instance. But I digress, great video, thank you :)

  • @pedresnyman9445
    @pedresnyman9445 Рік тому

    SUPER COOL!

  • @HarishPillay
    @HarishPillay Рік тому

    Good to hear you Rich. Looking forward to seeing you in person soon.

  • @andriifadieiev9757
    @andriifadieiev9757 Рік тому

    Amazing presentation!

  • @MarcLamberti
    @MarcLamberti Рік тому

    THIS PRESENTATION ❤❤❤

  • @MarcLamberti
    @MarcLamberti Рік тому

    This talk ❤

  • @SruthiThorati
    @SruthiThorati Рік тому

    how to programmatically add a db connection instead of using UI?