talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (33 results)

See all 33 →

Activities & events

Title & Speakers Event
Shubham Raj – Software Engineer , Jens Scheffler – Cluster Lead Test Execution

Are you looking to build slick, dynamic trigger forms for your DAGs? It all starts with mastering params. Params are the gold standard for adding execution options to your DAGs, allowing you to create dynamic, user-friendly trigger forms with descriptions, validation, and now, with Airflow 3, bidirectional support for conf data! In this talk, we’ll break down how to use params effectively, share best practices, and explore what’s new since the 2023 Airflow Summit talk ( https://airflowsummit.org/sessions/2023/flexible-dag-trigger-forms-aip-50/) . If you want to make DAG execution more flexible, intuitive, and powerful, this session is a must-attend!

Airflow
Airflow Summit 2025
Event Airflow Summit 2023 2023-07-01

Airflow’s KubernetesExecutor has supported multi_namespace_mode for long time. This feature is great at allowing Airflow jobs to run in different namespaces on the same Kubernetes clusters for better isolation and easier management. However, this feature requires cluster-role for the Airflow scheduler, which can create security problems or be a blocker for some users. PR https://github.com/apache/airflow/pull/28047 , which will become available in Airflow 2.6.0, resolves this issue by allowing Airflow users to specify multi_namespace_mode_namespace_list when using multi_namespace_mode, so that no cluster-role is needed and user only needs to ensure the Scheduler has permissions to certain namespaces rather than all namespaces on the Kubernetes cluster. This talk aims to help you better understand KubernetesExecutor and how to set it up in a more secure manner.

Airflow GitHub Kubernetes Cyber Security
Vikram Koka – Chief Strategy Officer @ Astronomer

Introduced in Airflow 2.4, Datasets are a foundational feature for authoring modular data pipelines. As DAGs grow to encompass a larger number of data sources and encompass multiple data transformation steps, they typically become less predictable in the timeliness of execution and less efficient. This talk focuses on leveraging Datasets to enable predictable and more efficient DAGs, by leveraging patterns from microservice architectures. Just as large monolithic applications were decomposed into micro-services to deliver more efficient scalability and faster development cycles, micropipelines have the same potential to radically transform data pipeline efficiency and development velocity. Using a simple financial analysis pipeline example, with data aggregation being done in Snowflake and prediction analysis in Spark, this talk outlines how to retain timelines of data pipelines while expanding data sets.

Airflow Snowflake Spark
Roy Noyman – Infrastructure Data Engineer at Wix.com (Data infra organziation)

Are you tired of spending hours on Airflow migrations and wondering how to make them more accessible? Would you like to be able to test your code on different Airflow versions? or are you struggling to set up a reliable local development environment? These are some of the top pain points for data engineers working with Airflow. But fear not – Wix Data Engineering has some best practices to share that will make your life easier. What will the audience learn: How does Wix Data Engineering make Airflow migrations easier and less painful. How to ensure DEs code is forward-compatible with the latest Airflow version. How to test code on different Airflow versions How to maintain a stable local environment for DEs while speeding up their dev velocity. Some more must-know framework team’s best practices.

Airflow Data Engineering
Utkarsh Sharma – Sr Software Engineer at Astronomer

Making a contribution to or becoming a committer on Airflow can be a daunting task, even for experienced Python developers and Airflow users. The sheer size and complexity of the code base may discourage potential contributors from taking the first steps. To help alleviate this issue, this session is designed to provide a better understanding of how Airflow works and build confidence in getting started. During the session, we will introduce the main components of Airflow, including the Web Server, Scheduler, and Workers. We will also cover key concepts such as DAGs, DAG-run objects, Tasks, and Task Instances. Additionally, we will explain how tasks communicate with each other using XComs, and discuss the frequency of DAG runs based on the schedule. To showcase changes in the state of various objects, we will dive into the code level and continuously share the state of the database at every important checkpoint.

Airflow Python
Zohar Donenhirsh – Director of Data Engineering in C2i Genomics , Alina Aven – Senior Data engineer in C2i-genomics

High-scale orchestration of genomic algorithms using Airflow workflows, AWS Elastic Container Service (ECS), and Docker. Genomic algorithms are highly demanding of CPU, RAM, and storage. Our data science team requires a platform to facilitate the development and validation of proprietary algorithms. The Data engineering team develops a research data platform that enables Data Scientists to publish docker images to AWS ECR and run them using Airflow DAGS that provision AWS’s ECS compute power of EC2 and Fargate. We will describe a research platform that allows our data science team to check their algorithms on ~1000 cases in parallel using airflow UI and dynamic DAG generation to utilize EC2 machines, auto-scaling groups, and ECS clusters across multiple AWS regions.

Airflow AWS Amazon EC2 Data Engineering Data Science Docker ELK
Amit Kumar – GoDaddy, Principal Software Engineer and Architect

Discover the transformation of Airflow at GoDaddy: from its initial deployment on-prem to its migration to the cloud, and finally to a Single Pane Orchestration Model. This evolution has streamlined our Data Platform and improved governance. Our experience will be beneficial for anyone seeking to optimize their Airflow implementation and simplify their orchestration processes. History and Use-cases Design, Organization decisions, and Governance: Examining the decision-making process and governance structure. Migration to Cloud:Process of transitioning Airflow from on-premises to the cloud. Data Processing engines used with Airflow for Data Processing. Challenges: Obstacles faced during and after migration and how they were overcome. *Demonstrating how Airflow can be integrated with a central Glue Catalog and Data Lake Mesh model. Single Pane Orchestration (PAAS) and custom re-usable Github Actions: Examining benefits of using a Single Pane Orchestration model Monitoring

Airflow AWS Glue Cloud Computing Data Lake GitHub
Diana Vazquez Romo – Data Engineer, Bell Canada

How to submit an Issue for community to fix To ensure a quality product the Airflow community relies on bug reports from Airflow users. Often times bug reports are incomplete or fail to include steps for observed bug to be re-created. This workshop will present an example of bug-to-issue process, namely, how to rule out non-Airflow issues and once an Airflow issue is suspected, how to submit an issue for community to see. This could also cover how the community picks up on an issue and eventually fixes it in a future release. Issue reporting is key to improving Airflow and the community will benefit on an easily digestible guide of how best to do so.

Airflow
Brent Bovenzi – Staff Frontend Engineer at Astronomer

We are continuing to modernize the Airflow UI to make it easier to manage all aspects of your DAGs. See a demo of the latest updates and improve your workflows with new tips and tricks. Then get a preview of what else will be coming soon. Followed up by Q&A for people to field their own use-cases and explore new ideas on how to improve the user experience.

Airflow
Bas Harenslak – Solutions Architect at Astronomer, author of Data Pipelines with Apache Airflow, and Airflow committer

Have you ever added a DAG file and had no clue what happened to it? You’re not alone! With default settings, Airflow can wait up to 5 minutes before processing new DAG files. In this talk, I’ll discuss the implementation of an event-based DAG parser that immediately processes changes in the DAGs folder. As a result, changes are reflected immediately in the Airflow UI. In this talk I will cover: A demonstration of the event-based DAG parser and the fast Airflow UI experience How the current DAG parser implementation and configuration works How an event-based DAG parser is implemented

Airflow
Branden West – Reddit, Software Engineer III, Data Platform , Dave Milmont – Senior Software Engineer - Reddit

We would love to speak about our experience upgrading our old airflow 1 infrastructure to airflow 2 on kubernetes and how we orchestrated the migration of approximately 1500 DAGs that were owned by multiple teams in our organization. We had some interesting challenges along the way and can speak about our solutions. Points we can talk about: Old airflow 1 infrastructure and why we decided to move to kubernetes for airflow 2. Possible migration paths we thought of and why we chose the route we did. Things we did to make the migration easier to achieve: Implementing dag factories - used some neat programmatic approaches to make a great factory interface for our users. Custom cross airflow instance dag dependency solution. DAG audits - how we programmatically determined which dags were actually still being used to reduce migration load. Problems that we faced: DAG ownership Backfilling in airflow 2 k8s DAG dependencies

Airflow Kubernetes
Syed Hussain – Software Developer - AWS

Deep dive into how AWS is developing Deferrable Operators for the Amazon Provider Package to help users realize the potential cost-savings provided by Deferrable Operators, and promote their usage.

AWS
Maciej Obuchowski – Senior Software Engineer @ Datadog

With native support for OpenLineage in Airflow, users can now observe and manage their data pipelines with ease. This talk will cover the benefits of using OpenLineage, how it is implemented in Airflow, practical examples of how to take advantage of it, and what’s in our roadmap. Whether you’re an Airflow user or provider maintainer, this session will give you the knowledge to make the most of this tool.

Airflow
Madison Swain-Bowden – Staff Data Engineer @ Automattic

As a data engineer, I’ve used Airflow extensively over the last 5 years: across 3 jobs, several different roles; for side projects, for critical infrastructure; for manually triggered jobs, for automated workflows; for IT (Ookla/Speedtest.net), for science (Allen Institute for Cell Science), for the commons (Openverse), for liberation (Orca Collective). Authoring a DAG has changed dramatically since 2018, thanks to improvements in both Airflow and the Python language. In this session, we’ll take a trip back in time to see how DAGs looked several years ago, and what the same DAGs might look like now. We’ll appreciate the many improvements that have been made towards simplifying workflow construction. I’ll also discuss the significant advancements that have been made around deploying Airflow. Lastly, I’ll give a brief overview of different use cases and ways I’ve seen Airflow leveraged.

Airflow C#/.NET Python
Ramya Pappu – Senior Data Engineer at StyleSeat , Kunal Haria – Engineering Manager, Data at StyleSeat

We will share the case study of Airflow at StyleSeat, where within a year our data grew from 2 million data points per day to 200 million. Our original solution for orchestrating this data was not enough, so we migrated to an Airflow based solution. Previous implementation Our tasks were orchestrated with hourly triggers on AWS Cloudwatch rules in their own log groups. Each task was a lambda individually defined as a task and executed python code from a docker image. As complexity increased, there were frequent downtimes and manual executions for failed tasks and their downstream dependencies.With every downtime, our business stakeholders started losing trust in Data and recovery times were longer with each downtime. We needed a modern orchestration platform which would enable our team to define and instrument complex pipelines as code, provide visibility into executions and define retry criteria on failures. Airflow was identified as a crucial & critical piece in modernizing our orchestration which would help us further onboard DBT. We wanted a managed solution and a partner who could help guide us to a successful migration.

Airflow AWS CloudWatch AWS Lambda dbt Docker Python
John Jackson – Principal Product Manager at Amazon MWAA

Amazon Managed Workflows for Apache Airflow (MWAA) was released in November 2020. Throughout MWAA’s design we held the tenets that this service would be open-source first, not forking or deviating from the project, and that the MWAA team would focus on improving Airflow for everyone—whether they run Airflow on MWAA, on AWS, or anywhere else. This talk will cover some of the design choices made to facilitate those tenets, how the organization was set up to contribute back to the community, what those contributions look like today, how we’re getting those contributions in the hands of users, and our vision for future engagement with the community.

Airflow AWS
Savin Goyal – Co-founder & CTO at Outerbounds , Ryan Delgado – Staff Software Engineer, Ramp

Airflow is a household brand in data engineering: It is readily familiar to most data engineers, quick to set up, and, as proven by millions of data pipelines powered by it since 2014, it can keep DAGs running. But with the increasing demands of ML, there is a pressing need for tools that meet data scientists where they are and address two pressing issues - improving the developer experience & minimizing operational overhead. In this talk, we discuss the problem space and the approach to solving it with Metaflow, the open-source framework we developed at Netflix, which now powers thousands of business-critical ML projects at Netflix & other companies. We wanted to provide data scientists with the best possible UX, allowing them to focus on parts they like (e.g., modeling) while providing robust solutions for the foundational infrastructure: data, compute, orchestration (using Airflow), & versioning. In this talk, we will demo our latest work that builds on top of Airflow.

AI/ML Airflow Data Engineering
Jonathan Rainer – Monzo, Data Platform Engineer , Ed Sparkes – Data engineer @ Monzo - Making money work for everyone

As a bank Monzo has seen exponential growth in active users, from 1.6 million in 2019 to 5.8 million in 2022. At the same time the number of data users and analysts has expanded from an initial team of 4 to 132. Alongside this growth, our infrastructure and tooling have had to evolve to deliver the same value at a new scale. From an Airflow installation deployed on a single monolithic instance we now deploy atop Kubernetes and have integrated our Airflow setup into the bank’s backend systems. This talk charts the story of that expansion and the growing pains we’ve faced, as well as looking to the future of our use of Airflow. We’ll first discuss how data at Monzo works, from event capture to arrival in our Data Warehouse, before assessing the challenges of our Airflow setup. We’ll then dive into the re-platforming that was required to meet our growing data needs, and some of the unique challenges that come with serving an ever growing user base and need for analysis and insight.

Airflow DWH Kubernetes
Daniel Imberman – Airflow PMC, Engineer at Astronomer, Lover of all things Airflow

As much as we love airflow, local development has been a bit of a white whale through much of its history. Until recently, Airflow’s local development experience has been hindered by the need to spin up a scheduler and webserver. In this talk, we will explore the latest innovation in Airflow local development, namely the “dag.test()” functionality introduced in Airflow 2.5. We will delve into practical applications of “dag.test()”, which empowers users to locally run and debug Airflow DAGs on a single python process. This new functionality significantly improves the development experience, enabling faster iteration and deployment. In this presentation, we will discuss: How to leverage IDE support for code completion, linting, and debugging; Techniques for inspecting and debugging DAG output, Best practices for unit testing DAGs and their underlying functions. Accessible to Airflow users of all levels, join us as we explore the future of Airflow local development!

Airflow Python