talk-data.com

Speaker

Daniel Imberman

Activities

talks

Airflow PMC, Engineer at Astronomer, Lover of all things Airflow

Filter by Event / Source

Airflow Summit 2020 2 Airflow Summit 2021 1 Airflow Summit 2023 1 Airflow Summit 2022 1

Talks & appearances

5 activities · Newest first

Search activities →

To Debug a DAG: The Airflow local dev story

2023-07-01 · Airflow Summit 2023

session

Airflow Python

As much as we love airflow, local development has been a bit of a white whale through much of its history. Until recently, Airflow’s local development experience has been hindered by the need to spin up a scheduler and webserver. In this talk, we will explore the latest innovation in Airflow local development, namely the “dag.test()” functionality introduced in Airflow 2.5. We will delve into practical applications of “dag.test()”, which empowers users to locally run and debug Airflow DAGs on a single python process. This new functionality significantly improves the development experience, enabling faster iteration and deployment. In this presentation, we will discuss: How to leverage IDE support for code completion, linting, and debugging; Techniques for inspecting and debugging DAG output, Best practices for unit testing DAGs and their underlying functions. Accessible to Airflow users of all levels, join us as we explore the future of Airflow local development!

Introducing Astro Python SDK: The next generation of DAG authoring

2022-07-01 · Airflow Summit 2022

session

Airflow Astronomer BigQuery postgresql Python S3

Imagine if you could chain together SQL models using nothing but python, write functions that treat Snowflake tables like dataframes and dataframes like SQL tables. Imagine if you could write a SQL airflow DAG using only python or without using any python at all. With Astro SDK, we at Astronomer have gone back to the drawing board around fundamental questions of what DAG writing could look like. Our goal is to empower Data Engineers, Data Scientists, and even the Business Analysts to write Airflow DAGs with code that reflects the data movement, instead of the system configuration. Astro will allow each group to focus on producing value in their respective fields with minimal knowledge of Airflow and high amounts of flexibility between SQL or python-based systems. This is way beyond just a new way of writing DAGs. This is a universal agnostic data transfer system. Users can run the exact same code against different databases (snowflake, bigquery, etc.) and datastores (GCS, S3, etc.) with no changes except to the connection IDs. Users will be able to promote a SQL flow from their dev postgres to their prod snowflake with a single variable change. We are ecstatic to reveal over eight months of work around building a new open-source project that will significantly improve your DAG authoring experience!

Apache Airflow and Ray: Orchestrating ML at Scale

2021-07-01 · Airflow Summit 2021

session

AI/ML Airflow Pandas TensorFlow

As the Apache Airflow project grows, we seek both ways to incorporate rising technologies and novel ways to expose them to our users. Ray is one of the fastest-growing distributed computation systems on the market today. In this talk, we will introduce the Ray decorator and Ray backend. These features, built with the help of the Ray maintainers at Anyscale, will allow Data Scientists to natively integrate their distributed pandas, XGBoost, and TensorFlow jobs to their airflow pipelines with a single decorator. By merging the orchestration of Airflow and the distributed computation of Ray, this coordination of technologies opens Airflow users to a whole host of new possibilities when designing their pipelines.

Keynote: Future of Airflow

2020-07-01 · Airflow Summit 2020

session

with Kamil Bregula , Jarek Potiuk (Apache Software Foundation) , Kaxil Naik , Daniel Imberman , Ash Berlin-Taylor (Astronomer) , Tomasz Urbaszek

Airflow

A team of core committers explain what is coming to Airflow 2.0.

Machine Learning with Apache Airflow

2020-07-01 · Airflow Summit 2020

session

AI/ML Airflow Cloud Computing Data Science GCP Cloud Functions

This talk discusses how to build an Airflow based data platform that can take advantage of popular ML tools (Jupyter, Tensorflow, Spark) while creating an easy-to-manage/monitor As the field of data science grows in popularity, companies find themselves in need of a single common language that can connect their data science teams and data infrastructure teams. Data scientists want rapid iteration, infrastructure engineers want monitoring and security controls, and product owners want their solutions deployed in time for quarterly reports. This talk will discuss how to build an Airflow based data platform that can take advantage of popular ML tools (Jupyter, Tensorflow, Spark) while creating an easy-to-manage/monitor ecosystem for data infrastructure and support team. In this talk, we will take an idea from a single-machine Jupyter Notebook to a cross-service Spark + Tensorflow pipeline, to a canary tested, production-ready model served on Google Cloud Functions. We will show how Apache Airflow can connect all layers of a data team to deliver rapid results.