talk-data.com talk-data.com

Event

Airflow Summit 2023

2023-07-01 Airflow Summit Visit website ↗

Activities tracked

2

Airflow Summit 2023 program

Filtering by: Vikram Koka ×

Sessions & talks

Showing 1–2 of 2 · Newest first

Search within this event →

A Hypervisor for Airflow, presented by Astronomer

2023-07-01
session
Vikram Koka (Astronomer) , Viraj Parekh

Over the last few years, we’ve spent countless hours talking to data engineers from everywhere from Fortune 500s to seed stage startups. In doing so, we’ve learned all about what it takes to deliver a world class Airflow service perfect for everyone. We’ve packaged all that up into The Astro Hypervisor, a new part of our platform that gives users a whole new level of control in Airflow. We’ll talk through how we’ve built this hypervisor and how our customers will be able to use it for autoscaling, tracking the health of Airflow environments and so much more.

Micropipelines: A microservice approach for DAG authoring using datasets

2023-07-01
session
Vikram Koka (Astronomer)

Introduced in Airflow 2.4, Datasets are a foundational feature for authoring modular data pipelines. As DAGs grow to encompass a larger number of data sources and encompass multiple data transformation steps, they typically become less predictable in the timeliness of execution and less efficient. This talk focuses on leveraging Datasets to enable predictable and more efficient DAGs, by leveraging patterns from microservice architectures. Just as large monolithic applications were decomposed into micro-services to deliver more efficient scalability and faster development cycles, micropipelines have the same potential to radically transform data pipeline efficiency and development velocity. Using a simple financial analysis pipeline example, with data aggregation being done in Snowflake and prediction analysis in Spark, this talk outlines how to retain timelines of data pipelines while expanding data sets.