talk-data.com talk-data.com

Vikram Koka

Speaker

Vikram Koka

2

talks

Chief Strategy Officer Astronomer

Vikram Koka is the Chief Strategy Officer at Astronomer, based in the San Francisco Bay Area. He is an experienced engineering leader with a background in distributed systems and data infrastructure. At Astronomer for six years, he led the Engineering and Open Source teams and contributed to Apache Airflow as a member of the Airflow PMC, focusing on architectural initiatives such as Scheduler High Availability, Data-Driven Scheduling, Dynamic Tasks, and the client/server architecture in Airflow 3.

Bio from: Airflow Summit 2021

Frequent Collaborators

Filtering by: Airflow Summit 2023 ×

Filter by Event / Source

Talks & appearances

Showing 2 of 9 activities

Search activities →

Over the last few years, we’ve spent countless hours talking to data engineers from everywhere from Fortune 500s to seed stage startups. In doing so, we’ve learned all about what it takes to deliver a world class Airflow service perfect for everyone. We’ve packaged all that up into The Astro Hypervisor, a new part of our platform that gives users a whole new level of control in Airflow. We’ll talk through how we’ve built this hypervisor and how our customers will be able to use it for autoscaling, tracking the health of Airflow environments and so much more.

Introduced in Airflow 2.4, Datasets are a foundational feature for authoring modular data pipelines. As DAGs grow to encompass a larger number of data sources and encompass multiple data transformation steps, they typically become less predictable in the timeliness of execution and less efficient. This talk focuses on leveraging Datasets to enable predictable and more efficient DAGs, by leveraging patterns from microservice architectures. Just as large monolithic applications were decomposed into micro-services to deliver more efficient scalability and faster development cycles, micropipelines have the same potential to radically transform data pipeline efficiency and development velocity. Using a simple financial analysis pipeline example, with data aggregation being done in Snowflake and prediction analysis in Spark, this talk outlines how to retain timelines of data pipelines while expanding data sets.