talk-data.com talk-data.com

Vikram Koka

Speaker

Vikram Koka

9

talks

Chief Strategy Officer Astronomer

Vikram Koka is the Chief Strategy Officer at Astronomer, based in the San Francisco Bay Area. He is an experienced engineering leader with a background in distributed systems and data infrastructure. At Astronomer for six years, he led the Engineering and Open Source teams and contributed to Apache Airflow as a member of the Airflow PMC, focusing on architectural initiatives such as Scheduler High Availability, Data-Driven Scheduling, Dynamic Tasks, and the client/server architecture in Airflow 3.

Bio from: Airflow Summit 2021

Frequent Collaborators

Filter by Event / Source

Talks & appearances

9 activities · Newest first

Search activities →

In this keynote, Peeyush Rai and Vikram Koka will be walking through how Airflow is being used as part of a Agentic AI platform servicing insurance companies, which runs on all the major public clouds, leveraging models from Open AI, Google (Gemini), AWS (Claude and Bedrock). This talk walks through the details of the actual end user business workflow including gathering relevant financial data to make a decision, as well as the tricky challenge of handling AI hallucinations, with new Airflow capabilities such as “Human in the loop”. This talk offers something for both business and technical audiences. Business users will get a clear view of what it takes to bring an AI application into production and how to align their operations and business teams with an AI enabled workflow. Meanwhile, technical users will walk away with practical insights on how to orchestrate complex business processes enabling a seamless collaboration between Airflow, AI Agents and Human in the loop.

Enterprises want the flexibility to operate across multiple clouds, whether to optimize costs, improve resiliency, to avoid vendor lock-in, or for data sovereignty. But for developers, that flexibility usually comes at the cost of extra complexity and redundant code. The goal here is simple: write once, run anywhere, with minimum boilerplate. In Apache Airflow, we’ve already begun tackling this problem with abstractions like Common-SQL, which lets you write database queries once and run them on 20+ databases, from Snowflake to Postgres to SQLite to SAP HANA. Similarly, Common-IO standardizes cloud blob storage interactions across all public clouds. With Airflow 3.0, we are pushing this further by introducing a Common Message Bus provider, which is an abstraction, initially supporting Amazon SQS and expanding to Google PubSub and Apache Kafka soon after. We expect additional implementations such as Amazon Kinesis and Managed Kafka over time. This talk will dive into why these abstractions matter, how they reduce friction for developers while giving enterprises true multi-cloud optionality, and what’s next for Airflow’s evolving provider ecosystem.

Apache Airflow® 3 is here, bringing major improvements to data orchestration. In this keynote, core Airflow contributors will walk through key enhancements that boost flexibility, efficiency, and user experience. Vikram Koka will kick things off with an overview of Airflow 3, followed by deep dives into DAG versioning (Jed Cunningham), enhanced backfilling (Daniel Standish), and a modernized UI (Brent Bovenzi & Pierre Jeambrun). Next, Ash Berlin-Taylor, Kaxil Naik, and Amogh Desai will introduce the Task Execution Interface and Task SDK, enabling tasks in any environment and language. Jens Scheffler will showcase the Edge Executor, while Constance Martineau, Tzu-ping Chung and Vincent Beck will demo event-driven scheduling and data assets. Finally, Buğra Öztürk will unveil CLI enhancements for automation and debugging. This keynote sets the stage for Airflow 3—don’t miss the chance to learn from the experts shaping the future of workflow orchestration!

Sponsored by: Astronomer | Unlocking the Future of Data Orchestration: Introducing Apache Airflow® 3

Airflow 3 is here, bringing a new era of flexibility, scalability, and security to data orchestration. This release makes building, running, and managing data pipelines easier than ever. In this session, we will cover the key benefits of Airflow 3, including: (1) Ease of Use: Airflow 3 rethinks the user experience—from an intuitive, upgraded UI to DAG Versioning and scheduler-integrated backfills that let teams manage pipelines more effectively than ever before (2) Stronger Security: By decoupling task execution from direct database connections, Airflow 3 enforces task isolation and minimal-privilege access. This meets stringent compliance standards while reducing the risk of unauthorized data exposure. (3) Ultimate Flexibility: Run tasks anywhere, anytime with remote execution and event-driven scheduling. Airflow 3 is designed for global, heterogeneous modern data environments with an architecture that facilitates edge and hybrid-cloud to GPU-based deployments.

Imagine a world where writing Airflow tasks in languages like Go, R, Julia, or maybe even Rust is not just a dream but a native capability. Say goodbye to BashOperators; welcome to the future of Airflow task execution. Here’s what you can expect to learn from this session: Multilingual Tasks: Explore how we empower DAG authors to write tasks in any language while retaining seamless access to Airflow Variables and Connections. Simplified Development and Testing: Discover how a standardized interface for task execution promises to streamline development efforts and elevate code maintainability. Enhanced Scalability and Remote Workers: Learn how enabling tasks to run on remote workers opens up possibilities for seamless deployment on diverse platforms, including Windows and remote Spark or Ray clusters. Experience the convenience of effortless deployments as we unlock new avenues for Airflow usage. Join us as we embark on an exploratory journey to shape the future of Airflow task execution. Your insights and contributions are invaluable as we refine this vision together. Let’s chart a course towards a more versatile, efficient, and accessible Airflow ecosystem.

Apache Airflow has emerged as the defacto standard for data orchestration. Over the last couple of years, Airflow has also seen increasing adoption for ML and AI use cases. It has been almost four years since the release of Airflow 2 and as a community we have agreed that it’s time for a major foundational release in the form of Airflow 3. This talk will introduce the vision behind Airflow 3, including the emerging technology trends in the industry and how Airflow will evolve in response. Specifically, this will include an overview of the architectural changes in Airflow to support emerging use cases and distributed data infrastructure models. This talk will also introduce the major features and the desired outcomes of the release. Airflow 3 will be a foundational release and therefore this talk will similarly introduce the new concepts being introduced as part of Airflow 3, which may be fully realized in follow-on 3.x releases. The goal of this talk is to raise awareness about Airflow 3 and to get feedback from the Airflow community while the release is still in the development phase.

Over the last few years, we’ve spent countless hours talking to data engineers from everywhere from Fortune 500s to seed stage startups. In doing so, we’ve learned all about what it takes to deliver a world class Airflow service perfect for everyone. We’ve packaged all that up into The Astro Hypervisor, a new part of our platform that gives users a whole new level of control in Airflow. We’ll talk through how we’ve built this hypervisor and how our customers will be able to use it for autoscaling, tracking the health of Airflow environments and so much more.

Introduced in Airflow 2.4, Datasets are a foundational feature for authoring modular data pipelines. As DAGs grow to encompass a larger number of data sources and encompass multiple data transformation steps, they typically become less predictable in the timeliness of execution and less efficient. This talk focuses on leveraging Datasets to enable predictable and more efficient DAGs, by leveraging patterns from microservice architectures. Just as large monolithic applications were decomposed into micro-services to deliver more efficient scalability and faster development cycles, micropipelines have the same potential to radically transform data pipeline efficiency and development velocity. Using a simple financial analysis pipeline example, with data aggregation being done in Snowflake and prediction analysis in Spark, this talk outlines how to retain timelines of data pipelines while expanding data sets.

In Apache Airflow, Xcom is the default mechanism for passing data between tasks in a DAG. In practice, this has been restricted to small data elements, since the Xcom data is persisted in the Airflow metadatabase and is constrained by database and performance limitations. With the new TaskFlow API introduced in Airflow 2.0, it is seamless to pass data between tasks and the use of Xcom is invisible. However, the ability to pass data is restricted to a relatively small set of data types which can be natively converted in JSON. This tutorial describes how to go beyond these limitations by developing and deploying a Custom Xcom backend within Airflow to enable the sharing of large and varied data elements such as Pandas data frames between tasks in a data pipeline, using a cloud storage such as Google Storage or Amazon S3.