talk-data.com
People (33 results)
See all 33 →Ash Berlin-Taylor
Airflow PMC member & Director Airflow Engineering at Astronomer · Astronomer
Kaxil Naik
Airflow PMC member & Committer | Senior Director of Engineering at Astronomer
Amogh Rajesh Desai
Airflow PMC Member & Committer | Senior Software Engineer at Astronomer
Activities & events
| Title & Speakers | Event |
|---|---|
|
Join fellow Airflow enthusiasts and industry leaders at the Marriott Downtown for an evening of insightful talks, great food and drinks, and exclusive swag! This event brings together seasoned pros to explore cutting-edge strategies for managing complex DAG codebases, orchestrating large-scale data workflows, and unlocking the powerful new capabilities of Apache Airflow 3. Don't miss this opportunity to connect, learn, and level up your Airflow expertise! PRESENTATIONS Contending with Complex DAG Codebases - a Codemodding Approach
Harmonizing Music Rights at Scale: Airflow for Ownership Resolution and Fair Royalties
Unlocking the Future of Data Orchestration: Introducing Apache Airflow® 3
AGENDA
|
Mastering Apache Airflow®: Automation, Scalability & Real-World Success Stories
|
|
Airflow Summit 2024
2024-09-10 · 12:00
Get ready for Airflow Summit 2024! Join Airflow enthusiasts to learn how companies like Uber, Ford, NerdWallet, Walmart, and The Texas Rangers Baseball Club are using Airflow to transform their data orchestration. Save the date to meet with our amazing community at The Westin St. Francis, San Francisco, USA, September 10-12, 2024. Please note: You must register here in order to attend. DETAILS |
Airflow Summit 2024
|
|
Airflow Summit 2024
2024-09-10 · 07:00
Get ready for Airflow Summit 2024! Join Airflow enthusiasts to learn how companies like Uber, Ford, NerdWallet, Walmart, and The Texas Rangers Baseball Club are using Airflow to transform their data orchestration. Save the date to meet with our amazing community at The Westin St. Francis, San Francisco, USA, September 10-12, 2024. Please note: You must register here in order to attend. DETAILS |
Airflow Summit 2024
|
|
Maxime Beauchemin
– Founder & CEO
@ Preset
In the past 18 months, artificial intelligence has not just entered our workspaces – it has taken over. As we stand at the crossroads of innovation and automation, it’s time for a candid reflection on how AI has reshaped our professional lives, and to talk about where it’s been a game changer, where it’s falling short, and what’s about to shift dramatically in the short term. Since the release of ChatGPT in December 2022, I’ve developed a “first-reflex” to augment and accelerate nearly every task with AI. As a founder and CEO, this spans a wide array of responsibilities from fundraising, internal communications, legal, operations, product marketing, finance, and beyond. In this keynote, I’ll cover diverse use cases across all areas of business, offering a comprehensive view of AI’s impact. Join me as I sort out through this new reality and try and forecast the future of AI in our work. It’s time for a radical checkpoint. Everything’s changing fast. In some areas, AI has been a slam dunk; in others, it’s been frustrating as hell. And once a few key challenges are tackled, we’re on the cusp of a tsunami of transformation. 3 major milestones are right around the corner: top-human-level reasoning, solid memory accumulation and recall, and proper executive skills. How is this going to affect all of us? |
|
|
Ian Moritz
– Growth Product Manager
Airflow is often used for running data pipelines, which themselves connect with other services through the provider system. However, it is also increasingly used as an engine under-the-hood for other projects building on top of the DAG primitive. For example, Cosmos is a framework for automatically transforming dbt DAGs into Airflow DAGs, so that users can supplement the developer experience of dbt with the power of Airflow. This session dives into how a select group of these frameworks (Cosmos, Meltano, Chronon) use Airflow as an engine for orchestrating complex workflows their systems depend on. In particular, we will discuss ways that we’ve increased Airflow performance to meet application-specific demands (high-task-count Cosmos DAGs, streaming jobs in Chronon), new Airflow features that will evolve how these frameworks use Airflow under the hood (DAG versioning, dataset integrations), and paths we see these projects taking over the next few years as Airflow grows. Airflow is not just a DAG platform, it’s an application platform! |
|
|
Deepan Ignaatious
– Sr. Product Manager at DoubleCloud
With recent works in the direction of Executor Decoupling and interest in Hybrid Execution, we find it’s still quite common for Airflow users to use the old-time rule of thumbs like “Don’t Use Airflow with LocalExecutor in production”, “If your scheduler lags, split your DAGs over two separate Airflow Clusters”, and so on. In our talk, we will show a deep dive comparison between various Execution models Airflow support and hopefully update understanding of their efficiency and limitations. |
|
|
Unlocking FMOps/LLMOps with Airflow: A guide to operationalizing and managing Large Language Models
2024-07-01
Parnab Basak
– Solutions Architect
@ Amazon Web Services
In the last few years Large Language Models (LLMs) have risen to prominence as outstanding tools capable of transforming businesses. However, bringing such solutions and models to the business-as-usual operations is not an easy task. In this session, we delve into the operationalization of generative AI applications using MLOps principles, leading to the introduction of foundation model operations (FMOps) or LLM operations using Apache Airflow. We further zoom into aspects of expected people and process mindsets, new techniques for model selection and evaluation, data privacy, and model deployment. Additionally, know how you can use the prescriptive features of Apache Airflow to aid your operational journey. Whether you are building using out of the box models (open-source or proprietary), creating new foundation models from scratch, or fine-tuning an existing model, with the structured approaches described you can effectively integrate LLMs into your operations, enhancing efficiency and productivity without causing disruptions in the cloud or on-premises. |
|
|
Jeetendra Vaidya
– Amazon Web Services, Solutions Architect
,
Joseph Morotti
– Sr. Solutions Architect at AWS
,
Sriharsh Adari
– Sr. Solutions Architect at AWS
Nowadays, conversational AI is no longer exclusive to large enterprises. It has become more accessible and affordable, opening up new possibilities and business opportunities. In this session, discover how you can leverage Generative AI as your AI pair programmer to suggest DAG code and recommend entire functions in real-time, directly from your editor. Visualize how to harness the power of ML, trained on billions of lines of code, to transform natural language prompts into coding suggestions. Seamlessly cycle through lines of code, complete function suggestions, and choose to accept, reject, or edit them. Witness firsthand how Generative AI provides recommendations based on the project’s context and style conventions. The objective is to equip you with techniques that allow you to spend less time on boilerplate and repetitive code patterns, and more time on what truly matters: building exceptional orchestration software. |
|
|
Gunnar Lykins
– FanDuel Group, Data Engineering
FanDuel Group, an industry leader in sports-tech entertainment, is proud to be recognized as the #1 sports betting company in the US as of 2023 with 53.4% market share. With a workforce exceeding 4,000 employees, including over 100 data engineers, FanDuel Group is at the forefront of innovation in batch processing orchestration platforms. Currently, our platform handles over 250,000 DAG runs & executes ~3 million tasks monthly across 17 deployments. It provides a standardized framework for pipeline development, structured observability, monitoring, & alerting. It also offers automated data processing managed by an in-house team, enabling stakeholders to concentrate on core business objectives. Our batch ingestion platform is the backbone of endless use cases, facilitating the landing of data into storage at scheduled intervals, real-time ingestion of micro batches triggered by events, standardization processes, & ensuring data availability for downstream applications. Our proposed session also delves into our forward-looking tech strategy as well as addressing the expansion of orchestration diversity by integrating scheduled jobs from various domains into our robust data platform. |
|
|
Bhavesh Jaisinghani
– Data Engineering Manager at Autodesk
In today’s data-driven era, ensuring data reliability and enhancing our testing and development capabilities are paramount. Local unit testing has its merits but falls short when dealing with the volume of big data. One major challenge is running Spark jobs pre-deployment to ensure they produce expected results and handle production-level data volumes. In this talk, we will discuss how Autodesk leveraged Astronomer to improve pipeline development. We’ll explore how it addresses challenges with sensitive and large data sets that cannot be transferred to local machines or non-production environments. Additionally, we’ll cover how this approach supports over 10 engineers working simultaneously on different feature branches within the same repo. We will highlight the benefits, such as conflict-free development and testing, and eliminating concerns about data corruption when running DAGs on production Airflow servers. Join me to discover how solutions like Astronomer empower developers to work with increased efficiency and reliability. This talk is perfect for those interested in big data, cloud solutions, and innovative development practices. |
|
|
Linkedin's Continuous Deployment
2024-07-01
Keshav Tyagi
– Staff Software Engineer
@ LinkedIn
,
Rahul Gade
– Staff Software Engineer at Linkedin
LinkedIn Continuous Deployment (LCD), started with the goal of improving the deployment experience and expanding its outreach to all LinkedIn systems. LCD delivers a modern deployment UX and easy-to-customize pipelines which enables all LinkedIn applications to declare their deployment pipelines. LCD’s vision is to automate cluster provisioning, deployments and enable touchless (continuous) deployments while reducing the manual toil involved in deployments. LCD is powered by Airflow to orchestrate its deployment pipelines and automate the validation steps. For our customers Airflow is an implementation detail and we have well abstracted it out with our no-code/low code pipelines. Users describe their pipeline intent (via CLI/UI) and LCD translates the pipeline intent into Airflow DAGs. LCD pipelines are built of steps. Inorder to democratize the adoption of the LCD, we have leveraged K8sPodOperator to run steps inside the pipeline. LCD partner teams expose validation actions as containers, which LCD pipeline runs as steps. At full scale, LCD will have about 10K+ DAGs running in parallel. |
|
|
Empowering Airflow Users: A framework for performance testing and transparent resource optimization
2024-07-01
Bartosz Jankiewicz
– Engineering Manager (Cloud Composer, Google)
Apache Airflow is the backbone of countless data pipelines, but optimizing performance and resource utilization can be a challenge. This talk introduces a novel performance testing framework designed to measure, monitor, and improve the efficiency of Airflow deployments. I’ll delve into the framework’s modular architecture, showcasing how it can be tailored to various Airflow setups (Docker, Kubernetes, cloud providers). By measuring key metrics across schedulers, workers, triggers, and databases, this framework provides actionable insights to identify bottlenecks and compare performance across different versions or configurations. Attendees will learn: The motivation behind developing a standardized performance testing approach. Key design considerations and challenges in measuring performance across diverse Airflow environments. How to leverage the framework to construct test suites for different use cases (e.g., version comparison). Practical tips for interpreting performance test results and making informed decisions about resource allocation. How this framework contributes to greater transparency in Airflow release notes, empowering users with performance data. |
|
|
Howie Wang
– Member of Technical Staff at OpenAI
As organizations grow, the task of creating and managing Airflow DAGs efficiently becomes a challenge. In this talk, we will delve into innovative approaches to streamlining Airflow DAG creation using YAML. By leveraging YAML configuration, we allow users to dynamically generate Airflow DAGs without requiring Python expertise or deep knowledge of Airflow primitives. We will showcase the significant benefits of this approach, including eliminating duplicate configurations, simplifying DAG management for a large group of workflows, and ultimately enhancing productivity within large organizations. Join us to learn practical strategies to optimize workflow orchestration, reduce development overhead, and facilitate seamless collaboration across teams. |
|
|
Scaling AI Workloads with Apache Airflow
2024-07-01
Rajesh Bishundeo
– Software Development Manager for the Open-Source Airflow Team at AWS
,
Shubham Mehta
– Senior Product Manager
@ AWS Analytics
AI workloads are becoming increasingly complex, with unique requirements around data management, compute scalability, and model lifecycle management. In this session, we will explore the real-world challenges users face when operating AI at scale. Through real-world examples, we will uncover common pitfalls in areas like data versioning, reproducibility, model deployment, and monitoring. Our practical guide will highlight strategies for building robust and scalable AI platforms leveraging Airflow as the orchestration layer and AWS for its extensive AI/ML capabilities. We will showcase how users have tackled these challenges, streamlined their AI workflows, and unlocked new levels of productivity and innovation. |
|
|
Ramesh Babu
– Sr Engineering Manager at Procore
Our Idea to platformize Ingestion pipelines is driven via Airflow in the background and streamline the entire ingestion process for Self Service. With customer experience on top of it and making data ingestion fool proof as part of Analytics data team, Airflow is just complementing for our vision. |
|
|
Pankaj Singh
– Senior Software Engineer at Astronomer | Apache Airflow Committer
,
Pankaj Koti
– Software Engineer at Astronomer
Airflow, an open-source platform for orchestrating complex data workflows, is widely adopted for its flexibility and scalability. However, as workflows grow in complexity and scale, optimizing Airflow performance becomes crucial for efficient execution and resource utilization. This session delves into the importance of optimizing Airflow performance and provides strategies, techniques, and best practices to enhance workflow execution speed, reduce resource consumption, and improve system efficiency. Attendees will gain insights into identifying performance bottlenecks, fine-tuning workflow configurations, leveraging advanced features, and implementing optimization strategies to maximize pipeline throughput. Whether you’re a seasoned Airflow user or just getting started, this session equips you with the knowledge and tools needed to optimize your Airflow deployments for optimal performance and scalability. We’ll also explore topics such as DAG writing best practices, monitoring and updating Airflow configurations, and database performance optimization, covering unused indexes, missing indexes, and minimizing table and index bloat. |
|
|
Vikram Koka
– Chief Strategy Officer
@ Astronomer
,
Ash Berlin-Taylor
– Airflow PMC member & Director Airflow Engineering at Astronomer
Imagine a world where writing Airflow tasks in languages like Go, R, Julia, or maybe even Rust is not just a dream but a native capability. Say goodbye to BashOperators; welcome to the future of Airflow task execution. Here’s what you can expect to learn from this session: Multilingual Tasks: Explore how we empower DAG authors to write tasks in any language while retaining seamless access to Airflow Variables and Connections. Simplified Development and Testing: Discover how a standardized interface for task execution promises to streamline development efforts and elevate code maintainability. Enhanced Scalability and Remote Workers: Learn how enabling tasks to run on remote workers opens up possibilities for seamless deployment on diverse platforms, including Windows and remote Spark or Ray clusters. Experience the convenience of effortless deployments as we unlock new avenues for Airflow usage. Join us as we embark on an exploratory journey to shape the future of Airflow task execution. Your insights and contributions are invaluable as we refine this vision together. Let’s chart a course towards a more versatile, efficient, and accessible Airflow ecosystem. |
|
|
Gil Reich
– Data Engineer at Wix
Feeling trapped in a maze of duplicate Airflow DAG code? We were too! That’s why we embarked on a journey to build a centralized library, eliminating redundancy and unlocking delightful efficiency. Join us as we share: The struggles of managing repetitive code across DAGs Our approach to a centralized library, revealing design and implementation strategies The amazing results: reduced development time, clean code, effortless maintenance, and a framework that creates efficient and self-documenting DAGs Let’s break free from complexity and duplication, and build a brighter Airflow future together! |
|
|
Evolution of Airflow at Uber
2024-07-01
Shobhit Shah
– Staff Software Engineer at Uber
,
Sumit Maheshwari
– Tech Lead at Uber, PMC Apache Airflow
Up until a few years ago, teams at Uber used multiple data workflow systems, with some based on open source projects such as Apache Oozie, Apache Airflow, and Jenkins while others were custom built solutions written in Python and Clojure. Every user who needed to move data around had to learn about and choose from these systems, depending on the specific task they needed to accomplish. Each system required additional maintenance and operational burdens to keep it running, troubleshoot issues, fix bugs, and educate users. After this evaluation, and with the goal in mind of converging on a single workflow system capable of supporting Uber’s scale, we settled on an Airflow-based system. The Airflow-based DSL provided the best trade-off of flexibility, expressiveness, and ease of use while being accessible for our broad range of users, which includes data scientists, developers, machine learning experts, and operations employees. This talk will focus on scaling Airflow to Uber’s scale and providing a no-code seamless user experience |
|
|
Jet Mariscal
– Tech Lead
@ Cloudflare
While Airflow is widely known for orchestrating and managing workflows, particularly in the context of data engineering, data science, ML (Machine Learning), and ETL (Extract, Transform, Load) processes, its flexibility and extensibility make it a highly versatile tool suitable for a variety of use cases beyond these domains. In fact, Cloudflare has publicly shared in the past an example on how Airflow was leveraged to build a system that automates datacenter expansions. In this talk, I will share a few more of our use cases beyond traditional data engineering, demonstrating Airflow’s sophisticated capabilities for orchestrating a wide variety of complex workflows, and discussing how Airflow played a crucial role in building some of the highly successful autonomous systems at Cloudflare, from handling automated bare metal server diagnostics and recovery at scale, to Zero Touch Provisioning that is helping us accelerate the roll out of inference-optimized GPUs in 150+ cities in multiple countries globally. |
|