talk-data.com talk-data.com

Topic

Astronomer

airflow data_orchestration cloud

4

tagged

Activity Trend

9 peak/qtr
2020-Q1 2026-Q1

Activities

4 activities · Newest first

Sponsored by: Astronomer | Scaling Data Teams for the Future

The role of data teams and data engineers is evolving. No longer just pipeline builders or dashboard creators, today’s data teams must evolve to drive business strategy, enable automation, and scale with growing demands. Best practices seen in the software engineering world (Agile development, CI/CD, and Infrastructure-as-code) from the DevOps movement are gradually making their way into data engineering. We believe these changes have led to the rise of DataOps and a new wave of best practices that will transform the discipline of data engineering. But how do you transform a reactive team into a proactive force for innovation? We’ll explore the key principles for building a resilient, high-impact data team—from structuring for collaboration, testing, automation, to leveraging modern orchestration tools. Whether you’re leading a team or looking to future-proof your career, you’ll walk away with actionable insights on how to stay ahead in the rapidly changing data landscape.

Sponsored by: Astronomer | Unlocking the Future of Data Orchestration: Introducing Apache Airflow® 3

Airflow 3 is here, bringing a new era of flexibility, scalability, and security to data orchestration. This release makes building, running, and managing data pipelines easier than ever. In this session, we will cover the key benefits of Airflow 3, including: (1) Ease of Use: Airflow 3 rethinks the user experience—from an intuitive, upgraded UI to DAG Versioning and scheduler-integrated backfills that let teams manage pipelines more effectively than ever before (2) Stronger Security: By decoupling task execution from direct database connections, Airflow 3 enforces task isolation and minimal-privilege access. This meets stringent compliance standards while reducing the risk of unauthorized data exposure. (3) Ultimate Flexibility: Run tasks anywhere, anytime with remote execution and event-driven scheduling. Airflow 3 is designed for global, heterogeneous modern data environments with an architecture that facilitates edge and hybrid-cloud to GPU-based deployments.

Scaling AI workloads with Ray & Airflow

Ray is an open-source framework for scaling Python applications, particularly machine learning and AI workloads. It provides the layer for parallel processing and distributed computing. Many large language models (LLMs), including OpenAI's GPT models, are trained using Ray.

On the other hand, Apache Airflow is a consolidated data orchestration framework downloaded more than 20 million times monthly.

This talk presents the Airflow Ray provider package that allows users to interact with Ray from an Airflow workflow. In this talk, I'll show how to use the package to create Ray clusters and how Airflow can trigger Ray pipelines in those clusters.

Ten years of building open source standards: From Parquet to Arrow to OpenLineage | Astronomer

ABOUT THE TALK: Over the last decade I have been lucky enough to contribute a few successful open source projects to the data ecosystem. In this talk

Julien Le Dem shares the story of his contribution to successful open source projects to the data ecosystem and what made their success possible. From the ideation process and early growth of the Apache Parquet columnar format and how this led to the creation of its in-memory alter-ego Apache Arrow. Julian will end with showing how this experience enabled the success of OpenLineage, an LFAI & Data project that brings observability to the data ecosystem.

ABOUT THE SPEAKER: Julien Le Dem is the Chief Architect of Astronomer and Co-Founder of Datakin. He co-created Apache Parquet and is involved in several open source projects including OpenLineage, Marquez (LFAI&Data), Apache Arrow, Apache Iceberg and a few others. Previously, he was a senior principal at Wework; principal architect at Dremio; and tech lead for Twitter’s data processing tools and principal engineer working on content platforms at Yahoo, where he received his Hadoop initiation.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/