talk-data.com

Topic

AWS

Amazon Web Services (AWS)

cloud cloud provider infrastructure services

Activities

tagged

Activity Trend

190 peak/qtr

2020-Q1 2026-Q1

Top Events

AWS re:Invent 2024 341 Data Engineering Podcast 56 O'Reilly Data Engineering Books 54 SaaS Scaled - Interviews about SaaS Startups, Analytics, & Operations 35 Databricks DATA + AI Summit 2023 32 Data + AI Summit 2025 31 O'Reilly Data Science Books 15 Airflow Summit 2025 11 Airflow Summit 2023 7 DataFramed 7 Airflow Summit 2024 6 DATA MINER Big Data Europe Conference 2020 6

Top Speakers

Tobias Macey 56 Marina Novikova (AWS) 23 Josh Kodroff (Pulumi) 17 AWS-Certified Expert (AWS) 14 Engin Diri (Pulumi) 11 Diana Esteves (Pulumi) 11 Richie (DataCamp) 5 IBM 4 Rajesh Bishundeo 4 Ringo De Smet (Pulumi) 4 Monique Femme (PUCRS) 4 Noritaka Sekiyama (Amazon Web Services (AWS)) 3

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Airflow Summit 2024 ×

Activating operational metadata with Airflow, Atlan and OpenLineage

2024-07-01 · Airflow Summit 2024

session

by Kacper Muda

Airflow Azure Data Collection dbt GCP Python Spark SQL

OpenLineage is an open standard for lineage data collection, integrated into the Airflow codebase, facilitating lineage collection across providers like Google, Amazon, and more. Atlan Data Catalog is a 3rd generation active metadata platform that is a single source of trust unifying cataloging, data discovery, lineage, and governance experience. We will demonstrate what OpenLineage is and how, with minimal and intuitive setup across Airlfow and Atlan, it presents unified workflows view, efficient cross-platform lineage collection, including column level, in various technologies (Python, Spark, dbt, SQL etc.) and clouds (AWS, Azure, GCP, etc.) - all orchestrated by Airflow. This integration enables further use case unlocks on automated metadata management by making the operational pipelines dataset-aware for self-service exploration. It also will demonstrate real world challenges and resolutions for lineage consumers in improving audit and compliance accuracy through column-level lineage traceability across the data estate. The talk will also briefly overview the most recent OpenLineage developments and planned future enhancements.

Elevating Machine Learning Deployment: Unleashing the Power of Airflow in Wix's ML Platform

2024-07-01 · Airflow Summit 2024

session

by Elad Yaniv

AI/ML Airflow API Data Science Amazon SageMaker

In his presentation, Elad will provide a novel take on Airflow, highlighting its versatility beyond conventional use for scheduled pipelines. He’ll discuss its application as an on-demand tool for initiating and halting jobs, mainly in the Data Science fields, like dataset enrichment and batch prediction via API calls, complete with real-time status tracking and alerts. The talk aims to encourage a fresh approach to Airflow utilization but will also delve into the technical aspects of implementing DAG triggering and cancellation logic. What will the audience learn: Real-life use case of leveraging Airflow capabilities beyond traditional pipeline scheduling, with innovative integration as the infrastructure for ML Platform. Trigger on-demand DAGs through API. Cancel running DAGs. Demonstration of an end-to-end ML pipeline utilizing AWS Sagemaker for batch predictions. Some more Airflow best practices. Join us to learn from Wix’s experience and best practices!

Growing with Apache Airflow: A Providers Journey

2024-07-01 · Airflow Summit 2024

session

by Rajesh Bishundeo

Airflow

It has been nearly 4 years since the launch of Managed Workflows for Apache Airflow (MWAA) by AWS. It has gone through the trials and tribulations as with any new idea, working with customers to better understand its shortcomings, building dedicated teams focused on scaling and growth, and at its core, preserving the integrity and functionality of Apache Airflow. Initially launched with Airflow 1.10, MWAA is now available globally in multiple AWS regions supporting the latest version of Airflow along with a multitude of features. In this talk, we will cover a bit of that history along with debunking a few myths surrounding the critical needs for users today. From compliance requirements, larger environments, observability, and pricing, we will discuss how MWAA has evolved and continues to grow through its focus on customer value and more importantly, its dedication to the Apache Airflow community.

Hello Quality: Building CIs to run Providers Packages System Tests

2024-07-01 · Airflow Summit 2024

session

by Freddy Demiane , Rahul Vats , Dennis Ferruzzi

Airflow Astronomer CI/CD

Airflow operators are a core feature of Apache Airflow and it’s extremely important that we maintain high quality of operators, prevent regressions and on the other hand we help developers with automated tests results to double check if introduced changes don’t cause regressions or backward incompatible changes and we provide Airflow release managers with information whether a given version of a provider should be released or not yet. Recently a new approach to assuring production quality was implemented for AWS, Google and Astronomer-provided operators - standalone Continuous Integration processes were configured for them and test results dashboards show the results of the last test runs. What has been working well for these operator providers might be a pattern to follow for others - during this presentation, AWS, Google and Astronomer engineers are going to share the information about the internals of Test Dashboards implemented for AWS, Google and Astronomer-provided operators. This approach might be a a ‘blueprint’ to follow for other providers.

Scaling AI Workloads with Apache Airflow

2024-07-01 · Airflow Summit 2024

session

by Rajesh Bishundeo , Shubham Mehta (AWS Analytics)

AI/ML Airflow Data Management

AI workloads are becoming increasingly complex, with unique requirements around data management, compute scalability, and model lifecycle management. In this session, we will explore the real-world challenges users face when operating AI at scale. Through real-world examples, we will uncover common pitfalls in areas like data versioning, reproducibility, model deployment, and monitoring. Our practical guide will highlight strategies for building robust and scalable AI platforms leveraging Airflow as the orchestration layer and AWS for its extensive AI/ML capabilities. We will showcase how users have tackled these challenges, streamlined their AI workflows, and unlocked new levels of productivity and innovation.

Simplified user management in Airflow

2024-07-01 · Airflow Summit 2024

session

by Vincent Beck

Airflow

Before Airflow 2.9, user management was part of core Airflow, therefore modifying it or customizing it to fit user needs was not an easy process. Authentication and authorization managers (auth managers), is a new concept introduced in Airflow 2.9. It was introduced as extensible user management (AIP-56), allowing Airflow users to have a flexible way to integrate with organization’s identity services. Organizations want a single place to manage permissions and FAB (Flask App Builder) made it difficult to achieve. In this talk, after explaining the concept of auth managers and why we built this, we will show you how you can leverage the new auth manager interface to build an authorization service for Airflow based on your existing identity provider. We will see that auth managers can be leveraged to change considerably how users and their permissions are managed in an Airflow environment. Finally, we will dive deep into the AWS auth manager as an alternative auth manager and see some different usages as examples.