In Apache Airflow, Xcom is the default mechanism for passing data between tasks in a DAG. In practice, this has been restricted to small data elements, since the Xcom data is persisted in the Airflow metadatabase and is constrained by database and performance limitations. With the new TaskFlow API introduced in Airflow 2.0, it is seamless to pass data between tasks and the use of Xcom is invisible. However, the ability to pass data is restricted to a relatively small set of data types which can be natively converted in JSON. This tutorial describes how to go beyond these limitations by developing and deploying a Custom Xcom backend within Airflow to enable the sharing of large and varied data elements such as Pandas data frames between tasks in a data pipeline, using a cloud storage such as Google Storage or Amazon S3.
talk-data.com
Topic
S3
Amazon S3
object_storage
cloud_storage
aws
1
tagged
Activity Trend
11
peak/qtr
2020-Q1
2026-Q1
Top Events
O'Reilly Data Engineering Books
22
Data Engineering Podcast
20
AWS re:Invent 2024
14
Databricks DATA + AI Summit 2023
11
O'Reilly Data Science Books
5
PyData London 2025
2
Data + AI Summit 2025
2
Airflow Summit 2025
2
Airflow Summit 2022
2
Airflow Summit 2021
1
Data Engineering Central Podcast
1
Airflow Summit 2020
1
Filtering by:
Airflow Summit 2021
×