BigQuery is GCP’s serverless, highly scalable and cost-effective cloud data warehouse that can analyze petabytes of data at super fast speeds. Amazon S3 is one of the oldest and most popular cloud storage offerings. Folks with data in S3 often want to use BigQuery to gain insights into their data. Using Apache Airflow, they can build pipelines to seamlessly orchestrate that connection. In this talk, Leah walks through how they created an easily configurable pipeline to extract data. When a team at work mentioned wanting to set up a repeatable process for migrating data stored in S3 to BigQuery, Leah knew using Cloud Composer (GCP-hosted Airflow) was the right tool for the job, but she didn’t have much experience with the proprietary file types the data used. Luckily, one of her colleagues did have experience with that proprietary file type, though they hadn’t worked with Airflow. Leah and her colleague teamed up to build a reusable, easily configurable solution for the team. She will walk you through their problem, the solution, and the process they took for coming to that solution, highlighting resources that were especially useful to a first-time Airflow user.
talk-data.com
Topic
S3
Amazon S3
object_storage
cloud_storage
aws
1
tagged
Activity Trend
11
peak/qtr
2020-Q1
2026-Q1
Top Events
O'Reilly Data Engineering Books
22
Data Engineering Podcast
20
AWS re:Invent 2024
14
Databricks DATA + AI Summit 2023
11
O'Reilly Data Science Books
5
PyData London 2025
2
Data + AI Summit 2025
2
Airflow Summit 2025
2
Airflow Summit 2022
2
Airflow Summit 2021
1
Data Engineering Central Podcast
1
Airflow Summit 2020
1
Filtering by:
Airflow Summit 2020
×