talk-data.com

Topic

GCP

Google Cloud Platform (GCP)

cloud cloud_provider infrastructure services

Activities

tagged

Activity Trend

31 peak/qtr

2020-Q1 2026-Q1

Top Events

Google Cloud Next '24 911 Google Cloud Next '25 574 Data Engineering Podcast 33 O'Reilly Data Engineering Books 16 Airflow Summit 2025 10 Data + AI Summit 2025 10 Big Data LDN 2025 7 Airflow Summit 2020 6 Straight Data Talk 6 Airflow Summit 2023 5 O'Reilly Data Science Books 4 gartner-data-analytics-uk-2025 4

Top Speakers

Tobias Macey 33 Ivan Nardini (Google Cloud) 9 Enrique Chan (Google) 8 Liz Raymond (Google Cloud) 8 Frank Torres (Google Cloud) 8 Patrick Bentley (Google Cloud) 7 Crispin Velez (Google Cloud) 7 Warren Barkley (Google Cloud) 7 Patrick Cole (Google Cloud) 7 Yasmeen Ahmad (Google Cloud) 7 Thomas Kurian (Google Cloud) 7 Gwendolyn Stripling (Google Cloud) 7

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Airflow Summit 2020 ×

Airflow CI/CD: Github to Cloud Composer (safely)

2020-07-01 · Airflow Summit 2020

session

by Jacob Ferriero (Google)

Airflow BigQuery CI/CD Cloud Computing Dataflow GitHub Cloud Composer

Deploying bad DAGs to your Airflow environment can wreak havoc. This talk provides an opinionated take on a mono repo structure for GCP data pipelines leveraging BigQuery, Dataflow and a series of CI tests for validating your Airflow DAGs before deploying them to Cloud Composer. Composer makes deploying airflow infrastructure easy and deploying DAGs “just dropping files in a GCS bucket”. However, this opens the opportunity for many organizations to shoot themselves in the foot by not following a strong CI/CD process. Pushing bad dags to Composer can manifest in a really sad airflow webserver and many wasted DAG parsing cycles in the scheduler, disrupting other teams using the same environment. This talk will outline a series of recommended continuous integration tests to validate PRs for updating or deploying new Airflow DAGs before pushing them to your GCP Environment with a small “DAGs deployer” application that will manage deploying DAGs following some best practices. This talk will walk through explaining automating these tests with Cloud Build, but could easily be ported to your favorite CI/CD tool.

Airflow the perfect match in our analytics pipeline

2020-07-01 · Airflow Summit 2020

session

by Sergio Camilo Fandiño Hernández

Airflow Analytics BI BigQuery Cloud Computing Redshift

For three years we at LOVOO, a market-leading dating app, have been using the Google Cloud managed version of Airflow, a product we’ve been familiar with since its Alpha release. We took a calculated risk and integrated the Alpha into our product, and, luckily, it was a match. Since then, we have been leveraging this software to build out not only our data pipeline, but also boost the way we do analytics and BI. The speaker will present an overview of the software’s usability for Pipeline Error Alerting through BashOperators that communicate with Slack and will touch upon how they built their Analytics Pipeline (deployment and growth) and currently batch big amounts of data from different sources effectively using Airflow. We will also showcase our PythonOperators-driven RedShift to BigQuery data migration process, as well as offer a guide for creating fully dynamic tasks inside DAG.

From S3 to BigQuery - How a first-time Airflow user successfully implemented a data pipeline

2020-07-01 · Airflow Summit 2020

session

by Leah Cole

Airflow BigQuery Cloud Computing Cloud Storage DWH Cloud Composer S3

BigQuery is GCP’s serverless, highly scalable and cost-effective cloud data warehouse that can analyze petabytes of data at super fast speeds. Amazon S3 is one of the oldest and most popular cloud storage offerings. Folks with data in S3 often want to use BigQuery to gain insights into their data. Using Apache Airflow, they can build pipelines to seamlessly orchestrate that connection. In this talk, Leah walks through how they created an easily configurable pipeline to extract data. When a team at work mentioned wanting to set up a repeatable process for migrating data stored in S3 to BigQuery, Leah knew using Cloud Composer (GCP-hosted Airflow) was the right tool for the job, but she didn’t have much experience with the proprietary file types the data used. Luckily, one of her colleagues did have experience with that proprietary file type, though they hadn’t worked with Airflow. Leah and her colleague teamed up to build a reusable, easily configurable solution for the team. She will walk you through their problem, the solution, and the process they took for coming to that solution, highlighting resources that were especially useful to a first-time Airflow user.

Machine Learning with Apache Airflow

2020-07-01 · Airflow Summit 2020

session

by Daniel Imberman

AI/ML Airflow Cloud Computing Data Science Cloud Functions Cyber Security Spark TensorFlow

This talk discusses how to build an Airflow based data platform that can take advantage of popular ML tools (Jupyter, Tensorflow, Spark) while creating an easy-to-manage/monitor As the field of data science grows in popularity, companies find themselves in need of a single common language that can connect their data science teams and data infrastructure teams. Data scientists want rapid iteration, infrastructure engineers want monitoring and security controls, and product owners want their solutions deployed in time for quarterly reports. This talk will discuss how to build an Airflow based data platform that can take advantage of popular ML tools (Jupyter, Tensorflow, Spark) while creating an easy-to-manage/monitor ecosystem for data infrastructure and support team. In this talk, we will take an idea from a single-machine Jupyter Notebook to a cross-service Spark + Tensorflow pipeline, to a canary tested, production-ready model served on Google Cloud Functions. We will show how Apache Airflow can connect all layers of a data team to deliver rapid results.

Migrating Airflow-based Spark jobs to Kubernetes - the native way

2020-07-01 · Airflow Summit 2020

session

by Roi Teveth (Nielsen Identity Engine) , Itai Yaffe (Nielsen Identity Engine)

Airflow AWS Amazon EMR Kubernetes Spark

At Nielsen Identity Engine, we use Spark to process 10’s of TBs of data. Our ETLs, orchestrated by Airflow, spin-up AWS EMR clusters with thousands of nodes per day. In this talk, we’ll guide you through migrating Spark workloads to Kubernetes with minimal changes to Airflow DAGs, using the open-sourced GCP Spark-on-K8s operator and the native integration we recently contributed to the Airflow project.

Run Airflow DAGs in a secure way

2020-07-01 · Airflow Summit 2020

session

by Rafal Biegacz

Airflow Cloud Computing Cloud Composer Cyber Security

In the contemporary world security is important more than ever - Airflow installations are no exception. Google Cloud Platform and Cloud Composer offer useful security options for running your DAGs and tasks in a way so you effectively can manage a risk of data exfiltration and access to the system is limited. This is a sponsored talk, presented by Google Cloud .