GCP

Dynamic DAGs and Data Quality using DAGFactory

2025-07-01 · Airflow Summit 2025

session

by Gangfeng Huang , Ashir Alam

Airflow BI Cloud Computing Data Quality GitHub Cloud Composer Python YAML

We have a similar pattern of DAGs running for different data quality dimensions like accuracy, timeliness, & completeness. To do this again and again, we would be duplicating and potentially introducing human error while doing copy paste of code or making people write same code again. To solve for this, we are doing few things: Run DAGs via DagFactory to dynamically generate DAGs using just some YAML code for all the steps we want to run in our DQ checks. Hide this behind a UI which is hooked to github PR open step, now the user just provides some inputs or selects from dropdown in UI and a YAML DAG is generated for them. This highlights the potential for DAGFactory to hide Airflow Python code from users and make it more accessible to Data Analysts and Business Intelligence along with normal Software Engg, along with reducing human error. YAML is the perfect format to be able to generate code, create a PR and DagFactory is the perfect fir for that. All of this is running in GCP Cloud Composer.

ELT and Elections: Cloud-agnostic patterns for real-time analysis

2025-07-01 · Airflow Summit 2025

session

by Kyle McCluskey

AI/ML Airflow AWS AWS Lambda BigQuery Cloud Computing Data Lake ETL/ELT Cloud Run S3 Amazon SageMaker

Discover how Apache Airflow powers scalable ELT pipelines, enabling seamless data ingestion, transformation, and machine learning-driven insights. This session will walk through: Automating Data Ingestion: Using Airflow to orchestrate raw data ingestion from third-party sources into your data lake (S3, GCP), ensuring a steady pipeline of high-quality training and prediction data. Optimizing Transformations with Serverless Computing: Offloading intensive transformations to serverless functions (GCP Cloud Run, AWS Lambda) and machine learning models (BigQuery ML, Sagemaker), integrating their outputs seamlessly into Airflow workflows. Real-World Impact: A case study on how INTRVL leveraged Airflow, BigQuery ML, and Cloud Run to analyze early voting data in near real-time, generating actionable insights on voter behavior across swing states. This talk not only provides a deep dive into the Political Tech space but also serves as a reference architecture for building robust, repeatable ELT pipelines. Attendees will gain insights into modern serverless technologies from AWS and GCP that enhance Airflow’s capabilities, helping data engineers design scalable, cloud-agnostic workflows.

Enterprise Auditing: "The Verifiable Data Pipeline"

2025-07-01 · Airflow Summit 2025

session

by Piotr Wieczorek , Rafal Biegacz

Airflow Cloud Computing Cloud Composer

This session will dive deep into leveraging the robust logging and audit capabilities of Google Cloud Platform, Cloud Composer and Apache Airflow to establish a fully transparent and verifiable data orchestration layer. We’ll demonstrate how to track and attribute every change—from environment configuration to individual task execution—essential for meeting stringent enterprise governance, compliance, and auditing requirements.

Event-Driven Airflow 3.0: Real-Time Orchestration with Pub/Sub

2025-07-01 · Airflow Summit 2025

session

by Andrea Bombino , Nawfel Bacha

Airflow Cloud Computing dbt Pub/Sub

Traditional time-based scheduling in Airflow can lead to inefficiencies and delays. With Airflow 3.0, we can now leverage native event-driven DAG execution, enabling workflows to trigger instantly when data arrives—eliminating polling-based sensors and rigid schedules. This talk explores real-time orchestration using Airflow 3.0 and Google Cloud Pub/Sub. We’ll showcase how to build an event-driven pipeline where DAGs automatically trigger as new data lands, ensuring faster and more efficient processing. Through a live demo, we’ll demonstrate how Airflow listens to Pub/Sub messages and dynamically triggers dbt transformations only when fresh data is available. This approach improves scalability, reduces costs, and enhances orchestration efficiency. Key Takeaways: How event-driven DAGs work vs. traditional scheduling, Best practices for integrating Airflow with Pub/Sub,Eliminating polling-based sensors for efficiency,Live demo: Event-driven pipeline with Airflow 3.0, Pub/Sub & dbt. This session will showcase how Airflow 3.0 enables truly real-time orchestration.

From Complexity to Simplicity with TaskHarbor: Trendyol's Path to a Unified Orchestration Platform

2025-07-01 · Airflow Summit 2025

session

by Salih Goktug Kose , Burak Özdemir

Airflow Cloud Computing Kubernetes Terraform YAML

At Trendyol, Turkey’s leading e-commerce company, Apache Airflow powers our task orchestration, handling DAGs with 500+ tasks, complex interdependencies, and diverse environments. Managing on-prem Airflow instances posed challenges in scalability, maintenance, and deployment. To address these, we built TaskHarbor, a fully managed orchestration platform with a hybrid architecture—combining Airflow on GKE with on-prem resources for optimal performance and efficiency. This talk covers how we: Enabled seamless DAG synchronization across environments using GCS Fuse. Optimized workload distribution via GCP’s HTTPS & TCP Load Balancers. Automated infrastructure provisioning (GKE, CloudSQL, Kubernetes) using Terraform. Simplified Airflow deployments by replacing Helm YAML files with a custom templating tool, reducing configurations to 10-15 lines. Built a fully automated deployment pipeline, ensuring zero developer intervention. We enhanced efficiency, reliability, and automation in hybrid orchestration by embracing a scalable, maintainable, and cloud-native strategy. Attendees will obtain practical insights into architecting Airflow at scale and optimizing deployments.

Lessons learned for scaling up Airflow 3 in Public Cloud

2025-07-01 · Airflow Summit 2025

session

by Augusto Hidalgo , Przemek Więch

Airflow Cloud Computing Cloud Composer Kubernetes postgresql

Apache Airflow 3 is a new state-of-the-art version of Airflow. For many users who plan to adopt Airflow 3 it’s important to understand how Airflow 3 behaves from performance perspective compared to Airflow 2. This presentation is going to present performance results for various Airflow 3 configurations and provides potential Airflow 3 adopters good understanding of its performance. The reference Airflow 3 configuration will be using Kubernetes cluster as a compute layer, PostgreSQL as Airflow Database and would be performed on Google Cloud Platform. Performance tests will be performed using community version of performance tests framework and there might be references to Cloud Composer (managed service for Apache Airflow). The tests will be done in production-grade configurations that might be good references for Airflow community users. Users will be provided with comparison of Airflow 3 and Airflow 2 from performance standpoint Users also will learn how to optimize Airflow scheduler performance by understanding DAG file processing, task scheduling and configuring Scheduler to run tens of thousands of DAGs/tasks in Airflow 3

Orchestrating Travel Insights: Priceline's MLOps with Airflow

2025-07-01 · Airflow Summit 2025

session

by Priyanka Samanta , Piotr Wieczorek

AI/ML Airflow Cloud Computing ETL/ELT Cloud Composer MLOps

The journey from ML model development to production deployment and monitoring is often complex and fragmented. How can teams overcome the chaos of disparate tools and processes? This session dives into how Apache Airflow serves as a unifying force in MLOps. We’ll begin with a look at the broader MLOps trends observed by Google within the Airflow community, highlighting how Airflow is evolving to meet these challenges and showcasing diverse MLOps use cases – both current and future. Then, Priceline will present a deep-dive case study on their MLOps transformation. Learn how they leveraged Cloud Composer, Google Cloud’s managed Apache Airflow service, to orchestrate their entire ML pipeline end-to-end: ETL, data preprocessing, model building & training, Dockerization, Google Artifact Registry integration, deployment, model serving, and evaluation. Discover how using Cloud Composer on GCP enabled them to build a scalable, reliable, adaptable, and maintainable MLOps practice, moving decisively from chaos to coordination. Cloud Composer (Airflow) has served as a major backbone in transforming the whole ML experience in Priceline. Join us to learn how to harness Airflow, particularly within a managed environment like Cloud Composer, for robust MLOps workflows, drawing lessons from both industry trends and a concrete, successful implementation.

What's Really Behind Incident Metrics?

2025-06-25 · Is Having Fewer Incidents Always Good? (London)

talk

by Pawel Hajdan (Google Cloud Platform)

Cloud Computing

Pawel Hajdan (former Tech Lead, Google Cloud Platform) will shed light on the counter-intuitive incentives that lead to unnecessary complexity, fragile systems, and communication breakdowns - and how we can improve.

Automated Observability and Incident Response in GCP

2025-06-24 · Google NY Site Reliability Engineering (SRE) Tech Talks, 24 Jun 2025

talk

by Ronaldo Arrudas (Nearsure)

AWS CloudWatch Azure Cloud Computing

Many SRE teams still rely on manual intervention for incident handling; automation can improve response times and reduce toil. We will cover: Setting up comprehensive observability: Cloud Logging, Cloud Monitoring, and OpenTelemetry; Incident automation strategies: Runbooks, Auto-Healing, and ChatOps; Lessons from AWS CloudWatch and Azure Monitor applied to GCP; Case study: Reducing MTTR (Mean Time to Resolution) through automated detection and remediation

Building Neo4j-Powered Applications with LLMs

2025-06-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ravindranatha Anthapu , Siddhant Agarwal

AI/ML Cloud Computing GenAI Java LLM Neo4j Python data data-engineering graph-databases

Dive into building applications that combine the power of Large Language Models (LLMs) with Neo4j knowledge graphs, Haystack, and Spring AI to deliver intelligent, data-driven recommendations and search outcomes. This book provides actionable insights and techniques to create scalable, robust solutions by leveraging the best-in-class frameworks and a real-world project-oriented approach. What this Book will help me do Understand how to use Neo4j to build knowledge graphs integrated with LLMs for enhanced data insights. Develop skills in creating intelligent search functionalities by combining Haystack and vector-based graph techniques. Learn to design and implement recommendation systems using LangChain4j and Spring AI frameworks. Acquire the ability to optimize graph data architectures for LLM-driven applications. Gain proficiency in deploying and managing applications on platforms like Google Cloud for scalability. Author(s) Ravindranatha Anthapu, a Principal Consultant at Neo4j, and Siddhant Agarwal, a Google Developer Expert in Generative AI, bring together their vast experience to offer practical implementations and cutting-edge techniques in this book. Their combined expertise in Neo4j, graph technology, and real-world AI applications makes them authoritative voices in the field. Who is it for? Designed for database developers and data scientists, this book caters to professionals aiming to leverage the transformational capabilities of knowledge graphs alongside LLMs. Readers should have a working knowledge of Python and Java as well as familiarity with Neo4j and the Cypher query language. If you're looking to enhance search or recommendation functionalities through state-of-the-art AI integrations, this book is for you.

Roundtable: Unlocking the Future - Strategic AI for Australian Government Leadership, moderated by Google Cloud

2025-06-17 · gartner-data-analytics-apac-2025

roundtable

by Robert Sibo (Google) , John Whyte (Google)

AI/ML Cloud Computing

In an increasingly complex landscape, strategic advantage hinges on intelligent decision making. This briefing examines how Google Cloud's advanced AI capabilities offer Australian State and Federal Government leaders unprecedented opportunities to extract actionable insights from vast datasets, predict emerging trends, and proactively address societal challenges. Learn how AI can enhance policy development, streamline operations, and bolster national resilience, ensuring Australia remains at the forefront of innovation and effective governance.

Google Cloud: The Silent Agentic AI Shift

2025-06-17 · gartner-data-analytics-apac-2025

talk

by Robert Sibo (Google)

AI/ML Cloud Computing GenAI

In just two years, our conversations about AI will be radically different. We'll stop talking about the labels—GenAI, Agents, AI—and start focusing on what truly matters: measurable business transformation. Today's exciting GenAI technology will fade into the background, becoming a very important, but seamless, integrated part of the modern tech stack that powers our companies.

In this session we will explore how the convergence of AI technologies will fuel a new era of hyper-personalization and intelligent automation, delivering tangible value across every industry and department.

164: I built an entire data pipeline in 30 minutes using only AI (no code required)

2025-06-12 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith

AI/ML Analytics BigQuery Data Analytics Data Engineering

Try Keboola 👉 https://www.keboola.com/mcp?utm_campaign=FY25_Q2_RoW_Marketing_Events_Webinar_Keboola_MCP_Server_Launch_June&utm_source=Youtube&utm_medium=Avery Today, we'll create an entire data pipeline from scratch without writing a single line of code! Using Keboola MCP server and ClaudeAI, we’ll extract data from my FindADataJob.com RSS feed, transform it, load it into Google BigQuery, and visualize it with Streamlit. This is the future of data engineering! Keboola MCP Integration: https://mcp.connection.us-east4.gcp.keboola.com/sse I Analyzed Data Analyst Jobs to Find Out What Skills You ACTUALLY Need https://www.youtube.com/watch?v=lo3VU1srV1E&t=212s 💌 Join 10k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter 🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training 👩‍💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa 👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com/interviewsimulator ⌚ TIMESTAMPS 00:00 - Introduction 00:54 - Definition of Basic Data Engineering Terms 02:26 - Keboola MCP and Its Capabilities 07:48 - Extracting Data from RSS Feed 12:43 - Transforming and Cleaning the Data 19:19 - Aggregating and Analyzing Data 23:19 - Scheduling and Automating the Pipeline 25:04 - Visualizing Data with Streamlit

🔗 CONNECT WITH AVERY 🎥 YouTube Channel: https://www.youtube.com/@averysmith 🤝 LinkedIn: https://www.linkedin.com/in/averyjsmith/ 📸 Instagram: https://instagram.com/datacareerjumpstart 🎵 TikTok: https://www.tiktok.com/@verydata 💻 Website: https://www.datacareerjumpstart.com/ Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Sponsored by: Google Cloud | Powering AI & Analytics: Innovations in Google Cloud Storage for Data Lakes

Use External Models in Databricks: Connecting to Azure, AWS, Google Cloud, Anthropic and More

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Ina Koleva (Databricks)

AI/ML AWS Azure Cloud Computing Databricks GenAI

In this session you will learn how to leverage a wide set of GenAI models in Databricks, including external connections to cloud vendors and other model providers. We will cover establishing connection to externally served models, via Mosaic AI Gateway. This will showcase connection to Azure, AWS & Google Cloud models, as well as model vendors like Anthropic, Cohere, AI21 Labs and more. You will also discover best practices on model comparison, governance and cost control on those model deployments.

Sponsored by: Google Cloud | Unlock price-performance and efficiency on Google Cloud: Databricks & Axion in Action

How to Build an Open Lakehouse: Best Practices for Interoperability

2025-06-11 · Data + AI Summit 2025 Watch

talk

by James Malone (Databricks) , Aniruth Narayanan (Databricks)

AWS Azure BigQuery Cloud Computing Data Lakehouse Microsoft Fabric Snowflake

Building an open data lakehouse? Start with the right blueprint. This session walks through common reference architectures for interoperable lakehouse deployments across AWS, Google Cloud, Azure and tools like Snowflake, BigQuery and Microsoft Fabric. Learn how to design for cross-platform data access, unify governance with Unity Catalog and ensure your stack is future-ready — no matter where your data lives.

What’s New in Security and Compliance on the Databricks Data Intelligence Platform

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Filippo Seracini (Databricks) , Suresh Thiru (Databricks)

AI/ML AWS Azure Cloud Computing Databricks Cyber Security SQL

In this session, we’ll walk through the latest advancements in platform security and compliance on Databricks — from networking updates to encryption, serverless security and new compliance certifications across AWS, Azure and Google Cloud. We’ll also share our roadmap and best practices for how to securely configure workloads on Databricks SQL Serverless, Unity Catalog, Mosaic AI and more — at scale. If you're building on Databricks and want to stay ahead of evolving risk and regulatory demands, this session is your guide.

Sponsored by: Google Cloud | Unleash the power of Gemini for Databricks

talk-data.com

Activity Trend

Top Events

Top Speakers

Dynamic DAGs and Data Quality using DAGFactory

ELT and Elections: Cloud-agnostic patterns for real-time analysis

Enterprise Auditing: "The Verifiable Data Pipeline"

Event-Driven Airflow 3.0: Real-Time Orchestration with Pub/Sub

From Complexity to Simplicity with TaskHarbor: Trendyol's Path to a Unified Orchestration Platform

Lessons learned for scaling up Airflow 3 in Public Cloud

Orchestrating Travel Insights: Priceline's MLOps with Airflow

What's Really Behind Incident Metrics?

Automated Observability and Incident Response in GCP

Building Neo4j-Powered Applications with LLMs

Roundtable: Unlocking the Future - Strategic AI for Australian Government Leadership, moderated by Google Cloud

Google Cloud: The Silent Agentic AI Shift

164: I built an entire data pipeline in 30 minutes using only AI (no code required)

Sponsored by: Google Cloud | Powering AI & Analytics: Innovations in Google Cloud Storage for Data Lakes

Use External Models in Databricks: Connecting to Azure, AWS, Google Cloud, Anthropic and More

Sponsored by: Google Cloud | Unlock price-performance and efficiency on Google Cloud: Databricks & Axion in Action

Sponsored by: Google Cloud | Building Powerful Agentic Ecosystems with Google Cloud's A2A

How to Build an Open Lakehouse: Best Practices for Interoperability

What’s New in Security and Compliance on the Databricks Data Intelligence Platform

Sponsored by: Google Cloud | Unleash the power of Gemini for Databricks