When Postgres is enough: solving document storage, pub/sub and distributed queues without more tools

2025-09-03 · PyData Berlin 2025 Watch

talk

by Eugen Geist

postgresql

When a new requirement appears, whether it's document storage, pub/sub messaging, distributed queues, or even full-text search, Postgres can often handle it without introducing more infrastructure.

This talk explores how to leverage Postgres' native features like JSONB, LISTEN/NOTIFY, queueing patterns and vector extensions to build robust, scalable systems without increasing infrastructure complexity.

You'll learn practical patterns that extend Postgres just far enough, keeping systems simpler, more maintainable, and easier to operate, especially in small to medium projects or freelancing setups, where Postgres often already forms a critical part of the stack.

Postgres might not replace everything forever - but it can often get you much further than you think.

Common provider abstractions: Key for multi-cloud data handling

2025-07-01 · Airflow Summit 2025

session

by Vikram Koka (Astronomer)

Airflow Kinesis Cloud Computing Kafka SAP Snowflake SQL postgresql

Enterprises want the flexibility to operate across multiple clouds, whether to optimize costs, improve resiliency, to avoid vendor lock-in, or for data sovereignty. But for developers, that flexibility usually comes at the cost of extra complexity and redundant code. The goal here is simple: write once, run anywhere, with minimum boilerplate. In Apache Airflow, we’ve already begun tackling this problem with abstractions like Common-SQL, which lets you write database queries once and run them on 20+ databases, from Snowflake to Postgres to SQLite to SAP HANA. Similarly, Common-IO standardizes cloud blob storage interactions across all public clouds. With Airflow 3.0, we are pushing this further by introducing a Common Message Bus provider, which is an abstraction, initially supporting Amazon SQS and expanding to Google PubSub and Apache Kafka soon after. We expect additional implementations such as Amazon Kinesis and Managed Kafka over time. This talk will dive into why these abstractions matter, how they reduce friction for developers while giving enterprises true multi-cloud optionality, and what’s next for Airflow’s evolving provider ecosystem.

Event-Driven Airflow 3.0: Real-Time Orchestration with Pub/Sub

2025-07-01 · Airflow Summit 2025

session

by Andrea Bombino , Nawfel Bacha

Airflow Cloud Computing dbt GCP

Traditional time-based scheduling in Airflow can lead to inefficiencies and delays. With Airflow 3.0, we can now leverage native event-driven DAG execution, enabling workflows to trigger instantly when data arrives—eliminating polling-based sensors and rigid schedules. This talk explores real-time orchestration using Airflow 3.0 and Google Cloud Pub/Sub. We’ll showcase how to build an event-driven pipeline where DAGs automatically trigger as new data lands, ensuring faster and more efficient processing. Through a live demo, we’ll demonstrate how Airflow listens to Pub/Sub messages and dynamically triggers dbt transformations only when fresh data is available. This approach improves scalability, reduces costs, and enhances orchestration efficiency. Key Takeaways: How event-driven DAGs work vs. traditional scheduling, Best practices for integrating Airflow with Pub/Sub,Eliminating polling-based sensors for efficiency,Live demo: Event-driven pipeline with Airflow 3.0, Pub/Sub & dbt. This session will showcase how Airflow 3.0 enables truly real-time orchestration.

Unlocking Event-Driven Scheduling in Airflow 3: A New Era of Reactive Data Pipelines

2025-07-01 · Airflow Summit 2025

session

by Vincent Beck

Airflow API

Airflow 3 introduces a major evolution in orchestration: native support for external event-driven scheduling. In this talk, I’ll share the journey behind AIP-82—why we needed it, how we built it, and what it unlocks. I’ll dive into how the new AssetWatcher enables pipelines to respond immediately to events like file arrivals, API calls, or pub/sub messages. You’ll see how this drastically reduces latency and infrastructure overhead while improving reactivity and resource efficiency. We’ll explore how it works under the hood, real-world use cases, best practices, and migration tips for teams ready to shift from time-based to event-driven workflows. If you’re looking to make your Airflow DAGs more dynamic, this is the talk that shows you how. Whether you’re an operator or contributor, you’ll walk away with a deep understanding of one of Airflow 3’s most impactful features.

Lightning Talk – Beyond Pipelines: Introducing Pub/Sub for Tables

2025-04-24 · AI Council 2025 Watch

lightning_talk

by Arvind Prabhakar

Use real-time data for AI/ML workloads in BigQuery

2025-04-11 · Google Cloud Next '25

session

by Prateek Duble (Google Cloud) , Nick Orlove (Google Cloud) , Sandhya Kapoor (Flipkart)

AI/ML Analytics BigQuery Cloud Computing Data Analytics GCP Data Streaming

Simplify real-time data analytics and build event-driven, AI-powered applications using BigQuery and Pub/Sub. Learn to ingest and process massive streaming data from users, devices, and microservices for immediate insights and rapid action. Explore BigQuery's continuous queries for real-time analytics and ML model training. Discover how Flipkart, India’s leading e-commerce platform, leverages Google Cloud to build scalable, efficient real-time data pipelines and AI/ML solutions, and gain insights on driving business value through real–time data.

Build event-driven apps with Cloud Run and Eventarc Advanced

2025-04-11 · Google Cloud Next '25

session

by Hamid Asaadi (Google Cloud) , Sara Ford (Google Cloud) , James Ma (Google Cloud)

Cloud Computing Cloud Run

Join us to discuss serverless computing and event-driven architectures with Cloud Run functions. Learn a quick and secure way to connect services and build event-driven architectures with multiple trigger types (HTTP, Pub/Sub, and Eventarc). And get introduced to Eventarc Advanced, centralized access control to your events with support for cross-project delivery.

Hadoop pioneer to cloud innovator: Yahoo’s data lake modernization journey

2025-04-11 · Google Cloud Next '25

session

by Akshay Sarma (Yahoo) , Ayyappan Arasu (Yahoo) , Dana Soltani (Google Cloud)

AI/ML BigQuery Cloud Computing Data Lake Dataproc GCP Hadoop

Get the inside story of Yahoo’s data lake transformation. As a Hadoop pioneer, Yahoo’s move to Google Cloud is a significant shift in data strategy. Explore the business drivers behind this transformation, technical hurdles encountered, and strategic partnership with Google Cloud that enabled a seamless migration. We’ll uncover key lessons, best practices for data lake modernization, and how Yahoo is using BigQuery, Dataproc, Pub/Sub, and other services to drive business value, enhance operational efficiency, and fuel their AI initiatives.

Kir Tititevsky: Modern Streaming Architecture Transforming the Service Bus

2025-03-31 · Straight Data Talk Listen

podcast_episode

by Yuliia Tkachova (Masthead Data) , Kir Titievsky (Google)

Cloud Computing Cloud Storage Dataflow GCP Kafka Data Streaming

Kir Titievsky, Product Manager at Google Cloud with extensive experience in streaming and storage infrastructure, joined Yuliia and Dumky to talk about streaming. Drawing from his work with Apache Kafka, Cloud PubSub, Dataflow and Cloud Storage since 2015, Kir explains the fundamental differences between streaming and micro-batch processing. He challenges common misconceptions about streaming costs, explaining how streaming can be significantly less expensive than batch processing for many use cases. Kir shares insights on the "service bus architecture" revival, discussing how modern distributed messaging systems have solved historic bottlenecks while creating new opportunities for business and performance needs.Kir's medium - https://medium.com/@kir-gcpKir's Linkedin page - https://www.linkedin.com/in/kir-titievsky-%F0%9F%87%BA%F0%9F%87%A6-7775052/

Azure Managed Redis: Fully Managed Redis for the Hyperscale Cloud | BRK189

2024-11-25 · Microsoft Ignite 2023 Watch

video

by Benjamin Renaud (Redis) , Scott Hunter (Microsoft)

Azure Cloud Computing Microsoft Redis

Introducing Azure Managed Redis, a fully managed, hyperscale-ready Redis solution that integrates the latest Redis innovations. It delivers up to 99.999% availability while ensuring a cost-effective total cost of ownership. With advanced features available across all four tiers, it empowers users to optimize key use cases like vector similarity search, session management, and pub/sub messaging. Join our session with product leaders to discover best practices and learn how to get started today.

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Scott Hunter * Benjamin Renaud

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This is one of many sessions from the Microsoft Ignite 2024 event. View even more sessions on-demand and learn about Microsoft Ignite at https://ignite.microsoft.com

BRK189 | English (US) | Data

MSIgnite

Airflow Datasets and Pub/Sub for Dynamic DAG Triggering

2024-07-01 · Airflow Summit 2024

session

by Andrea Bombino , Nawfel Bacha

Airflow Cloud Computing Data Engineering dbt GCP

Looking for a way to streamline your data workflows and master the art of orchestration? As we navigate the complexities of modern data engineering, Airflow’s dynamic workflow and complex data pipeline dependencies are starting to become more and more common nowadays. In order to empower data engineers to exploit Airflow as the main orchestrator, Airflow Datasets can be easily integrated in your data journey. This session will showcase the Dynamic Workflow orchestration in Airflow and how to manage multi-DAGs dependencies with Multi-Dataset listening. We’ll take you through a real-time data pipeline with Pub/Sub messaging integration and dbt in Google Cloud environment, to ensure data transformations are triggered only upon new data ingestion, moving away from rigid time-based scheduling or the use of sensors and other legacy ways to trigger a DAG.

Build continuous data and AI pipelines with BigQuery continuous queries

2024-04-11 · Google Cloud Next '24

session

by Nick Orlove (Google Cloud) , Pavan Edara (Google Cloud) , Pinaki Mitra (UPS)

AI/ML Big Data BigQuery Cloud Computing Data Engineering GCP SQL

Learn about real-time AI-powered insights with BigQuery continuous queries, and how this new feature is poised to revolutionize data engineering by empowering event-driven and AI-driven data pipelines with Vertex AI, Pub/Sub, and Bigtable – all through the familiar language of Cloud SQL. Learn about how UPS was able to use big data on millions of shipped packages to reduce package theft, their work on more efficient claims processing, and why they are looking to BigQuery to accelerate time to insights and smarter business outcomes.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Cloud-powered, API-first testing with Testcontainers and Kotlin

2024-04-10 · Google Cloud Next '24

session

by Oleg Šelajev (Docker)

API BigQuery Cloud Computing GCP Cloud Run Java JavaScript Python Rust

Adequately testing systems that use Google Cloud services can be a serious challenge. In this session we’ll show you how to shift testing to an API-first approach using Testcontainers. This approach helps us improve the feedback cycle and reliability for both our inner-dev loop and our competitive intelligence cycle. We’ll go through an end-to-end example that uses BigQuery and PubSub, Cloud Build, and Cloud Run. Examples will use Kotlin but it could be accomplished with other languages including Rust, Go, JavaScript, Python, Java, and more.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Transform your business with streaming: Learn how Mercado Libre did it

2024-04-10 · Google Cloud Next '24

session

by Diego Delgado (Mercado Libre) , Shan Kulandaivel (Google Cloud) , Pablo Milan (Mercado Libre)

Cloud Computing Dataflow GCP Data Streaming

Businesses everywhere have the opportunity to drive transformational impact by leveraging streaming data to make decisions and build experiences that delight users. In this session you will learn how MercadoLibre processes tens of billions of messages across thousands of applications to drive business impact. You will also learn about exciting new product announcements ranging from native ingest capabilities in Cloud Pub/Sub to new efficiency features in Dataflow to support for OSS technologies.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Leveraging Google Kubernetes Engine and Pub/Sub to power research, trading, and risk

2024-04-10 · Google Cloud Next '24

session

by Aasif Versi (Citadel) , Gin Siu Cheng (Google Cloud) , Sacha Best (Citadel) , Cindy Zhang (Citadel)

Cloud Computing GCP Kubernetes

Learn how Citadel’s fixed income fund powers their daily financial activities. First, we’ll explore the challenges of calculating profit and loss across thousands of positions, back-testing models and running trading strategies. Then we’ll discuss developing a versatile platform that bursts to thousands of workers while also handling real-time calculations. Finally, we’ll present challenges encountered and give insight on practical solutions teams can apply to their own cloud compute infrastructures.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Ford uses Google Cloud for connected vehicle telemetry data solutions

2024-04-09 · Google Cloud Next '24

session

by Scott Weinstein (Google Cloud) , Anton Gething (Google Cloud) , Gavarraju Nanduri (Ford Motor Company) , Sangeetha Suryanarayanan (Ford Motor Company)

AI/ML BigQuery Cloud Computing Dataflow GCP

Connected vehicle telemetry has data that can be used to gain insights into vehicle performance, driver behavior, and fleet operations using AI technology. We will discuss how Ford uses Bigtable to collect, store, and analyze connected vehicle telemetry data in conjunction with BigQuery, Pub/Sub and Dataflow, a recipe applicable to many time series use cases. Get some of the insights we have gained from this data, how we have used these insights to improve our fleet operations and also some new Bigtable features we‘re most excited about.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services

2023-11-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by Sandika S. Sukhdeve , Dr. Shitalkumar R. Sukhdeve

AI/ML Analytics BI Big Data BigQuery Cloud Computing Cloud Storage Data Analytics Data Science DataViz Dataflow Dataproc +9 more

This book is your practical and comprehensive guide to learning Google Cloud Platform (GCP) for data science, using only the free tier services offered by the platform. Data science and machine learning are increasingly becoming critical to businesses of all sizes, and the cloud provides a powerful platform for these applications. GCP offers a range of data science services that can be used to store, process, and analyze large datasets, and train and deploy machine learning models. The book is organized into seven chapters covering various topics such as GCP account setup, Google Colaboratory, Big Data and Machine Learning, Data Visualization and Business Intelligence, Data Processing and Transformation, Data Analytics and Storage, and Advanced Topics. Each chapter provides step-by-step instructions and examples illustrating how to use GCP services for data science and big data projects. Readers will learn how to set up a Google Colaboratory account and run Jupyternotebooks, access GCP services and data from Colaboratory, use BigQuery for data analytics, and deploy machine learning models using Vertex AI. The book also covers how to visualize data using Looker Data Studio, run data processing pipelines using Google Cloud Dataflow and Dataprep, and store data using Google Cloud Storage and SQL. What You Will Learn Set up a GCP account and project Explore BigQuery and its use cases, including machine learning Understand Google Cloud AI Platform and its capabilities Use Vertex AI for training and deploying machine learning models Explore Google Cloud Dataproc and its use cases for big data processing Create and share data visualizations and reports with Looker Data Studio Explore Google Cloud Dataflow and its use cases for batch and stream data processing Run data processing pipelines on Cloud Dataflow Explore Google Cloud Storageand its use cases for data storage Get an introduction to Google Cloud SQL and its use cases for relational databases Get an introduction to Google Cloud Pub/Sub and its use cases for real-time data streaming Who This Book Is For Data scientists, machine learning engineers, and analysts who want to learn how to use Google Cloud Platform (GCP) for their data science and big data projects

Building Real-Time Analytics Systems

2023-09-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mark Needham

Analytics AWS Kinesis Dashboard Kafka data data-engineering real-time-analytics streaming-messaging

Gain deep insight into real-time analytics, including the features of these systems and the problems they solve. With this practical book, data engineers at organizations that use event-processing systems such as Kafka, Google Pub/Sub, and AWS Kinesis will learn how to analyze data streams in real time. The faster you derive insights, the quicker you can spot changes in your business and act accordingly. Author Mark Needham from StarTree provides an overview of the real-time analytics space and an understanding of what goes into building real-time applications. The book's second part offers a series of hands-on tutorials that show you how to combine multiple software products to build real-time analytics applications for an imaginary pizza delivery service. You will: Learn common architectures for real-time analytics Discover how event processing differs from real-time analytics Ingest event data from Apache Kafka into Apache Pinot Combine event streams with OLTP data using Debezium and Kafka Streams Write real-time queries against event data stored in Apache Pinot Build a real-time dashboard and order tracking app Learn how Uber, Stripe, and Just Eat use real-time analytics

Data Engineering with Google Cloud Platform

2022-03-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Adi Wijaya

AI/ML Airflow BigQuery Cloud Computing Cloud Storage Data Engineering Data Science Dataflow GCP Cloud Composer Linux Python +5 more

In 'Data Engineering with Google Cloud Platform', you'll explore how to construct efficient, scalable data pipelines using GCP services. This hands-on guide covers everything from building data warehouses to deploying machine learning pipelines, helping you master GCP's ecosystem. What this Book will help me do Build comprehensive data ingestion and transformation pipelines using BigQuery, Cloud Storage, and Dataflow. Design end-to-end orchestration flows with Airflow and Cloud Composer for automated data processing. Leverage Pub/Sub for building real-time event-driven systems and streaming architectures. Gain skills to design and manage secure data systems with IAM and governance strategies. Prepare for and pass the Professional Data Engineer certification exam to elevate your career. Author(s) Adi Wijaya is a seasoned data engineer with significant experience in Google Cloud Platform products and services. His expertise in building data systems has equipped him with insights into the real-world challenges data engineers face. Adi aims to demystify technical topics and deliver practical knowledge through his writing, helping tech professionals excel. Who is it for? This book is tailored for data engineers and data analysts who want to leverage GCP for building efficient and scalable data systems. Readers should have a beginner-level understanding of topics like data science, Python, and Linux to fully benefit from the material. It is also suitable for individuals preparing for the Google Professional Data Engineer exam. The book is a practical companion for enhancing cloud and data engineering skills.

Data Science on the Google Cloud Platform, 2nd Edition

2022-03-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Valliappa Lakshmanan

AI/ML Analytics BigQuery Cloud Computing Dashboard Data Science Dataflow Dataproc GCP Cloud Run Spark cloud-computing +3 more

Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build using Google Cloud Platform (GCP). This hands-on guide shows data engineers and data scientists how to implement an end-to-end data pipeline with cloud native tools on GCP. Throughout this updated second edition, you'll work through a sample business decision by employing a variety of data science approaches. Follow along by building a data pipeline in your own project on GCP, and discover how to solve data science problems in a transformative and more collaborative way. You'll learn how to: Employ best practices in building highly scalable data and ML pipelines on Google Cloud Automate and schedule data ingest using Cloud Run Create and populate a dashboard in Data Studio Build a real-time analytics pipeline using Pub/Sub, Dataflow, and BigQuery Conduct interactive data exploration with BigQuery Create a Bayesian model with Spark on Cloud Dataproc Forecast time series and do anomaly detection with BigQuery ML Aggregate within time windows with Dataflow Train explainable machine learning models with Vertex AI Operationalize ML with Vertex AI Pipelines

talk-data.com

Pub/Sub

Activity Trend

Top Events

Top Speakers

When Postgres is enough: solving document storage, pub/sub and distributed queues without more tools

Common provider abstractions: Key for multi-cloud data handling

Event-Driven Airflow 3.0: Real-Time Orchestration with Pub/Sub

Unlocking Event-Driven Scheduling in Airflow 3: A New Era of Reactive Data Pipelines

Lightning Talk – Beyond Pipelines: Introducing Pub/Sub for Tables

Use real-time data for AI/ML workloads in BigQuery

Build event-driven apps with Cloud Run and Eventarc Advanced

Hadoop pioneer to cloud innovator: Yahoo’s data lake modernization journey

Kir Tititevsky: Modern Streaming Architecture Transforming the Service Bus

Azure Managed Redis: Fully Managed Redis for the Hyperscale Cloud | BRK189

MSIgnite

Airflow Datasets and Pub/Sub for Dynamic DAG Triggering

Build continuous data and AI pipelines with BigQuery continuous queries

Cloud-powered, API-first testing with Testcontainers and Kotlin

Transform your business with streaming: Learn how Mercado Libre did it

Leveraging Google Kubernetes Engine and Pub/Sub to power research, trading, and risk

Ford uses Google Cloud for connected vehicle telemetry data solutions

Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services

Building Real-Time Analytics Systems

Data Engineering with Google Cloud Platform

Data Science on the Google Cloud Platform, 2nd Edition