Data Streaming

Databricks Data Privacy

2025-06-09 · Data + AI Summit 2025

talk

Data Governance Databricks Delta Git PySpark SQL

In this course, you’ll learn how to apply patterns to securely store and delete personal information for data governance and compliance on the Data Intelligence Platform. We’ll cover topics like storing sensitive data appropriately to simplify granting access and processing deletes, processing deletes to ensure compliance with the right to be forgotten, performing data masking, and configuring fine-grained access control to configure appropriate privileges to sensitive data.Pre-requisites: Ability to perform basic code development tasks using the Databricks workspace (create clusters, run code in notebooks, use basic notebook operations, import repos from git, etc), intermediate programming experience with SQL and PySpark (extract data from a variety of file formats and data sources, apply a number of common transformations to clean data, reshape and manipulate complex data using advanced built-in functions), intermediate programming experience with Delta Lake (create tables, perform complete and incremental updates, compact files, restore previous versions etc.). Beginner experience with Lakeflow Declarative Pipelines and streaming workloads.Labs: YesCertification Path: Databricks Certified Data Engineer Professional

Databricks Performance Optimization

2025-06-09 · Data + AI Summit 2025

talk

Databricks Delta Git PySpark Spark SQL

In this course, you’ll learn how to optimize workloads and physical layout with Spark and Delta Lake and and analyze the Spark UI to assess performance and debug applications. We’ll cover topics like streaming, liquid clustering, data skipping, caching, photons, and more. Pre-requisites: Ability to perform basic code development tasks using the Databricks workspace (create clusters, run code in notebooks, use basic notebook operations, import repos from git, etc), intermediate programming experience with SQL and PySpark (extract data from a variety of file formats and data sources, apply a number of common transformations to clean data, reshape and manipulate complex data using advanced built-in functions), intermediate programming experience with Delta Lake (create tables, perform complete and incremental updates, compact files, restore previous versions etc.). Labs: Yes Certification Path: Databricks Certified Data Engineer Professional

Databricks Streaming and Lakeflow Declarative Pipelines

2025-06-09 · Data + AI Summit 2025

talk

Data Quality Databricks Delta ETL/ELT Git PySpark SQL

In this course, you’ll learn how to Incrementally process data to power analytic insights with Structured Streaming and Auto Loader, and how to apply design patterns for designing workloads to perform ETL on the Data Intelligence Platform with Lakeflow Declarative Pipelines. First, we’ll cover topics including ingesting raw streaming data, enforcing data quality, implementing CDC, and exploring and tuning state information. Then, we’ll cover options to perform a streaming read on a source, requirements for end-to-end fault tolerance, options to perform a streaming write to a sink, and creating an aggregation and watermark on a streaming dataset. Pre-requisites: Ability to perform basic code development tasks using the Databricks workspace (create clusters, run code in notebooks, use basic notebook operations, import repos from git, etc.), intermediate programming experience with SQL and PySpark (extract data from a variety of file formats and data sources, apply a number of common transformations to clean data, reshape and manipulate complex data using advanced built-in functions), intermediate programming experience with Delta Lake (create tables, perform complete and incremental updates, compact files, restore previous versions etc.). Beginner experience with streaming workloads and familiarity with Lakeflow Declarative Pipelines. Labs: No Certification Path: Databricks Certified Data Engineer Professional

Data Ingestion with Lakeflow Connect

2025-06-09 · Data + AI Summit 2025 Watch

talk

AI/ML API Cloud Computing Databricks Delta ETL/ELT Python SaaS Spark SQL

In this course, you’ll learn how to have efficient data ingestion with Lakeflow Connect and manage that data. Topics include ingestion with built-in connectors for SaaS applications, databases and file sources, as well as ingestion from cloud object storage, and batch and streaming ingestion. We'll cover the new connector components, setting up the pipeline, validating the source and mapping to the destination for each type of connector. We'll also cover how to ingest data with Batch to Streaming ingestion into Delta tables, using the UI with Auto Loader, automating ETL with Lakeflow Declarative Pipelines or using the API.This will prepare you to deliver the high-quality, timely data required for AI-driven applications by enabling scalable, reliable, and real-time data ingestion pipelines. Whether you're supporting ML model training or powering real-time AI insights, these ingestion workflows form a critical foundation for successful AI implementation.Pre-requisites: Beginner familiarity with the Databricks Data Intelligence Platform (selecting clusters, navigating the Workspace, executing notebooks), cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions, etc.Labs: NoCertification Path: Databricks Certified Data Engineer Associate

Bringing stories to life with AI, data streaming and generative agents

2025-06-07 · PyData London 2025

talk

by Olena Kutsenko (Confluent)

AI/ML Flink Iceberg Kafka LLM Python RAG

Explore how AI-powered Generative Agents can evolve in real time using live data streams. Inspired by Stanford's 'Generative Agents' paper, this session dives into building dynamic, AI-driven worlds with Apache Kafka, Flink, and Iceberg - plus LLMs, RAG, and Python. Demos and practical examples included!

How we unified feature engineering across data and backend at Monzo

2025-06-07 · PyData London 2025 Watch

talk

by Alex Jones

Deep dive into how Monzo reduced the effort it takes to generate point-in-time correct features for model development and productionise them with realtime streaming using our event-driven architecture.

Apache Kafka in Action

2025-05-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alexander Kropp , Anatoly Zelenin (DataFlow Academy)

Analytics Cloud Computing Kafka Kubernetes Microsoft data data-engineering streaming-messaging

Apache Kafka, start to finish. Apache Kafka in Action: From basics to production guides you through the concepts and skills you’ll need to deploy and administer Kafka for data pipelines, event-driven applications, and other systems that process data streams from multiple sources. Authors Anatoly Zelenin and Alexander Kropp have spent years using Kafka in real-world production environments. In this guide, they reveal their hard-won expert insights to help you avoid common Kafka pitfalls and challenges. Inside Apache Kafka in Action you’ll discover: Apache Kafka from the ground up Achieving reliability and performance Troubleshooting Kafka systems Operations, governance, and monitoring Kafka use cases, patterns, and anti-patterns Clear, concise, and practical, Apache Kafka in Action is written for IT operators, software engineers, and IT architects working with Kafka every day. Chapter by chapter, it guides you through the skills you need to deliver and maintain reliable and fault-tolerant data-driven applications. About the Technology Apache Kafka is the gold standard streaming data platform for real-time analytics, event sourcing, and stream processing. Acting as a central hub for distributed data, it enables seamless flow between producers and consumers via a publish-subscribe model. Kafka easily handles millions of events per second, and its rock-solid design ensures high fault tolerance and smooth scalability. About the Book Apache Kafka in Action is a practical guide for IT professionals who are integrating Kafka into data-intensive applications and infrastructures. The book covers everything from Kafka fundamentals to advanced operations, with interesting visuals and real-world examples. Readers will learn to set up Kafka clusters, produce and consume messages, handle real-time streaming, and integrate Kafka into enterprise systems. This easy-to-follow book emphasizes building reliable Kafka applications and taking advantage of its distributed architecture for scalability and resilience. What's Inside Master Kafka’s distributed streaming capabilities Implement real-time data solutions Integrate Kafka into enterprise environments Build and manage Kafka applications Achieve fault tolerance and scalability About the Reader For IT operators, software architects and developers. No experience with Kafka required. About the Authors Anatoly Zelenin is a Kafka expert known for workshops across Europe, especially in banking and manufacturing. Alexander Kropp specializes in Kafka and Kubernetes, contributing to cloud platform design and monitoring. Quotes A great introduction. Even experienced users will go back to it again and again. - Jakub Scholz, Red Hat Approachable, practical, well-illustrated, and easy to follow. A must-read. - Olena Kutsenko, Confluent A zero to hero journey to understanding and using Kafka! - Anthony Nandaa, Microsoft Thoughtfully explores a wide range of topics. A wealth of valuable information seamlessly presented and easily accessible. - Olena Babenko, Aiven Oy

Workshop – A Local-First Approach to Extremely Fast Streaming Visualization

2025-04-22 · AI Council 2025 Watch

workshop

by Parham Parvizi

Use real-time data for AI/ML workloads in BigQuery

2025-04-11 · Google Cloud Next '25

session

by Prateek Duble (Google Cloud) , Nick Orlove (Google Cloud) , Sandhya Kapoor (Flipkart)

AI/ML Analytics BigQuery Cloud Computing Data Analytics GCP Pub/Sub

Simplify real-time data analytics and build event-driven, AI-powered applications using BigQuery and Pub/Sub. Learn to ingest and process massive streaming data from users, devices, and microservices for immediate insights and rapid action. Explore BigQuery's continuous queries for real-time analytics and ML model training. Discover how Flipkart, India’s leading e-commerce platform, leverages Google Cloud to build scalable, efficient real-time data pipelines and AI/ML solutions, and gain insights on driving business value through real–time data.

Save time and effort with Google Cloud Managed Service for Apache Kafka

2025-04-11 · Google Cloud Next '25

session

by Mehran Nazir (Google Cloud) , Shan Kulandaivel (Google Cloud) , Neil Arlo (Madhive) , Kir Titievsky (Google)

Analytics Cloud Computing GCP Kafka

Madhive built their ad analytics and bidding infrastructure using databases and batch pipelines. When the pipeline lag got too long to bid effectively, they rebuilt from scratch with Google Cloud’s Managed Service for Apache Kafka. Join this session to learn about Madhive’s journey and dive deep into how the service works, how it can help you build streaming systems quickly and securely, and what migration looks like. This session is relevant for Kafka administrators and architects building event-sourcing platforms or event-driven systems.

Disrupting streaming experiences with AI transformation

2025-04-11 · Google Cloud Next '25

session

by Adonias Melo (Globo) , Anshul Kapoor (Google Cloud) , Albert Lai (Google Cloud) , Sean Black (DAILYMOTION)

AI/ML Cloud Computing

Audiences around the world have almost limitless access to content that’s only a click, swipe, or voice command away. Companies are embracing cloud capabilities to evolve from traditional media companies into media-tech and media-AI companies. Join us to discover how the cloud is maximizing personalization and monetization to enable the next generation of AI-powered streaming experiences for audiences everywhere.

Construct a scalable, high-volume trading platform with low latency using AlloyDB and Spark Streaming on Dataproc

2025-04-10 · Google Cloud Next '25

session

by Sachin Pawar (Google) , Surjit Singh (Google)

Cloud Computing Dataproc GCP Spark

Overwhelmed by the complexities of building a robust and scalable data pipeline for algo trading with AlloyDB? This session provides the Google Cloud services, tools, recommendations, and best practices you need to succeed. We'll explore battle-tested strategies for implementing a low-latency, high-volume trading platform using AlloyDB and Spark Streaming on Dataproc.

Leverage Composer Orchestration to create a scalable and efficient data pipeline that meets the demands of algo trading and can handle increasing data volumes and trading activity by utilizing the scalability of Google Cloud services.

The age of agentic AI in telecom

2025-04-10 · Google Cloud Next '25

session

by Eric Parsons (Ericsson) , Brian Kracik (Google Cloud) , Sudharsan Srinivasan (Verizon) , Muninder Sambi (Google Cloud) , Scott Petty (Vodafone)

AI/ML

The telecom industry has always been critical to advancing how we communicate, work, and play, whether through creation of our mobile world or streaming through high bandwidth connectivity. In this session we will explore how communication service providers from around the globe are leveraging AI agents across their workforce, customer experience, field operations, network operations, and more.

Streaming Data Governance: Three Must-Have Requirements to Support AI/ML Innovation - Audio Blog

2025-04-10 · Secrets of Data Analytics Leaders Listen

podcast_episode

AI/ML Data Governance

This blog defines the governance requirements that streaming data pipelines must meet to make artificial intelligence/machine learning (AI/ML) initiatives successful. Published at: https://www.eckerson.com/articles/streaming-data-governance-three-must-have-requirements-to-support-ai-ml-innovation

When search fails: Forensic approaches

2025-04-09 · Google Cloud Next '25

session

by Vijay Sekhri (Google Cloud)

AI/ML Cloud Computing Data Management Data Quality GCP

Enhance your data ingestion architecture's resilience with Google Cloud's serverless solutions. Gain end-to-end visibility into your data's lineage—track each data point's transformation journey, including timestamps, user actions, and process outcomes. Implement real-time streaming and daily batch processes for Vertex AI Retail Search to deliver near real-time search capabilities while maintaining a daily backup for contingencies. Adopt best practices for data management, lineage tracking, and forensic capabilities to streamline issue diagnosis. This talk presents a scalable and fault-tolerant design that optimizes data quality and search performance while ensuring forensic-level traceability for every data movement.

Confluent & Vertex AI: SQL Automation & Real-Time Data Streaming with LLMs

2025-04-09 · Google Cloud Next '25

session

by Pascal Vantrepote (Confluent) , Dustin Shammo (Confluent)

AI/ML Analytics Cloud Computing GCP LLM SQL

Join this Cloud Talk to explore how Large Language Models (LLMs) can revolutionize your data workflows. Learn to automate SQL query generation and stream results into Confluent using Vertex AI for real-time analytics and decision-making. Dive into integrating advanced AI into data pipelines, simplifying SQL creation, enhancing workflows, and leveraging Vertex AI for scalable machine learning. Discover how to optimize your data infrastructure and drive insights with Confluent’s Data Streaming Platform and cutting-edge AI technology.

This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.

Solve real-time AI challenges: Bigtable and BigQuery in Spotify’s music recommendation engine

2025-04-09 · Google Cloud Next '25

session

by Christopher Crosbie (Google Cloud) , Sandeep Karmarkar (Google Cloud) , Sanket Gupta (Spotify)

AI/ML BigQuery Cloud Computing Kafka

Google's Data Cloud is a unified platform for the entire data lifecycle, from streaming with Managed Kafka, to ML feature creation in BigQuery, to global deployment via Bigtable. In this talk, we’ll give you a behind the scenes look at how Spotify's recommendation engine team uses Google's Data Cloud for their feature pipelines. Plus, we will demonstrate BigQuery AI Query Engine and how it streamlines feature development and testing. Finally, we'll explore new Bigtable capabilities that simplify application deployment and monitoring.

Bring the power of BigQuery to your Apache Iceberg lakehouse

2025-04-09 · Google Cloud Next '25

session

by Gaurav Saxena (Automotive Industry) , Edward Byne (Spotify) , Pavan Edara (Google Cloud) , Filipe Regadas (Spotify) , Aniket Arora (Google Cloud)

AI/ML Analytics BigQuery Data Lakehouse Iceberg

Unlock the potential of AI with high-performance, scalable lakehouses using BigQuery and Apache Iceberg. This session details how BigQuery leverages Google's infrastructure to supercharge Iceberg, delivering peak performance and resilience. Discover BigQuery's unified read/write path for rapid queries, superior storage management beyond simple compaction, and robust, high-throughput streaming pipelines. Learn how Spotify utilizes BigQuery's lakehouse architecture for a unified data source, driving analytics and AI innovation.

Creating a turnkey streaming data lakehouse for BigQuery and BigLake users with Redpanda Iceberg Topics

2025-04-09 · Google Cloud Next '25

session

by Matt Schumpert (Redpanda)

API BigQuery Cloud Computing Data Lakehouse ETL/ELT GCP Iceberg Kafka Redpanda SQL

Redpanda, a leading Kafka API-compatible streaming platform, now supports storing topics in Apache Iceberg, seamlessly fusing low-latency streaming with data lakehouses using BigQuery and BigLake in GCP. Iceberg Topics eliminate complex & inefficient ETL between streams and tables, making real-time data instantly accessible for analysis in BigQuery This push-button integration eliminates the need for costly connectors or custom pipelines, enabling both simple and sophisticated SQL queries across streams and other datasets. By combining Redpanda and Iceberg, GCP customers gain a secure, scalable, and cost-effective solution that transforms their agility while reducing infrastructure and human capital costs.

This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.

Kir Tititevsky: Modern Streaming Architecture Transforming the Service Bus

2025-03-31 · Straight Data Talk Listen

podcast_episode

by Yuliia Tkachova (Masthead Data) , Kir Titievsky (Google)

Cloud Computing Cloud Storage Dataflow GCP Kafka Pub/Sub

Kir Titievsky, Product Manager at Google Cloud with extensive experience in streaming and storage infrastructure, joined Yuliia and Dumky to talk about streaming. Drawing from his work with Apache Kafka, Cloud PubSub, Dataflow and Cloud Storage since 2015, Kir explains the fundamental differences between streaming and micro-batch processing. He challenges common misconceptions about streaming costs, explaining how streaming can be significantly less expensive than batch processing for many use cases. Kir shares insights on the "service bus architecture" revival, discussing how modern distributed messaging systems have solved historic bottlenecks while creating new opportunities for business and performance needs.Kir's medium - https://medium.com/@kir-gcpKir's Linkedin page - https://www.linkedin.com/in/kir-titievsky-%F0%9F%87%BA%F0%9F%87%A6-7775052/

talk-data.com

Activity Trend

Top Events

Top Speakers

Databricks Data Privacy

Databricks Performance Optimization

Databricks Streaming and Lakeflow Declarative Pipelines

Data Ingestion with Lakeflow Connect

Bringing stories to life with AI, data streaming and generative agents

How we unified feature engineering across data and backend at Monzo

Apache Kafka in Action

Workshop – A Local-First Approach to Extremely Fast Streaming Visualization

Use real-time data for AI/ML workloads in BigQuery

Save time and effort with Google Cloud Managed Service for Apache Kafka

Disrupting streaming experiences with AI transformation

Construct a scalable, high-volume trading platform with low latency using AlloyDB and Spark Streaming on Dataproc

The age of agentic AI in telecom

Streaming Data Governance: Three Must-Have Requirements to Support AI/ML Innovation - Audio Blog

When search fails: Forensic approaches

Confluent & Vertex AI: SQL Automation & Real-Time Data Streaming with LLMs

Solve real-time AI challenges: Bigtable and BigQuery in Spotify’s music recommendation engine

Bring the power of BigQuery to your Apache Iceberg lakehouse

Creating a turnkey streaming data lakehouse for BigQuery and BigLake users with Redpanda Iceberg Topics

Kir Tititevsky: Modern Streaming Architecture Transforming the Service Bus