talk-data.com

Topic

Spark

Apache Spark

big_data distributed_computing analytics

Activities

tagged

Activity Trend

71 peak/qtr

2020-Q1 2026-Q2

Top Events

O'Reilly Data Engineering Books 143 Databricks DATA + AI Summit 2023 120 Data Engineering Podcast 84 Data + AI Summit 2025 66 O'Reilly Data Science Books 20 DATA MINER Big Data Europe Conference 2020 8 Microsoft Ignite 2025 7 Airflow Summit 2020 7 Google Cloud Next '24 6 Airflow Summit 2024 6 Big Data LDN 2025 5 Google Cloud Next '25 5

Top Speakers

Tobias Macey 84 Matei Zaharia (Databricks) 10 Reynold Xin (Databricks) 8 Jean-Georges Perrin (Actian) 5 Holden Karau (Fight Health Insurance) 5 Al Martin (IBM) 5 Mark Brown (Microsoft) 5 Martin Grund (Databricks) 4 Richie (DataCamp) 4 Denny Lee (Databricks) 4 Ali Ghodsi (Databricks) 4 Michael Armbrust (Databricks) 4

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Google Cloud Next '25 ×

Unleashing the power of Apache Spark integrated with BigQuery

2025-04-10 · Google Cloud Next '25

session

by Kunal Vashi (Google Cloud) , Bhooshan Mogal (Google Cloud) , James Callaghan (trivago)

AI/ML BigQuery Cloud Computing GCP

This session dives into the world of on-demand Apache Spark on Google Cloud. We explore its native integration with BigQuery, its new capabilities and the benefits of using Spark for AI and machine learning (ML) workloads. We’ll discuss why Spark is a good choice for large-scale data processing, distributed training, and distributed inferencing. We’ll learn from Trivago about how they leveraged the Spark and BigQuery together to simplify their AI and ML workflows.

Construct a scalable, high-volume trading platform with low latency using AlloyDB and Spark Streaming on Dataproc

2025-04-10 · Google Cloud Next '25

session

by Sachin Pawar (Google) , Surjit Singh (Google)

Cloud Computing Dataproc GCP Data Streaming

Overwhelmed by the complexities of building a robust and scalable data pipeline for algo trading with AlloyDB? This session provides the Google Cloud services, tools, recommendations, and best practices you need to succeed. We'll explore battle-tested strategies for implementing a low-latency, high-volume trading platform using AlloyDB and Spark Streaming on Dataproc.

Leverage Composer Orchestration to create a scalable and efficient data pipeline that meets the demands of algo trading and can handle increasing data volumes and trading activity by utilizing the scalability of Google Cloud services.

Build an open, unified AI lakehouse with BigQuery and OSS

2025-04-10 · Google Cloud Next '25

session

by Elango Ganesan (CME Group) , Susheel Kaushik (Google Cloud) , Vinod Ramachandran (Google Cloud) , Zenul Pomal (CME Group)

AI/ML BigQuery Data Governance Data Lakehouse Iceberg

This session provides a comprehensive guide to building a secure and unified AI lakehouse on BigQuery with the power of open source software (OSS). We’ll explore essential components, including data ingestion, storage, and management; AI and machine learning workflows; pipeline orchestration; data governance; and operational efficiency. Learn about the newest features that support both Apache Spark and Apache Iceberg.

Under the Iceberg: Simple, unified Cloud Storage for analytics data lakes

2025-04-10 · Google Cloud Next '25

session

by Edward Yang (Two Sigma) , Vivek Saraswat (Google Cloud) , Dave Stiver (Google Cloud)

AI/ML Analytics BigQuery Cloud Computing Cloud Storage Dataproc Iceberg

Modern analytics and AI workloads demand a unified storage layer for structured and unstructured data. Learn how Cloud Storage simplifies building data lakes based on Apache Iceberg. We’ll discuss storage best practices and new capabilities that enable high performance and cost efficiency. We’ll also guide you through real-world examples, including Iceberg data lakes with BigQuery or third-party solutions, data preparation for AI pipelines with Dataproc and Apache Spark, and how customers have built unified analytics and AI solutions on Cloud Storage.

Drive AI workloads with GPU-accelerated data processing, vector indexing and search

2025-04-10 · Google Cloud Next '25

session

by Felix Cheung (NVIDIA) , Corey Nolet (Nvidia)

AI/ML Cloud Computing Dataproc ETL/ELT GCP Vector DB

NVIDIA GPUs accelerate batch ETL workloads at significant cost savings and performance. In this session, we will delve into optimizing Apache Spark on GCP Dataproc using the G2 accelerator-optimized series with L4 GPUs via RAPIDS Accelerator For Apache Spark, showcasing up to 14x speedups and 80% cost reductions for Spark applications. We will demonstrate this acceleration through a reference AI architecture on financial transaction fraud detection, and go through performance measurements.

Unstructured data makes up the majority of all new data; a trend that's been growing exponentially since 2018. At these volumes, vector embeddings require indexes to be trained so that nearest neighbors can be efficiently approximated, avoiding the need for exhaustive lookups. However, training these indexes puts intense demand on vector databases to maintain a high ingest throughput. In this session, we will explain how the NVIDIA cuVS library is turbo charging vector database ingest with GPUs, providing speedups from 5-20x and improving data readiness.

This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.