talk-data.com talk-data.com

G

Speaker

Gera Shegalov

1

talks

Principal Distributed Systems Engineer NVIDIA

German "Gera" Shegalov is a Principal Systems Engineer on the Apache Spark ETL Acceleration Team at NVIDIA. He received a master's and PhD in Computer Science from Saarland University. Prior to NVIDIA, Gera’s career includes scaling Hadoop clusters to 100s of petabytes and supporting 1000s of A/B tests daily at Twitter as well as architecting ML apps on Einstein Platform at Salesforce. He has contributed to multiple OSS projects such as Hadoop, Scalding, Spark, and TransmogrifAI.

Bio from: Data + AI Summit 2025

Filter by Event / Source

Talks & appearances

1 activities · Newest first

Search activities →
GPU Accelerated Spark Connect

Spark Connect, first included for SQL/DataFrame API in Apache Spark 3.4 and recently extended to MLlib in 4.0, introduced a new way to run Spark applications over a gRPC protocol. This has many benefits, including easier adoption for non-JVM clients, version independence from applications and increased stability and security of the associated Spark clusters. The recent Spark Connect extension for ML also included a plugin interface to configure enhanced server-side implementations of the MLlib algorithms when launching the server. In this talk, we shall demonstrate how this new interface, together with Spark SQL’s existing plugin interface, can be used with NVIDIA GPU-accelerated plugins for ML and SQL to enable no-code change, end-to-end GPU acceleration of Spark ETL and ML applications over Spark Connect, with optimal performance up to 9x at 80% cost reduction compared to CPU baselines.