Speaker

Sourav Gulati

Activities

2

talks

Senior Resident Solutions Architect Databricks

Sourav is a Senior Resident Solutions Architect at Databricks with over 13 years of experience in data engineering. He has worked extensively with clients across various industries, delivering robust and scalable data solutions. He is passionate about helping organizations unlock the full potential of Databricks by leveraging data effectively to drive smarter decisions and meaningful outcomes.

Bio from: Data + AI Summit 2025

Filter by Event / Source

Data + AI Summit 2025 1 O'Reilly Data Engineering Books 1

Talks & appearances

2 activities · Newest first

Search activities →

Breaking Barriers: Building Custom Spark 4.0 Data Connectors with Python

2025-06-11 · Data + AI Summit 2025 Watch

talk

with Sourav Gulati (Databricks) , Ashish Saraswat (Databricks)

API Java Python Scala Spark Data Streaming

Building a custom Spark data source connector once required Java or Scala expertise, making it complex and limiting. This left many proprietary data sources without public SDKs disconnected from Spark. Additionally, data sources with Python SDKs couldn't harness Spark’s distributed power. Spark 4.0 changes this with a new Python API for data source connectors, allowing developers to build fully functional connectors without Java or Scala. This unlocks new possibilities, from integrating proprietary systems to leveraging untapped data sources. Supporting both batch and streaming, this API makes data ingestion more flexible than ever. In this talk, we’ll demonstrate how to build a Spark connector for Excel using Python, showcasing schema inference, data reads/writes and streaming support. Whether you're a data engineer or Spark enthusiast, you’ll gain the knowledge to integrate Spark with any data source — entirely in Python.

Apache Spark 2.x for Java Developers

2017-07-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

with Sourav Gulati (Databricks) , Sumit Kumar

data data-engineering apache-spark AI/ML Analytics API

Delve into mastering big data processing with 'Apache Spark 2.x for Java Developers.' This book provides a practical guide to implementing Apache Spark using the Java APIs, offering a unique opportunity for Java developers to leverage Spark's powerful framework without transitioning to Scala. What this Book will help me do Learn how to process data from formats like XML, JSON, CSV using Spark Core. Implement real-time analytics using Spark Streaming and third-party tools like Kafka. Understand data querying with Spark SQL and master SQL schema processing. Apply machine learning techniques with Spark MLlib to real-world scenarios. Explore graph processing and analytics using Spark GraphX. Author(s) None Kumar and None Gulati, experienced professionals in Java development and big data, bring their wealth of practical experience and passion for teaching to this book. With a clear and concise writing style, they aim to simplify Spark for Java developers, making big data approachable. Who is it for? This book is perfect for Java developers who are eager to expand their skillset into big data processing with Apache Spark. Whether you are a seasoned Spark user or first diving into big data concepts, this book meets you at your level. With practical examples and straightforward explanations, you can unlock the potential of Spark in real-world scenarios.