talk-data.com talk-data.com

Sourav Gulati

Speaker

Sourav Gulati

1

talks

Senior Resident Solutions Architect Databricks

Sourav is a Senior Resident Solutions Architect at Databricks with over 13 years of experience in data engineering. He has worked extensively with clients across various industries, delivering robust and scalable data solutions. He is passionate about helping organizations unlock the full potential of Databricks by leveraging data effectively to drive smarter decisions and meaningful outcomes.

Bio from: Data + AI Summit 2025

Filtering by: Data + AI Summit 2025 ×

Filter by Event / Source

Talks & appearances

Showing 1 of 2 activities

Search activities →
Breaking Barriers: Building Custom Spark 4.0 Data Connectors with Python

Building a custom Spark data source connector once required Java or Scala expertise, making it complex and limiting. This left many proprietary data sources without public SDKs disconnected from Spark. Additionally, data sources with Python SDKs couldn't harness Spark’s distributed power. Spark 4.0 changes this with a new Python API for data source connectors, allowing developers to build fully functional connectors without Java or Scala. This unlocks new possibilities, from integrating proprietary systems to leveraging untapped data sources. Supporting both batch and streaming, this API makes data ingestion more flexible than ever. In this talk, we’ll demonstrate how to build a Spark connector for Excel using Python, showcasing schema inference, data reads/writes and streaming support. Whether you're a data engineer or Spark enthusiast, you’ll gain the knowledge to integrate Spark with any data source — entirely in Python.