talk-data.com

Speaker

Tathagata Das

Activities

talks

Sr. Staff Software Engineer Databricks

Tathagata Das is a Staff Software Engineer at Databricks and has been one of the core developers of Apache Spark (especially Structured Streaming) and Delta Lake. He is a member of Apache Spark PMC, and a Delta Lake committer. He is also one of the authors of Learning Spark: Lighting-fast Data Analytics (2nd edition). Previously, he was a grad student in the UC Berkeley at AMPLab where he conducted research about data-center processing frameworks and networks with Scott Shenker and Ion Stoica.

Bio from: Databricks DATA + AI Summit 2023

Frequent Collaborators

Denny Lee Databricks 2

Filter by Event / Source

Data + AI Summit 2025 2 Databricks DATA + AI Summit 2023 2 O'Reilly Data Engineering Books 1

Talks & appearances

5 activities · Newest first

Search activities →

Extending the Lakehouse: Power Interoperable Compute With Unity Catalog Open APIs

2025-06-11 · Data + AI Summit 2025 Watch

talk

with Tathagata Das (Databricks) , Michelle Leon (Databricks)

Flink API Data Lakehouse DuckDB Iceberg Cyber Security

The lakehouse is built for storage flexibility, but what about compute? In this session, we’ll explore how Unity Catalog enables you to connect and govern multiple compute engines across your data ecosystem. With open APIs and support for the Iceberg REST Catalog, UC lets you extend access to engines like Trino, DuckDB, and Flink while maintaining centralized security, lineage, and interoperability. We will show how you can get started today working with engines like Apache Spark and Starburst to read and write to UC managed tables with some exciting demos. Learn how to bring flexibility to your compute layer—without compromising control.

Open Source Unity Catalog: Getting Started, Best Practices and Governance at Scale

2025-06-10 · Data + AI Summit 2025 Watch

talk

with Tathagata Das (Databricks) , Ben Wilson (Databricks)

AI/ML

How to use UC OSS, what features are available, and intro to the ecosystem. We'll dive into the latest release and get hands-on with demos for working with your UC data and AI assets — including tables, volumes, models and AI functions.

Delta Kernel: Simplifying Building Connectors for Delta

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

with Denny Lee (Databricks) , Tathagata Das (Databricks)

Flink API Data Lakehouse Databricks Delta PySpark

Since the release of Delta 2.0, the project has been growing at a breakneck speed. In this session, we will cover all the latest capabilities that makes Delta Lake the best format for the lakehouse. Based on lessons learned from this past year, we will introduce Project Aqueduct and how we will simplify building Delta Lake APIs from Rust and Go to Trino, Flink, and PySpark.

Talk by: Tathagata Das and Denny Lee

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Delta Lake AMA

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

with Robert Pack (Databricks) , Bart Samwel (Databricks) , Allison Portis , Tathagata Das (Databricks)

Databricks Delta

Have some great questions about Delta Lake? Well, come by and ask the experts your questions!

Talk by: Bart Samwel, Tathagata Das, Robert Pack, and Allison Portis

Learning Spark, 2nd Edition

2020-07-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

with Denny Lee (Databricks) , Brooke Wenig , Jules S. Damji (Anyscale Inc) , Tathagata Das (Databricks)

data data-engineering apache-spark AI/ML Analytics API

Data is bigger, arrives faster, and comes in a variety of formatsâ??and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, youâ??ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow