Speaker

Xiao Li

Activities

3

talks

Engineering Director Databricks

Xiao Li is an Engineering Director at Databricks, an Apache Spark Committer, and a PMC member. He has a deep interest in Spark and database engines. Previously, he was an IBM Master Inventor and an expert in asynchronous database replication and consistency verification. Xiao earned his Ph.D. from the University of Florida in 2011.

Bio from: Data + AI Summit 2025

Filter by Event / Source

Data + AI Summit 2025 2 Databricks DATA + AI Summit 2023 1

Talks & appearances

3 activities · Newest first

Search activities →

Declarative Pipelines — Ask Us Anything

2025-06-12 · Data + AI Summit 2025

lightning_talk

with Sandy Ryza (Databricks) , Denny Lee (Databricks) , Xiao Li (Databricks)

ETL/ELT SQL

Join us for an insightful Ask Me Anything (AMA) session on Declarative Pipelines — a powerful approach to simplify and optimize data workflows. Learn how to define data transformations using high-level, SQL-like semantics, reducing boilerplate code while improving performance and maintainability. Whether you're building ETL processes, feature engineering pipelines, or analytical workflows, this session will cover best practices, real-world use cases and how Declarative Pipelines can streamline your data applications. Bring your questions and discover how to make your data processing more intuitive and efficient!

The Upcoming Apache Spark 4.1: The Next Chapter in Unified Analytics

2025-06-11 · Data + AI Summit 2025 Watch

talk

with DB Tsai (Databricks) , Xiao Li (Databricks)

Analytics API Data Quality ETL/ELT PySpark Python

Apache Spark has long been recognized as the leading open-source unified analytics engine, combining a simple yet powerful API with a rich ecosystem and top-notch performance. In the upcoming Spark 4.1 release, the community reimagines Spark to excel at both massive cluster deployments and local laptop development. We’ll start with new single-node optimizations that make PySpark even more efficient for smaller datasets. Next, we’ll delve into a major “Pythonizing” overhaul — simpler installation, clearer error messages and Pythonic APIs. On the ETL side, we’ll explore greater data source flexibility (including the simplified Python Data Source API) and a thriving UDF ecosystem. We’ll also highlight enhanced support for real-time use cases, built-in data quality checks and the expanding Spark Connect ecosystem — bridging local workflows with fully distributed execution. Don’t miss this chance to see Spark’s next chapter!

Deep Dive into the New Features of Apache Spark™ 3.4

2023-07-25 · Databricks DATA + AI Summit 2023 Watch

video

with Xiao Li (Databricks) , Daniel Tenedorio (Databricks)

Databricks Jira PySpark Spark SQL

Join us for this Technical Deep Dive session. In 2022, Apache Spark™ was awarded the prestigious SIGMOD Systems Award, because Spark is the de facto standard for data processing.

In this session, we will share the latest progress in Apache Spark community. With tremendous contribution from the open source community, Spark 3.4 managed to resolve in excess of 2,400 Jira tickets. We will talk about the major features and improvements in Spark 3.4. The major updates are Spark Connect, numerous PySpark and SQL language features, engine performance enhancements, as well as operational improvements in Spark UX and error handling.

Talk by: Xiao Li and Daniel Tenedorio

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc