Join us for an interactive Ask Me Anything (AMA) session on the latest advancements in Apache Spark 4, including Spark Connect — the new client-server architecture enabling seamless integration with IDEs, notebooks and custom applications. Learn about performance improvements, enhanced APIs and best practices for leveraging Spark’s next-generation features. Whether you're a data engineer, Spark developer or big data enthusiast, bring your questions on architecture, real-world use cases and how these innovations can optimize your workflows. Don’t miss this chance to dive deep into the future of distributed computing with Spark!
talk-data.com
Speaker
DB Tsai
2
talks
DB Tsai is an engineering leader at the Databricks Spark team. He is an Apache Spark Project Management Committee (PMC) Member and Committer, and he enjoys building teams with great cultures focusing on large scale distributed data infrastructure. Before his transition to a leadership role, he implemented several algorithms including Linear Regression and Binary/Multinomial Logistic Regression with Elastici-Net (L1/L2) regularization using LBFGS/OWL-QN optimizers in Apache Spark project.
Bio from: Data + AI Summit 2025
Filter by Event / Source
Talks & appearances
Showing 2 of 2 activities
Apache Spark has long been recognized as the leading open-source unified analytics engine, combining a simple yet powerful API with a rich ecosystem and top-notch performance. In the upcoming Spark 4.1 release, the community reimagines Spark to excel at both massive cluster deployments and local laptop development. We’ll start with new single-node optimizations that make PySpark even more efficient for smaller datasets. Next, we’ll delve into a major “Pythonizing” overhaul — simpler installation, clearer error messages and Pythonic APIs. On the ETL side, we’ll explore greater data source flexibility (including the simplified Python Data Source API) and a thriving UDF ecosystem. We’ll also highlight enhanced support for real-time use cases, built-in data quality checks and the expanding Spark Connect ecosystem — bridging local workflows with fully distributed execution. Don’t miss this chance to see Spark’s next chapter!