talk-data.com talk-data.com

T

Speaker

Takuya Ueshin

1

talks

Sr. Software Engineer Databricks

Takuya is a software engineer at Databricks and an Apache Spark committer and PMC member. His primary focus is on PySpark, including the Pandas API on Spark and the Spark Connect Python client. Additionally, he has experience working on other Spark components, such as Spark Core and Spark SQL.

Bio from: Data + AI Summit 2025

Filter by Event / Source

Talks & appearances

1 activities · Newest first

Search activities →
What’s New in PySpark: TVFs, Subqueries, Plots, and Profilers

PySpark’s DataFrame API is evolving to support more expressive and modular workflows. In this session, we’ll introduce two powerful additions: table-valued functions (TVFs) and the new subquery API. You’ll learn how to define custom TVFs using Python User-Defined Table Functions (UDTFs), including support for polymorphism, and how subqueries can simplify complex logic. We’ll also explore how lateral joins connect these features, followed by practical tools for the PySpark developer experience—such as plotting, profiling, and a preview of upcoming capabilities like UDF logging and a Python-native data source API. Whether you're building production pipelines or extending PySpark itself, this talk will help you take full advantage of the latest features in the PySpark ecosystem.