Speaker

Hyukjin Kwon

Activities

1

talks

Staff Software Engineer Databricks

Hyukjin is a Databricks software engineer as the tech-lead in OSS PySpark team, ASF member, Apache Spark PMC member and committer, working on many different areas in Apache Spark such as PySpark, Spark SQL, SparkR, infrastructure, etc. He is the top contributor in Apache Spark, and leads efforts such as Project Zen, Pandas API on Spark, and Python Spark Connect.

Bio from: Databricks DATA + AI Summit 2023

Filtering by: Data + AI Summit 2025 ×

Filter by Event / Source

Databricks DATA + AI Summit 2023 2 Data + AI Summit 2025 1

Talks & appearances

Showing 1 of 3 activities

Search activities →

No-Code Change in Your Python UDF for Arrow Optimization

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

API Arrow Pandas Python Spark

Apache Spark™ has introduced Arrow-optimized APIs such as Pandas UDFs and the Pandas Functions API, providing high performance for Python workloads. Yet, many users continue to rely on regular Python UDFs due to their simple interface, especially when advanced Python expertise is not readily available. This talk introduces a powerful new feature in Apache Spark that brings Arrow optimization to regular Python UDFs. With this enhancement, users can leverage performance gains without modifying their existing UDFs — simply by enabling a configuration setting or toggling a UDF-level parameter. Additionally, we will dive into practical tips and features for using Arrow-optimized Python UDFs effectively, exploring their strengths and limitations. Whether you’re a Spark beginner or an experienced user, this session will allow you to achieve the best of both simplicity and performance in your workflows with regular Python UDFs.