PySpark’s Arrow-based Python UDFs open the door to dramatically faster data processing by avoiding expensive serialization overhead. At the same time, Polars, a high-performance DataFrame library built on Rust, offers zero-copy interoperability with Apache Arrow. This talk shows how combining these two technologies unlocks new performance gains: writing Arrow UDFs with Polars in PySpark can deliver performance speedups compared to Python UDFs. Attendees will learn how Arrow UDFs work in PySpark, how it can be used with other data processing libraries, and how to apply this approach to real-world Spark pipelines for faster, more efficient workloads.
talk-data.com
Topic
PySpark
big_data
distributed_computing
python
1
tagged
Activity Trend
14
peak/qtr
2020-Q1
2026-Q1
Top Events
O'Reilly Data Engineering Books
19
Databricks DATA + AI Summit 2023
16
Data + AI Summit 2025
13
Data Engineering Podcast
4
O'Reilly Data Science Books
2
PyData Berlin 2025
2
PyData Cardiff - July 2025
1
From a Fintech lens: MCP server live-coding & feature selection data hacks
1
dbt Coalesce 2025
1
PyData Seattle 2025
1
PyConDE & PyData Berlin 2023
1
SciPy 2025
1
Filtering by:
PyData Seattle 2025
×