Topic

Polars

data_manipulation data_analysis rust

Activities

2

tagged

Activity Trend

13 peak/qtr

2020-Q1 2026-Q2

Top Events

SciPy 2025 5 PyData Berlin 2025 3 O'Reilly Data Science Books 3 Data Engineering Central Podcast 3 PyData Paris 2025 2 PyData London 2025 2 DataTopics: All Things Data, AI & Tech 2 PyData Seattle 2025 2 PyConDE & PyData Berlin 2023 2 PyData Amsterdam 2025 2 Databricks DATA + AI Summit 2023 2 O'Reilly Data Engineering Books 1

Top Speakers

Marco Gorelli (Narwhals) 4 Dr. Jeroen Janssens (Posit) 3 Thijs Nieuwdorp (VodafoneZiggo) 2 Daniel Beach 2 Thomas Bierhance 1 Bernardo Dionisi 1 Brodie Vidrine 1 Guen Prawiroatmodjo 1 Vyas Ramasubramani 1 Ritchie Vink (Polars) 1 Oz Katz (Treeverse) 1 Joris Bekkers 1

Activities

Showing filtered results

All Video Podcast Book

Filtering by: PyData Seattle 2025 ×

Know Your Data(Frame) with Paguro: Declarative and Composable Validation and Metadata using Polars

2025-11-07 · PyData Seattle 2025

talk

by Bernardo Dionisi

Data Quality

Modern data pipelines are fast and expressive, but ensuring data quality is often not as straightforward. This talk introduces Paguro, an open-source, feature-rich validation and metadata library designed on top of the Polars DataFrame library. Paguro enables users to validate both single Data(Lazy)Frames and collections of Data(Lazy)Frames together, and provides beautifully formatted terminal diagnostics that explain why and where validation failed. Attendees will learn how to integrate the lightweight, fast, and composable validation toolkit into their workflows, from exploration to production, using a familiar Polars-native syntax.

Polars on Spark: Unlocking Performance with Arrow Python UDFs

2025-11-07 · PyData Seattle 2025

talk

by Shujing Yang , Allison Wang (Databricks)

Arrow PySpark Python Rust Spark

PySpark’s Arrow-based Python UDFs open the door to dramatically faster data processing by avoiding expensive serialization overhead. At the same time, Polars, a high-performance DataFrame library built on Rust, offers zero-copy interoperability with Apache Arrow. This talk shows how combining these two technologies unlocks new performance gains: writing Arrow UDFs with Polars in PySpark can deliver performance speedups compared to Python UDFs. Attendees will learn how Arrow UDFs work in PySpark, how it can be used with other data processing libraries, and how to apply this approach to real-world Spark pipelines for faster, more efficient workloads.