Topic

Parquet

Apache Parquet

columnar_storage big_data compression file_format storage

Activities

1

tagged

Activity Trend

5 peak/qtr

2020-Q1 2026-Q2

Top Events

Data Engineering Podcast 20 Databricks DATA + AI Summit 2023 8 O'Reilly Data Engineering Books 8 O'Reilly Data Science Books 3 Data Council Austin 2024 - Day 1 2 Data Council 2023 2 PyData Boston 2025 2 PyData Amsterdam 2025 1 DuckCon #4 Amsterdam 2024 1 Snowflake World Tour Berlin 1 DATA MINER Big Data Europe Conference 2020 1 PyData Paris 2025 1

Top Speakers

Tobias Macey 20 Julien Le Dem (Astronomer) 4 Matthew Topol (Voltron Data) 3 Dipti Borkar (Microsoft) 2 Tony Wang 1 Willy Lulciuc (WeWork) 1 Rodrigo Silva Ferreira 1 Mohammed Guller 1 Aneesh Karve (Quilt Data) 1 Rok Mihevc 1 Ewen Cheslack-Postava (Confluent) 1 Javier Ramirez 1

Activities

Showing filtered results

All Video Podcast Book

Filtering by: PyData Amsterdam 2025 ×

Actionable Techniques for Finding Performance Regressions

2025-09-25 · PyData Amsterdam 2025

talk

by Thijs Nieuwdorp (VodafoneZiggo) , Dr. Jeroen Janssens (Posit)

Bash Data Science Git Polars Python

Ever been burned by a mysterious slowdown in your data pipeline? In this session, we'll reveal how a stealthy performance regression in the Polars DataFrame library was hunted down and squashed. Using git bisect, Bash scripting, and uv, we automated commit compilation and benchmarking across two repos to pinpoint a commit that degraded multi-file Parquet loading. This led to challenging assumptions and rethinking performance monitoring for the Python data science library Polars.