Ever been burned by a mysterious slowdown in your data pipeline? In this session, we'll reveal how a stealthy performance regression in the Polars DataFrame library was hunted down and squashed. Using git bisect, Bash scripting, and uv, we automated commit compilation and benchmarking across two repos to pinpoint a commit that degraded multi-file Parquet loading. This led to challenging assumptions and rethinking performance monitoring for the Python data science library Polars.
talk-data.com
Speaker
Thijs Nieuwdorp
5
talks
Frequent Collaborators
Filter by Event / Source
Talks & appearances
5 activities · Newest first
At VodafoneZiggo, we're building digital LLM tools that provide instant information, automate repetitive tasks, and will ultimately serve as a digital buddy. This talk explores how our projects enhance efficiency and transform fieldwork, paving the way for a more effective and informed technical workforce.
Jeroen Janssens and Thijs Nieuwdo join me to chat about all things Polars. We discuss the evolution of the Polars library, its advantages over pandas, and their journey of writing 'Python Polars: The Definitive Guide.'
Learn how to transform your Python code into a command-line tool. Jeroen Janssens, author of Data Science at the Command Line, guides you through the process of turning your scripts into reusable, executable tools, integrating them into your data workflows and harnessing the power of the Unix command line.