talk-data.com

Topic

Polars

data_manipulation data_analysis rust

Activities

tagged

Activity Trend

13 peak/qtr

2020-Q1 2026-Q1

Top Events

SciPy 2025 5 PyData Berlin 2025 3 O'Reilly Data Science Books 3 Data Engineering Central Podcast 3 PyData Paris 2025 2 PyData London 2025 2 DataTopics: All Things Data, AI & Tech 2 PyData Seattle 2025 2 PyConDE & PyData Berlin 2023 2 PyData Amsterdam 2025 2 Databricks DATA + AI Summit 2023 2 O'Reilly Data Engineering Books 1

Top Speakers

Marco Gorelli (Narwhals) 4 Dr. Jeroen Janssens (Posit) 3 Thijs Nieuwdorp (VodafoneZiggo) 2 Daniel Beach 2 Thomas Bierhance 1 Bernardo Dionisi 1 Brodie Vidrine 1 Guen Prawiroatmodjo 1 Vyas Ramasubramani 1 Ritchie Vink (Polars) 1 Oz Katz (Treeverse) 1 Joris Bekkers 1

Activities

Showing filtered results

All Video Podcast Book

Filtering by: SciPy 2025 ×

From Legacy to Leading-Edge: Revamping NCEI Software for the Cloud Era

2025-07-11 · SciPy 2025

talk

by Sarah Purpura

AWS CI/CD Cloud Computing DevOps

Extreme weather events threaten industries and economic stability. NOAA’s National Centers for Environmental Information (NCEI) addresses this through the Industry Proving Grounds (IPG), which modernizes data delivery by collaborating with sectors like re/insurance and retail to develop practical, data-driven solutions. This presentation explores IPG’s technical innovations, including implementing Polars for efficient data processing, AWS for scalability, and CI/CD pipelines for streamlined deployment. These tools enhance data accessibility, reduce latency, and support real-time decision-making. By integrating scientific computing, cloud technology, and DevOps, NCEI improves climate resilience and provides a model for leveraging open-source tools to address global challenges.

Accelerated DataFrames for all: Bringing GPU acceleration to pandas and Polars

2025-07-10 · SciPy 2025

talk

by Vyas Ramasubramani

Analytics Data Analytics Pandas Python

In Python, data analytics users often prioritize convenience, flexibility, and familiarity over pure performance. The cuDF DataFrame library provides a pandas-like experience with from 10x up to 50x performance improvements, but subtle differences prevent it from being a true drop-in replacement for many users. This talk will showcase the evolution of this library to provide zero-code change experiences, first for pandas users and now for Polars. We will provide examples of this usage and a high level overview of how users can make use of these today. We will then delve into the details of how GPU acceleration is implemented differently in pandas and Polars, along with a deep dive into some of the different technical challenges encountered for each. This talk will have something for both data practitioners and library developers.

User guides: engaging new users, delighting old ones

2025-07-09 · SciPy 2025

talk

by Michael Chow (RStudio)

API DuckDB

User guides are the piece you often hit right after clicking the "Learn" or "Get Started" button in a package's documentation. They're responsible for onboarding new users, and providing a learning path through a package. Surprisingly, while pieces of documentation like the API Reference tend to be the same, the design of user guides tend to differ across packages.

In this talk, I'll discuss how to design an effective user guide for open source software. I'll explain how the guides for Polars, DuckDB, and FastAPI balance working end-to-end like a course, with being browsable like a reference.

Breaking Out of the Loop: Refactoring Legacy Software with Polars

2025-07-09 · SciPy 2025

talk

by Brodie Vidrine

Java Python

Data manipulation libraries like Polars allow us to analyze and process data much faster than with native Python, but that’s only true if you know how to use them properly. When the team working on NCEI's Global Summary of the Month first integrated Polars, they found it was actually slower than the original Java version. In this talk, we'll discuss how our team learned how to think about computing problems like spreadsheet programmers, increasing our products’ processing speed by over 80%. We’ll share tips for rewriting legacy code to take advantage of parallel processing. We’ll also cover how we created custom, pre-compiled functions with Numba when the business requirements were too complex for native Polars expressions.

All the SQL a Pythonista needs to know: an introduction to SQL and DataFrames with DuckDB

2025-07-07 · SciPy 2025

talk

by Guen Prawiroatmodjo , Jacob Matson (MotherDuck) , Alex Monahan (MotherDuck)

Cloud Computing DuckDB HTML Pandas Python SQL

Structured Query Language (or SQL for short) is a programming language to manage data in a database system and an essential part of any data engineer’s tool kit. In this tutorial, you will learn how to use SQL to create databases, tables, insert data into them and extract, filter, join data or make calculations using queries. We will use DuckDB, a new open source embedded in-process database system that combines cutting edge database research with dataframe-inspired ease of use. DuckDB is only a pip install away (with zero dependencies), and runs right on your laptop. You will learn how to use DuckDB with your existing Python tools like Pandas, Polars, and Ibis to simplify and speed up your pipelines. Lastly, you will learn how to use SQL to create fast, interactive data visualizations, and how to teach your data how to fly and share it via the Cloud.