Topic

NumPy

scientific_computing numerical_analysis python

Activities

2

tagged

Activity Trend

16 peak/qtr

2020-Q1 2026-Q2

Top Events

O'Reilly Data Science Books 41 SciPy 2025 11 O'Reilly Data Visualization Books 8 O'Reilly Data Engineering Books 6 ADSP: Algorithms + Data Structures = Programs 3 PyData Paris 2025 3 PyConDE & PyData Berlin 2023 2 PyData Seattle 2025 2 Data Engineering Podcast 2 Introduction AI Mini Bootcamp - Dr. Yasin Ceran 1 [Online] Contributing to the NumPy Documentation 1 PyData Paris 2024 1

Top Speakers

Wes McKinney (Posit) 3 Conor Hoekstra 3 Bryce Adelstein Lelbach (NVIDIA) 3 Kyran Dale 2 Ralf Gommers (Quansight Labs) 2 Ivan Idris 2 Hendrik Makait 2 Fabio Nelli 2 Robert Johansson 2 Martin Czygan 2 Jake VanderPlas 2 Ashwin Pajankar 2

Activities

Showing filtered results

All Video Podcast Book

Filtering by: PyConDE & PyData Berlin 2023 ×

Pragmatic ways of using Rust in your data project

2023-04-18 · PyConDE & PyData Berlin 2023

talk

by Christopher Prohm

Pandas Python Rust

Writing efficient data pipelines in Python can be tricky. The standard recommendation is to use vectorized functions implemented in Numpy, Pandas, or the like. However, what to do, when the processing task does not fit these libraries? Using plain Python for processing can result in lacking performance, in particular when handling large data sets.

Rust is a modern, performance-oriented programming language that is already widely used by the Python community. Augmenting data processing steps with Rust can result in substantial speed ups. In this talk will present strategies of using Rust in a larger Python data processing pipeline with a particular focus on pragmatism and minimizing integration efforts.

Observability for Distributed Computing with Dask

2023-04-18 · PyConDE & PyData Berlin 2023

talk

by Hendrik Makait

AI/ML Cloud Computing Data Engineering Data Science Pandas Python React

Debugging is hard. Distributed debugging is hell.

Dask is a popular library for parallel and distributed computing in Python. Dask is commonly used in data science, actual science, data engineering, and machine learning to distribute workloads onto clusters of many hundreds of workers with ease.

However, when things go wrong life can become difficult due to all of the moving parts. These parts include your code, other PyData libraries like NumPy/pandas, the machines you’re running on, the network between them, storage, the cloud, and of course issues with Dask itself. It can be difficult to understand what is going on, especially when things seem slower than they should be or fail unexpectedly. Observability is the key to sanity and success.

In this talk, we describe the tools Dask offers to help you observe your distributed cluster, analyze performance, and monitor your cluster to react to unexpected changes quickly. We will dive into distributed logging, automated metrics, event-based monitoring, and root-causing problems with diagnostic tooling. Throughout the talk, we will leverage real-world use cases to show how these tools help to identify and solve problems for large-scale users in the wild.

This talk should be particularly insightful for Dask users, but the approaches to observing distributed systems should be relevant to anyone operating at scale in production.