talk-data.com talk-data.com

Event

SciPy 2025

2025-07-07 – 2025-07-13 PyData

Activities tracked

3

Filtering by: Tom Nicholas ×

Sessions & talks

Showing 1–3 of 3 · Newest first

Search within this event →

VirtualiZarr and Icechunk: How to build a cloud-optimised datacube of archival files in 3 lines of xarray

2025-07-10
talk

The best way to distribute large scientific datasets is via the Cloud, in Cloud-Optimized formats. But often this data is stuck in archival pre-Cloud file formats such as netCDF.

VirtualiZarr makes it easy to create "Virtual" Zarr datacubes, allowing performant access to huge archival datasets as if it were in the Cloud-Optimized Zarr format, without duplicating any of the original data.

We will demonstrate using VirtualiZarr to generate references to archival files, combine them into one array datacube using xarray-like syntax, commit them to Icechunk, and read the data back with zarr-python v3.

Cubed: Scalable array processing with bounded-memory in Python

2025-07-09
talk

Cubed is a framework for distributed processing of large arrays without a cluster. Designed to respect memory constraints at all times, Cubed can express any NumPy-like array operation as a series of embarrassingly-parallel, bounded-memory steps. By using Zarr as persistent storage between steps, Cubed can run in a serverless fashion on both a local machine and on a range of Cloud platforms. After explaining Cubed’s model, we will show how Cubed has been integrated with Xarray and demonstrate its performance on various large array geoscience workloads.

Hierarchical Data Analysis with Xarray DataTree & Zarr

2025-07-08
talk

Xarray provides data structures for multi-dimensional labeled arrays and a toolkit for scalable data analysis on large, complex datasets. Many real-world datasets often have hierarchical or heterogeneous structure, and are best organized through groups of related data arrays. Through xarray.DataTree, the xarray data model now supports opening datasets with a hierarchical structure of groups, such as HDF5 files and Zarr stores. This expanded data model is now general enough to manage data across different scientific disciplines, including geosciences and biosciences. This hands-on tutorial focuses on intermediate and advanced workflows using xarray to analyze real-world hierarchical data.