talk-data.com talk-data.com

J

Speaker

Justus Magin

3

talks

Frequent Collaborators

Filtering by: SciPy 2025 ×

Filter by Event / Source

Talks & appearances

Showing 3 of 3 activities

Search activities →

We illustrate the power and flexibility of a new extension point in Xarray's data model: "custom indexes" that allow Xarray users to neatly handle complex grids, and enables at least one new data model (vector data cubes). We present a whirlwind tour of specific examples to illustrate the power of this feature, and aim to stimulate experimentation during the sprints.

Over the past few years, Discrete Global Grid Systems (DGGS) that subdivide the earth into (roughly) equally sized faces have seen a rise in popularity. However, their in-memory representation is different from traditional projection-based data, which is either comprised of evenly shaped rectangular grid (aka raster) or discrete geometries (aka vector), and thus requires specialized tooling. In particular, this includes libraries that can work on the numeric cell ids defined by the specific DGGS.

xdggs is a library that provides a unified interface for xarray that allows working with and visualizing a variety of DGGS-indexed data sets.

Xarray provides data structures for multi-dimensional labeled arrays and a toolkit for scalable data analysis on large, complex datasets. Many real-world datasets often have hierarchical or heterogeneous structure, and are best organized through groups of related data arrays. Through xarray.DataTree, the xarray data model now supports opening datasets with a hierarchical structure of groups, such as HDF5 files and Zarr stores. This expanded data model is now general enough to manage data across different scientific disciplines, including geosciences and biosciences. This hands-on tutorial focuses on intermediate and advanced workflows using xarray to analyze real-world hierarchical data.