talk-data.com
PyData
talk
2025-07-10 at 22:50
VirtualiZarr and Icechunk: How to build a cloud-optimised datacube of archival files in 3 lines of xarray
Event:
SciPy 2025
Speakers
Topics
Description
The best way to distribute large scientific datasets is via the Cloud, in Cloud-Optimized formats. But often this data is stuck in archival pre-Cloud file formats such as netCDF.
VirtualiZarr makes it easy to create "Virtual" Zarr datacubes, allowing performant access to huge archival datasets as if it were in the Cloud-Optimized Zarr format, without duplicating any of the original data.
We will demonstrate using VirtualiZarr to generate references to archival files, combine them into one array datacube using xarray-like syntax, commit them to Icechunk, and read the data back with zarr-python v3.