Contribute Media
A thank you to everyone who makes this possible: Read More

Zarr: Scalable Storage of Tensor Data for Use in Parallel and Distributed Computing

Description

Many scientific problems involve computing over large N-dimensional typed arrays of data, and reading or writing data is often the major bottleneck limiting speed or scalability. The Zarr project is developing a simple, scalable approach to storage of such data in a way that is compatible with a range of approaches to distributed and parallel computing. We describe the Zarr protocol and data storage format, and the current state of implementations for various programming languages including Python. We also describe current uses of Zarr in malaria genomics, the Human Cell Atlas, and the Pangeo project.

Details

Improve this page