File Chunk Store

Hello!

I want to propose adding a new Zarr store type when all array chunks are located in a single binary file. A propotype implementation, named _file chunk store_, is described in this Medium [post](https://medium.com/pangeo/cloud-performant-reading-of-netcdf4-hdf5-data-using-the-zarr-library-1a95c5c92314). In this approach, Zarr metadata (`.zgroup`, `.zarray`, `.zattrs`, or `.zmetadata`) are stored in one of the current Zarr store types while the array chunks are in a binary file. The file chunk store translates array chunk keys into file seek and read operations and therefore only provides read access to the chunk data.

The file chunk store requires a mapping between array chunk keys and their file locations. The prototype implementation put this information for every Zarr array in JSON files named `.zchunkstore`. An example is below:
```json
   {
    "BEAM0001/tx_pulseflag/0": {
        "offset": 94854560,
        "size": 120
    },
    "BEAM0001/tx_pulseflag/1": {
        "offset": 94854680,
        "size": 120
    },
    "BEAM0001/tx_pulseflag/2": {
        "offset": 94854800,
        "size": 120
    },
    "BEAM0001/tx_pulseflag/3": {
        "offset": 94854920,
        "size": 120
    },
    "BEAM0001/tx_pulseflag/4": {
        "offset": 96634038,
        "size": 120
    },
    "BEAM0001/tx_pulseflag/5": {
        "offset": 96634158,
        "size": 123
    },
    "source": {
        "array_name": "/BEAM0001/tx_pulseflag",
        "uri": "https://e4ftl01.cr.usgs.gov/GEDI/GEDI01_B.001/2019.05.26/GEDI01_B_2019146164739_O02560_T04067_02_003_01.h5"
    }
}
```
Array chunk file location is described with the starting byte (`offset`) in the file and the number of bytes to read (`size`). Also included is the file information (`source`) to enable verification of chunk data provenance. The file chunk store prototype uses file-like Python objects, delegating to users the responsibility to arrange access to correct files.

We can discuss specific implementation details If there is enough interest in this new store type.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

File Chunk Store #556

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

File Chunk Store #556

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions