Skip to content

File Chunk Store #556

Closed
Closed
@ajelenak

Description

@ajelenak

Hello!

I want to propose adding a new Zarr store type when all array chunks are located in a single binary file. A propotype implementation, named file chunk store, is described in this Medium post. In this approach, Zarr metadata (.zgroup, .zarray, .zattrs, or .zmetadata) are stored in one of the current Zarr store types while the array chunks are in a binary file. The file chunk store translates array chunk keys into file seek and read operations and therefore only provides read access to the chunk data.

The file chunk store requires a mapping between array chunk keys and their file locations. The prototype implementation put this information for every Zarr array in JSON files named .zchunkstore. An example is below:

   {
    "BEAM0001/tx_pulseflag/0": {
        "offset": 94854560,
        "size": 120
    },
    "BEAM0001/tx_pulseflag/1": {
        "offset": 94854680,
        "size": 120
    },
    "BEAM0001/tx_pulseflag/2": {
        "offset": 94854800,
        "size": 120
    },
    "BEAM0001/tx_pulseflag/3": {
        "offset": 94854920,
        "size": 120
    },
    "BEAM0001/tx_pulseflag/4": {
        "offset": 96634038,
        "size": 120
    },
    "BEAM0001/tx_pulseflag/5": {
        "offset": 96634158,
        "size": 123
    },
    "source": {
        "array_name": "/BEAM0001/tx_pulseflag",
        "uri": "https://e4ftl01.cr.usgs.gov/GEDI/GEDI01_B.001/2019.05.26/GEDI01_B_2019146164739_O02560_T04067_02_003_01.h5"
    }
}

Array chunk file location is described with the starting byte (offset) in the file and the number of bytes to read (size). Also included is the file information (source) to enable verification of chunk data provenance. The file chunk store prototype uses file-like Python objects, delegating to users the responsibility to arrange access to correct files.

We can discuss specific implementation details If there is enough interest in this new store type.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions