Skip to content

Listing every format that could be represented as virtual zarr #218

Open
@TomNicholas

Description

@TomNicholas

Let's list all the file formats that could potentially be represented efficiently as "virtual zarr" - i.e. zarr + chunk manifests.

The important criteria here is that the format must store data in a small number of contiguous chunks, such that access using http range requests to object storage is efficient. This rules out some formats, for example I don't think we can efficiently access this format that @kmuehlbauer mentioned over in openradar/xradar#187 (comment):

file formats where variables are written interleaved within one chunk of data (eg: 100 bytes v1, 100 bytes v2, 100 bytes v3, 100 bytes v1, 100 bytes v2, 100 bytes v3, ...)? Is there something like strides available?

If we start thinking of Zarr as a "SuperFormat" (super as in superset, not as in super-duper), then this is the list of existing formats comprising that set of what can be referenced using chunk manifests (see zarr-developers/zarr-specs#287).


Definitely can support:

Probably can support:

Maybe can support?

Probably can't support:

(The checkboxes indicate whether or not a working implementation already exists - going through kerchunks' in-memory format as an intermediate or creating a ManifestArray directly.)

cc @jhamman @d-v-b

Metadata

Metadata

Assignees

No one assigned

    Labels

    KerchunkRelating to the kerchunk library / specification itselfhelp wantedExtra attention is neededreferences generationReading byte ranges from archival filesusage exampleReal world use case examples

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions