Skip to content

RFC: Dataset storage information in HDF5/JSON #75

Open

Description

This is a proposal to add dataset storage information to HDF5/JSON. JSON key for this is named byteBlocks. The word "block" is hopefully still technically accurate while not too similar to "chunk".

Below is an example for one dataset. Block JSON keys are in the same format as in the HSDS schema, e.g. 0_0 and 0_1. Two blocks are in the example; the 0_1 block has an optional url key in case of remote blocks. The url key can also apply to the entire dataset, in which case cannot appear in the blocks.

{
    "datasets": {
        "7f335a2e-7ab1-11e4-87a5-3c15c2da029e": {
            "attributes": [], 
            "dcpl": {
                "fillValue": 0,
                "layout": {
                    "class": "H5D_CHUNKED",
                    "dims": [8]
                }
            },
            "shape": {
                "class": "H5S_SIMPLE",
                "dims": [10, 10], 
                "maxdims": [10, 10]
            }, 
            "type": {
                "base": "H5T_STD_I32BE", 
                "class": "H5T_INTEGER"
            }, 
            "byteBlocks": {
                "0_0": {
                    "offset": 1234,
                    "size": 2567,
                },
                "0_1": {
                    "offset": 56789,
                    "size": 1967,
                    "url": "s3://mybucket/path/to/object"
                }
            }
        }
    }
}

cc: @derobins @jreadey @gheber

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions