Skip to content

Provide offset for memory mapping / contiguous layout #321

Open
@maartenbreddels

Description

@maartenbreddels

Related: #265 #149
Thanks @rabernat for pointing me to this library.

In vaex (out of core dataframes) I use .hdf5 or arrow files which are memory mapped, which gives really good performance. Arrow natively supports this, and .hdf5 can be used if contiguous layout is specified. In this case, you can ask the offset (and size) of the array in the file. Once the offsets, types, endianness and lengths/shapes are collected, the Nd-arrays can be memory mapped, linked to a numpy array, and passed around to any library, giving you lazy reading and no memory wasting out of the box.

Is it an idea for zarr to support this layout and, provide an API to get this offset? This would make it really easy for me to support zarr in vaex. In case of chuncked storage, or compressions options, the hdf5 library returns an offset of -1 (the h5py translates that to None I think).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions