Description
I often find myself wanting a way to present non-zarr array data as if it was a zarr array, without going through the effort (and data duplication) required to convert the array to zarr. This often comes up when working with unchunked legacy file formats.
A simple way to do this would be via an HTTP server that converts incoming requests for chunk files into whatever operations are necessary to retrieve chunk data from the legacy format. In the simplest formulation, the server would need to be configured with a declaration of the data being "zarred", and runtime options like compression etc.
If the program were called zerve
(for "zarr serve"), then invocation might look like this:
zerve path/to/array.tif --path my_array
┌──────────────────────────────────────────────────┐
│ │
│ Serving! │
│ │
│ - Local: http://localhost:33907/my_array │
│ - Network: http://192.168.1.160:33907/my_array│
│ │
│ This port was picked because 3000 is in use. │
│ │
│ Copied local address to clipboard! │
│ │
└──────────────────────────────────────────────────┘
I copied the terminal output from the nodejs program serve
, that I often use for statically hosting zarr arrays. One can imagine more elaborate JSON configuration for the server that would declare how to embed the array (or multiple arrays) in a virtual group.
The particular use case I described (wrapping a legacy format in a zarr API) is a specific instance of a more general pattern (wrapping X in a zarr API, where X is some arbitrary computation that produces an array). If designed in a modular way, the http proxy i'm proposing would support this broader usage pattern. And we can also imagine this functionality being used as a python library, a la xpublish. In fact, I think xpublish basically does what I want here, so maybe the only work is to pull the zarr proxying out of xpublish?
In terms of where this should live in zarr-python
, i think this functionality would be very useful for testing http-based storage, so on those grounds I think it's in-scope for zarr-python
, but not a pressing need.
cc @jhamman