Description
Sorry for getting distracted at the end of the geo-zarr meeting we just had (for those that were there). Here is a summary of what I was getting at.
(@rabernat , yes I know this has been discussed many times over - apologies)
There are two principal parts to the coordinates problem:
- coordinate tranform
- parsing/reading coordinate definitions
Coordinate transform
A mechanism within zarr/xarray to find (each of) the coordinates of a given array position and the (fractional) array location of a given coordinate set. This should be a vectorized operation each way.
Currently, xarray supports explicit coordinate value arrays via the netCDF model well (and "flexible" indexes whose internals I don't understand well).
- I suggest that this should be an extension point, each associated with a different internal representation (e.g., affine is usually a square matrix, explicit arrays are usually one- or two-dimensional arrays with sizes determined by the data)
- on day 1, we want to support explicit values and affine (linear transform)
- other transforms should be pluggable, and eventually include for instance the large number of each curvature models built into grib
- whether we should have a single affine matrix across all dimensions (lon, lat, time = f(x, y, z)), or if we should split dimensions (lon, lat = f1(x, y); time = f2(z)) is a decision to be taken early.
- the coordinates interface must support slicing and might support units.
Crucially, I advocate that the transform mechanism is independent of the data domain, so that we don't treat "lon/lat" as special. This is because zarr and xarray are general purpose libraries, and we don't want to exclude microscopy, genetics and other fields with many users.
Coordinate definitions
In the meeting, a few specific (geo) coordinate definitions were mentioned:
- gdal coefficients
- tiff bounding box
- CRS text/parameters
plus, of course, netCDF explicit arrays (with or without CF). I also mentioned astro WCS as a reference point (which supports explicit, affine, and various analytic forms for arbitrary dimensionality with no geo reference; interestingly, it also applies to fields of tables).
I would suggest that it is the job of geo-zarr to build the converters to and from these styles of definitions to transform internal representation, such that you can round-trip coordinate information without losing accuracy.