Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zarr v2 endpoints to Tiled #774

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open

Conversation

genematx
Copy link
Contributor

@genematx genematx commented Aug 6, 2024

This PR exposes Tiled data as a zarr collection on a set of new api endpoints, /zarr/v2/.... This allows one to use zarr clients directly with Tiled, as if it was an external filesystem accessed through fsspec.

Assuming a demo Tiled server is running on 127.0.0.1:8000 (e.g. started with tiled serve demo), one can read its contents into zarr by first specifying a file system mapper and then passing it to zarr:

import zarr
from fsspec import get_mapper

url = "http://localhost:8000/zarr/v2/"
fs_mapper = get_mapper(url)
root = zarr.open(fs_mapper, mode="r")

The resulting object is a zarr.Group, which represents the root of the Tiled catalog tree and supports (most) of the usual operations on zarr groups:

>>> print(group)
<zarr.hierarchy.Group '/' read-only>

>>> list(group.keys())
['dynamic', 'flat_array', 'high_entropy', 'low_entropy',
'nested', 'scalars', 'structured_data', 'tables']
>>> root.tree()
/
├── dynamic (3, 3) float64
 ├── flat_array (100,) float64
 ├── high_entropy (100, 100) int64
 ├── low_entropy (100, 100) int32
 ├── nested
 │   ├── cubes
 │   │   ├── tiny_cube (50, 50, 50) float64
 │   │   └── tiny_hypercube (50, 50, 50, 50, 50) float64
 │   ├── images
 │   │   ├── big_image (10000, 10000) float64
 │   │   ├── medium_image (1000, 1000) float64
 │   │   ├── small_image (300, 300) float64
 │   │   └── tiny_image (50, 50) float64
 │   └── sparse_image (100, 100) float64
 ├── scalars
 │   ├── e_arr (1,) <U7
 │   ├── fortytwo () int64
 │   ├── fsc () <U5
 │   └── pi () float64
 ├── structured_data
 │   ├── pets
 │   └── xarray_dataset
 │       ├── lat (2, 2) float64
 │       ├── lon (2, 2) float64
 │       ├── precipitation (2, 2, 3) float64
 │       ├── temperature (2, 2, 3) float64
 │       └── time (3,) datetime64[ns]
 └── tables
     ├── long_table
     │   ├── A (100000,) float64
     │   ├── B (100000,) float64
     │   └── C (100000,) float64
     ├── short_table
     │   ├── A (100,) uint8
     │   ├── B (100,) uint8
     │   └── C (100,) uint8
     └── wide_table
         ├── A (10,) float64
         ├── B (10,) float64
         ├── C (10,) float64
         ...
         ├── X (10,) float64
         ├── Y (10,) float64
         └── Z (10,) float64

NOTE: To access Tiled servers that require authentication, we can pass an api-key in the header of the HTTP requests. With fsspec, this is done by explicitly constructing an HTTPFileSystem object and mapping it to zarr:

from fsspec.implementations.http import HTTPFileSystem

headers = {"Authorization": "Apikey your-api-key-goes-here",
           "Content-Type": "application/json"}
fs = HTTPFileSystem(client_kwargs={"headers": headers})
root = zarr.open(fs.get_mapper(url), mode="r")

The native tiled datastructures are mapped to zarr as follows:

Tiled zarr
Container Group
Array Array
Sparse Array Array (dense)
Data Frame Group (of columns)
Data Frame Column Array

Addresses the Issue #562.

Checklist

  • Add a Changelog entry
  • Add the ticket number which this PR closes to the comment section

@joshmoore joshmoore mentioned this pull request Sep 9, 2024
2 tasks
@genematx genematx changed the title Add zarr endpoints to Tiled Add zarr v2 endpoints to Tiled Oct 22, 2024
@genematx genematx marked this pull request as ready for review October 22, 2024 16:52
pyproject.toml Outdated
@@ -44,6 +44,7 @@ tiled = "tiled.commandline.main:main"

# This is the union of all optional dependencies.
all = [
"aiohttp",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find where this is used. Maybe this was needed in some transient state of the PR but no longer is needed.

$ git grep aiohttp
pyproject.toml:    "aiohttp",
pyproject.toml:    "aiohttp",
pyproject.toml:    "aiohttp",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, since we use starlette for the server and httpx for the client, it would be somewhat odd and redundant to use aiohttp as well.

Copy link
Contributor Author

@genematx genematx Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used by fsspec.implementations.http.HTTPFileSystem, which is needed to connect to a tiled server that requires authentication. I had the same thought yesterday, that this was something I used before but no longer need, but unfortunately it's not the case. We don't need it in all requirements though (only for testing), which I have fixed now.

tiled/_tests/test_zarr.py Outdated Show resolved Hide resolved
arr = zarr.open(fs.get_mapper(url), mode="r")
actual = arr[...]
expected = df[col]
assert numpy.array_equal(actual, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempting to write raises a helpful error message:

ReadOnlyError: object is read-only

This behavior should be tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of test cases here. Just to note, those errors are raised by fsspec and zarr client objects, even before the request reaches Tiled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants