Description
I've been working to make zarrita
work with async stores and using concurrency throughout the library. However, I think that many users will still want to use synchronous methods. So, the implementation uses only async methods internally and provides sync methods, by running the async methods an event loop on a separate thread and blocking the main thread (inspired by fsspec).
I am looking for feedback on how to design the API to accommodate both sync and async functions. Here are the main options that came to my mind:
1. Separate classes
The class methods create either sync or async variants of the Array
class. Users need to decide upfront, whether to use async or sync methods.
# sync
a = zarrita.Array.create_sync(
store,
'array',
shape=(6, 10),
dtype='int32',
chunk_shape=(2, 5),
)
a[:, :] = np.ones((6, 10), dtype='int32') # set
a[:, :] # get
a.reshape((10, 10))
assert isinstance(a, zarrita.ArraySync)
# async
a = await zarrita.Array.create_async(
store,
'array',
shape=(6, 10),
dtype='int32',
chunk_shape=(2, 5),
)
await a[:, :].set(np.ones((6, 10), dtype='int32')) # set
await a[:, :].get() # get
await a.reshape((10, 10))
assert isinstance(a, zarrita.Array)
2. Separate methods and properties
Both sync and async methods are available through the same class. There are still separate create
and create_async
class methods because the creation of an array is async under the hood (i.e. writing metadata to storage).
# sync
a = zarrita.Array.create(
store,
'array',
shape=(6, 10),
dtype='int32',
chunk_shape=(2, 5),
)
# async
a = await zarrita.Array.create_async(
store,
'array',
shape=(6, 10),
dtype='int32',
chunk_shape=(2, 5),
)
2a. Property-based async
This is a sync-first API, with the async methods available through the async_
property.
# sync
a[:, :] = np.ones((6, 10), dtype='int32') # set
a[:, :] # get
a.reshape((10, 10))
# async
await a.async_[:, :].set(np.ones((6, 10), dtype='int32')) # set
await a.async_[:, :].get() # get
await a.async_.reshape((10, 10))
2b. Async methods
Similar to 2a, but with _async
-suffixed async methods. This feels unpleasant, because the slice syntax [:, :]
cannot be used.
# sync
a[:, :] = np.ones((6, 10), dtype='int32') # set
a[:, :] # get
a.reshape((10, 10))
# async
await a.set_async((slice(None), slice(None)), np.ones((6, 10), dtype='int32')) # set
await a.get_async((slice(None), slice(None))) # get
await a.reshape_async((10, 10))
3. Async-first API
Implemented through future
objects. Inspired by tensorstore
# sync
a[:, :].set(np.ones((6, 10), dtype='int32')).result() # set
a[:, :].get().result() # get
a.reshape((10, 10)).result()
# async
await a[:, :].set(np.ones((6, 10), dtype='int32')) # set
await a[:, :].get() # get
await a.reshape((10, 10))