Skip to content

Sync/Async API design feedback #4

Open
@normanrz

Description

@normanrz

I've been working to make zarrita work with async stores and using concurrency throughout the library. However, I think that many users will still want to use synchronous methods. So, the implementation uses only async methods internally and provides sync methods, by running the async methods an event loop on a separate thread and blocking the main thread (inspired by fsspec).

I am looking for feedback on how to design the API to accommodate both sync and async functions. Here are the main options that came to my mind:

1. Separate classes

The class methods create either sync or async variants of the Array class. Users need to decide upfront, whether to use async or sync methods.

# sync
a = zarrita.Array.create_sync(
    store,
    'array',
    shape=(6, 10),
    dtype='int32',
    chunk_shape=(2, 5),
)
a[:, :] = np.ones((6, 10), dtype='int32') # set
a[:, :] # get
a.reshape((10, 10))
assert isinstance(a, zarrita.ArraySync)

# async
a = await zarrita.Array.create_async(
    store,
    'array',
    shape=(6, 10),
    dtype='int32',
    chunk_shape=(2, 5),
)
await a[:, :].set(np.ones((6, 10), dtype='int32')) # set
await a[:, :].get() # get
await a.reshape((10, 10))
assert isinstance(a, zarrita.Array)

2. Separate methods and properties

Both sync and async methods are available through the same class. There are still separate create and create_async class methods because the creation of an array is async under the hood (i.e. writing metadata to storage).

# sync
a = zarrita.Array.create(
    store,
    'array',
    shape=(6, 10),
    dtype='int32',
    chunk_shape=(2, 5),
)

# async
a = await zarrita.Array.create_async(
    store,
    'array',
    shape=(6, 10),
    dtype='int32',
    chunk_shape=(2, 5),
)

2a. Property-based async

This is a sync-first API, with the async methods available through the async_ property.

# sync
a[:, :] = np.ones((6, 10), dtype='int32') # set
a[:, :] # get
a.reshape((10, 10))

# async
await a.async_[:, :].set(np.ones((6, 10), dtype='int32')) # set
await a.async_[:, :].get() # get
await a.async_.reshape((10, 10))

2b. Async methods

Similar to 2a, but with _async-suffixed async methods. This feels unpleasant, because the slice syntax [:, :] cannot be used.

# sync
a[:, :] = np.ones((6, 10), dtype='int32') # set
a[:, :] # get
a.reshape((10, 10))

# async
await a.set_async((slice(None), slice(None)), np.ones((6, 10), dtype='int32')) # set
await a.get_async((slice(None), slice(None))) # get
await a.reshape_async((10, 10))

3. Async-first API

Implemented through future objects. Inspired by tensorstore

# sync
a[:, :].set(np.ones((6, 10), dtype='int32')).result() # set
a[:, :].get().result() # get
a.reshape((10, 10)).result()

# async
await a[:, :].set(np.ones((6, 10), dtype='int32')) # set
await a[:, :].get() # get
await a.reshape((10, 10))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions