Creating sharded arrays easily

Zarr v3 introduces support for sharding, i.e. saving data in chunks that are themselves divided into subchunks which can be read individually. We anticipate that sharding is one of the key features to motivate users to switch from zarr v2 to zarr v3, and so I think there's good reason to ensure that sharded arrays can be made _easily_ in `zarr-python`.

## status quo

In the `v3` branch today, you create a sharded array by including a special codec in the `codecs` kwarg in the array constructor. The sharding codec contains codecs of its own. See this example from the test suite: https://github.com/zarr-developers/zarr-python/blob/ac6c6a3cf88976ab296c94f0891fee2ac7ae1bdb/tests/v3/test_codecs/test_sharding.py#L43-L60
Complaints:
 - This is a lot of code / specification. In zarr v2, you could just say `chunks=(10,10)` and you are done. V3 requires much more work.
 - it requires importing a specific class (`ShardingCodec`), or using in the dictionary configuration of that codec, which we don't have autocomplete for right now (e.g., via a typeddict).
 - the array compression configuration (the other codecs) goes in two completely different places depending on whether you are using sharding or not -- if you are not using sharding, then all the codecs go in the `codecs` kwarg of the array. if you _are_ using sharding, then the `codecs` kwarg should only be the sharding codec, with the array compression config sent to the `codecs` kwarg of the sharding codec itself. This may not be intuitive to people, and it's a rather indirect implementation of something that was simple in zarr v2.

## solutions

I have 1 concrete idea for making this easier, which I have prototyped in #2169. that PR introduces a way to specify both sharding and regular chunking with a single keyword argument to `Array.create`. I think specifying how the chunks of the array are organized with a single data structure is a promising approach, but I am curious to see if anyone has alternative ideas (or if everyone thinks the status quo is fine). 


	arr = Array.create(
	spath,
	shape=tuple(s + offset for s in data.shape),
	chunk_shape=(64,) * data.ndim,
	dtype=data.dtype,
	fill_value=6,
	codecs=[
	ShardingCodec(
	chunk_shape=(32,) * data.ndim,
	codecs=[
	TransposeCodec(order=order_from_dim("F", data.ndim)),
	BytesCodec(),
	BloscCodec(cname="lz4"),
	],
	index_location=index_location,
	)
	],
	)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Creating sharded arrays easily #2170

status quo

solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Creating sharded arrays easily #2170

Description

status quo

solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions