Skip to content

Releases: google/xarray-beam

v0.11.1

14 Oct 18:02

Choose a tag to compare

Allow specifying default chunks per shard in `to_zarr`.

The `zarr_chunks_per_shard` argument in `xbeam.Dataset.to_zarr` now supports using `...` as a key to set a default number of chunks per shard for all dimensions not explicitly listed. Dimensions not included in the mapping default to 1 chunk per shard. This simplifies specifying Zarr chunking strategies.

PiperOrigin-RevId: 819301545

v0.11.0

10 Oct 17:04

Choose a tag to compare

Add local staging to Zarr setup in xarray_beam.

Fixes https://github.com/google/xarray-beam/issues/122

This change introduces a `stage_locally` parameter to `setup_zarr`, `ChunksToZarr` and `Dataset.to_zarr`. When enabled, Zarr metadata is first written to a local temporary directory and then copied to the final destination in parallel using `fsspec`. This can significantly speed up the setup process on high-latency filesystems, e.g., in one example, I found it sped up Zarr setup by a factor of 25x, from 100 seconds to 4 seconds.

This adds a hard dependency on fsspec in Xarray-Beam.

Hopefully in the future Xarray will have concurrent writing to stores built in (see https://github.com/pydata/xarray/issues/10622), which will eliminate the primary need for this.

Alternatively, we might be able to eventually leverage Zarr's built-in stores to do this copying rather than fsspec. Zarr has all the necessary functionality (including atomic writes, which would be nice) but does not expose the required public APIs for copying store objects from a synchronous function.

PiperOrigin-RevId: 817684876

v0.10.5

08 Oct 19:15

Choose a tag to compare

Allow `Dataset.rechunk` to change `split_vars`.

This is convenient because the optimal ordering of splitting and rechunking is not obvious.

Also make consolidate_variables() and split_variables() no-ops when appropriate.

PiperOrigin-RevId: 816805573

v0.10.4

04 Oct 05:46

Choose a tag to compare

Add Dataset.from_ptransform

This is a variant of the Dataset constructor with extensive validation.

Also add documentation explaining how it works.

PiperOrigin-RevId: 814972825

v0.10.3

02 Oct 23:38

Choose a tag to compare

Allow using `...` as a key in chunk specifications.

This change enables specifying a default chunk size for all dimensions not explicitly listed in the `chunks` mapping by using `...` as a key. For example, `{'x': 10, ...: 20}` will chunk dimension 'x' into sizes of 10 and all other dimensions into sizes of 20.

PiperOrigin-RevId: 814430585

v0.10.2

01 Oct 17:14

Choose a tag to compare

Add xbeam.normalize_chunks() and update xbeam.Dataset docstrings

PiperOrigin-RevId: 813800457

v0.10.1

30 Sep 22:49

Choose a tag to compare

Update xarray version in docs build

PiperOrigin-RevId: 813456305

v0.10.0

30 Sep 19:38

Choose a tag to compare

Add docs for xbeam.Dataset

PiperOrigin-RevId: 813378676

v0.9.3

23 Sep 19:30

Choose a tag to compare

Update xarray_beam version to 0.9.3.

PiperOrigin-RevId: 810535844

v0.9.2

12 Sep 19:45

Choose a tag to compare

Expose encoding parameter in ChunksToZarr.

PiperOrigin-RevId: 806388507