Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composite operations on chunked array types not dispatched through chunk manager #9523

Open
5 tasks
jeremiah-corrado opened this issue Sep 19, 2024 · 3 comments
Labels
bug topic-arrays related to flexible array support topic-chunked-arrays Managing different chunked backends, e.g. dask

Comments

@jeremiah-corrado
Copy link

jeremiah-corrado commented Sep 19, 2024

What happened?

The arkouda-xarray chunk manager (found here) is designed to allow Arkouda's distributed arrays to operate as chunked arrays in XArray. (Note Arkouda's array's also implement the array API, which was being used for XArray interoperability before the chunk manager was created).

When executing various operations on XArray DataArrays or DataSets that are backed by the Arkouda chunk manager, I noticed that they are not being routed through the chunk manager itself. For example, calling myArkoudaBackedArray.mean(axis="some-axis") does not get routed through ArkoudaManager.reduction. However, the operation is successful and produces the correct result, indicating that the array-api implementation of mean is being used instead (i.e., the arkouda-backed array is being treated as a duck array).

https://github.com/jeremiah-corrado/arkouda-xarray-seasonal-avgs-example/blob/main/README.md contains a full example where I observed the above behavior. It includes instructions to get Arkouda set up for XArray interoperability.

pinging @TomNicholas, since we discussed this the other day.

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: None

xarray: 2024.5.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.8.0
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.8.4
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 70.0.0
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.24.0
sphinx: None

@jeremiah-corrado jeremiah-corrado added bug needs triage Issue that has not been reviewed by xarray team member labels Sep 19, 2024
@TomNicholas TomNicholas added topic-chunked-arrays Managing different chunked backends, e.g. dask topic-arrays related to flexible array support and removed needs triage Issue that has not been reviewed by xarray team member labels Sep 21, 2024
@keewis
Copy link
Collaborator

keewis commented Sep 23, 2024

looking at the docstring of the chunk manager base class, it appears this is not used for all reductions:

"""
A general version of array reductions along one or more axes.
Used inside some reductions like nanfirst, which is used by ``groupby.first``.

In other words, this seems to be expected behavior? I wonder if var.reduce(np.mean) would go through the reduction method, though.

@TomNicholas
Copy link
Member

In other words, this seems to be expected behavior?

Reminding myself of how this works, I think yes, this is expected. The array type knows how to do a chunked mean on a chunked array without any further guidance from xarray, so it's fine to just use the array API to call the correct implementation of mean.

I wonder if var.reduce(np.mean) would go through the reduction method, though.

It won't (see NamedArray.reduce)

data = func(self.data, axis=axis, **kwargs)

but I also don't think it needs to either. reduce is really a special case of the more general reduction primitive. reduction accepts the func kwarg that reduce accepts, but also accepts combine_func and aggregate_func. Basically these allow tree-reductions made of alternating steps, rather than just the same function applied over and over. This comes from dask.array.reduction, and is needed in the chunkmanager because it is used in xarray (only once though apparently, in the implementation .first and .last methods).

return chunkmanager.reduction(

If arkouda chose not to implement ArkoudaChunkManager.reduction I think the only thing that would break would be the .first and .last methods. .reduce would in general work, but only if the function you passed into .reduce knew how to handle the Arkouda Array type.

So actually other than documenting / testing this a bit better I believe there is not actually a bug to be fixed here?

@jeremiah-corrado
Copy link
Author

I think the only thing that would break would be the .first and .last methods. .reduce would in general work, but only if the function you passed into .reduce knew how to handle the Arkouda Array type.

Okay that makes sense. Thank you both for digging into the implementation/docs to figure that out. I'll work on implementing reduction properly in Arkouda's chunk manager so that those methods work.

I believe there is not actually a bug to be fixed here?

I think that's correct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-arrays related to flexible array support topic-chunked-arrays Managing different chunked backends, e.g. dask
Projects
None yet
Development

No branches or pull requests

3 participants