Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding cumsum / cumprod reduction operators #791

Closed
pwolfram opened this issue Mar 11, 2016 · 12 comments
Closed

Adding cumsum / cumprod reduction operators #791

pwolfram opened this issue Mar 11, 2016 · 12 comments

Comments

@pwolfram
Copy link
Contributor

It would be useful to have the cumsum / cumprod reduction operator for DataArray and Dataset, analagous to http://xarray.pydata.org/en/stable/generated/xarray.DataArray.sum.html?highlight=sum#xarray.DataArray.sum
and http://xarray.pydata.org/en/stable/generated/xarray.Dataset.sum.html?highlight=sum#xarray.Dataset.sum

I notice this is on the TODO at https://github.com/pydata/xarray/blob/master/xarray/core/ops.py#L54 and am assuming there is something subtle here about the implementation. I believe the issue was probably with dask, but the issue / PR at dask/dask#923 & dask/dask#925 may have removed the roadblock.

@pwolfram
Copy link
Contributor Author

Advice on the feasibility of adding cumsum / cumprod appreciated.

@jhamman
Copy link
Member

jhamman commented Mar 11, 2016

I don't think this should be too difficult now that cumulative reductions are available in dask and numpy. I think you'll just have to add the cumsum and cumprod names to the list of NAN_REDUCE_METHODS. You may also need modify some of the logic in _create_nan_agg_method so it doesn't prepend a nan in-front of these two methods.

@pwolfram
Copy link
Contributor Author

It may be more involved, e.g., there appears to be a custom nanprod called by https://github.com/pydata/xarray/blob/master/xarray/core/ops.py#L374-L383 at https://github.com/pydata/xarray/blob/master/xarray/core/npcompat.py#L186-L253. I could inject logic to remove the nan prepending for these functions at https://github.com/pydata/xarray/blob/master/xarray/core/ops.py#L310-L352 but this may be somewhat of a hack...

@pwolfram
Copy link
Contributor Author

You are correct that it shouldn't be too difficult.

@pwolfram
Copy link
Contributor Author

Note, numpy doesn't provide cumsum or cumprod that are nan compatible. I just checked https://github.com/numpy/numpy/blob/master/numpy/lib/nanfunctions.py#L4-L17 so injecting logic may be the way to go.

@shoyer
Copy link
Member

shoyer commented Mar 11, 2016

cumsum/cumprod will need a slightly different (simpler) interface than the other reduce methods, because unlike other aggregation functions they don't remove a dimension (the result has the same size as the input). Also, as you point out, NumPy doesn't have a nan-skipping version of these functions.

There was no particular reason why I didn't add these before -- we just never had a compelling enough need to get around to it. I don't think it would be particularly difficult.

@pwolfram
Copy link
Contributor Author

@shoyer, should a custom version supporting nans be implemented too? I think that is more useful for my immediate needs and it would be good to have both options for completeness. However, it will be somewhat of a hack and involve adding nancumsum and nancumprod functions to npcompat.py. Note, however, that these are not standard numpy functions, even in known future versions, so I'm concerned this change falls into the "hack" category. This being said, implementation of the standard numpy functions would still be useful and the nans case could be handled at the application level.

Can you provide some perspective on what you would be willing to admit into xarray? It seems like standard cumsum and cumprod are safe bets but the nan versions are somewhat suspect.

@shoyer
Copy link
Member

shoyer commented Mar 12, 2016

Why not make a PR to add nancumsum and nancumprod to NumPy as well? Then it's pretty clear that a back-port in npcompat.py is appropriate. That's actually how nanprod ended up in NumPy :).

@pwolfram
Copy link
Contributor Author

@shoyer and @jhamman, I've started a branch to address this at https://github.com/pwolfram/xarray/tree/add_cumsum_cumprod, but there appear to be some other issues that may need resolved prior to this implementation, e.g., #807.

Also, 2 quick new developer questions:

  1. Is there any easy way to do xarray development short of doing the ful xarray install via something like pip install --upgrade git+ssh://git@github.com/pwolfram/numpy@nancumsumprod?
  2. How can I run the test suite (I'm thinking it may be something like python runtests.py -v for numpy)?

I suspect there is a simple way to do these things but would appreciate hearing about your workflows. Thanks!

@jhamman
Copy link
Member

jhamman commented Mar 28, 2016

@pwolfram -

We all may have slightly different development workflows but mine goes something like this:

# checkout conda environment with xarray dependencies
source activate xarray_dev34

cd path_to_xarray

# install xarray using setuptools develop option
python setup.py develop

# make changes to source code

# run test suite
py.test

@pwolfram
Copy link
Contributor Author

Thanks @jhamman!

@pwolfram
Copy link
Contributor Author

Please see #812 for the PR stub. Note, I still need to work on this when I get some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants