Skip to content

Alternative dask-powered histogram algorithm using xarray.groupby and numpy_groupies #60

Open
@TomNicholas

Description

@TomNicholas

After @shoyer mentioned earlier today that he had an example of dealing with with the ND-histogram problem in xarray by using xarray.apply_ufunc and numpy_groupies, I made this notebook to try it out for creating histograms in xarray.

The basic idea is that da.groupby_bins(bins).apply(count) essentially creates a histogram, and numpy_groupies can speed up the groupby_bins hugely.

I think its pretty cool that it even works, but you'll see in the notebook that I don't think the performance compares favourably with xhistogram's dask.blockwise implementation (see #49), though I didn't manage to get numba-powered groupies working yet. The dask task graphs are also not as nice.

@rabernat this is the sort of thing I had in mind originally.

@gjoseph92 you might find this interesting as an alternate solution to your blockwise one.

@dcherian and @max-sixty you might find this example interesting as I know you've been working on using numpy_groupies in pydata/xarray#4473 .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions