-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sparse and other duck array issues #3245
Comments
I think it is intentional for automatic coercion to NumPy arrays to fail. Making |
The better way to phrase this is:
|
I think We should have a separate API (maybe Basically, we should leave the decision about whether automatic coercion is safe up to the authors of duck array libraries. |
👍 I think |
The main downside of |
I do think the pandas approach is pretty good here - don't break anything relying on |
Xarray's .data is basically the equivalent of pandas's .array.
…On Fri, Aug 23, 2019 at 2:44 PM Maximilian Roos ***@***.***> wrote:
I do think the pandas approach is pretty good here - don't break anything
relying on .values - instead have .array to get the backing array (in
whatever form it might be), and to_numpy() coercing to a raw numpy array
ref
https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html#accessing-the-values-in-a-series-or-index
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3245?email_source=notifications&email_token=AAJJFVUUGARF55CXIKAQHUDQGBD4NA5CNFSM4IOZW5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5BJCVI#issuecomment-524456277>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJJFVWSWKSJFWNUUGRO3ODQGBD4NANCNFSM4IOZW5JA>
.
|
As far as I can tell, the proposal here will require either s = pd.Series(...)
xr.DataArray.from_series(s).to_series() or: xr.DataArray.from_series(s, sparse=True).to_dense().to_series() For any code that can't guarantee sparse/non-sparse input, the first will fail sometimes, so it will always be necessary to write the latter everywhere, which IMO is unnecessarily verbose. |
Do we arrive at the consensus here for API to change the sparse or numpy array? To make it sparse array,
To change the backend back from sparse array, I personally like |
I weakly prefer following the upstream API: though Also isn't the function |
I like |
This is a good point. I'm in favour of |
Yes, but I think that's mostly works because it's so short. In general the rule is to use underscores when it improves clarity. |
brilliant |
What is the best way to save One naive way would be to use |
@dcherian pointed me over here after I raised #4234.
I very much agree with this reasoning.
I also agree with this reasoning. In cupy many things will break because For cupy I do think that automatic coercion to numpy for One suggestion is to add an accessor to make things explicit. So it would become |
I am unsure about automatic coercion for But I think we should automatically coerce for |
Given the current guidance in #4212 is to aim for cupy to mostly be handled via accessors I wonder if xarray could be extended a little further to allow Currently with the accessor model a user would need to do something like Enabling accessors to register |
we currently don't really encourage this way of using accessors (especially using too many of these, see #1080 (comment) and #1080 (comment)), but you can use @xr.register_dataarray_accessor("as_cupy")
def _(da):
def as_cupy(self, ...):
# ...
return da
return as_cupy |
Thanks @keewis. I did consider that but I assumed that kind of thing wouldn't be encouraged. Happy to go down that path, but I wonder if life would be easier if there were an extension API for this purpose. I was thinking of something along these lines which would help keep things in check. @xr.register_dataarray_as_method("cupy")
class CupyAsMethod:
def to_cupy(self):
"""Convert all data arrays to cupy."""
for var in self.data_vars:
self.data_vars[var].data = cp.asarray(self.data_vars[var].data)
return self
def to_numpy(self):
"""Convert all data arrays to numpy."""
for var in self.data_vars:
self.data_vars[var].data = self.data_vars[var].data.get()
return self There would need to be some logic around dispatching Perhaps some I could go as you suggest and use the current accessor tooling to test this functionality out. Then upstream it into something more maintainable if it works out ok? |
Sounds good. It might be worth discussing the general strategy for the support of this kind of duck array (sparse, cupy and others) in the dev call on Wednesday. |
It seems like general consensus was (please correct me if any of this is wrong):
|
Thanks for writing that up @dcherian. Sounds good to me! |
+1 for One question about |
This brings up a point I was wondering about as well: how should these |
IMO a good first pass for |
Agreed! Sparse also needs a special case.
…On Wed, Jul 22, 2020 at 6:35 PM Deepak Cherian ***@***.***> wrote:
IMO a good first pass for as_numpy would be to use np.asarray with a
special case for cupy where it would use .get.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3245 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJJFVW33SB5YZJWAMIFMR3R46HVVANCNFSM4IOZW5JA>
.
|
Is there any motion on this? This come up in relation to akward array today (see scikit-hep/awkward#749 ). If xarray goes with the I think the |
My impression was that the def as_numpy(self):
# needs cupy special handling
data = self.data
if isinstance(data, cupy_array_type):
raise NotImplementedError
else:
return self.copy(data=np.array(data))
def to_numpy(self):
"""Coerces to and returns a numpy.ndarray"""
return self.as_numpy().data |
I agree, this would be useful. Ideally, that would be implemented by deferring to the data, for example by calling the data's |
I'm trying to implement this (I wanted pint-aware plotting to work) in #5568, but not 100% sure if I'm doing it right.
def as_numpy(self) -> T_DataArray:
"""
Coerces wrapped data into a numpy array, and returns it wrapped in
a DataArray.
"""
data = self.data
if isinstance(data, dask_array_type):
return self.load()
elif isinstance(data, cupy_array_type):
return self.copy(data=data.get())
elif isinstance(data, pint_array_type):
return self.copy(data=data.magnitude)
elif isinstance(data, sparse_array_type):
return self.copy(data=data.to_dense())
else:
return self.copy(data=np.array(data)) but do I actually want multiple passes, like data = self.data
if isinstance(data, dask_array_type):
data = self.load().data
if isinstance(data, cupy_array_type):
data = data.get()
if isinstance(data, pint_array_type):
data = data.magnitude
if isinstance(data, sparse_array_type):
data = data.to_dense()
return self.copy(data=np.array(data))
|
Also I can add a |
right, I would leave these methods to xarray extension libraries like |
Issues I've run into working with @friedrichknuth at the Pangeo meeting trying to use sparse
.plot()
doesn't work. We need to convert to dense before plotting.values
doesn't work. Raises an error about explicitly callingtodense()
align
doesn't seem to work.The text was updated successfully, but these errors were encountered: