-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
A great beauty of numpys masked arrays is that it works with any dtype, since it does not use nan. Unfortunately, when I try to put my data into an xarray.Dataset, it converts ints to float, as shown below:
In [137]: x = arange(30, dtype="i1").reshape(3, 10)
In [138]: xr.Dataset({"count": (["x", "y"], ma.masked_where(x%5>3, x))}, coords={"x": range(3), "y":
...: range(10)})
Out[138]:
<xarray.Dataset>
Dimensions: (x: 3, y: 10)
Coordinates:
* y (y) int64 0 1 2 3 4 5 6 7 8 9
* x (x) int64 0 1 2
Data variables:
count (x, y) float64 0.0 1.0 2.0 3.0 nan 5.0 6.0 7.0 8.0 nan 10.0 ...
This happens in the function _maybe_promote.
Such type “promotion” is unaffordable for me; the memory consumption of my multi-gigabyte arrays would explode by a factor 4. Secondly, many of my integer-dtype fields are bit arrays, for which floating point representation is not desirable.
It would greatly benefit xarray if it could use masking while preserving the dtype of input data.
(See also: Stackoverflow question)
fbunt and krishnaap
Metadata
Metadata
Assignees
Labels
No labels