Skip to content

Use masked arrays while preserving int #1194

@gerritholl

Description

@gerritholl

A great beauty of numpys masked arrays is that it works with any dtype, since it does not use nan. Unfortunately, when I try to put my data into an xarray.Dataset, it converts ints to float, as shown below:

In [137]: x = arange(30, dtype="i1").reshape(3, 10)

In [138]: xr.Dataset({"count": (["x", "y"], ma.masked_where(x%5>3, x))}, coords={"x": range(3), "y":
     ...: range(10)})
Out[138]:
<xarray.Dataset>
Dimensions:  (x: 3, y: 10)
Coordinates:
  * y        (y) int64 0 1 2 3 4 5 6 7 8 9
  * x        (x) int64 0 1 2
Data variables:
    count    (x, y) float64 0.0 1.0 2.0 3.0 nan 5.0 6.0 7.0 8.0 nan 10.0 ...

This happens in the function _maybe_promote.

Such type “promotion” is unaffordable for me; the memory consumption of my multi-gigabyte arrays would explode by a factor 4. Secondly, many of my integer-dtype fields are bit arrays, for which floating point representation is not desirable.

It would greatly benefit xarray if it could use masking while preserving the dtype of input data.

(See also: Stackoverflow question)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions