Description
xarray version 0.16.2/ Python 3.8.5
When reading a dataset containing complex variables using Dataset.open_zarr method the following warning appears:
_/home/.../python3.8/site-packages/xarray/coding/variables.py:218: ComplexWarning: Casting complex values to real discards the imaginary part
And the imaginary part is effectively discarded which is not what I expected.
After a slightly more in-depth analysis I came across the function (xarray/coding/variables.py:226)
def _choose_float_dtype(dtype, has_offset):
"""Return a float dtype that can losslessly represent `dtype` values."""
# Keep float32 as-is. Upcast half-precision to single-precision,
# because float16 is "intended for storage but not computation"
if dtype.itemsize <= 4 and np.issubdtype(dtype, np.floating):
return np.float32
# float32 can exactly represent all integers up to 24 bits
if dtype.itemsize <= 2 and np.issubdtype(dtype, np.integer):
# A scale factor is entirely safe (vanishing into the mantissa),
# but a large integer offset could lead to loss of precision.
# Sensitivity analysis can be tricky, so we just use a float64
# if there's any offset at all - better unoptimised than wrong!
if not has_offset:
return np.float32
# For all other types and circumstances, we just use float64.
# (safe because eg. complex numbers are not supported in NetCDF)
return np.float64
For me, this behavior is strange, I find more natural to use the stored type rather than to make a systematic transformation into a float.
To test, I have modified the decode method (xarray/coding/variables.py:265)
def decode(self, variable, name=None):
dims, data, attrs, encoding = unpack_for_decoding(variable)
if "scale_factor" in attrs or "add_offset" in attrs:
scale_factor = pop_to(attrs, encoding, "scale_factor", name=name)
add_offset = pop_to(attrs, encoding, "add_offset", name=name)
# my change
# dtype = _choose_float_dtype(data.dtype, "add_offset" in attrs)
dtype = data.dtype
if np.ndim(scale_factor) > 0:
scale_factor = scale_factor.item()
if np.ndim(add_offset) > 0:
add_offset = add_offset.item()
transform = partial(
_scale_offset_decoding,
scale_factor=scale_factor,
add_offset=add_offset,
dtype=dtype,
)
data = lazy_elemwise_func(data, transform, dtype)
return Variable(dims, data, attrs, encoding)
and it is working as I expected.
If there is a good reason to keep things as they are, can you explain me how to deal with complex data without creating a new variable?
Thank you for your great job, xarray is awesome.