Skip to content

ComplexWarning: Casting complex values to real discards the imaginary part #4655

Closed
@SebastienDorgan

Description

@SebastienDorgan

xarray version 0.16.2/ Python 3.8.5

When reading a dataset containing complex variables using Dataset.open_zarr method the following warning appears:
_/home/.../python3.8/site-packages/xarray/coding/variables.py:218: ComplexWarning: Casting complex values to real discards the imaginary part
And the imaginary part is effectively discarded which is not what I expected.
After a slightly more in-depth analysis I came across the function (xarray/coding/variables.py:226)

def _choose_float_dtype(dtype, has_offset):
    """Return a float dtype that can losslessly represent `dtype` values."""
    # Keep float32 as-is.  Upcast half-precision to single-precision,
    # because float16 is "intended for storage but not computation"
    if dtype.itemsize <= 4 and np.issubdtype(dtype, np.floating):
        return np.float32
    # float32 can exactly represent all integers up to 24 bits
    if dtype.itemsize <= 2 and np.issubdtype(dtype, np.integer):
        # A scale factor is entirely safe (vanishing into the mantissa),
        # but a large integer offset could lead to loss of precision.
        # Sensitivity analysis can be tricky, so we just use a float64
        # if there's any offset at all - better unoptimised than wrong!
        if not has_offset:
            return np.float32
    # For all other types and circumstances, we just use float64.
    # (safe because eg. complex numbers are not supported in NetCDF)
    return np.float64

For me, this behavior is strange, I find more natural to use the stored type rather than to make a systematic transformation into a float.
To test, I have modified the decode method (xarray/coding/variables.py:265)

   def decode(self, variable, name=None):
        dims, data, attrs, encoding = unpack_for_decoding(variable)

        if "scale_factor" in attrs or "add_offset" in attrs:
            scale_factor = pop_to(attrs, encoding, "scale_factor", name=name)
            add_offset = pop_to(attrs, encoding, "add_offset", name=name)
            # my change
            # dtype = _choose_float_dtype(data.dtype, "add_offset" in attrs)
            dtype = data.dtype
            if np.ndim(scale_factor) > 0:
                scale_factor = scale_factor.item()
            if np.ndim(add_offset) > 0:
                add_offset = add_offset.item()
            transform = partial(
                _scale_offset_decoding,
                scale_factor=scale_factor,
                add_offset=add_offset,
                dtype=dtype,
            )
            data = lazy_elemwise_func(data, transform, dtype)

        return Variable(dims, data, attrs, encoding)

and it is working as I expected.
If there is a good reason to keep things as they are, can you explain me how to deal with complex data without creating a new variable?

Thank you for your great job, xarray is awesome.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions