Open
Description
When a Dataset has variables with different dtypes, there's no way to tell apply_ufunc that the same function applied to different variables will produce different dtypes:
ds1 = xarray.Dataset(data_vars={'a': ('x', [1, 2]), 'b': ('x', [3.0, 4.5])}).chunk()
ds2 = xarray.apply_ufunc(lambda x: x + 1, ds1, dask='parallelized', output_dtypes=[float])
ds2
<xarray.Dataset>
Dimensions: (x: 2)
Dimensions without coordinates: x
Data variables:
a (x) float64 dask.array<shape=(2,), chunksize=(2,)>
b (x) float64 dask.array<shape=(2,), chunksize=(2,)>
ds2.compute()
<xarray.Dataset>
Dimensions: (x: 2)
Dimensions without coordinates: x
Data variables:
a (x) int64 2 3
b (x) float64 4.0 5.5
Proposed solution
When the output is a dataset, apply_ufunc could accept either output_dtypes=[t]
(if all output variables will have the same dtype) or output_dtypes=[{var1: t1, var2: t2, ...}]
. In the example above, it would be output_dtypes=[{'a': int, 'b': float}]
.