Description
What happened?
I've recently run into a few datetime issues with xarray. I've provided two separate reproducible examples below that seem to be connected to the same issue.
What did you expect to happen?
#issue 1 - I expect to get out an xarray containing the number of days between my input array (january 5th) and my target day (january first). Instead, I'm getting out 64-bit ints that represent the nanosecond equivalent of datetime values
######################################
#issue 2
I'm getting some unexpected behavior when working with numpy arrays that should return as datetime64 objects. In the above example, when I set output_dtypes to output_dtypes=[datetime64[ns] I'm getting a TypeError: Cannot cast NumPy timedelta64 scalar from metadata [ns] to according to the rule 'same_kind'.
I have tried many variations of explicitly setting input and output dtypes with no change in the error.
If I set output_dtypes=[ ] I am able to get a return of float64 values that I can convert after the fact to the expected datetime's. Although conversion after the fact isn't a huge problem, it seems to suggest to me that there is either an underlying issue or I have some misunderstanding.
If I remove dask I do get the same error when I replace degree_days with a NumPy backed xarray,
Minimal Complete Verifiable Example
#issue 1
import xarray as xr
import numpy as np
# Create a 10x10 array filled with the datetime 2000-01-05
lat = np.arange(10)
lon = np.arange(10)
date = np.datetime64("2000-01-05")
data = np.full((10, 10), date)
# Create the xarray DataArray
data_array = xr.DataArray(
data, coords={"latitude": lat, "longitude": lon}, dims=["latitude", "longitude"]
)
# Calculate the timedelta in days from 2000-01-01
start_date = np.datetime64("2000-01-01")
timedelta_days = (data_array - start_date).astype("timedelta64[D]").astype(int)
print("Original DataArray:")
print(data_array)
print("\nTimedelta in days from 2000-01-01:")
print(timedelta_days)
#######################################################################################
#######################################################################################
#issue 2
import numpy as np
import pandas as pd
import xarray as xr
import dask.array as da
#############################################
#Function
#############################################
def day_cumsum_reaches_threshold_linear(
degree_days, start_index, start_time_values, threshold
):
cumsum = np.cumsum(degree_days[start_index:])
threshold_reached = np.where(cumsum >= threshold)[0]
if len(threshold_reached) == 0:
print("error")
return np.datetime64("NaT", "ns")
first_reached_index = threshold_reached[0]
result_date = start_time_values[start_index + first_reached_index]
return result_date
#############################################
#Input data
#############################################
vday_cumsum_reaches_threshold_linear = np.vectorize(day_cumsum_reaches_threshold_linear)
time = pd.date_range("2000-01-01", periods=50, freq="D").to_numpy(
dtype="datetime64[ns]"
)
lat = np.linspace(-90, 90, 10)
lon = np.linspace(-180, 180, 10)
degree_days = xr.DataArray(
da.random.random((10, 10, 50), chunks=(10, 10, -1)), # No chunking along time
coords=[lat, lon, time],
dims=["lat", "lon", "time"],
)
start_dates = xr.DataArray(
np.random.choice(time[:5], size=(10, 10)), coords=[lat, lon], dims=["lat", "lon"]
)
start_indices = np.array(
[np.where(degree_days.time.values == d)[0][0] for d in start_dates.values.flatten()]
).reshape(start_dates.shape)
threshold = 15
#############################################
#Apply function
#############################################
result_raw = xr.apply_ufunc(
day_cumsum_reaches_threshold_linear,
degree_days,
start_indices,
degree_days.time.values.astype("datetime64[ns]"),
threshold,
input_core_dims=[["time"], [], ["time"], []],
output_core_dims=[[]],
vectorize=True,
dask="parallelized",
output_dtypes=[],
)
result_raw.compute()
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
No response
Anything else we need to know?
No response
Environment
Dask version: 2024.7.1
-Numpy version 1.26.4 - used because 2.0 is currently incompatible with netCDF4
Xarray version 2024.6.0
Python version: 3.12.4
Operating System: ubuntu 22.04
Install method (conda, pip, source): conda