Skip to content

datetime handling seems broken #9387

Closed
Closed
@ThomWorm

Description

@ThomWorm

What happened?

I've recently run into a few datetime issues with xarray. I've provided two separate reproducible examples below that seem to be connected to the same issue.

What did you expect to happen?

#issue 1 - I expect to get out an xarray containing the number of days between my input array (january 5th) and my target day (january first). Instead, I'm getting out 64-bit ints that represent the nanosecond equivalent of datetime values

######################################

#issue 2

I'm getting some unexpected behavior when working with numpy arrays that should return as datetime64 objects. In the above example, when I set output_dtypes to output_dtypes=[datetime64[ns] I'm getting a TypeError: Cannot cast NumPy timedelta64 scalar from metadata [ns] to according to the rule 'same_kind'.

I have tried many variations of explicitly setting input and output dtypes with no change in the error.

If I set output_dtypes=[ ] I am able to get a return of float64 values that I can convert after the fact to the expected datetime's. Although conversion after the fact isn't a huge problem, it seems to suggest to me that there is either an underlying issue or I have some misunderstanding.

If I remove dask I do get the same error when I replace degree_days with a NumPy backed xarray,

Minimal Complete Verifiable Example

#issue 1
import xarray as xr
import numpy as np

# Create a 10x10 array filled with the datetime 2000-01-05
lat = np.arange(10)
lon = np.arange(10)
date = np.datetime64("2000-01-05")

data = np.full((10, 10), date)

# Create the xarray DataArray
data_array = xr.DataArray(
    data, coords={"latitude": lat, "longitude": lon}, dims=["latitude", "longitude"]
)

# Calculate the timedelta in days from 2000-01-01
start_date = np.datetime64("2000-01-01")
timedelta_days = (data_array - start_date).astype("timedelta64[D]").astype(int)

print("Original DataArray:")
print(data_array)
print("\nTimedelta in days from 2000-01-01:")
print(timedelta_days)

#######################################################################################
#######################################################################################
#issue 2

import numpy as np
import pandas as pd
import xarray as xr
import dask.array as da

#############################################
#Function
#############################################
def day_cumsum_reaches_threshold_linear(
    degree_days, start_index, start_time_values, threshold
):
    cumsum = np.cumsum(degree_days[start_index:])
    threshold_reached = np.where(cumsum >= threshold)[0]
    if len(threshold_reached) == 0:
        print("error")
        return np.datetime64("NaT", "ns")
    first_reached_index = threshold_reached[0]
    result_date = start_time_values[start_index + first_reached_index]
    return result_date

#############################################
#Input data
#############################################

vday_cumsum_reaches_threshold_linear = np.vectorize(day_cumsum_reaches_threshold_linear)


time = pd.date_range("2000-01-01", periods=50, freq="D").to_numpy(
    dtype="datetime64[ns]"
)
lat = np.linspace(-90, 90, 10)
lon = np.linspace(-180, 180, 10)
degree_days = xr.DataArray(
    da.random.random((10, 10, 50), chunks=(10, 10, -1)),  # No chunking along time
    coords=[lat, lon, time],
    dims=["lat", "lon", "time"],
)
start_dates = xr.DataArray(
    np.random.choice(time[:5], size=(10, 10)), coords=[lat, lon], dims=["lat", "lon"]
)
start_indices = np.array(
    [np.where(degree_days.time.values == d)[0][0] for d in start_dates.values.flatten()]
).reshape(start_dates.shape)
threshold = 15

#############################################
#Apply function
#############################################


result_raw = xr.apply_ufunc(
    day_cumsum_reaches_threshold_linear,
    degree_days,
    start_indices,
    degree_days.time.values.astype("datetime64[ns]"),
    threshold,
    input_core_dims=[["time"], [], ["time"], []],
    output_core_dims=[[]],
    vectorize=True,
    dask="parallelized",
    output_dtypes=[],
)


result_raw.compute()

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

Dask version: 2024.7.1
-Numpy version 1.26.4 - used because 2.0 is currently incompatible with netCDF4
Xarray version 2024.6.0
Python version: 3.12.4
Operating System: ubuntu 22.04
Install method (conda, pip, source): conda

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions