Skip to content

Commit 8772355

Browse files
huardspencerkclark
andcommitted
Add support for CFTimeIndex in get_clean_interp_index (#3631)
* add support for CFTimeIndex in get_clean_interp_index * black * added test comparing cftime index with standard index * added comment * index in ns instead of days * pep8 * datetime_to_numeric: convert timedelta objects using np.timedelta64 type conversion. add overflow tests * added interp test * switched clean_interp_index resolution to us. Fixed interpolate_na and added support for CFTimeIndex. * Error message to explain overflow problem. * switched timedelta64 units from ms to us * reverted default user-visible resolution to ns. Converts to float, possibly lossy. * pep8 * black * special case for older numpy versions * black * added xfail for overflow error with numpy < 1.17 * changes following PR comments from spencerclark * bypass pandas to convert timedeltas to floats. avoids overflow errors. * black * removed numpy conversion. added docstrings. renamed tests. * pep8 * updated whats new * Update doc/whats-new.rst Co-Authored-By: Spencer Clark <spencerkclark@gmail.com> * update interpolate_na docstrings * black * dt conflicts with accessor * replaced assert_equal by assert_allclose * Update xarray/core/duck_array_ops.py Co-Authored-By: Spencer Clark <spencerkclark@gmail.com> * Update xarray/core/duck_array_ops.py Co-Authored-By: Spencer Clark <spencerkclark@gmail.com> * renamed array to value in timedelta_to_numeric. Added tests * removed support for TimedeltaIndex in timedelta_to_numeric * added tests for np_timedelta64_to_float and pd_timedelta_to_float. renamed array to value for pd_timedelta_to_float. removed pd_timedeltaindex_to_float. * black * Fix flake8 error * black Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
1 parent cc142f4 commit 8772355

File tree

9 files changed

+352
-95
lines changed

9 files changed

+352
-95
lines changed

doc/whats-new.rst

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@ Breaking changes
2929
- scipy 1.3
3030

3131
- Remove ``compat`` and ``encoding`` kwargs from ``DataArray``, which
32-
have been deprecated since 0.12. (:pull:`3650`).
33-
Instead, specify the encoding when writing to disk or set
32+
have been deprecated since 0.12. (:pull:`3650`).
33+
Instead, specify the encoding when writing to disk or set
3434
the ``encoding`` attribute directly.
3535
By `Maximilian Roos <https://github.com/max-sixty>`_
3636
- :py:func:`xarray.dot`, :py:meth:`DataArray.dot`, and the ``@`` operator now
@@ -67,10 +67,15 @@ New Features
6767
- :py:meth:`Dataset.swap_dims` and :py:meth:`DataArray.swap_dims`
6868
now allow swapping to dimension names that don't exist yet. (:pull:`3636`)
6969
By `Justus Magin <https://github.com/keewis>`_.
70-
- Extend :py:class:`core.accessor_dt.DatetimeAccessor` properties
71-
and support `.dt` accessor for timedelta
70+
- Extend :py:class:`core.accessor_dt.DatetimeAccessor` properties
71+
and support `.dt` accessor for timedelta
7272
via :py:class:`core.accessor_dt.TimedeltaAccessor` (:pull:`3612`)
7373
By `Anderson Banihirwe <https://github.com/andersy005>`_.
74+
- Support CFTimeIndex in :py:meth:`DataArray.interpolate_na`, define 1970-01-01
75+
as the default offset for the interpolation index for both DatetimeIndex and
76+
CFTimeIndex, use microseconds in the conversion from timedelta objects
77+
to floats to avoid overflow errors (:issue:`3641`, :pull:`3631`).
78+
By David Huard `<https://github.com/huard>`_.
7479

7580
Bug fixes
7681
~~~~~~~~~

xarray/coding/cftimeindex.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -430,7 +430,14 @@ def __sub__(self, other):
430430
import cftime
431431

432432
if isinstance(other, (CFTimeIndex, cftime.datetime)):
433-
return pd.TimedeltaIndex(np.array(self) - np.array(other))
433+
try:
434+
return pd.TimedeltaIndex(np.array(self) - np.array(other))
435+
except OverflowError:
436+
raise ValueError(
437+
"The time difference exceeds the range of values "
438+
"that can be expressed at the nanosecond resolution."
439+
)
440+
434441
elif isinstance(other, pd.TimedeltaIndex):
435442
return CFTimeIndex(np.array(self) - other.to_pytimedelta())
436443
else:

xarray/core/dataarray.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
cast,
1919
)
2020

21+
import datetime
2122
import numpy as np
2223
import pandas as pd
2324

@@ -2041,7 +2042,9 @@ def interpolate_na(
20412042
method: str = "linear",
20422043
limit: int = None,
20432044
use_coordinate: Union[bool, str] = True,
2044-
max_gap: Union[int, float, str, pd.Timedelta, np.timedelta64] = None,
2045+
max_gap: Union[
2046+
int, float, str, pd.Timedelta, np.timedelta64, datetime.timedelta
2047+
] = None,
20452048
**kwargs: Any,
20462049
) -> "DataArray":
20472050
"""Fill in NaNs by interpolating according to different methods.
@@ -2073,14 +2076,15 @@ def interpolate_na(
20732076
or None for no limit. This filling is done regardless of the size of
20742077
the gap in the data. To only interpolate over gaps less than a given length,
20752078
see ``max_gap``.
2076-
max_gap: int, float, str, pandas.Timedelta, numpy.timedelta64, default None.
2079+
max_gap: int, float, str, pandas.Timedelta, numpy.timedelta64, datetime.timedelta, default None.
20772080
Maximum size of gap, a continuous sequence of NaNs, that will be filled.
20782081
Use None for no limit. When interpolating along a datetime64 dimension
20792082
and ``use_coordinate=True``, ``max_gap`` can be one of the following:
20802083
20812084
- a string that is valid input for pandas.to_timedelta
20822085
- a :py:class:`numpy.timedelta64` object
20832086
- a :py:class:`pandas.Timedelta` object
2087+
- a :py:class:`datetime.timedelta` object
20842088
20852089
Otherwise, ``max_gap`` must be an int or a float. Use of ``max_gap`` with unlabeled
20862090
dimensions has not been implemented yet. Gap length is defined as the difference

xarray/core/dataset.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
cast,
2828
)
2929

30+
import datetime
3031
import numpy as np
3132
import pandas as pd
3233

@@ -3995,7 +3996,9 @@ def interpolate_na(
39953996
method: str = "linear",
39963997
limit: int = None,
39973998
use_coordinate: Union[bool, Hashable] = True,
3998-
max_gap: Union[int, float, str, pd.Timedelta, np.timedelta64] = None,
3999+
max_gap: Union[
4000+
int, float, str, pd.Timedelta, np.timedelta64, datetime.timedelta
4001+
] = None,
39994002
**kwargs: Any,
40004003
) -> "Dataset":
40014004
"""Fill in NaNs by interpolating according to different methods.
@@ -4028,14 +4031,15 @@ def interpolate_na(
40284031
or None for no limit. This filling is done regardless of the size of
40294032
the gap in the data. To only interpolate over gaps less than a given length,
40304033
see ``max_gap``.
4031-
max_gap: int, float, str, pandas.Timedelta, numpy.timedelta64, default None.
4034+
max_gap: int, float, str, pandas.Timedelta, numpy.timedelta64, datetime.timedelta, default None.
40324035
Maximum size of gap, a continuous sequence of NaNs, that will be filled.
40334036
Use None for no limit. When interpolating along a datetime64 dimension
40344037
and ``use_coordinate=True``, ``max_gap`` can be one of the following:
40354038
40364039
- a string that is valid input for pandas.to_timedelta
40374040
- a :py:class:`numpy.timedelta64` object
40384041
- a :py:class:`pandas.Timedelta` object
4042+
- a :py:class:`datetime.timedelta` object
40394043
40404044
Otherwise, ``max_gap`` must be an int or a float. Use of ``max_gap`` with unlabeled
40414045
dimensions has not been implemented yet. Gap length is defined as the difference

xarray/core/duck_array_ops.py

Lines changed: 106 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -372,51 +372,141 @@ def _datetime_nanmin(array):
372372

373373

374374
def datetime_to_numeric(array, offset=None, datetime_unit=None, dtype=float):
375-
"""Convert an array containing datetime-like data to an array of floats.
375+
"""Convert an array containing datetime-like data to numerical values.
376+
377+
Convert the datetime array to a timedelta relative to an offset.
376378
377379
Parameters
378380
----------
379-
da : np.array
380-
Input data
381-
offset: Scalar with the same type of array or None
382-
If None, subtract minimum values to reduce round off error
383-
datetime_unit: None or any of {'Y', 'M', 'W', 'D', 'h', 'm', 's', 'ms',
384-
'us', 'ns', 'ps', 'fs', 'as'}
385-
dtype: target dtype
381+
da : array-like
382+
Input data
383+
offset: None, datetime or cftime.datetime
384+
Datetime offset. If None, this is set by default to the array's minimum
385+
value to reduce round off errors.
386+
datetime_unit: {None, Y, M, W, D, h, m, s, ms, us, ns, ps, fs, as}
387+
If not None, convert output to a given datetime unit. Note that some
388+
conversions are not allowed due to non-linear relationships between units.
389+
dtype: dtype
390+
Output dtype.
386391
387392
Returns
388393
-------
389394
array
395+
Numerical representation of datetime object relative to an offset.
396+
397+
Notes
398+
-----
399+
Some datetime unit conversions won't work, for example from days to years, even
400+
though some calendars would allow for them (e.g. no_leap). This is because there
401+
is no `cftime.timedelta` object.
390402
"""
391403
# TODO: make this function dask-compatible?
404+
# Set offset to minimum if not given
392405
if offset is None:
393406
if array.dtype.kind in "Mm":
394407
offset = _datetime_nanmin(array)
395408
else:
396409
offset = min(array)
410+
411+
# Compute timedelta object.
412+
# For np.datetime64, this can silently yield garbage due to overflow.
413+
# One option is to enforce 1970-01-01 as the universal offset.
397414
array = array - offset
398415

399-
if not hasattr(array, "dtype"): # scalar is converted to 0d-array
416+
# Scalar is converted to 0d-array
417+
if not hasattr(array, "dtype"):
400418
array = np.array(array)
401419

420+
# Convert timedelta objects to float by first converting to microseconds.
402421
if array.dtype.kind in "O":
403-
# possibly convert object array containing datetime.timedelta
404-
array = np.asarray(pd.Series(array.ravel())).reshape(array.shape)
422+
return py_timedelta_to_float(array, datetime_unit or "ns").astype(dtype)
405423

406-
if datetime_unit:
407-
array = array / np.timedelta64(1, datetime_unit)
424+
# Convert np.NaT to np.nan
425+
elif array.dtype.kind in "mM":
408426

409-
# convert np.NaT to np.nan
410-
if array.dtype.kind in "mM":
427+
# Convert to specified timedelta units.
428+
if datetime_unit:
429+
array = array / np.timedelta64(1, datetime_unit)
411430
return np.where(isnull(array), np.nan, array.astype(dtype))
412-
return array.astype(dtype)
431+
432+
433+
def timedelta_to_numeric(value, datetime_unit="ns", dtype=float):
434+
"""Convert a timedelta-like object to numerical values.
435+
436+
Parameters
437+
----------
438+
value : datetime.timedelta, numpy.timedelta64, pandas.Timedelta, str
439+
Time delta representation.
440+
datetime_unit : {Y, M, W, D, h, m, s, ms, us, ns, ps, fs, as}
441+
The time units of the output values. Note that some conversions are not allowed due to
442+
non-linear relationships between units.
443+
dtype : type
444+
The output data type.
445+
446+
"""
447+
import datetime as dt
448+
449+
if isinstance(value, dt.timedelta):
450+
out = py_timedelta_to_float(value, datetime_unit)
451+
elif isinstance(value, np.timedelta64):
452+
out = np_timedelta64_to_float(value, datetime_unit)
453+
elif isinstance(value, pd.Timedelta):
454+
out = pd_timedelta_to_float(value, datetime_unit)
455+
elif isinstance(value, str):
456+
try:
457+
a = pd.to_timedelta(value)
458+
except ValueError:
459+
raise ValueError(
460+
f"Could not convert {value!r} to timedelta64 using pandas.to_timedelta"
461+
)
462+
return py_timedelta_to_float(a, datetime_unit)
463+
else:
464+
raise TypeError(
465+
f"Expected value of type str, pandas.Timedelta, datetime.timedelta "
466+
f"or numpy.timedelta64, but received {type(value).__name__}"
467+
)
468+
return out.astype(dtype)
413469

414470

415471
def _to_pytimedelta(array, unit="us"):
416472
index = pd.TimedeltaIndex(array.ravel(), unit=unit)
417473
return index.to_pytimedelta().reshape(array.shape)
418474

419475

476+
def np_timedelta64_to_float(array, datetime_unit):
477+
"""Convert numpy.timedelta64 to float.
478+
479+
Notes
480+
-----
481+
The array is first converted to microseconds, which is less likely to
482+
cause overflow errors.
483+
"""
484+
array = array.astype("timedelta64[ns]").astype(np.float64)
485+
conversion_factor = np.timedelta64(1, "ns") / np.timedelta64(1, datetime_unit)
486+
return conversion_factor * array
487+
488+
489+
def pd_timedelta_to_float(value, datetime_unit):
490+
"""Convert pandas.Timedelta to float.
491+
492+
Notes
493+
-----
494+
Built on the assumption that pandas timedelta values are in nanoseconds,
495+
which is also the numpy default resolution.
496+
"""
497+
value = value.to_timedelta64()
498+
return np_timedelta64_to_float(value, datetime_unit)
499+
500+
501+
def py_timedelta_to_float(array, datetime_unit):
502+
"""Convert a timedelta object to a float, possibly at a loss of resolution.
503+
"""
504+
array = np.asarray(array)
505+
array = np.reshape([a.total_seconds() for a in array.ravel()], array.shape) * 1e6
506+
conversion_factor = np.timedelta64(1, "us") / np.timedelta64(1, datetime_unit)
507+
return conversion_factor * array
508+
509+
420510
def mean(array, axis=None, skipna=None, **kwargs):
421511
"""inhouse mean that can handle np.datetime64 or cftime.datetime
422512
dtypes"""

0 commit comments

Comments
 (0)