Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of rust based cftime #8322

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

antscloud
Copy link
Contributor

As discussed in #8302, here is a first attempt to implement cftime_rs.

There are a lot of tests and I struggle to understand all the processing in coding/times.py.
However, with this first attempt I've been able to make the test_cf_datetime work (ignoring one test)

@requires_cftime
@pytest.mark.filterwarnings("ignore:Ambiguous reference date string")
@pytest.mark.filterwarnings("ignore:Times can't be serialized faithfully")
@pytest.mark.parametrize(["num_dates", "units", "calendar"], _CF_DATETIME_TESTS)
def test_cf_datetime(num_dates, units, calendar) -> None:

Also there are some key differences betwwen cftime and cftime-rs :

  • A long int is used to represent the timestamp internally, so cftime-rs will not overflow as soon as numpy, python or cftime. It can go from -291,672,107,014 BC to 291,672,107,014 AD approximately and this depends on calendar.
  • There is no only_use_python_datetimes argument. Instead there are 4 distinct functions :
  • These functions only take a python list of one dimension and return a list of one dimension. A conversion should be done before.
  • There is no multiple datetime type (there are hidden) but instead a single object PyCFDatetime
  • There is no conda repository at the moment

Finally, and regardless of this PR, I guess there could be a speed improvement by vectorizing operations by replacing this :

def _encode_datetime_with_cftime(dates, units: str, calendar: str) -> np.ndarray:
"""Fallback method for encoding dates using cftime.
This method is more flexible than xarray's parsing using datetime64[ns]
arrays but also slower because it loops over each element.
"""
if cftime is None:
raise ModuleNotFoundError("No module named 'cftime'")
if np.issubdtype(dates.dtype, np.datetime64):
# numpy's broken datetime conversion only works for us precision
dates = dates.astype("M8[us]").astype(datetime)
def encode_datetime(d):
# Since netCDF files do not support storing float128 values, we ensure
# that float64 values are used by setting longdouble=False in num2date.
# This try except logic can be removed when xarray's minimum version of
# cftime is at least 1.6.2.
try:
return (
np.nan
if d is None
else cftime.date2num(d, units, calendar, longdouble=False)
)
except TypeError:
return np.nan if d is None else cftime.date2num(d, units, calendar)
return np.array([encode_datetime(d) for d in dates.ravel()]).reshape(dates.shape)

by something like this :

def encode_datetime_with_cftime(
dates, units: str, calendar: str
) -> np.ndarray[int | float]:
"""Fallback method for encoding dates using cftime.
This method is more flexible than xarray's parsing using datetime64[ns]
arrays but also slower because it loops over each element.
"""
if cftime is None:
raise ModuleNotFoundError("No module named 'cftime-rs'")
dates = np.array(dates)
if np.issubdtype(dates.dtype, np.datetime64):
# numpy's broken datetime conversion only works for us precision
dates = dates.astype("M8[us]").astype(datetime)
# Find all the none or NaN position
none_position = np.equal(dates, None)
# Remove NaN from the dates
filtered_dates = dates[~none_position]
print(filtered_dates)
# encoded_nums will be the same size as filtered_dates
# Try converting to f64 first to avoid unnecessary conversion to i64
try:
encoded_nums = cftime.pydate2num(
filtered_dates.tolist(), units, calendar, dtype="f64"
)
except TypeError:
encoded_nums = cftime.pydate2num(
filtered_dates.tolist(), units, calendar, dtype="i64"
)
# Create a full matrix of NaN
# And fill the num dates in the not NaN or None position
result = np.full(dates.shape, np.nan)
result[np.nonzero(~none_position)] = encoded_nums
return result

We can use numpy instead of list comprehensions. It takes a bit more of memory though.

@github-actions github-actions bot added topic-cftime CI Continuous Integration tools dependencies Pull requests that update a dependency file labels Oct 17, 2023
@dcherian
Copy link
Contributor

Finally, and regardless of this PR, I guess there could be a speed improvement by vectorizing operations by replacing this

Interesting. Can you open a separate PR with that change please? It'd be nice to get it in ASAP if it works.

antscloud added a commit to antscloud/xarray that referenced this pull request Oct 17, 2023
@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label Oct 17, 2023
antscloud added a commit to antscloud/xarray that referenced this pull request Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration tools dependencies Pull requests that update a dependency file run-benchmark Run the ASV benchmark workflow topic-cftime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants