Fix critical literal `np.timedelta64` encoding bugs #10469

spencerkclark · 2025-06-30T00:10:31Z

This PR fixes the critical np.timedelta64 encoding bugs introduced in #10101. I added a check to ensure that modifying the default _FillValue does not inadvertently mark non-missing values as missing as well.

Closes Writing timedelta64 to netCDF3 always raises an error #10466
Closes Cannot load timedelta64 encoded data from disk #10468
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

cc: @shoyer @kmuehlbauer; sorry again for the trouble.

shoyer

Thanks, Spencer!

shoyer · 2025-06-30T01:14:17Z

xarray/coding/times.py

+                # Overwrite the on-disk dtype encoding, which is numeric, with
+                # the dtype attribute stored on disk, which corresponds to
+                # a timedelta64 dtype.
+                encoding["dtype"] = attrs.pop("dtype")


I don't think we want to set datetime64 here as the encoded dtype, because this information is preserved in the dtype of the resulting Variable. We should leave encoding['dtype'] as the raw numeric dtype on disk.

I went back and forth on this a bit in the initial PR. Would we be open to renaming the attribute that encodes the timedelta64 dtype "xarray_dtype"? This way we would not need to worry about overwriting the numeric dtype in the encoding and can still retain information about the timedelta64 encoding method used.

I think it is potentially useful, at least while we still have two different methods of decoding timedelta64 values, to be able to distinguish between them for round-tripping purposes (i.e. so that we do not add new dtype attributes to files that did not have them previously). Maybe that should not be a priority though.

I think it is potentially useful, at least while we still have two different methods of decoding timedelta64 values, to be able to distinguish between them for round-tripping purposes (i.e. so that we do not add new dtype attributes to files that did not have them previously). Maybe that should not be a priority though.

I think users probably do want new dtype attributes saved if they are re-writing data with Xarray. This makes their datasets forward compatible with future versions of Xarray, and silences the deprecation warning about decoding changing in the future, in a way that feels cleaner to me than adding decode_timedelta=True to every open_dataset() call.

Ok great, that should simplify things.

shoyer · 2025-06-30T01:15:08Z

xarray/tests/test_backends.py

+            timedeltas = np.array(
+                [0, np.iinfo(np.int32).min, np.iinfo(np.int64).min]
+            ).astype("timedelta64[s]")


I would suggest doing this test with NaT as well (or instead)

Indeed this test covers that since np.iinfo(np.int64).min is NaT. I'll write the constructor like this instead so that becomes clearer:

timedeltas = np.array([0, np.iinfo(np.int32).min, "NaT"], dtype="timedelta64[s]")

shoyer · 2025-06-30T01:18:54Z

xarray/backends/netcdf3.py

+                    raise ValueError(
+                        f"Could not safely coerce default int64 _FillValue "
+                        f"({default_int64_fill_value}) to the analogous int32 "
+                        f"value ({default_int32_fill_value}), since it "
+                        f"already exists as non-missing within variable "
+                        f"{name!r}. Try explicitly setting "
+                        f"encoding['_FillValue'] to another int32 value."
+                    )


I don't think we need this error. We already have a check in coerce_nc3_dtype that the encoded int32 values match the original int64 values exactly.

I think this is needed to check something different, i.e. that the new fill value does not conflict with an existing value in the array. I believe the downstream check in coerce_nc3_dtype would pass in that circumstance since we replace the int64 fill value with the new int32 fill value upstream of it.

shoyer · 2025-06-30T01:24:48Z

One other strategic question -- do we need this separate logic for decoding into an integer dtype that exactly matches the precision of the datetime64 units?

xarray/xarray/coding/times.py

Lines 1456 to 1487 in 4a581f4

    
           else: 
        
               resolution, _ = np.datetime_data(variable.dtype) 
        
               dtype = np.int64 
        
               attrs_dtype = f"timedelta64[{resolution}]" 
        
               units = _numpy_dtype_to_netcdf_timeunit(variable.dtype) 
        
               safe_setitem(attrs, "dtype", attrs_dtype, name=name) 
        
               # Remove dtype encoding if it exists to prevent it from 
        
               # interfering downstream in NonStringCoder. 
        
               encoding.pop("dtype", None) 
        
               if any( 
        
                   k in encoding for k in _INVALID_LITERAL_TIMEDELTA64_ENCODING_KEYS 
        
               ): 
        
                   raise ValueError( 
        
                       f"Specifying 'add_offset' or 'scale_factor' is not " 
        
                       f"supported when encoding the timedelta64 values of " 
        
                       f"variable {name!r} with xarray's new default " 
        
                       f"timedelta64 encoding approach. To encode {name!r} " 
        
                       f"with xarray's previous timedelta64 encoding " 
        
                       f"approach, which supports the 'add_offset' and " 
        
                       f"'scale_factor' parameters, additionally set " 
        
                       f"encoding['units'] to a unit of time, e.g. " 
        
                       f"'seconds'. To proceed with encoding of {name!r} " 
        
                       f"via xarray's new approach, remove any encoding " 
        
                       f"entries for 'add_offset' or 'scale_factor'." 
        
                   ) 
        
               if "_FillValue" not in encoding and "missing_value" not in encoding: 
        
                   encoding["_FillValue"] = np.iinfo(np.int64).min 
        
           data, units = encode_cf_timedelta(data, units, dtype) 
        
           safe_setitem(attrs, "units", units, name=name) 
        
           return Variable(dims, data, attrs, encoding, fastpath=True)

I am wondering because it seems like this logic is already covered in encode_cf_timedelta(), and adding this explicit check makes means that timedelta64[ns] data cannot be saved directly to netCDF3 files unless the units are converted first. timedelta64[ns] worked via encode_cf_timedelta as long as nanosecond precision isn't really needed, similar to how datetime64[ns] is OK. This was convenient for users of "human" time units.

spencerkclark

Thanks for the quick feedback @shoyer!

I am wondering because it seems like this logic is already covered in encode_cf_timedelta(), and adding this explicit check makes means that timedelta64[ns] data cannot be saved directly to netCDF3 files unless the units are converted first. timedelta64[ns] worked via encode_cf_timedelta as long as nanosecond precision isn't really needed, similar to how datetime64[ns] is OK. This was convenient for users of "human" time units.

I agree this is annoying, and it seems like we should be able to relax it some. I'll think about it some more and see what I can do.

spencerkclark · 2025-06-30T21:09:57Z

xarray/backends/netcdf3.py

+                    raise ValueError(
+                        f"Could not safely coerce default int64 _FillValue "
+                        f"({default_int64_fill_value}) to the analogous int32 "
+                        f"value ({default_int32_fill_value}), since it "
+                        f"already exists as non-missing within variable "
+                        f"{name!r}. Try explicitly setting "
+                        f"encoding['_FillValue'] to another int32 value."
+                    )


I think this is needed to check something different, i.e. that the new fill value does not conflict with an existing value in the array. I believe the downstream check in coerce_nc3_dtype would pass in that circumstance since we replace the int64 fill value with the new int32 fill value upstream of it.

spencerkclark · 2025-06-30T21:10:14Z

xarray/coding/times.py

+                # Overwrite the on-disk dtype encoding, which is numeric, with
+                # the dtype attribute stored on disk, which corresponds to
+                # a timedelta64 dtype.
+                encoding["dtype"] = attrs.pop("dtype")


I went back and forth on this a bit in the initial PR. Would we be open to renaming the attribute that encodes the timedelta64 dtype "xarray_dtype"? This way we would not need to worry about overwriting the numeric dtype in the encoding and can still retain information about the timedelta64 encoding method used.

I think it is potentially useful, at least while we still have two different methods of decoding timedelta64 values, to be able to distinguish between them for round-tripping purposes (i.e. so that we do not add new dtype attributes to files that did not have them previously). Maybe that should not be a priority though.

spencerkclark · 2025-06-30T21:10:36Z

xarray/tests/test_backends.py

+            timedeltas = np.array(
+                [0, np.iinfo(np.int32).min, np.iinfo(np.int64).min]
+            ).astype("timedelta64[s]")


Indeed this test covers that since np.iinfo(np.int64).min is NaT. I'll write the constructor like this instead so that becomes clearer:

timedeltas = np.array([0, np.iinfo(np.int32).min, "NaT"], dtype="timedelta64[s]")

Fix literal timedelta encoding bugs

4a581f4

github-actions bot added topic-backends topic-documentation topic-cftime io labels Jun 30, 2025

shoyer reviewed Jun 30, 2025

View reviewed changes

spencerkclark commented Jun 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix critical literal `np.timedelta64` encoding bugs #10469

Fix critical literal `np.timedelta64` encoding bugs #10469

spencerkclark commented Jun 30, 2025

Uh oh!

shoyer left a comment

Uh oh!

shoyer Jun 30, 2025

Uh oh!

spencerkclark Jun 30, 2025

Uh oh!

shoyer Jun 30, 2025

Uh oh!

spencerkclark Jun 30, 2025

Uh oh!

shoyer Jun 30, 2025

Uh oh!

spencerkclark Jun 30, 2025

Uh oh!

shoyer Jun 30, 2025

Uh oh!

spencerkclark Jun 30, 2025

Uh oh!

shoyer commented Jun 30, 2025

Uh oh!

spencerkclark left a comment

Uh oh!

spencerkclark Jun 30, 2025

Uh oh!

spencerkclark Jun 30, 2025

Uh oh!

spencerkclark Jun 30, 2025

Uh oh!

Uh oh!

Uh oh!

Fix critical literal np.timedelta64 encoding bugs #10469

Are you sure you want to change the base?

Fix critical literal np.timedelta64 encoding bugs #10469

Conversation

spencerkclark commented Jun 30, 2025

Uh oh!

shoyer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shoyer commented Jun 30, 2025

Uh oh!

spencerkclark left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Fix critical literal `np.timedelta64` encoding bugs #10469

Fix critical literal `np.timedelta64` encoding bugs #10469