-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time dtype encoding defaulting to int64
when writing netcdf or zarr
#3942
Comments
I have run in to this problem before. The initial choice to use Note that you can always specify an encoding to make sure that you can append properly. |
I agree with Deepak. Xarray intelligently chooses its encoding when it write the initial dataset to make sure it has enough precision to resolve all times. It cannot magically know that, in the future, you plan to append data which requires greater precision. Your options are:
I also agree that we should definitely be raising a warning (or even an error) in your situation. |
Yep I managed to overcome this by manually setting encoding parameters, just wondering if there would be any downside in preferring |
I think I've bumped into a symptom of this issue (my issue is described in #5969). And I think #3379 may be another symptom of this issue. Perhaps I'm biased (because I work with timeseries which only span a few years) but I wonder if xarray should default to encoding time as If that's no good, then let's definitely add a note to the documentation to say that it might be a good idea for users to manually specify the encoding for datetimes if they wish to append to Zarrs. |
👍
Adding this error message would make it obvious that this is happening. PRs are very welcome! |
Cool, I agree that an error and a documentation change is likely to be sufficient 🙂 (and I'd be keen to write a PR to help out!) But, before we commit to that path: Please may I ask: Why not default to xarray encoding time as |
It's choosing the highest resolution that matches the data, which has the benefit of allowing the maximum possible time range given the data's frequency: Lines 317 to 319 in 5871637
I'm not sure if this is why it was originally chosen; but that is one advantage. Perhaps @spencerkclark has some insight here. |
This logic has been around in xarray for a long time (I think it dates back to #12!), so it predates me. If I had to guess though, it would have to do with the fact that back then, a form of
This of course is not true anymore. We no longer use To be honest, currently it seems the only remaining advantage to choosing a larger time encoding unit and proximate reference date is that it makes the raw encoded values a little more human-readable. However, encoding dates with units of |
Time
dtype
encoding defaults to"int64"
for datasets with only zero-hour times when writing to netcdf or zarr.This results in these datasets having a precision constrained by how the time units are defined (in the example below
daily
precision, given units are defined as'days since ...'
). If we for instance create a zarr dataset using this default encoding option with such datasets, and subsequently append some non-zero times onto it, we loose the hour/minute/sec information from the appended bits.MCVE Code Sample
Expected Output
Problem Description
Perhaps it would be useful defaulting time
dtype
to"float64"
. Another option could be using a finer time resolution by default than that automatically defined from xarray based on the dataset times (for instance, if the units are automatically defined as "days since ...", use "seconds since...".The text was updated successfully, but these errors were encountered: