-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only copy datetime64 data if it is using non-nanosecond precision. #125
Conversation
Wow, that is nasty! As you can see, we currently have a lot of awkward hacks to work around numpy's semi-broken datetime64, and it looks like this fix broke some of them -- hence the failing Travis builds. Maybe we need some special logic in |
Yeah this gets tricky. Fixed part of the problem by reverting to using np.asarray instead of as_array_or_item in NumpyArrayWrapper. But I'm not sure thats the full solution, like you mentioned the problem is much deeper, though I don't think pushing the datetime nastiness into higher level functions (such as concat) is a great option. Also, I've been hoping to get the way dates are handled to be slightly more consistent, since as it currently stands its hard to know which data type dates are being stored as.
I'm going to attempt getting utils.as_safe_array to convert from datetime objects to datetime64 objects which should make things a little clearer, but still doesn't solve the whole problem. |
Indeed, it would be nice to make that consistent! I was making some effort to not automatically convert python or netCDF4 datetime objects into numpy.datetime64, for the hypothetical situation where people care about dates before 1677 or aren't using standard calendars (but this will change with #126). But if that's too complicated, feel free to insist on the pandas approach of converting everything to nanosecond datetime64. |
Let me know how this is coming along! I'd like to release v0.1.1 within the next few days, since #129 means that ReadTheDocs wasn't able to build versioned docs for v0.1, and we should have a static version of our docs built for the current release. At this point, adding and/or documenting new features means that the docs get out of sync with the latest release on pypi, which is obviously non-ideal. For example, the tutorial now mentions loading groups from NetCDF files even though that's not in v0.1. I'm going to save #128 and any other highly visible changes for v0.2, but if think you're close to a fix for this issue I'd love to get it in 0.1.1. If not, it's no big deal to wait for 0.2, which I'm guessing will follow within a month or so. |
…o support virtual_variable indexing, pydata#121
…64 object to valid years (1677<time<2262)
…e ability to fallback to netCDF4.datetime objects since this solution does not fix the problem for all calendars in all situations.
ReadTheDocs comes with pre-built packages for the basic scientific python stack, but some of these packages are old (e.g., numpy is 1.7.1). The only way to upgrade packages on readthedocs is to use a virtual environment and a requirements.txt. Unfortunately, this means we can't upgrade both numpy and pandas simultaneously, because pandas may get built first and link against the wrong version of numpy. We inadvertantly stumbled upon a work around to build the "latest" docs by first installing numpy in the (cached) virtual environment, and then later (in another commit), adding pandas to the requirements.txt file. However, this is a real hack and makes it impossible to maintain different versions of the docs, such as for tagged releases. Accordingly, this commit relaxes the numpy version requirement so we can use a version that readthedocs already has installed. (We actually don't really need a newer version of numpy for any current functionality in xray, although it's nice to have for support for missing value functions like nanmean.)
Expanded warnings for when non numpy.datetime64 arrays are returned. Use context manager for warnings in test_conventions.py.
…uring fallback of single element
an attempt to standardize time handling. NOTE: virtual variable slicing is now broken, so do not submit.
variables get cast to integers.
@@ -68,12 +68,15 @@ def _as_compatible_data(data): | |||
data = data.values | |||
except AttributeError: | |||
pass | |||
|
|||
if (data.dtype.kind == 'O' or | |||
(data.dtype.kind == 'M' and not data.dtype == '<M8[ns]')): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please indent by an extra tab to separate the condition from the following statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean. PEP presents several options here, one of which is what I have. That said, feel free to modify as you see fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine :)
Maybe best to open a new PR so it's clear what is new and what is just from the rebase? |
Did you end up adding your example (concatenating the time objects) as a regression test? That seems like a good idea to ensure that this stays fixed! |
self.assertEqual(type(x.values[0]), type(value)) | ||
self.assertEqual(type(x[0].values), type(value)) | ||
self.assertEqual(type(x.values[0]), type(expected)) | ||
if not x.dtype.kind == 'M': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure about making this exception... it leaves us returning non-datetime-datetimes like:
(Pdb) x[0].values
array(946684800000000000, dtype='datetime64[ns]')
I've been playing around with this but don't have a full fix yet. Here is my regression test for your bug:
|
This is a more direct fix for the bug @akleeman reported pydata#125. Previously, it did not always work to modify Variable.values in-place (as done in Variable.concat), because as_array_or_item could return a copy of an ndarray instead of a view. @akleeman I didn't incorporate your tests for the
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 2 to 3. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v2...v3) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
In an attempt to coerce all datetime arrays to nano second resolutoin utils.as_safe_array() was creating copies of any datetime64 array (via the astype method). This was causing unexpected behavior (bugs) for things such as concatenation over times. (see below).