Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only copy datetime64 data if it is using non-nanosecond precision. #125

Closed
wants to merge 17 commits into from

Conversation

akleeman
Copy link
Contributor

In an attempt to coerce all datetime arrays to nano second resolutoin utils.as_safe_array() was creating copies of any datetime64 array (via the astype method). This was causing unexpected behavior (bugs) for things such as concatenation over times. (see below).

import xray
import pandas as pd

ds = xray.Dataset()
ds['time'] = ('time', pd.date_range('2011-09-01', '2011-09-11'))
times = [ds.indexed(time=[i]) for i in range(10)]
ret = xray.Dataset.concat(times, 'time')
print ret['time']

<xray.DataArray 'time' (time: 10)>
array(['1970-01-02T07:04:40.718526408-0800',
       '1969-12-31T16:00:00.099966608-0800',
       '1969-12-31T16:00:00.041748384-0800',
       '1969-12-31T16:00:00.041748360-0800',
       '1969-12-31T16:00:00.041748336-0800',
       '1969-12-31T16:00:00.041748312-0800',
       '1969-12-31T16:00:00.041748288-0800',
       '1969-12-31T16:00:00.041748264-0800',
       '1969-12-31T16:00:00.041748240-0800',
       '1969-12-31T16:00:00.041748216-0800'], dtype='datetime64[ns]')
Attributes:
    Empty

@shoyer shoyer added the bug label May 12, 2014
@shoyer
Copy link
Member

shoyer commented May 12, 2014

Wow, that is nasty! As you can see, we currently have a lot of awkward hacks to work around numpy's semi-broken datetime64, and it looks like this fix broke some of them -- hence the failing Travis builds.

Maybe we need some special logic in Variable.concat instead? My guess is that the trouble here is related to all these datetime64 objects (plain np.datetime64 objects, not arrays with dtype='datetime64', because 0-dimensional datetime64 arrays are broken) being put into an array directly, which at some point implicitly converts them into integers.

@akleeman
Copy link
Contributor Author

Yeah this gets tricky. Fixed part of the problem by reverting to using np.asarray instead of as_array_or_item in NumpyArrayWrapper. But I'm not sure thats the full solution, like you mentioned the problem is much deeper, though I don't think pushing the datetime nastiness into higher level functions (such as concat) is a great option.

Also, I've been hoping to get the way dates are handled to be slightly more consistent, since as it currently stands its hard to know which data type dates are being stored as.

> d64 =  np.datetime64(d)
> print xray.Variable(['time'], [d]).dtype
dtype('O')
> print xray.Variable(['time'], [d64]).dtype
dtype('<M8[ns]')
> print d64.dtype
dtype('<M8[us]')

I'm going to attempt getting utils.as_safe_array to convert from datetime objects to datetime64 objects which should make things a little clearer, but still doesn't solve the whole problem.

@shoyer
Copy link
Member

shoyer commented May 13, 2014

Indeed, it would be nice to make that consistent!

I was making some effort to not automatically convert python or netCDF4 datetime objects into numpy.datetime64, for the hypothetical situation where people care about dates before 1677 or aren't using standard calendars (but this will change with #126). But if that's too complicated, feel free to insist on the pandas approach of converting everything to nanosecond datetime64.

@shoyer
Copy link
Member

shoyer commented May 16, 2014

Let me know how this is coming along!

I'd like to release v0.1.1 within the next few days, since #129 means that ReadTheDocs wasn't able to build versioned docs for v0.1, and we should have a static version of our docs built for the current release. At this point, adding and/or documenting new features means that the docs get out of sync with the latest release on pypi, which is obviously non-ideal. For example, the tutorial now mentions loading groups from NetCDF files even though that's not in v0.1.

I'm going to save #128 and any other highly visible changes for v0.2, but if think you're close to a fix for this issue I'd love to get it in 0.1.1. If not, it's no big deal to wait for 0.2, which I'm guessing will follow within a month or so.

Joe Hamman and others added 17 commits May 16, 2014 01:17
…e ability to fallback to netCDF4.datetime objects since this solution does not fix the problem for all calendars in all situations.
ReadTheDocs comes with pre-built packages for the basic scientific python
stack, but some of these packages are old (e.g., numpy is 1.7.1). The only way
to upgrade packages on readthedocs is to use a virtual environment and a
requirements.txt.

Unfortunately, this means we can't upgrade both numpy and pandas
simultaneously, because pandas may get built first and link against the wrong
version of numpy. We inadvertantly stumbled upon a work around to build the
"latest" docs by first installing numpy in the (cached) virtual environment,
and then later (in another commit), adding pandas to the requirements.txt
file.

However, this is a real hack and makes it impossible to maintain different
versions of the docs, such as for tagged releases. Accordingly, this commit
relaxes the numpy version requirement so we can use a version that readthedocs
already has installed. (We actually don't really need a newer version of numpy
for any current functionality in xray, although it's nice to have for support
for missing value functions like nanmean.)
Expanded warnings for when non numpy.datetime64 arrays are returned.
Use context manager for warnings in test_conventions.py.
an attempt to standardize time handling.

NOTE: virtual variable slicing is now broken, so do not submit.
@@ -68,12 +68,15 @@ def _as_compatible_data(data):
data = data.values
except AttributeError:
pass

if (data.dtype.kind == 'O' or
(data.dtype.kind == 'M' and not data.dtype == '<M8[ns]')):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please indent by an extra tab to separate the condition from the following statement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. PEP presents several options here, one of which is what I have. That said, feel free to modify as you see fit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine :)

@shoyer
Copy link
Member

shoyer commented May 16, 2014

Maybe best to open a new PR so it's clear what is new and what is just from the rebase?

@shoyer
Copy link
Member

shoyer commented May 17, 2014

Did you end up adding your example (concatenating the time objects) as a regression test? That seems like a good idea to ensure that this stays fixed!

self.assertEqual(type(x.values[0]), type(value))
self.assertEqual(type(x[0].values), type(value))
self.assertEqual(type(x.values[0]), type(expected))
if not x.dtype.kind == 'M':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure about making this exception... it leaves us returning non-datetime-datetimes like:

(Pdb) x[0].values
array(946684800000000000, dtype='datetime64[ns]')

@shoyer
Copy link
Member

shoyer commented May 17, 2014

I've been playing around with this but don't have a full fix yet. Here is my regression test for your bug:

    def test_index_and_concat_datetime64(self):
        # regression test for #125
        expected = Variable('t', pd.date_range('2011-09-01', periods=10))
        times = [expected[[i]] for i in range(10)]
        actual = Variable.concat(times, 't')
        self.assertArrayEqual(expected, actual)

shoyer added a commit to shoyer/xarray that referenced this pull request May 19, 2014
This is a more direct fix for the bug @akleeman reported pydata#125.

Previously, it did not always work to modify Variable.values in-place (as done
in Variable.concat), because as_array_or_item could return a copy of an
ndarray instead of a view.

@akleeman I didn't incorporate your tests for the
@shoyer shoyer closed this May 20, 2014
keewis pushed a commit to keewis/xarray that referenced this pull request Jan 17, 2024
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 2 to 3.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@v2...v3)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants