Only copy datetime64 data if it is using non-nanosecond precision. #125

akleeman · 2014-05-12T13:36:22Z

In an attempt to coerce all datetime arrays to nano second resolutoin utils.as_safe_array() was creating copies of any datetime64 array (via the astype method). This was causing unexpected behavior (bugs) for things such as concatenation over times. (see below).

import xray
import pandas as pd

ds = xray.Dataset()
ds['time'] = ('time', pd.date_range('2011-09-01', '2011-09-11'))
times = [ds.indexed(time=[i]) for i in range(10)]
ret = xray.Dataset.concat(times, 'time')
print ret['time']

<xray.DataArray 'time' (time: 10)>
array(['1970-01-02T07:04:40.718526408-0800',
       '1969-12-31T16:00:00.099966608-0800',
       '1969-12-31T16:00:00.041748384-0800',
       '1969-12-31T16:00:00.041748360-0800',
       '1969-12-31T16:00:00.041748336-0800',
       '1969-12-31T16:00:00.041748312-0800',
       '1969-12-31T16:00:00.041748288-0800',
       '1969-12-31T16:00:00.041748264-0800',
       '1969-12-31T16:00:00.041748240-0800',
       '1969-12-31T16:00:00.041748216-0800'], dtype='datetime64[ns]')
Attributes:
    Empty

shoyer · 2014-05-12T17:39:35Z

Wow, that is nasty! As you can see, we currently have a lot of awkward hacks to work around numpy's semi-broken datetime64, and it looks like this fix broke some of them -- hence the failing Travis builds.

Maybe we need some special logic in Variable.concat instead? My guess is that the trouble here is related to all these datetime64 objects (plain np.datetime64 objects, not arrays with dtype='datetime64', because 0-dimensional datetime64 arrays are broken) being put into an array directly, which at some point implicitly converts them into integers.

akleeman · 2014-05-13T13:02:22Z

Yeah this gets tricky. Fixed part of the problem by reverting to using np.asarray instead of as_array_or_item in NumpyArrayWrapper. But I'm not sure thats the full solution, like you mentioned the problem is much deeper, though I don't think pushing the datetime nastiness into higher level functions (such as concat) is a great option.

Also, I've been hoping to get the way dates are handled to be slightly more consistent, since as it currently stands its hard to know which data type dates are being stored as.

> d64 =  np.datetime64(d)
> print xray.Variable(['time'], [d]).dtype
dtype('O')
> print xray.Variable(['time'], [d64]).dtype
dtype('<M8[ns]')
> print d64.dtype
dtype('<M8[us]')

I'm going to attempt getting utils.as_safe_array to convert from datetime objects to datetime64 objects which should make things a little clearer, but still doesn't solve the whole problem.

shoyer · 2014-05-13T16:56:33Z

Indeed, it would be nice to make that consistent!

I was making some effort to not automatically convert python or netCDF4 datetime objects into numpy.datetime64, for the hypothetical situation where people care about dates before 1677 or aren't using standard calendars (but this will change with #126). But if that's too complicated, feel free to insist on the pandas approach of converting everything to nanosecond datetime64.

shoyer · 2014-05-16T06:52:23Z

Let me know how this is coming along!

I'd like to release v0.1.1 within the next few days, since #129 means that ReadTheDocs wasn't able to build versioned docs for v0.1, and we should have a static version of our docs built for the current release. At this point, adding and/or documenting new features means that the docs get out of sync with the latest release on pypi, which is obviously non-ideal. For example, the tutorial now mentions loading groups from NetCDF files even though that's not in v0.1.

I'm going to save #128 and any other highly visible changes for v0.2, but if think you're close to a fix for this issue I'd love to get it in 0.1.1. If not, it's no big deal to wait for 0.2, which I'm guessing will follow within a month or so.

…o support virtual_variable indexing, pydata#121

…64 object to valid years (1677<time<2262)

…e ability to fallback to netCDF4.datetime objects since this solution does not fix the problem for all calendars in all situations.

ReadTheDocs comes with pre-built packages for the basic scientific python stack, but some of these packages are old (e.g., numpy is 1.7.1). The only way to upgrade packages on readthedocs is to use a virtual environment and a requirements.txt. Unfortunately, this means we can't upgrade both numpy and pandas simultaneously, because pandas may get built first and link against the wrong version of numpy. We inadvertantly stumbled upon a work around to build the "latest" docs by first installing numpy in the (cached) virtual environment, and then later (in another commit), adding pandas to the requirements.txt file. However, this is a real hack and makes it impossible to maintain different versions of the docs, such as for tagged releases. Accordingly, this commit relaxes the numpy version requirement so we can use a version that readthedocs already has installed. (We actually don't really need a newer version of numpy for any current functionality in xray, although it's nice to have for support for missing value functions like nanmean.)

Expanded warnings for when non numpy.datetime64 arrays are returned. Use context manager for warnings in test_conventions.py.

…uring fallback of single element

an attempt to standardize time handling. NOTE: virtual variable slicing is now broken, so do not submit.

variables get cast to integers.

shoyer · 2014-05-16T18:16:56Z

xray/variable.py

@@ -68,12 +68,15 @@ def _as_compatible_data(data):
            data = data.values
        except AttributeError:
            pass
-
+    if (data.dtype.kind == 'O' or
+        (data.dtype.kind == 'M' and not data.dtype == '<M8[ns]')):


Please indent by an extra tab to separate the condition from the following statement

I'm not sure what you mean. PEP presents several options here, one of which is what I have. That said, feel free to modify as you see fit.

shoyer · 2014-05-16T18:30:49Z

Maybe best to open a new PR so it's clear what is new and what is just from the rebase?

shoyer · 2014-05-17T00:24:16Z

Did you end up adding your example (concatenating the time objects) as a regression test? That seems like a good idea to ensure that this stays fixed!

shoyer · 2014-05-17T03:38:49Z

test/test_variable.py

-                self.assertEqual(type(x.values[0]), type(value))
-                self.assertEqual(type(x[0].values), type(value))
+                self.assertEqual(type(x.values[0]), type(expected))
+                if not x.dtype.kind == 'M':


I'm not so sure about making this exception... it leaves us returning non-datetime-datetimes like:

(Pdb) x[0].values array(946684800000000000, dtype='datetime64[ns]')

shoyer · 2014-05-17T03:39:56Z

I've been playing around with this but don't have a full fix yet. Here is my regression test for your bug:

    def test_index_and_concat_datetime64(self):
        # regression test for #125
        expected = Variable('t', pd.date_range('2011-09-01', periods=10))
        times = [expected[[i]] for i in range(10)]
        actual = Variable.concat(times, 't')
        self.assertArrayEqual(expected, actual)

@akleeman

This is a more direct fix for the bug @akleeman reported pydata#125. Previously, it did not always work to modify Variable.values in-place (as done in Variable.concat), because as_array_or_item could return a copy of an ndarray instead of a view. @akleeman I didn't incorporate your tests for the

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 2 to 3. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v2...v3) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

shoyer added the bug label May 12, 2014

Joe Hamman and others added 17 commits May 16, 2014 01:17

convert array of netCDF4.datetime objects to numpy.datetime64 array t…

b303744

…o support virtual_variable indexing, pydata#121

restrict the conversion of netCDF4.datetime objects to numpy.datetime…

9dfe284

…64 object to valid years (1677<time<2262)

Handle arbitrary shape time arrays in nctime_to_nptime. Also added th…

edcc854

…e ability to fallback to netCDF4.datetime objects since this solution does not fix the problem for all calendars in all situations.

initial implementation of support for NetCDF groups

0d6a745

cleanup fallback code a bit

e41048c

use IOError; safer recursion; don't worry about group cycles

c406d2b

simplify group access to avoid recursion; test missing group error

1c6b032

Add further testing for single element time ordinals.

8d95d38

Expanded warnings for when non numpy.datetime64 arrays are returned. Use context manager for warnings in test_conventions.py.

filter warnings from earlier tests

e25cfcd

correct filter message for warnings

a3f21d3

add test to catch fallback warnings, insure returning a numpy array d…

69a9aa7

…uring fallback of single element

fix python 3 build; right way to test exception is raised

3dd0d69

Only copy datetime64 data if it is using non-nanosecond precision.

3e870e2

Convert datetime objects to datetime64 upon variable creation in

d88e54f

an attempt to standardize time handling. NOTE: virtual variable slicing is now broken, so do not submit.

Added utils.safe_timestamp() which avoids issues when datetime[ns]

6bf4cc3

variables get cast to integers.

Updated test_variable to deal with python 3 unicode types

d09708a

shoyer reviewed May 16, 2014
View reviewed changes

shoyer reviewed May 17, 2014
View reviewed changes

This was referenced May 19, 2014

Fix concatenating Variables with dtype=datetime64 #134

Merged

More consistent datetime conversion #136

Closed

shoyer closed this May 20, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only copy datetime64 data if it is using non-nanosecond precision. #125

Only copy datetime64 data if it is using non-nanosecond precision. #125

akleeman commented May 12, 2014

shoyer commented May 12, 2014

akleeman commented May 13, 2014

shoyer commented May 13, 2014

shoyer commented May 16, 2014

shoyer May 16, 2014

akleeman May 16, 2014

shoyer May 17, 2014

shoyer commented May 16, 2014

shoyer commented May 17, 2014

shoyer May 17, 2014

shoyer commented May 17, 2014

Only copy datetime64 data if it is using non-nanosecond precision. #125

Only copy datetime64 data if it is using non-nanosecond precision. #125

Conversation

akleeman commented May 12, 2014

shoyer commented May 12, 2014

akleeman commented May 13, 2014

shoyer commented May 13, 2014

shoyer commented May 16, 2014

shoyer May 16, 2014

Choose a reason for hiding this comment

akleeman May 16, 2014

Choose a reason for hiding this comment

shoyer May 17, 2014

Choose a reason for hiding this comment

shoyer commented May 16, 2014

shoyer commented May 17, 2014

shoyer May 17, 2014

Choose a reason for hiding this comment

shoyer commented May 17, 2014