Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems parsing time variable using open_dataset #118

Closed
jhamman opened this issue May 8, 2014 · 4 comments · Fixed by #119
Closed

Problems parsing time variable using open_dataset #118

jhamman opened this issue May 8, 2014 · 4 comments · Fixed by #119
Labels

Comments

@jhamman
Copy link
Member

jhamman commented May 8, 2014

I'm noticing a problem parsing the time variable for at least the noleap calendar for a properly formatted time dimension. Any thoughts on why this is?

ncdump -c -t sample_for_xray.nc 
netcdf sample_for_xray {
dimensions:
    time = UNLIMITED ; // (4 currently)
    y = 205 ;
    x = 275 ;
variables:
    double Wind(time, y, x) ;
        Wind:units = "m/s" ;
        Wind:long_name = "Wind speed" ;
        Wind:coordinates = "latitude longitude" ;
        Wind:dimensions = "2" ;
        Wind:type_preferred = "double" ;
        Wind:time_rep = "instantaneous" ;
        Wind:_FillValue = 9.96920996838687e+36 ;
    double time(time) ;
        time:calendar = "noleap" ;
        time:dimensions = "1" ;
        time:long_name = "time" ;
        time:type_preferred = "int" ;
        time:units = "days since 0001-01-01 0:0:0" ;

// global attributes:
        ...
data:

 time = "1979-09-16 12", "1979-10-17", "1979-11-16 12", "1979-12-17" ;
ds = xray.open_dataset('sample_for_xray.nc')
print ds['time']
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-46-65c280e7a283> in <module>()
      1 ds = xray.open_dataset('sample_for_xray.nc')
----> 2 print ds['time']

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/common.pyc in __repr__(self)
     40 
     41     def __repr__(self):
---> 42         return array_repr(self)
     43 
     44     def _iter(self):

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/common.pyc in array_repr(arr)
    122     summary = ['<xray.%s %s(%s)>'% (type(arr).__name__, name_str, dim_summary)]
    123     if arr.size < 1e5 or arr._in_memory():
--> 124         summary.append(repr(arr.values))
    125     else:
    126         summary.append('[%s values with dtype=%s]' % (arr.size, arr.dtype))

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/data_array.pyc in values(self)
    147     def values(self):
    148         """The variables's data as a numpy.ndarray"""
--> 149         return self.variable.values
    150 
    151     @values.setter

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/variable.pyc in values(self)
    217     def values(self):
    218         """The variable's data as a numpy.ndarray"""
--> 219         return utils.as_array_or_item(self._data_cached())
    220 
    221     @values.setter

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/utils.pyc in as_array_or_item(values, dtype)
     56         # converted into an integer instead :(
     57         return values
---> 58     values = as_safe_array(values, dtype=dtype)
     59     if values.ndim == 0 and values.dtype.kind == 'O':
     60         # unpack 0d object arrays to be consistent with numpy

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/utils.pyc in as_safe_array(values, dtype)
     40     """Like np.asarray, but convert all datetime64 arrays to ns precision
     41     """
---> 42     values = np.asarray(values, dtype=dtype)
     43     if values.dtype.kind == 'M':
     44         # np.datetime64

/home/jhamman/anaconda/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
    458 
    459     """
--> 460     return array(a, dtype, copy=False, order=order)
    461 
    462 def asanyarray(a, dtype=None, order=None):

/home/jhamman/anaconda/lib/python2.7/site-packages/xray/variable.pyc in __array__(self, dtype)
    121         if dtype is None:
    122             dtype = self.dtype
--> 123         return self.array.values.astype(dtype)
    124 
    125     def __getitem__(self, key):

TypeError: Cannot cast datetime.date object from metadata [D] to [ns] according to the rule 'same_kind'

This file is available here: ftp://ftp.hydro.washington.edu/pub/jhamman/sample_for_xray.nc

@shoyer
Copy link
Member

shoyer commented May 8, 2014

Ouch! Thanks for filing the report and providing the sample file -- I will take a look.

For now, turn off automatic date decoding by calling xray.open_dataset('sample_for_xray.nc', decode_cf=False).

I'm guessing that part of the trouble might be that numpy and pandas provide poor support for alternative calendars (and honestly, I haven't tested them very much). I attempted to fall back on making arrays of python datetime objects, but in this case it looks like that didn't work -- somehow things got converted in a numpy native datetime64 array anyways.

@jhamman
Copy link
Member Author

jhamman commented May 8, 2014

Thanks, the decode_cf keyword should get me around the problem for now.

I've made a habit of always directly converting my netCDF4.datetime to true datetime.datetime objects immediately, since netCDF4 only returns real datetime objects for the Gregorian calendars.

f = netCDF4.Dataset('sample_for_xray.nc') 
decoded_times = netCDF4.num2date(f.variables['time'][:], 
                                 f.variables['time'].units, 
                                 f.variables['time'].calendar)
for i, t in enumerate(decoded_times):
    decoded_times[i] = datetime.datetime(*t.timetuple()[:6])

The important piece to remember if this is done is that you have to be very picky about how you calculate timedeltas between these dates since they think they are on the Gregorian calendar. I usually just keep an ordinal based time array around for that reason.

shoyer added a commit that referenced this issue May 9, 2014
These calendars now result in arrays with object dtype.

Should fix #118.
@shoyer shoyer reopened this May 9, 2014
@shoyer
Copy link
Member

shoyer commented May 9, 2014

OK, I just merged a fix into master.

Unfortunately, it's not terribly useful to be able to have arrays decoded as netCDF4.datetime objects, because it's not possible to use them with label based indexing, as they are not hashable (see Unidata/netcdf4-python#255).

Just out of curiosity, why do you usually convert netCDF4.datetime objects into real datetime objects? I'm guessing it's not because you want objects you can put in a dictionary?

If there is a better type than netCDF4.datetime to use for decoded dates, I'm definitely willing to consider it. As I said before, we don't make much use of non-standard dates.

It's also certainly possible (in principle) to keep around another array with the original, encoded dates. Right now all the decoding according to CF conventions is done in one large function with no options, but I would love for it to be more flexible and modular.

@shoyer shoyer added the bug label May 9, 2014
@jhamman
Copy link
Member Author

jhamman commented May 11, 2014

@shoyer - my experience is that the dummy netCDF4.datetime objects don't play nice with setting up a pandas time index, so a intermediate conversion step is necessary. I haven't looked into why this is exactly.

I just tried the new decoding and it seems to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants