Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERR: HDF5 serialization of datelike-object dtypes should raise #8887

Open
cowpig opened this issue Nov 24, 2014 · 4 comments
Open

ERR: HDF5 serialization of datelike-object dtypes should raise #8887

cowpig opened this issue Nov 24, 2014 · 4 comments
Labels
Datetime Datetime data dtype Enhancement Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore

Comments

@cowpig
Copy link

cowpig commented Nov 24, 2014

UPDATE:

In [157]: problem_date = old.dob.loc[4231354]

In [158]: problem_date
Out[158]: datetime.date(2939, 6, 2)

In [159]: test_series = pd.Series([problem_date])

In [160]: pd.to_datetime(test_series)
Out[160]: 
0    2939-06-02
dtype: object

It seems this is the source of the problem. I think there may be other dates in my dataset that are breaking the to_datetime method

UPDATE 2:

It seems that maybe it's that the date is later than 2900 that's causing the problem?

In [194]: pd.to_datetime(old.dob[8230866])                
Out[194]: datetime.date(2955, 8, 22)

In [195]: another_bad_date = old.dob.loc[8230866]         

In [196]: pd.to_datetime(pd.Series([another_bad_date]))
Out[196]: 
0    2955-08-22
dtype: object

original issue:

The column in question came from a read_sql query, and the column has datetimes. It consists solely of pandas datetime objects and NoneType objects. I have iterated over the Series to be sure. The column has 11 million rows.

I've tried casting with to_datetime (and the dtype remains object--shouldn't the dtype change after that call?), to no avail.

Here's some stuff I get from poking around after sticking an import pdb; pdb.set_trace() into line 3329 of pytables.py (after except (NotImplementedError, ValueError, TypeError) as e:):

(Pdb) b

(Pdb) i

3

(Pdb) blocks[3]

ObjectBlock: [1, 2, 3, 4, 9, 12, 13, 14], 8 x 8255524, dtype: object

(Pdb) blk_items[3]

Index([u'dob', u'City', u'Region', u'Zip', u'lang', u'UnsubscribedDate', u'BadAddressDate', u'ISP'], dtype='object')

(Pdb) existing_col

(Pdb) col

name->values_block_3,cname->values_block_3,dtype->None,shape->None

(Pdb) b

(Pdb) type(b)

<class 'pandas.core.internals.ObjectBlock'>

(Pdb) block_items

*** NameError: name 'block_items' is not defined

(Pdb) b_items

Index([u'dob', u'City', u'Region', u'Zip', u'lang', u'UnsubscribedDate', u'BadAddressDate', u'ISP'], dtype='object')

(Pdb) existing_col

(Pdb) e

TypeError('Cannot serialize the column [dob] because\nits data contents are [mixed] object dtype',)

(Pdb) type(col)

<class 'pandas.io.pytables.DataCol'>

(Pdb) lib

<module 'pandas.lib' from '/home/mmccrea/anaconda/lib/python2.7/site-packages/pandas/lib.so'>

My debugging kinds of hits a wall here, because it seems infer_dtype seems to be throwing the error, which is in lib.so, which is a compiled binary and I'm not sure how to look into that to figure out what's going on. I would love a suggestion about how to deal with that in the future, in addition to some answers about what's going on in this case.

@cowpig cowpig changed the title HDF5Store: TypeError: Cannot serialize the column [dob] because TypeError: Cannot serialize the column [bid] because its data contents are [mixed] object dtype HDF5Store: TypeError: Cannot serialize the column [dob] because its data contents are [mixed] object dtype Nov 24, 2014
@cowpig cowpig changed the title HDF5Store: TypeError: Cannot serialize the column [dob] because its data contents are [mixed] object dtype to_datetime fails on a specific date Nov 24, 2014
@jreback
Copy link
Contributor

jreback commented Nov 24, 2014

see here: http://pandas.pydata.org/pandas-docs/stable/gotchas.html#minimum-and-maximum-timestamps

these are out of range of the high performance datetime impl, so these revert to object dtypes.

An alternative is to use Periods. (though their is an open issue with storing these in HDF5. Its not difficult, just needs a bit of work, see here

So this should raise ATM in HDF5. These cannot be serialized in table format at all (Object block is restricted to actual strings). I think fixed format might work.

That said if you would like to work on the period repr would be great5.

@jreback jreback changed the title to_datetime fails on a specific date ERR: HDF5 serialization of datelike-object dtypes should raise Nov 24, 2014
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions IO HDF5 read_hdf, HDFStore Error Reporting Incorrect or improved errors from pandas labels Nov 24, 2014
@jreback jreback added this to the 0.16.0 milestone Nov 24, 2014
@rockg
Copy link
Contributor

rockg commented Nov 25, 2014

Just curious, what are you doing that you need dates out to 2900?

@cowpig
Copy link
Author

cowpig commented Nov 25, 2014

OK, so I'm thinking that the first problem is that to_datetime ignores errors by default, and I'll put in a pull request to fix that.

I might look closer at the Periods thing later this week.

@cowpig
Copy link
Author

cowpig commented Nov 25, 2014

oh, and @rockg it a database of time travelers
(jk they're database errors)

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@mroeschke mroeschke added Enhancement Datetime Datetime data dtype and removed Dtype Conversions Unexpected or buggy dtype conversions labels May 16, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Enhancement Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

No branches or pull requests

4 participants