Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.to_records() converts DatetimeIndex to datetime object, not np.datetime64 #18160

Closed
jzwinck opened this issue Nov 8, 2017 · 11 comments · Fixed by #18902
Closed

DataFrame.to_records() converts DatetimeIndex to datetime object, not np.datetime64 #18160

jzwinck opened this issue Nov 8, 2017 · 11 comments · Fixed by #18902
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@jzwinck
Copy link
Contributor

jzwinck commented Nov 8, 2017

This code:

df = pd.DataFrame([pd.to_datetime('2017-11-07')], [pd.to_datetime('2017-11-08')])
df.to_records()

Produces a totally different column type for the index vs the data:

rec.array([(datetime.datetime(2017, 11, 8, 0, 0), '2017-11-07T00:00:00.000000000')], 
          dtype=[('index', 'O'), ('0', '<M8[ns]')])

It should produce:

rec.array([('2017-11-08T00:00:00.000000000', '2017-11-07T00:00:00.000000000')], 
          dtype=[('index', '<M8[ns]'), ('0', '<M8[ns]')])

I'm using Pandas 0.20.3, NumPy 1.13.1, and Python 3.5.3.

@jreback
Copy link
Contributor

jreback commented Nov 8, 2017

seems buggy
would appreciate a PR

@jreback jreback added Bug Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 8, 2017
@jreback jreback added this to the Next Major Release milestone Nov 8, 2017
@jzwinck
Copy link
Contributor Author

jzwinck commented Nov 10, 2017

@jreback I just discovered this was broken by #1908 (commit 38333f4) which added the to_records(convert_datetime64) option with the horrible default of True!

It looks like the option was added in order to work around some now-ancient NumPy bug. Setting convert_datetime64=False now works fine (for my test case and for the one in #1908).

What do you think about changing the default from True to False? The current default loses precision and is inefficient.

@jreback
Copy link
Contributor

jreback commented Nov 10, 2017

i would be in favor of deprecating that option entirely

@jzwinck
Copy link
Contributor Author

jzwinck commented Nov 10, 2017

Does that require a DeprecationWarning first, then later removal, or should we just remove the option at once?

@jreback
Copy link
Contributor

jreback commented Nov 10, 2017

FutureWarning

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Dec 23, 2017
@H0R5E
Copy link
Contributor

H0R5E commented Aug 7, 2018

I actually get a bad conversion with to_records() now unless I set convert_datetime64=True. The docs do say "By default, timestamps are converted to datetime.datetime".

>conda info

     active environment : _dtocean_dev
    active env location : C:\Anaconda\envs\_dtocean_dev
            shell level : 1
       user config file : C:\Users\Work\.condarc
 populated config files : C:\Users\Work\.condarc
          conda version : 4.5.9
    conda-build version : 3.0.30
         python version : 2.7.14.final.0
       base environment : C:\Anaconda  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/win-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/pro/win-64
                          https://repo.anaconda.com/pkgs/pro/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
                          https://conda.anaconda.org/conda-forge/win-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/dataonlygreater/win-64
                          https://conda.anaconda.org/dataonlygreater/noarch
                          https://conda.anaconda.org/t/<TOKEN>/topper/win-64
                          https://conda.anaconda.org/t/<TOKEN>/topper/noarch
          package cache : C:\Anaconda\pkgs
                          C:\Users\Work\AppData\Local\conda\conda\pkgs
       envs directories : C:\Anaconda\envs
                          C:\Users\Work\AppData\Local\conda\conda\envs
                          C:\Users\Work\.conda\envs
               platform : win-64
             user-agent : conda/4.5.9 requests/2.18.4 CPython/2.7.14 Windows/7 W
indows/6.1.7601
          administrator : False
             netrc file : None
           offline mode : False
> conda list
...
numpy                     1.11.3           py27he0c0ee4_9
...
pandas                    0.23.3           py27h39f3610_0
...

@reidy-p
Copy link
Contributor

reidy-p commented Aug 17, 2018

@H0R5E is this still an issue for you? If so, can you post a reproducible example and I will try to see if I can find the problem.

@H0R5E
Copy link
Contributor

H0R5E commented Aug 18, 2018

@reidy-p, sure here you go:

import numpy as np
import pandas as pd

rng = pd.date_range('1/1/2011', periods=2, freq='H')
data = {"a": np.random.randn(len(rng)),
        "b": np.random.randn(len(rng))}
df = pd.DataFrame(data, index=rng)

type(df.to_records()[0][0]) # returns numpy.datetime64, expecting datetime.datetime

@H0R5E
Copy link
Contributor

H0R5E commented Aug 18, 2018

My point is that you've made a significant change of behaviour without correcting your docs. The examples show the expected output to be datetime.datetime. There is also no test for which type it should be.

        By default, timestamps are converted to `datetime.datetime`:
        >>> df.index = pd.date_range('2018-01-01 09:00', periods=2, freq='min')
        >>> df
                             A     B
        2018-01-01 09:00:00  1  0.50
        2018-01-01 09:01:00  2  0.75
        >>> df.to_records()
        rec.array([(datetime.datetime(2018, 1, 1, 9, 0), 1, 0.5 ),
                   (datetime.datetime(2018, 1, 1, 9, 1), 2, 0.75)],
                  dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])

I'm surprised I'm the only one to have been caught out by this.

@TomAugspurger
Copy link
Contributor

@H0R5E could you submit a PR fixing the docs? Thanks.

@TomAugspurger
Copy link
Contributor

You can remove to_records from this list too:

-k"-assign -axes -combine -isin -itertuples -join -nlargest -nsmallest -nunique -pivot_table -quantile -query -reindex -reindex_axis -replace -round -set_index -stack -to_dict -to_records -to_stata -transform"
so that it's tested.

H0R5E added a commit to H0R5E/dtocean-core that referenced this issue Mar 10, 2019
Note this was caused by a change which conflicts with the documentation. See:

pandas-dev/pandas#18160
H0R5E added a commit to DTOcean/dtocean-core that referenced this issue Mar 12, 2019
Note this was caused by a change which conflicts with the documentation. See:

pandas-dev/pandas#18160
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants