Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: indexing DatetimeIndex with date object #35830

Open
2 of 3 tasks
Macfly opened this issue Aug 20, 2020 · 13 comments
Open
2 of 3 tasks

API: indexing DatetimeIndex with date object #35830

Macfly opened this issue Aug 20, 2020 · 13 comments
Labels
Bug datetime.date stdlib datetime.date support Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action Regression Functionality that used to work in a prior pandas version

Comments

@Macfly
Copy link

Macfly commented Aug 20, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

down load this file
bug.xlsx

or create an excel file with the follwoing:

col1 8/21/2020 12:00:00 AM
value 10.1

then run the following code

    import pandas as pd
    import datetime as dt

    df = pd.read_excel("bug.xlsx", index_col=0)
    toto = df[dt.date(2020, 8, 21)]

Problem description

I have this error on 1.1.0 and 1.1.1:

  File "C:/Users/snoof/PycharmProjects/palms_db_script/palms_db.py", line 58, in <module>
    toto = df[dt.date(2020, 8, 21)]
  File "C:\Users\snoof\PycharmProjects\palms_db_script\venv\lib\site-packages\pandas\core\frame.py", line 2899, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\Users\snoof\PycharmProjects\palms_db_script\venv\lib\site-packages\pandas\core\indexes\datetimes.py", line 622, in get_loc
    raise KeyError(key)
KeyError: datetime.date(2020, 8, 21)

Expected Output

no error, with pandas 1.0.5, it finds the element.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f2ca0a2
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.1.1
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : Non

@Macfly Macfly added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 20, 2020
@jbrockmendel
Copy link
Member

The issue is unrelated to Excel, just in how to index with a DatetimeIndex.

Minimal reproducer:

df = pd.DataFrame([[10.1]], columns=pd.DatetimeIndex(["8/21/2020 12:00:00 AM"]))
key = df.columns[0].date()

>>> df[date]
KeyError: datetime.date(2020, 8, 21)

In general, you should use datetime or Timestamp objects when indexing on a DatetimeIndex, as in general date objects are not supported (we are not as consistent about this as we'd like to be)

@jbrockmendel jbrockmendel added Indexing Related to indexing on series/frames, not to indexes themselves Datetime Datetime data dtype and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 2, 2020
@jbrockmendel jbrockmendel changed the title BUG: read_excel with index_col as as datetime API: indexing DatetimeIndex with date object Sep 2, 2020
@simonjayhawkins
Copy link
Member

In general, you should use datetime or Timestamp objects when indexing on a DatetimeIndex, as in general date objects are not supported (we are not as consistent about this as we'd like to be)

since this was working in 1.0.5 will mark as regression

>>> pd.__version__
'1.0.5'
>>> df = pd.DataFrame([[10.1]], columns=pd.DatetimeIndex(["8/21/2020 12:00:00 AM"]))
>>> key = df.columns[0].date()
>>>
>>> df[key]
0    10.1
Name: 2020-08-21 00:00:00, dtype: float64
>>>

@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Sep 3, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.2 milestone Sep 3, 2020
@simonjayhawkins simonjayhawkins modified the milestones: 1.1.2, 1.1.3 Sep 7, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Sep 7, 2020
@simonjayhawkins
Copy link
Member

first bad commit: [9b0ef5d] refactor DTI.get_loc (#31023)

https://github.com/simonjayhawkins/pandas/runs/1082023137?check_suite_focus=true

see also #35466

@simonjayhawkins
Copy link
Member

moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline

@seth-p
Copy link
Contributor

seth-p commented Oct 7, 2020

Any chance of this getting fixed?

@TomAugspurger
Copy link
Contributor

#35478 is perhaps addressing it, though it's not clear that the current behavior is incorrect. IMO, we __getitem__ should follow the semantics of dict.__getitem__, where date objects aren't treated as datetime objects at midnight.

This is complicated a bit by partial string indexing, but I still think there's reason to be stricter here, given the ambiguity of:

In [6]: s = pd.Series([1, 2], index=pd.DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 00:00:01']))

In [7]: s[datetime.date(2000, 1, 1)]

Does that behave like a datetime at midnight, selecting just the first row? Or is it similar to partial string indexing, selecting all rows on that day regardless of the time?

@seth-p
Copy link
Contributor

seth-p commented Oct 7, 2020

My thoughts:

  1. s[datetime.date(YYYY, MM, DD)] should behave the same as s['YYYY-MM-DD']. As for what that behavior should be when the DatetimeIndex contains non-midnight timestamps, I don't know.
  2. In my case (which I suspect is quite common) the DatetimeIndex consists solely of 'YYYY-MM-DD 00:00:00' timestamps, so the ambiguity doesn't arise.

By the way, note that s[datetime.date(YYYY, MM, DD):datetime.date(YYYY, MM, DD)] does work. So clearly there is already some (implicit?) logic for interpreting datetime.date objects in a DatetimeIndex. Interestingly, there appears to be an inconsistency betwee [11] and [12]:

In [9]: s = pd.Series([1, 2], index=pd.DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 00:00:01']))

In [10]: s['2000-01-01']
Out[10]: 
2000-01-01 00:00:00    1
2000-01-01 00:00:01    2
dtype: int64

In [11]: s['2000-01-01':'2000-01-01']
Out[11]: 
2000-01-01 00:00:00    1
2000-01-01 00:00:01    2
dtype: int64

In [12]: s[dt.date(2000, 1, 1):dt.date(2000, 1, 1)]
Out[12]: 
2000-01-01    1
dtype: int64

@jbrockmendel
Copy link
Member

By the way, note that s[datetime.date(YYYY, MM, DD):datetime.date(YYYY, MM, DD)] does work.

This is incorrect as is, should be deprecated.

@simonjayhawkins simonjayhawkins modified the milestones: 1.1.4, 1.1.5 Oct 29, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline

@jreback jreback modified the milestones: 1.1.5, Contributions Welcome Nov 25, 2020
@davidia
Copy link

davidia commented Jan 18, 2021

Hello, Indexing DateTime index by date as above has worked forever and we have hundreds of loc that depend on this behaviour. Would you be open to a PR that promotes date to datetime in get_loc?

        """
        Get integer location for requested label

        Returns
        -------
        loc : int
        """
        if not is_scalar(key):
            raise InvalidIndexError(key)

        orig_key = key
        if is_valid_nat_for_dtype(key, self.dtype):
            key = NaT

        if isinstance(key, date):
            # GH35830
            key = datetime.combine(key, time.min)

        if isinstance(key, self._data._recognized_scalars):
            # needed to localize naive datetimes
            self._deprecate_mismatched_indexing(key)
            key = self._maybe_cast_for_get_loc(key)

@davidia
Copy link

davidia commented Jan 24, 2021

@TomAugspurger that's a fair point but the problem here is that this behaviour has been around for ever and is likely relied upon by many people using daily data, a very common use-case. I don't unintentionally breaking this without warning is in the spirit of the contribution guidelines.

@TomAugspurger
Copy link
Contributor

IIRC we were inconsistent earlier. @jbrockmendel would know best.

@jbrockmendel jbrockmendel added the datetime.date stdlib datetime.date support label Jun 5, 2021
@mroeschke mroeschke added Bug Needs Discussion Requires discussion from core team before further action labels Aug 10, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug datetime.date stdlib datetime.date support Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants