Skip to content

BUG: quantile sometimes using interpolation at endpoint with datetime data #49110

Open
@jrbourbeau

Description

@jrbourbeau

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

s = pd.Series(
    [
        pd.Timestamp("2019-09-05 17:16:30.155681"), # us resolution
        pd.Timestamp("2019-09-05 17:16:33.155681"),
    ]
)

for interpolation in ["nearest", "lower", "higher", "linear"]:
    print(f"\n{interpolation = }")
    pd_result = s.quantile(q=0, interpolation=interpolation)
    np_result = np.percentile(s, q=0, interpolation=interpolation)
    print(f"{str(s.min()) = }")
    print(f"pandas result: {pd_result}")
    print(f"numpy result: {np_result}")
    assert pd_result == np_result == s.min()

Issue Description

When calculating the q=0 quantile on datetime data, Series.quantile will sometimes not return the min() value when interpolation="linear". This differs from NumPy's behavior for q=0, which returns always returns the minimum value found in the Series / array. This also differs from what I would naively expect from reading the interpolation= parameter description in the quantile docstring

This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j

as q=0 is a special quantile that doesn't lie between two data points.

I should also note that when using other interpolation methods (e.g. "nearest", "lower", "higher"), Series.quantile does return the min() value and that the discrepancy observed when interpolation="linear" doesn't always happen on datetime data. For example, the snippet above is using us-resolution timestamps, however if we instead us ns-resolution timestamps

s = pd.Series(
    [
        pd.Timestamp("1970-01-19 03:28:23.790155681"), # ns resolution
        pd.Timestamp("1970-01-19 03:28:23.793155681"),
    ]
)

Then interpolation="linear" does result the min() value and is consistent with NumPy.

Expected Behavior

For q=0, I expect Series.quantile to always return the min() value, regardless of the interpolation method being used or the resolution of the datetime data in the series.

Installed Versions

------------------
commit           : 87cfe4e38bafe7300a6003a1d18bd80f3f77c763
python           : 3.10.4.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 21.6.0
Version          : Darwin Kernel Version 21.6.0: Mon Aug 22 20:17:10 PDT 2022; root:xnu-8020.140.49~2/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.5.0
numpy            : 1.21.6
pytz             : 2022.1
dateutil         : 2.8.2
setuptools       : 59.8.0
pip              : 22.0.4
Cython           : None
pytest           : 7.1.3
hypothesis       : None
sphinx           : 4.5.0
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 3.1.2
IPython          : 8.2.0
pandas_datareader: None
bs4              : 4.11.1
bottleneck       : None
brotli           :
fastparquet      : 0.8.2
fsspec           : 2022.8.2
gcsfs            : None
matplotlib       : 3.5.1
numba            : 0.55.1
numexpr          : 2.8.0
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 10.0.0.dev271
pyreadstat       : None
pyxlsb           : None
s3fs             : 0.6.0
scipy            : 1.8.1
snappy           :
sqlalchemy       : 1.4.35
tables           : 3.7.0
tabulate         : None
xarray           : 2022.3.0
xlrd             : None
xlwt             : None
zstandard        : None
tzdata           : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions