Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
s = pd.Series(
[
pd.Timestamp("2019-09-05 17:16:30.155681"), # us resolution
pd.Timestamp("2019-09-05 17:16:33.155681"),
]
)
for interpolation in ["nearest", "lower", "higher", "linear"]:
print(f"\n{interpolation = }")
pd_result = s.quantile(q=0, interpolation=interpolation)
np_result = np.percentile(s, q=0, interpolation=interpolation)
print(f"{str(s.min()) = }")
print(f"pandas result: {pd_result}")
print(f"numpy result: {np_result}")
assert pd_result == np_result == s.min()
Issue Description
When calculating the q=0
quantile on datetime data, Series.quantile
will sometimes not return the min()
value when interpolation="linear"
. This differs from NumPy's behavior for q=0
, which returns always returns the minimum value found in the Series / array. This also differs from what I would naively expect from reading the interpolation=
parameter description in the quantile
docstring
This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j
as q=0
is a special quantile that doesn't lie between two data points.
I should also note that when using other interpolation methods (e.g. "nearest"
, "lower"
, "higher"
), Series.quantile
does return the min()
value and that the discrepancy observed when interpolation="linear"
doesn't always happen on datetime data. For example, the snippet above is using us
-resolution timestamps, however if we instead us ns
-resolution timestamps
s = pd.Series(
[
pd.Timestamp("1970-01-19 03:28:23.790155681"), # ns resolution
pd.Timestamp("1970-01-19 03:28:23.793155681"),
]
)
Then interpolation="linear"
does result the min()
value and is consistent with NumPy.
Expected Behavior
For q=0
, I expect Series.quantile
to always return the min()
value, regardless of the interpolation
method being used or the resolution of the datetime data in the series.
Installed Versions
------------------
commit : 87cfe4e38bafe7300a6003a1d18bd80f3f77c763
python : 3.10.4.final.0
python-bits : 64
OS : Darwin
OS-release : 21.6.0
Version : Darwin Kernel Version 21.6.0: Mon Aug 22 20:17:10 PDT 2022; root:xnu-8020.140.49~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.5.0
numpy : 1.21.6
pytz : 2022.1
dateutil : 2.8.2
setuptools : 59.8.0
pip : 22.0.4
Cython : None
pytest : 7.1.3
hypothesis : None
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.2.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli :
fastparquet : 0.8.2
fsspec : 2022.8.2
gcsfs : None
matplotlib : 3.5.1
numba : 0.55.1
numexpr : 2.8.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 10.0.0.dev271
pyreadstat : None
pyxlsb : None
s3fs : 0.6.0
scipy : 1.8.1
snappy :
sqlalchemy : 1.4.35
tables : 3.7.0
tabulate : None
xarray : 2022.3.0
xlrd : None
xlwt : None
zstandard : None
tzdata : None