Skip to content

Resampling consistency issue/question #33171

@MMCMA

Description

@MMCMA

Code Sample, a copy-pastable example if possible

import pandas as pd

periods = 7
start = '20100131'
dates = pd.date_range(start=start, freq='M', periods=periods)
df = pd.DataFrame(range(periods), index=dates, columns=['a'])

print('Original: freq="M"')
print(df)
print('\n-> "b" -> "BM"')
print(df.resample('b').last().resample('BM').last())
print('\n-> "BM"')
print(df.resample('BM').last())

print('\n"M"-date_range:')
print(pd.date_range(start=start, freq='M', periods=periods))
print('\n"b"-date_range:')
print(pd.date_range(start=start, freq='b', end=dates[-1]))
print('\n"BM"-date_range:')
print(pd.date_range(start=start, freq='BM', end=dates[-1]))

Original: freq="M"
a
2010-01-31 0
2010-02-28 1
2010-03-31 2
2010-04-30 3
2010-05-31 4
2010-06-30 5
2010-07-31 6

-> "b" -> "BM"
a
2010-01-29 0.0
2010-02-26 1.0
2010-03-31 2.0
2010-04-30 3.0
2010-05-31 4.0
2010-06-30 5.0
2010-07-30 6.0

-> "BM"
a
2010-02-26 0.0
2010-03-31 2.0
2010-04-30 3.0
2010-05-31 4.0
2010-06-30 5.0
2010-07-30 NaN
2010-08-31 6.0

"M"-date_range:
DatetimeIndex(['2010-01-31', '2010-02-28', '2010-03-31', '2010-04-30',
'2010-05-31', '2010-06-30', '2010-07-31'],
dtype='datetime64[ns]', freq='M')

"b"-date_range:
DatetimeIndex(['2010-02-01', '2010-02-02', '2010-02-03', '2010-02-04',
'2010-02-05', '2010-02-08', '2010-02-09', '2010-02-10',
'2010-02-11', '2010-02-12',
...
'2010-07-19', '2010-07-20', '2010-07-21', '2010-07-22',
'2010-07-23', '2010-07-26', '2010-07-27', '2010-07-28',
'2010-07-29', '2010-07-30'],
dtype='datetime64[ns]', length=130, freq='B')

"BM"-date_range:
DatetimeIndex(['2010-02-26', '2010-03-31', '2010-04-30', '2010-05-31',
'2010-06-30', '2010-07-30'],
dtype='datetime64[ns]', freq='BM')

Problem description

I am not sure if this behaviour is desired.

I would assume that the conversion from M->b would not include a date pre-dateing the min of M-dates. But if it does, I would assume that the conversion from M->BM would also include '2010-01-29'. The same argument would apply for the '2010-7-30'. It feels a bit inconsistent, but it could also be that I am off and I miss to understand the full logic.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200330
Cython : 0.29.15
pytest : 5.4.1
hypothesis : 5.5.4
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : 0.9.3
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.8
numba : 0.48.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds DiscussionRequires discussion from core team before further actionResampleresample method

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions