-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
Code Sample, a copy-pastable example if possible
import pandas as pd
periods = 7
start = '20100131'
dates = pd.date_range(start=start, freq='M', periods=periods)
df = pd.DataFrame(range(periods), index=dates, columns=['a'])
print('Original: freq="M"')
print(df)
print('\n-> "b" -> "BM"')
print(df.resample('b').last().resample('BM').last())
print('\n-> "BM"')
print(df.resample('BM').last())
print('\n"M"-date_range:')
print(pd.date_range(start=start, freq='M', periods=periods))
print('\n"b"-date_range:')
print(pd.date_range(start=start, freq='b', end=dates[-1]))
print('\n"BM"-date_range:')
print(pd.date_range(start=start, freq='BM', end=dates[-1]))
Original: freq="M"
a
2010-01-31 0
2010-02-28 1
2010-03-31 2
2010-04-30 3
2010-05-31 4
2010-06-30 5
2010-07-31 6
-> "b" -> "BM"
a
2010-01-29 0.0
2010-02-26 1.0
2010-03-31 2.0
2010-04-30 3.0
2010-05-31 4.0
2010-06-30 5.0
2010-07-30 6.0
-> "BM"
a
2010-02-26 0.0
2010-03-31 2.0
2010-04-30 3.0
2010-05-31 4.0
2010-06-30 5.0
2010-07-30 NaN
2010-08-31 6.0
"M"-date_range:
DatetimeIndex(['2010-01-31', '2010-02-28', '2010-03-31', '2010-04-30',
'2010-05-31', '2010-06-30', '2010-07-31'],
dtype='datetime64[ns]', freq='M')
"b"-date_range:
DatetimeIndex(['2010-02-01', '2010-02-02', '2010-02-03', '2010-02-04',
'2010-02-05', '2010-02-08', '2010-02-09', '2010-02-10',
'2010-02-11', '2010-02-12',
...
'2010-07-19', '2010-07-20', '2010-07-21', '2010-07-22',
'2010-07-23', '2010-07-26', '2010-07-27', '2010-07-28',
'2010-07-29', '2010-07-30'],
dtype='datetime64[ns]', length=130, freq='B')
"BM"-date_range:
DatetimeIndex(['2010-02-26', '2010-03-31', '2010-04-30', '2010-05-31',
'2010-06-30', '2010-07-30'],
dtype='datetime64[ns]', freq='BM')
Problem description
I am not sure if this behaviour is desired.
I would assume that the conversion from M->b would not include a date pre-dateing the min of M-dates. But if it does, I would assume that the conversion from M->BM would also include '2010-01-29'. The same argument would apply for the '2010-7-30'. It feels a bit inconsistent, but it could also be that I am off and I miss to understand the full logic.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.0.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200330
Cython : 0.29.15
pytest : 5.4.1
hypothesis : 5.5.4
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : 0.9.3
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.8
numba : 0.48.0