Closed
Description
Groupby is missing "semester"
from datetime import *
import pandas as pd
import numpy as np
df = pd.DataFrame()
days = pd.date_range(start="2017-05-17",
end="2017-11-29",
freq="1D")
df = pd.DataFrame({'DTIME': days, 'DATA': np.random.randint(50, high=80, size=len(days))})
df.set_index('DTIME', inplace=True)
grouped = df.groupby(pd.Grouper(freq='2QS')) # group by 2 Quarters, start
print("Groups date start:")
for dtime, group in grouped:
print dtime
# print(group)
returns groups based on the first date time index of the dataset, not on the year semesters that begin on January, 1st and July, 1st:
Groups date start:
2017-04-01 00:00:00 <=== this is because the first datetime index is in May, 2017
2017-10-01 00:00:00
while I would expect:
Groups date start:
2017-01-01 00:00:00
2017-07-01 00:00:00
This issue is difficult to spot, as the behaviour changes according to the dataset, while it should be consistent. I didn't spot it with my first dataset (starting on January).
The same problem will show when grouping by 6MS (six months, start)
Semester frequency is missing from Pandas'offset-aliases