Skip to content

Add "semester" as a time/date component to DatetimeIndex #22362

Closed
@Nemecsek

Description

@Nemecsek

Groupby is missing "semester"

from datetime import *
import pandas as pd
import numpy as np

df = pd.DataFrame()

days = pd.date_range(start="2017-05-17", 
                     end="2017-11-29",
                    freq="1D")
df = pd.DataFrame({'DTIME': days, 'DATA': np.random.randint(50, high=80, size=len(days))})
df.set_index('DTIME', inplace=True)

grouped = df.groupby(pd.Grouper(freq='2QS'))  # group by 2 Quarters,  start
print("Groups date start:")
for dtime, group in grouped:
    print dtime
    # print(group)

returns groups based on the first date time index of the dataset, not on the year semesters that begin on January, 1st and July, 1st:

Groups date start:
2017-04-01 00:00:00    <=== this is because the first datetime index is in  May, 2017
2017-10-01 00:00:00

while I would expect:

Groups date start:
2017-01-01 00:00:00   
2017-07-01 00:00:00

This issue is difficult to spot, as the behaviour changes according to the dataset, while it should be consistent. I didn't spot it with my first dataset (starting on January).

The same problem will show when grouping by 6MS (six months, start)

Semester frequency is missing from Pandas'offset-aliases

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions