BUG: groupby.tshift inconsistent behavior with other groupby transformations #34452
Description
I discovered this while trying to tackle issue #32344, where @ryankarlos mentioned groupby.transform('tshift', ...)
seems to behave incorrectly.
However, before we can address #32344, we probably need to address this.
# on current master
>>> import pandas as pd
>>> import numpy as np
>>> pd.__version__
'1.1.0.dev0+1708.g043b60920'
>>> df = pd.DataFrame(
... {
... "A": ["foo", "foo", "foo", "foo", "bar", "bar", "baz"],
... "B": [1, 2, np.nan, 3, 3, np.nan, 4],
... },
... index=pd.date_range('2020-01-01', '2020-01-07')
... )
>>> df
A B
2020-01-01 foo 1.0
2020-01-02 foo 2.0
2020-01-03 foo NaN
2020-01-04 foo 3.0
2020-01-05 bar 3.0
2020-01-06 bar NaN
2020-01-07 baz 4.0
>>> df.groupby("A").tshift(1, "D")
B
A
bar 2020-01-06 3.0
2020-01-07 NaN
baz 2020-01-08 4.0
foo 2020-01-02 1.0
2020-01-03 2.0
2020-01-04 NaN
2020-01-05 3.0
>>> df.groupby("A").ffill()
B
2020-01-01 1.0
2020-01-02 2.0
2020-01-03 2.0
2020-01-04 3.0
2020-01-05 3.0
2020-01-06 3.0
2020-01-07 4.0
>>> df.groupby("A").cumsum()
B
2020-01-01 1.0
2020-01-02 3.0
2020-01-03 NaN
2020-01-04 6.0
2020-01-05 3.0
2020-01-06 NaN
2020-01-07 4.0
We can see that groupby.tshift
is inconsistent with other groupby transformations. It retains the groupby column, and more importantly reordered the data.
Since 0.25 we have had deliberate effort to make all groupby transformations consistent, see https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html#dataframe-groupby-ffill-bfill-no-longer-return-group-labels
Following this thinking I would expect the returned data to behave more like
>>> df.groupby("A").tshift(1, "D") # this is actually the result of df.tshift(1, "D").drop(columns='A')
B
2020-01-02 1.0
2020-01-03 2.0
2020-01-04 NaN
2020-01-05 3.0
2020-01-06 3.0
2020-01-07 NaN
2020-01-08 4.0
However, if we are to make groupby.tshift
consistent with other groupby transformation like the above, this makes it no different from df.tshift(1, "D").drop(columns='A')', and
groupby` has lost its meaning here.
Perhaps we should just deprecate groupby.tshift
entirely? I know #11631 discussed about deprecating tshift
, but that has been stalled for a long time.