Skip to content

API: groupby.resample *maybe* can return a deferred operation #12486

Closed
@jreback

Description

@jreback

xref #12448 / #12449

and on SO

In [1]:         df = DataFrame({'date': pd.date_range(start='2016-01-01',
   ...:                                               periods=4,
   ...:                                               freq='W'),
   ...:                         'group': [1, 1, 2, 2],
   ...:                         'val': [5, 6, 7, 8]}).set_index('date')

In [2]: df
Out[2]: 
            group  val
date                  
2016-01-03      1    5
2016-01-10      1    6
2016-01-17      2    7
2016-01-24      2    8

This replicates 0.17.1 (something slightly off with it including the grouper column)

In [3]: df.groupby('group').apply(lambda x: x.resample('1D').ffill())[['val']]
Out[3]: 
                  val
group date           
1     2016-01-03    5
      2016-01-04    5
      2016-01-05    5
      2016-01-06    5
      2016-01-07    5
      2016-01-08    5
      2016-01-09    5
      2016-01-10    6
2     2016-01-17    7
      2016-01-18    7
      2016-01-19    7
      2016-01-20    7
      2016-01-21    7
      2016-01-22    7
      2016-01-23    7
      2016-01-24    8

# ideally this would work. Its possible but requires some intelligently filling according to each group level.
In [4]: df.groupby('group').resample('1D').ffill()
Out[4]: 
            group  val
date                  
2016-01-03      1    5
2016-01-10      1    6
2016-01-17      2    7

A pure asfreq operation

data = [['2010-01-01', 'A', 2], ['2010-01-02', 'A', 3], ['2010-01-05', 'A', 8], 
        ['2010-01-10', 'A', 7], ['2010-01-13', 'A', 3], ['2010-01-01', 'B', 5], 
        ['2010-01-03', 'B', 2], ['2010-01-04', 'B', 1], ['2010-01-11', 'B', 7], 
        ['2010-01-14', 'B', 3]]

df = pd.DataFrame(data, columns=['Date', 'ID', 'Score'])
df.Date = pd.to_datetime(df.Date)

In [27]: df.groupby('ID').apply(lambda x: x.set_index('Date').Score.resample('D').asfreq())
Out[27]: 
ID  Date      
A   2010-01-01    2.0
    2010-01-02    3.0
    2010-01-03    NaN
    2010-01-04    NaN
    2010-01-05    8.0
    2010-01-06    NaN
    2010-01-07    NaN
    2010-01-08    NaN
    2010-01-09    NaN
    2010-01-10    7.0
    2010-01-11    NaN
    2010-01-12    NaN
    2010-01-13    3.0
B   2010-01-01    5.0
    2010-01-02    NaN
    2010-01-03    2.0
    2010-01-04    1.0
    2010-01-05    NaN
    2010-01-06    NaN
    2010-01-07    NaN
    2010-01-08    NaN
    2010-01-09    NaN
    2010-01-10    NaN
    2010-01-11    7.0
    2010-01-12    NaN
    2010-01-13    NaN
    2010-01-14    3.0
Name: Score, dtype: float64

Would be nice for this to work

df.groupby(['ID',pd.Grouper(key='Date',freq='D')]).asfreq()

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions