Skip to content

QST: best practices to combine groupby rolling and apply #55681

Closed
@randomgambit

Description

@randomgambit

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/76969408/how-to-combine-groupby-rolling-and-apply-in-pandas

Question about pandas

Hello there,

Apologies if this turns out to be very simple, but I confess I am not quite sure what are the current best practices (with the latest Pandas version) to do a groupby.rolling.apply() operation

Typical use case would be to use pandas qcut() (or any function which does not have a native rolling(). version implemented, contrary to rolling.max() for instance) in a rolling fashion for each group in the dataframe.

Below is a concrete example where I use .apply() with a simple plus one operation (which does not need the rolling part but it reduces the complexity of the example). Note that the groups are already correctly ordered by time (if you were to print each iterable in dd.groupby('group') the observations would be ordered by time)

dd = pd.DataFrame({'mynum' : [1,2,3,4,4,5,3],
                   'time' : [1,1,1,2,3,2,2],
                   'group': ['a', 'c','b', 'a','a' ,'b','c']})

Out[82]: 
   mynum  time group
0      1     1     a
1      2     1     c
2      3     1     b
3      4     2     a
4      4     3     a
5      5     2     b
6      3     2     c

#iloc[-1] is necessary as we need to return just one number per row
dd.groupby('group', as_index = False).rolling(2).mynum.apply(lambda x: (x+1).iloc[-1])
Out[89]: 
group   
a      0    NaN
       3    5.0
       4    5.0
b      2    NaN
       5    6.0
c      1    NaN
       6    4.0
Name: mynum, dtype: float64

but creating a new variable generates issues

dd['var'] = dd.groupby('group', as_index = False).rolling(2).mynum.apply(lambda x: (x+1).iloc[-1])
TypeError: incompatible index of inserted column with frame index

while trying to be smarter and extracting the values (as suggested in the SO question) creates a silent wrong realignment.

dd['wrongvar'] = dd.groupby('group', as_index = False).rolling(2).mynum.apply(lambda x: (x+1).iloc[-1]).values

dd.sort_values(by = 'group')
Out[102]: 
   mynum  time group  wrongvar
0      1     1     a       NaN
3      4     2     a       NaN
4      4     3     a       6.0
2      3     1     b       5.0
5      5     2     b       NaN
1      2     1     c       5.0
6      3     2     c       4.0

What am I supposed to do here?
Thank you so much for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapUsage QuestionWindowrolling, ewma, expanding

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions