Description
Research
-
I have searched the [pandas] tag on StackOverflow for similar questions.
-
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/questions/76969408/how-to-combine-groupby-rolling-and-apply-in-pandas
Question about pandas
Hello there,
Apologies if this turns out to be very simple, but I confess I am not quite sure what are the current best practices (with the latest Pandas
version) to do a groupby.rolling.apply()
operation
Typical use case would be to use pandas qcut()
(or any function which does not have a native rolling().
version implemented, contrary to rolling.max()
for instance) in a rolling fashion for each group in the dataframe.
Below is a concrete example where I use .apply()
with a simple plus one operation (which does not need the rolling part but it reduces the complexity of the example). Note that the groups are already correctly ordered by time (if you were to print each iterable in dd.groupby('group')
the observations would be ordered by time
)
dd = pd.DataFrame({'mynum' : [1,2,3,4,4,5,3],
'time' : [1,1,1,2,3,2,2],
'group': ['a', 'c','b', 'a','a' ,'b','c']})
Out[82]:
mynum time group
0 1 1 a
1 2 1 c
2 3 1 b
3 4 2 a
4 4 3 a
5 5 2 b
6 3 2 c
#iloc[-1] is necessary as we need to return just one number per row
dd.groupby('group', as_index = False).rolling(2).mynum.apply(lambda x: (x+1).iloc[-1])
Out[89]:
group
a 0 NaN
3 5.0
4 5.0
b 2 NaN
5 6.0
c 1 NaN
6 4.0
Name: mynum, dtype: float64
but creating a new variable generates issues
dd['var'] = dd.groupby('group', as_index = False).rolling(2).mynum.apply(lambda x: (x+1).iloc[-1])
TypeError: incompatible index of inserted column with frame index
while trying to be smarter and extracting the values (as suggested in the SO question) creates a silent wrong realignment.
dd['wrongvar'] = dd.groupby('group', as_index = False).rolling(2).mynum.apply(lambda x: (x+1).iloc[-1]).values
dd.sort_values(by = 'group')
Out[102]:
mynum time group wrongvar
0 1 1 a NaN
3 4 2 a NaN
4 4 3 a 6.0
2 3 1 b 5.0
5 5 2 b NaN
1 2 1 c 5.0
6 3 2 c 4.0
What am I supposed to do here?
Thank you so much for your help!