You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apologies if this turns out to be very simple, but I confess I am not quite sure what are the current best practices (with the latest Pandas version) to do a groupby.rolling.apply() operation
Typical use case would be to use pandas qcut() (or any function which does not have a native rolling(). version implemented, contrary to rolling.max() for instance) in a rolling fashion for each group in the dataframe.
Below is a concrete example where I use .apply() with a simple plus one operation (which does not need the rolling part but it reduces the complexity of the example). Note that the groups are already correctly ordered by time (if you were to print each iterable in dd.groupby('group') the observations would be ordered by time)
dd = pd.DataFrame({'mynum' : [1,2,3,4,4,5,3],
'time' : [1,1,1,2,3,2,2],
'group': ['a', 'c','b', 'a','a' ,'b','c']})
Out[82]:
mynum time group
0 1 1 a
1 2 1 c
2 3 1 b
3 4 2 a
4 4 3 a
5 5 2 b
6 3 2 c
#iloc[-1] is necessary as we need to return just one number per row
dd.groupby('group', as_index = False).rolling(2).mynum.apply(lambda x: (x+1).iloc[-1])
Out[89]:
group
a 0 NaN
3 5.0
4 5.0
b 2 NaN
5 6.0
c 1 NaN
6 4.0
Name: mynum, dtype: float64
but creating a new variable generates issues
dd['var'] = dd.groupby('group', as_index = False).rolling(2).mynum.apply(lambda x: (x+1).iloc[-1])
TypeError: incompatible index of inserted column with frame index
while trying to be smarter and extracting the values (as suggested in the SO question) creates a silent wrong realignment.
dd['wrongvar'] = dd.groupby('group', as_index = False).rolling(2).mynum.apply(lambda x: (x+1).iloc[-1]).values
dd.sort_values(by = 'group')
Out[102]:
mynum time group wrongvar
0 1 1 a NaN
3 4 2 a NaN
4 4 3 a 6.0
2 3 1 b 5.0
5 5 2 b NaN
1 2 1 c 5.0
6 3 2 c 4.0
What am I supposed to do here?
Thank you so much for your help!
The text was updated successfully, but these errors were encountered:
Though the OP is using .apply, they could be using .agg. This is another case where it seems best if we adhere to the semantics of a transform in groupby and not add the groups to the result's index.
very interesting, thanks @rhshadrach! I see now: actually reset_index() is the way to go as it drops the outer index of the groupby dataframe (the actual grouping variable) while keeping the original index values (here, ordered from 0 to 6). So some correct realignment still occurs with reset_index() while nothing happens with .values (and the ordering ends up being incorrect).
Research
I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/questions/76969408/how-to-combine-groupby-rolling-and-apply-in-pandas
Question about pandas
Hello there,
Apologies if this turns out to be very simple, but I confess I am not quite sure what are the current best practices (with the latest
Pandas
version) to do agroupby.rolling.apply()
operationTypical use case would be to use pandas
qcut()
(or any function which does not have a nativerolling().
version implemented, contrary torolling.max()
for instance) in a rolling fashion for each group in the dataframe.Below is a concrete example where I use
.apply()
with a simple plus one operation (which does not need the rolling part but it reduces the complexity of the example). Note that the groups are already correctly ordered by time (if you were to print each iterable indd.groupby('group')
the observations would be ordered bytime
)but creating a new variable generates issues
while trying to be smarter and extracting the values (as suggested in the SO question) creates a silent wrong realignment.
What am I supposed to do here?
Thank you so much for your help!
The text was updated successfully, but these errors were encountered: