You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BUG: dataframe returned by groupby.apply() is missing outer group indices if row indices of returned dataframe are identical to those of the input dataframe
#46041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
importpandasaspdimportnumpyasnpF=pd.DataFrame({ "a" : np.r_[0:10], "b" : np.r_[0:10]//5 })
# this works as expected: the outer group indices are presentF.groupby("b").apply(lambdax : x.iloc[:2])
# this does not work as expected: the outer group indices are goneF.groupby("b").apply(lambdax : x.iloc[:])
Issue Description
groupby.apply() omits the outer group index in the dataframe it returns if the row indices of the returned dataframe are identical to the indices of the input dataframe. For example, given the following dataframe:
if I group by column b and take only the first two rows from each group,
In [1]: F.groupby("b").apply(lambda x : x.iloc[:2])
Out[1]:
a b
b
0 0 0 0
1 1 0
1 5 5 1
6 6 1
the group index b becomes the outer index of the dataframe, as expected. However, if I instead take all the rows from each group
In [2]: F.groupby("b").apply(lambda x : x.iloc[:])
Out[2]:
a b
0 0 0
1 1 0
2 2 0
3 3 0
4 4 0
5 5 1
6 6 1
7 7 1
8 8 1
9 9 1
the outer group index disappears. It seems that any time the row indices of the dataframe returned by groupby.apply() are identical to those of the input dataframe, the outer group index disappears:
In [3]: F.groupby("b").apply(lambda x : x)
Out[3]:
a b
0 0 0
1 1 0
2 2 0
3 3 0
4 4 0
5 5 1
6 6 1
7 7 1
8 8 1
9 9 1
In [4]: F.groupby("b").apply(lambda x : x + 1)
Out[4]:
a b
0 1 1
1 2 1
2 3 1
3 4 1
4 5 1
5 6 2
6 7 2
7 8 2
8 9 2
9 10 2
If the row indices of the returned dataframe differ from those of the input dataframe, the outer group index is present:
In [5]: F.groupby("b").apply(lambda x : x.iloc[np.r_[0, 4, 1, 2, 3]])
Out[5]:
a b
b
0 0 0 0
4 4 0
1 1 0
2 2 0
3 3 0
1 5 5 1
9 9 1
6 6 1
7 7 1
8 8 1
In [6]: F.groupby("b").apply(lambda x : x.iloc[::-1])
Out[6]:
a b
b
0 4 4 0
3 3 0
2 2 0
1 1 0
0 0 0
1 9 9 1
8 8 1
7 7 1
6 6 1
5 5 1
Expected Behavior
I expect the outer group indices to always be present:
In [1]: F = pd.DataFrame({ "a" : np.r_[0:10], "b" : np.r_[0:10]//5 })
In [2]: F.groupby("b").apply(lambda x : x.iloc[:])
Out[2]:
a b
b
0 0 0 0
1 1 0
2 2 0
3 3 0
4 4 0
1 5 5 1
6 6 1
7 7 1
8 8 1
9 9 1
Thanks for the report! apply is inferring the op is a transform in cases where the group is not present in the result. Transforms have an index that matches the index of input. This is the reason why the groups are not being included.
Users should have control of this through the groupby(.., group_keys=) argument, but that is currently ignored in the case that apply infers a transform. This would be fixed by #34998.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
groupby.apply()
omits the outer group index in the dataframe it returns if the row indices of the returned dataframe are identical to the indices of the input dataframe. For example, given the following dataframe:if I group by column
b
and take only the first two rows from each group,the group index
b
becomes the outer index of the dataframe, as expected. However, if I instead take all the rows from each groupthe outer group index disappears. It seems that any time the row indices of the dataframe returned by
groupby.apply()
are identical to those of the input dataframe, the outer group index disappears:If the row indices of the returned dataframe differ from those of the input dataframe, the outer group index is present:
Expected Behavior
I expect the outer group indices to always be present:
Installed Versions
The text was updated successfully, but these errors were encountered: