Skip to content

BUG: dataframe returned by groupby.apply() is missing outer group indices if row indices of returned dataframe are identical to those of the input dataframe #46041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
julianhess opened this issue Feb 17, 2022 · 1 comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Duplicate Report Duplicate issue or pull request Groupby

Comments

@julianhess
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

F = pd.DataFrame({ "a" : np.r_[0:10], "b" : np.r_[0:10]//5 })

# this works as expected: the outer group indices are present
F.groupby("b").apply(lambda x : x.iloc[:2])

# this does not work as expected: the outer group indices are gone
F.groupby("b").apply(lambda x : x.iloc[:])

Issue Description

groupby.apply() omits the outer group index in the dataframe it returns if the row indices of the returned dataframe are identical to the indices of the input dataframe. For example, given the following dataframe:

 F = pd.DataFrame({ "a" : np.r_[0:10], "b" : np.r_[0:10]//5 })

   a  b
0  0  0
1  1  0
2  2  0
3  3  0
4  4  0
5  5  1
6  6  1
7  7  1
8  8  1
9  9  1

if I group by column b and take only the first two rows from each group,

In [1]: F.groupby("b").apply(lambda x : x.iloc[:2])
Out[1]:
     a  b
b
0 0  0  0
  1  1  0
1 5  5  1
  6  6  1

the group index b becomes the outer index of the dataframe, as expected. However, if I instead take all the rows from each group

In [2]: F.groupby("b").apply(lambda x : x.iloc[:])
Out[2]:
   a  b
0  0  0
1  1  0
2  2  0
3  3  0
4  4  0
5  5  1
6  6  1
7  7  1
8  8  1
9  9  1

the outer group index disappears. It seems that any time the row indices of the dataframe returned by groupby.apply() are identical to those of the input dataframe, the outer group index disappears:

In [3]: F.groupby("b").apply(lambda x : x)
Out[3]:
   a  b
0  0  0
1  1  0
2  2  0
3  3  0
4  4  0
5  5  1
6  6  1
7  7  1
8  8  1
9  9  1

In [4]: F.groupby("b").apply(lambda x : x + 1)
Out[4]:
    a  b
0   1  1
1   2  1
2   3  1
3   4  1
4   5  1
5   6  2
6   7  2
7   8  2
8   9  2
9  10  2

If the row indices of the returned dataframe differ from those of the input dataframe, the outer group index is present:

In [5]: F.groupby("b").apply(lambda x : x.iloc[np.r_[0, 4, 1, 2, 3]])
Out[5]:
     a  b
b
0 0  0  0
  4  4  0
  1  1  0
  2  2  0
  3  3  0
1 5  5  1
  9  9  1
  6  6  1
  7  7  1
  8  8  1
  
In [6]: F.groupby("b").apply(lambda x : x.iloc[::-1])
Out[6]:
     a  b
b
0 4  4  0
  3  3  0
  2  2  0
  1  1  0
  0  0  0
1 9  9  1
  8  8  1
  7  7  1
  6  6  1
  5  5  1

Expected Behavior

I expect the outer group indices to always be present:

In [1]: F = pd.DataFrame({ "a" : np.r_[0:10], "b" : np.r_[0:10]//5 })

In [2]: F.groupby("b").apply(lambda x : x.iloc[:])
Out[2]:
   a  b
b
0  0  0  0
   1  1  0
   2  2  0
   3  3  0
   4  4  0
1  5  5  1
   6  6  1
   7  7  1
   8  8  1
   9  9  1

Installed Versions

INSTALLED VERSIONS
------------------
commit           : 06d230151e6f18fdb8139d09abf539867a8cd481
python           : 3.8.5.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.4.0-210-generic
Version          : #242-Ubuntu SMP Fri Apr 16 09:57:56 UTC 2021
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.4.1
numpy            : 1.20.3
pytz             : 2020.4
dateutil         : 2.8.1
pip              : 20.3.3
setuptools       : 51.0.0.post20201207
Cython           : 0.29.22
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.3
IPython          : 8.0.1
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
fsspec           : 0.9.0
gcsfs            : 0.8.0
matplotlib       : 3.4.2
numba            : 0.53.1
numexpr          : 2.7.2
odfpy            : None
openpyxl         : 3.0.6
pandas_gbq       : None
pyarrow          : 3.0.0
pyreadstat       : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.4
sqlalchemy       : None
tables           : 3.6.1
tabulate         : 0.8.7
xarray           : None
xlrd             : 2.0.1
xlwt             : None
zstandard        : None
@julianhess julianhess added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 17, 2022
julianhess added a commit to getzlab/canine that referenced this issue Feb 17, 2022
julianhess added a commit to getzlab/CApy that referenced this issue Feb 17, 2022
@rhshadrach
Copy link
Member

Thanks for the report! apply is inferring the op is a transform in cases where the group is not present in the result. Transforms have an index that matches the index of input. This is the reason why the groups are not being included.

Users should have control of this through the groupby(.., group_keys=) argument, but that is currently ignored in the case that apply infers a transform. This would be fixed by #34998.

Closing as a duplicate (issues in the linked PR)

@rhshadrach rhshadrach added Apply Apply, Aggregate, Transform, Map Groupby Duplicate Report Duplicate issue or pull request and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Duplicate Report Duplicate issue or pull request Groupby
Projects
None yet
Development

No branches or pull requests

2 participants