Skip to content

DataFrame.multiply(Series, axis=1) with 10k+ rows raises AttributeError #28171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JoElfner opened this issue Aug 27, 2019 · 2 comments
Closed
Labels
Duplicate Report Duplicate issue or pull request

Comments

@JoElfner
Copy link
Contributor

Code Sample raising the error:

df = pd.DataFrame(data=np.random.rand(15000, 5), columns=['x', 'a', 'ab', 'b', 'c'])
ser = pd.Series(data=[1.1, 9, 17.8], index=['a', 'b', 'c'])
df.multiply(ser, axis=1)

not raising the error if up to 10000 rows are taken:

# NO error:
df.iloc[:10000].multiply(ser, axis=1)
# error:
df.iloc[:10001].multiply(ser, axis=1)

Problem description

Multiplying a DataFrame with more than 10k rows with a Series with the keyword axis=1 raises the following error:

AttributeError: 'numpy.dtype' object has no attribute 'value_counts'
It seems like another path is taken, if more than 10k rows are multiplied.
Version 0.24.2 did NOT reproduce the error. The error occurs since 0.25.0

Omitting the axis keyword seems to solve the problem, but is it safe to do so in all cases?

Any ideas for a workaround except for solving it in chunks?

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Windows OS-release : 7 machine : AMD64 processor : Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : de LOCALE : de_DE.ISO8859-1

pandas : 0.25.0
numpy : 1.16.4
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : 0.29.13
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.7.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.0
numexpr : 2.7.0
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8

@TomAugspurger
Copy link
Contributor

Fixed in 0.25.1: #27636

@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label Aug 27, 2019
@JoElfner
Copy link
Contributor Author

ouch, must have missed all issues/pr/changes from 0.25.0 to 0.25.1 when looking for existing issues/prs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants