Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: estimate_pandas_size on arrow based pandas dataframe raises pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays #629

Open
codingl2k1 opened this issue Jul 28, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@codingl2k1
Copy link
Contributor

codingl2k1 commented Jul 28, 2023

Describe the bug

A clear and concise description of what the bug is.

  File "/Users/codingl2k1/Work/xorbits/python/xorbits/_mars/utils.py", line 496, in calc_data_size
    return estimate_pandas_size(dt)
  ^^^^^^^^^^^^^^^^^
  File "/Users/codingl2k1/Work/xorbits/python/xorbits/_mars/utils.py", line 571, in estimate_pandas_size
    sample_size = sys.getsizeof(iloc[indices])
  ^^^^^^^^^^^^^^^^^
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pandas/core/indexing.py", line 1103, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
^^^^^^^^^^^^^^^
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pandas/core/indexing.py", line 1647, in _getitem_axis
    return self._get_list_axis(key, axis=axis)
^^^^^^^^^^^^^^^
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pandas/core/indexing.py", line 1618, in _get_list_axis
    return self.obj._take_with_is_copy(key, axis=axis)
^^^^^^^^^^^^^^^
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pandas/core/generic.py", line 3948, in _take_with_is_copy
    result = self._take(indices=indices, axis=axis)
  ^^^^^^^^^^^^^^^^^
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pandas/core/generic.py", line 3932, in _take
    new_data = self._mgr.take(
  ^^^^^^^^^^^^^^^^^
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 963, in take
    return self.reindex_indexer(
  ^^^^^^^^^^^^^^^^^
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 747, in reindex_indexer
    new_blocks = [
^^^^^^^^^^^^^^^
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 748, in <listcomp>
    blk.take_nd(
^^^^^^^^^^^
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 945, in take_nd
    new_values = algos.take_nd(
  ^^^^^^^^^^^^^^^^^
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pandas/core/array_algos/take.py", line 114, in take_nd
    return arr.take(indexer, fill_value=fill_value, allow_fill=allow_fill)
  ^^^^^^^^^^^^^^^^^
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pandas/core/arrays/arrow/array.py", line 1035, in take
    return type(self)(self._data.take(indices))
^^^^^^^^^^^
  File "pyarrow/table.pxi", line 1029, in pyarrow.lib.ChunkedArray.take
  File "/Users/codingl2k1/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pyarrow/compute.py", line 482, in take
    return call_function('take', [data, indices], options, memory_pool)
      ^^^^^^^^^^^^^^^^^
  File "pyarrow/_compute.pyx", line 572, in pyarrow._compute.call_function
  File "pyarrow/_compute.pyx", line 367, in pyarrow._compute.Function.call
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: [address=127.0.0.1:59388, pid=61917] offset overflow while concatenating arrays

To Reproduce

To help us to reproduce this bug, please provide information below:

Python 3.11.4
pandas 2.0.3
pyarrow 12.0.1

  1. Your Python version
  2. The version of Xorbits you use
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

Similar issue: apache/arrow#33049

@XprobeBot XprobeBot added the bug Something isn't working label Jul 28, 2023
@XprobeBot XprobeBot added this to the v0.5.0 milestone Jul 28, 2023
@codingl2k1 codingl2k1 changed the title BUG: estimate_pandas_size on arrow based pandas dataframe raises error BUG: estimate_pandas_size on arrow based pandas dataframe raises pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays Jul 28, 2023
@XprobeBot XprobeBot modified the milestones: v0.5.0, v0.5.1 Jul 28, 2023
@XprobeBot XprobeBot modified the milestones: v0.5.1, v0.5.2 Aug 14, 2023
@XprobeBot XprobeBot modified the milestones: v0.5.2, Temp, v0.6.0, v0.6.1 Sep 8, 2023
@XprobeBot XprobeBot modified the milestones: v0.6.1, v0.6.2, v0.6.3 Sep 15, 2023
@XprobeBot XprobeBot modified the milestones: v0.6.3, v0.7.0 Sep 25, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.0, v0.7.1 Oct 23, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.1, v0.7.2 Nov 21, 2023
@XprobeBot XprobeBot modified the milestones: v0.7.2, v0.7.3 Jan 5, 2024
@XprobeBot XprobeBot modified the milestones: v0.7.3, v0.7.4 Aug 22, 2024
@luweizheng luweizheng removed this from the v0.7.4 milestone Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants