Skip to content

BUG: Impossible creation of array with dtype=string #61263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
3474043
DOC: Update warning in Index.values docstring to clarify index modifi…
Manju080 Mar 6, 2025
d070b06
DOC: Update warning in Index.values docstring to clarify index modifi…
Manju080 Mar 7, 2025
b00ba12
Update pandas/core/indexes/base.py
Manju080 Mar 8, 2025
390c8be
DOC : Fixing the whitespace which was causing error
Manju080 Mar 10, 2025
e58f383
Fixed docstring validation and formatting issues
Manju080 Mar 11, 2025
a505d35
BUG: Fix array creation for string dtype with inconsistent list lengt…
Manju080 Apr 9, 2025
6f5c4d4
BUG: Fix array creation for string dtype with inconsistent list lengt…
Manju080 Apr 9, 2025
fc4653d
BUG fix GH#61155 v2
Manju080 Apr 15, 2025
ae36cf7
BUG fix GH#61155 with test case for list of lists handling
Manju080 Apr 15, 2025
fb965e7
Fix formatting in test_string_array.py (pre-commit autofix)
Manju080 Apr 16, 2025
d3bbeaf
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 16, 2025
4bf8a07
Add test for list of lists handling in ensure_string_array (GH#61155)
Manju080 May 6, 2025
e81e1da
Merge branch 'bugfix-61155' of https://github.com/Manju080/pandas int…
Manju080 May 7, 2025
0ca4a18
fixing checks
Manju080 May 7, 2025
8a4a54d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 8, 2025
90a74ef
Update pandas/tests/libs/test_lib.py
Manju080 May 8, 2025
4db751d
Remove pandas/tests/arrays/test_string_array.py as requested
Manju080 May 8, 2025
9979a8d
wrong fiel base.py
Manju080 May 8, 2025
71f7adc
Remove check for nested lists in scalars in string_.py first try
Manju080 May 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -769,7 +769,10 @@ cpdef ndarray[object] ensure_string_array(
return out
arr = arr.to_numpy(dtype=object)
elif not util.is_array(arr):
arr = np.array(arr, dtype="object")
# GH#61155: Guarantee a 1-d result when array is a list of lists
input_arr = arr
arr = np.empty(len(arr), dtype="object")
arr[:] = input_arr

result = np.asarray(arr, dtype="object")

Expand Down
2 changes: 2 additions & 0 deletions pandas/core/arrays/string_.py
Original file line number Diff line number Diff line change
Expand Up @@ -655,6 +655,8 @@ def _from_sequence(
# zero_copy_only to True which caused problems see GH#52076
scalars = np.array(scalars)
# convert non-na-likes to str, and nan-likes to StringDtype().na_value
if isinstance(scalars, list) and all(isinstance(x, list) for x in scalars):
scalars = [str(x) for x in scalars]
Copy link
Member

@rhshadrach rhshadrach Apr 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, can you modify ensure_string_array in pandas._libs.lib.pyx as follows. Instead of

elif not util.is_array(arr):
    arr = np.array(arr, dtype="object")

do

elif not util.is_array(arr):
    # GH#61155: Guarantee a 1-d result when array is a list of lists
    arr = np.empty(len(array), dtype="object")
    arr[:] = array

Will has almost no performance impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the suggestions, I have made the necessary changes as per the guidance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the update. The changes in this file should now be reverted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be reverted I think.

result = lib.ensure_string_array(scalars, na_value=na_value, copy=copy)

# Manually creating new array avoids the validation step in the __init__, so is
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4912,6 +4912,10 @@ def values(self) -> ArrayLike:
:meth:`Index.to_numpy`, depending on whether you need
a reference to the underlying data or a NumPy array.

.. versionchanged:: 3.0.0

The returned array is read-only.

Returns
-------
array: numpy.ndarray or ExtensionArray
Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/arrays/test_string_array.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import pandas as pd

print(pd.array([list("test"), list("words")], dtype="string"))
print(pd.array([list("test"), list("word")], dtype="string"))
Loading