Skip to content

fix: Add fallback for incompatible string concenation in pandas#3548

Open
MarcoGorelli wants to merge 3 commits intonarwhals-dev:mainfrom
MarcoGorelli:binary-pyarrow-comparisons
Open

fix: Add fallback for incompatible string concenation in pandas#3548
MarcoGorelli wants to merge 3 commits intonarwhals-dev:mainfrom
MarcoGorelli:binary-pyarrow-comparisons

Conversation

@MarcoGorelli
Copy link
Copy Markdown
Member

closes #3546

Description

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

  • Related issue #<issue number>
  • Closes #<issue number>

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

@dangotbanned
Copy link
Copy Markdown
Member

dangotbanned commented Apr 14, 2026

Note

Edited to increase the range

Have you thought about handling the dtypes somewhere around here instead?

def concat_str(
self, *exprs: PandasLikeExpr, separator: str, ignore_nulls: bool
) -> PandasLikeExpr:
string = self._version.dtypes.String()
def func(df: PandasLikeDataFrame) -> list[PandasLikeSeries]:
expr_results = [s for _expr in exprs for s in _expr(df)]
series = [s.cast(string) for s in expr_results]

E.g. you could do 2 passes where:

  1. Collect all the native dtypes
  2. Decide on what's needed to make them compatible & then do the casts at the native-level

This is reminding me a bit of #3398 (comment) 😄

@MarcoGorelli
Copy link
Copy Markdown
Member Author

not sure about doing it for all ops tbh, but for string concatenation at least it seems to be necessary so i'd suggest starting with that, we can make it more generally available later if necessary

@MarcoGorelli MarcoGorelli marked this pull request as ready for review April 14, 2026 15:19
@dangotbanned
Copy link
Copy Markdown
Member

not sure about doing it for all ops tbh, but for string concatenation at least it seems to be necessary so i'd suggest starting with that, we can make it more generally available later if necessary

Sorry if I wasn't clear, I was talking about concat_str - have edited (#3548 (comment))

@MarcoGorelli
Copy link
Copy Markdown
Member Author

we'd then need to do the same casts for __add__, so i think it's cleaner to do just it in one place?

@dangotbanned
Copy link
Copy Markdown
Member

we'd then need to do the same casts for __add__

I was a bit confused by this, I had no idea polars did it too 😂

import polars as pl

>>> pl.Series(["a", "b", "c"]) + pl.Series(["d", "e", "f"])
shape: (3,)
Series: '' [str]
[
	"ad"
	"be"
	"cf"
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Concatenating nw.lit('string') column with PyArrow string column raises TypeError

2 participants