Skip to content

Incorrect PolarsInefficientMapWarning for string containment with in #17182

@henryharbeck

Description

@henryharbeck

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

string = "qwerty"
substring = "we"

df = pl.DataFrame({"substring": [substring]})

df.select(
    pl.col("substring").map_elements(
        lambda substring: substring in string,
        return_dtype=pl.Boolean,
    )
)

From discord here

Somewhat related to #14055 as it also reports issues with different operator behaviour based on input types.
If you believe this is too closely related to that issue, feel free to close.

Log output

<ipython-input-40-a1f88d5d4bbb>:7: PolarsInefficientMapWarning: 
Expr.map_elements is significantly slower than the native expressions API.
Only use if you absolutely CANNOT implement your logic otherwise.
Replace this expression...
  - pl.col("substring").map_elements(lambda substring: ...)
with this one instead:
  + pl.col("substring").is_in(string)

Issue description

As part of PolarsInefficientMapWarning, Python's in operator should be translated to Expr.str.contains(...) when the right operand of substring in string is a str

Expected behavior

Provide a correct suggestion as part of the warning. The current suggestion raises an error.

Installed versions

--------Version info---------
Polars:               0.20.31
Index type:           UInt32
Platform:             Linux-6.1.85+-x86_64-with-glibc2.35
Python:               3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.2.1
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2023.6.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.7.1
nest_asyncio:         1.6.0
numpy:                1.25.2
openpyxl:             3.1.4
pandas:               2.0.3
pyarrow:              14.0.2
pydantic:             2.7.4
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.31
torch:                2.3.0+cu121
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

Metadata

Metadata

Assignees

No one assigned

    Labels

    P-mediumPriority: mediumblack magicHold on to your hatsbugSomething isn't workingpythonRelated to Python Polars

    Type

    No type

    Projects

    Status

    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions