feat: Add {Expr,Series}.str.to_time#3538
feat: Add {Expr,Series}.str.to_time#3538camriddell wants to merge 13 commits intonarwhals-dev:mainfrom
{Expr,Series}.str.to_time#3538Conversation
Some backends natively allow casting directly from strings to time types, others must step through a datetime (for parsing). These are exposed in Narwhals as the nw.Time() type supported backends: pyarrow, pandas[pyarrow], duckdb, ibis, polars unsupported backends: - pandas & dask: no native time type - _spark_like: time type is not widely used, spark has a native time type, but sqlframe does not
for more information, see https://pre-commit.ci
Test Failures@MarcoGorelli some failures are due to an older version of Polars not auto-inferring the %H:%M format, which was added in pola-rs/polars#22606 (release Polars 1.30.0)) Should we:
EditI went with option 2. from the above: xfail for this particular time format for versions of Polars<=1.30. Reverting is easy if we want to switch. |
This comment was marked as resolved.
This comment was marked as resolved.
On further thought, I think we should just xfail this. The glue-code necessary here is going to be fragile since Polars does the time-format inferral step within the Rust layer making it hard to configure from our Python API. Also auto-inferral of Time formats is a nice-to For some more details:
MREsPolars <=1.29.0 fails to auto-infer HH:MM (but successfully auto-infers HH:MM:SS)❯ python << EOF
import polars as pl
print(f'{pl.__version__ = }')
print(pl.Series(['12:59:34']).str.to_time())
print(pl.Series(['12:59']).str.to_time())
EOF
pl.__version__ = '1.29.0'
shape: (1,)
Series: '' [time]
[
12:59:34
]
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.14/site-packages/polars/series/utils.py", line 106, in wrapper
return s.to_frame().select_seq(f(*args, **kwargs)).to_series()
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.14/site-packages/polars/dataframe/frame.py", line 9682, in select_seq
return self.lazy().select_seq(*exprs, **named_exprs).collect(_eager=True)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.14/site-packages/polars/_utils/deprecation.py", line 93, in wrapper
return function(*args, **kwargs)
File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.14/site-packages/polars/lazyframe/frame.py", line 2224, in collect
return wrap_df(ldf.collect(engine, callback))
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: could not find an appropriate format to parse times, please define a formatPolars <=1.29.0 supplying an explicit format is fine❯ python << EOF
import polars as pl
print(f'{pl.__version__ = }')
print(pl.Series(['12:59:34']).str.to_time())
print(pl.Series(['12:59']).str.to_time(format='%H:%M'))
EOF
pl.__version__ = '1.29.0'
shape: (1,)
Series: '' [time]
[
12:59:34
]
shape: (1,)
Series: '' [time]
[
12:59:00
]
Polars >1.29.0 fails to auto-infer HH:MM (but successfully auto-infers HH:MM:SS)Then upgrading to more recent Polars ❯ uv pip install polars --upgrade
Resolved 2 packages in 119ms
Prepared 1 package in 0.21ms
Uninstalled 1 package in 3ms
Installed 1 package in 4ms
- polars==1.29.0
+ polars==1.39.3
❯ python << EOF
import polars as pl
print(f'{pl.__version__ = }')
print(pl.Series(['12:59:34']).str.to_time())
print(pl.Series(['12:59']).str.to_time())
EOF
pl.__version__ = '1.39.3'
shape: (1,)
Series: '' [time]
[
12:59:34
]
shape: (1,)
Series: '' [time]
[
12:59:00
]
Unfortunately `.str.to_datetime` doesn't auto-infer the time formats either❯ python << EOF
import polars as pl
print(f'{pl.__version__ = }')
print(pl.Series(['12:59:34']).str.to_datetime())
print(pl.Series(['12:59']).str.to_datetime())
EOF
pl.__version__ = '1.39.3'
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.14/site-packages/polars/series/string.py", line 168, in to_datetime
self._s.str_to_datetime_infer(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
time_unit,
^^^^^^^^^^
...<2 lines>...
ambiguous_s._s,
^^^^^^^^^^^^^^^
)
^
polars.exceptions.ComputeError: could not find an appropriate format to parse dates, please define a format |
yup definitely
is it ok to implement it for spark but xfail for sqlframe, and raise a feature request to sqlframe? |
Just added a commit for sparks new
In fact I even had to "opt-in" to Sparks support for our tests so I'm not sure if this is something we want to expose to users just yet since. |
|
ah hadn't realised yet, thanks for explaining - ok to keep it unimplemented for spark then |
This reverts commit 917b462.
Once this lands, I can rebase a different branch that has the pyspark-relevant code that we can hold onto in a draft PR for posterity. |
There was a problem hiding this comment.
Thanks @camriddell 🚀 it looks great! Opened a discussion for what we should do in the pandas-like case and a tiny suggestion for the docstrings
| if not is_dtype_pyarrow(self.native.dtype): | ||
| msg = ( | ||
| "This operation requires a pyarrow-backed series. " | ||
| "Please refer to https://narwhals-dev.github.io/narwhals/api-reference/narwhals/#narwhals.maybe_convert_dtypes " | ||
| "and ensure you are using dtype_backend='pyarrow'. " | ||
| "Additionally, make sure you have pandas version 1.5+ and pyarrow installed. " | ||
| ) | ||
| raise TypeError(msg) |
There was a problem hiding this comment.
Should we follow the same "pattern" that we use when converting from any pandas-like dtype to one that is supported only with pyarrow instead of directly requiring that the dtype is already pyarrow-backed?
I refer to:
narwhals/narwhals/_pandas_like/utils.py
Lines 549 to 563 in 13e6024
There was a problem hiding this comment.
I was following the pattern in _pandas_like...str.spit.
Shall I change both this instance and the one in my PR to follow the reference in _pandas_like.utils.narwhals_to_native_arrow_dtype?
There was a problem hiding this comment.
Thanks for pointing that out! I guess it's some inconsistency!
I would be ok casting in behalf of the user in these cases, but happy to hear what others think about this
There was a problem hiding this comment.
Updated, I let the mechanics of .cast(Time) handle the auto-conversion after stepping through the datetime dtype.
I also added an xfail for testing environments that are running the non-pyarrow pandas-like constructors but do not have pyarrow installed/available.
Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>
Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>
{Expr,Series}.str.to_time
FBruzzesi
left a comment
There was a problem hiding this comment.
Thanks for adjusting @camriddell - just a minor comment on how to skip tests for old pandas, and I think we can merge once addressed!
|
@MarcoGorelli was just thinking that I could leave the |
FBruzzesi
left a comment
There was a problem hiding this comment.
Thanks for bearing with me @camriddell - It's a +1 from me!

Some backends natively allow casting directly from strings to time types, others must step through a datetime (for parsing). These are exposed in Narwhals as the nw.Time() type
supported backends: pyarrow, pandas[pyarrow], duckdb, ibis, polars unsupported backends:
Description
What type of PR is this? (check all applicable)
Related issues
Checklist