Releases · pola-rs/polars

04 Feb 12:01

github-actions

py-1.38.0

e1612c2

Python Polars 1.38.0 Latest

Latest

⚠️ Deprecations

Deprecate retries=n in favor of storage_options={"max_retries": n} (#26155)

🚀 Performance improvements

Enable zero-copy object_store put upload for IPC sink (#26288)
Resolve file schema's and metadata concurrently (#26325)
Run elementwise CSEE for the streaming engine (#26278)
Disable morsel splitting for fast-count on streaming engine (#26245)
Implement streaming decompression for scan_ndjson and scan_lines (#26200)
Improve string slicing performance (#26206)
Refactor scan_delta to use python dataset interface (#26190)
Add dedicated kernel for group-by arg_max/arg_min (#26093)
Add streaming merge-join (#25964)
Generalize Bitmap::new_zeroed opt for Buffer::zeroed (#26142)
Reduce fs stat calls in path expansion (#26173)
Lower streaming group_by n_unique to unique().len() (#26109)

✨ Enhancements

Avoid OOM for scan_ndjson and scan_lines if input is compressed and negative slice (#26396)
Support annoymous agg in-mem (#26376)
Add unstable arrow_schema parameter to sink_parquet (#26323)
Improve error message formatting for structs (#26349)
Remove parquet field overwrites (#26236)
Enable zero-copy object_store put upload for IPC sink (#26288)
Improved disambiguation for qualified wildcard columns in SQL projections (#26301)
Expose upload_concurrency through env var (#26263)
Allow quantile to compute multiple quantiles at once (#25516)
Allow empty LazyFrame in LazyFrame.group_by(...).map_groups (#26275)
Use delta file statistics for batch predicate pushdown (#26242)
Add streaming UnorderedUnion (#26240)
Implement compression support for sink_ndjson (#26212)
Add unstable record batch statistics flags to {sink/scan}_ipc (#26254)
Support CSE for python UDFs on the same address (#26253)
Cloud retry/backoff configuration via storage_options (#26204)
Use same sort order for expanded paths across local / cloud / directory / glob (#26191)
Add streaming merge-join (#25964)
Serialize optimization flags for cloud plan (#26168)
Add compression support to write_csv and sink_csv (#26111)
Add scan_lines (#26112)
Support regex in str.split (#26060)
Add unstable IPC Statistics read/write to scan_ipc/sink_ipc (#26079)
Add unstable height parameter to DataFrame/LazyFrame (#26014)
Remove old partition sink API (#26100)
Expose ArrowStreamExportable on python collect batches iterator (#26074)
Add nulls support for all rolling_by operations (#26081)

🐞 Bug fixes

Correct off-by-one in RLE row counting for nullable dictionary-encoded columns (#26411)
Support very large integers in env var limits (#26399)
Fix PlPath panic from incorrect slicing of UTF8 boundaries (#26389)
Fix Float dtype for spearman correlation (#26392)
Fix optimizer panic in right joins with type coercion (#26365)
Don't serialize retry config from local environment vars (#26289)
Fix PartitionBy with scalar key expressions and diff() (#26370)
Add {Float16, Float32} -> Float32 lossless upcast (#26373)
Fix panic using with_columns and collect_all (#26366)
Add multi-page support for writing dictionary-encoded Parquet columns (#26360)
Ensure slice advancement when skipping non-inlinable values in is_in with inlinable needles (#26361)
Pin xlsx2csv version temporarily (#26352)
Bugs in ViewArray total_bytes_len (#26328)
Overflow in i128::abs in Decimal fits check (#26341)
Make Expr.hash on Categorical mapping-independent (#26340)
Clone shared GroupBy node before mutation in physical plan creation (#26327)
Fixed "sheet_name" typing for read_ods and read_excel (#26317)
Improve Polars dtype inference from Python Union typing (#26303)
Consider the "current location" of an item when computing rolling_rank_by (#26287)
Reset is_count_star flag between queries in collect_all (#26256)
Fix incorrect is_between filter on scan_parquet (#26284)
Make polars compatible with ty (#26270)
Lower AnonymousStreamingAgg in group-by as aggregate (#26258)
Avoid overflow in pl.duration scalar arguments case (#26213)
Broadcast arr.get on single array with multiple indices (#26219)
Fix panic on CSPE with sorts (#26231)
Eager DataFrame.slice with negative offset and length=None (#26215)
Use correct schema side for streaming merge join lowering (#26218)
Overflow panic in scan_csv with multiple files and skip_rows + n_rows larger than total row count (#26128)
Respect allow_object flag after cache (#26196)
Raise error on non-elementwise PartitionBy keys (#26194)
Allow ordered categorical dictionary in scan_parquet (#26180)
Allow excess bytes on IPC bitmap compressed length (#26176)
Address a macOS-specific compile issue (#26172)
Fix deadlock on hash_rows() of 0-width DataFrame (#26154)
Fix NameError filtering pyarrow dataset (#26166)
Fix concat_arr panic when using categoricals/enums (#26146)
Fix NDJSON/scan_lines negative slice splitting with extremely long lines (#26132)
Incorrect group_by min/max fast path (#26139)
Remove a source of non-determinism from lowering (#26137)
Error when with_row_index or unpivot create duplicate columns on a LazyFrame (#26107)
Panics on shift with head (#26099)

📖 Documentation

Fix Expr.get referencing incorrect dtype for index parameter (#26364)
Fix Expr.quantile formatting (#26351)
Drop sphinx-llms-txt extension (#26285)
Remove deprecated cublet_id (#26260)
Update for new release (#26255)
Update MCP server section with new URL (#26241)
Fix unmatched paren and punctuation in pandas migration guide (#26251)
Add observatory database_path to docs (#26201)
Note plugins in Python user-defined functions (#26138)

📦 Build system

Address remaining Python 3.14 issues with make requirements-all (#26195)
Address a macOS-specific compile issue (#26172)

🛠️ Other improvements

Ensure local doctests skip from_torch if module not installed (#26405)
Change linked timezones in test suite to canonical timezones (#26310)
Implement various deprecations (#26314)
Rename Operator::Divide to RustDivide (#26339)
Properly disable the Pyodide tests (#26382)
Remove unused field (#26367)
Fix runtime nesting (#26359)
Remove xlsx2csv dependency pin (#26355)
Use outer runtime if exists in to_alp (#26353)
Make CategoricalMapping::new pub(crate) to avoid misuse (#26308)
Clarify IPC buffer read limit/length paramter (#26334)
Add dtype test coverage for delta predicate filter (#26291)
Add AI policy (#26286)
Unpin "pandas<3" in dev dependencies (#26249)
Remove all non CSV fast-count paths (#26233)
Pin pandas to 2.x for now (#26221)
Remove unnecessary xfail (#26199)
Ensure optimization flag modification happens local (#26185)
Simplify IcebergDataset (#26165)
Reorganize unit tests into logical subdirectories (#26149)
Lint leftover fixme (#26122)
Improve backtrace for POLARS_PANIC_ON_ERR (#26125)
Fix Python docs build (#26117)
Disable unused-ignore mypy lint (#26110)
Ignore mypy warning (#26105)
Raise error on file://hostname/path (#26061)
Disable debug info for docs workflow (#26086)
Update docs for next polars cloud release (#26091)
Support Python 3.14 in dev environment (#26073)

Thank you to all our contributors for making this release possible!
@Atarust, @EndPositive, @Kevin-Patyk, @LeeviLindgren, @MarcoGorelli, @Matt711, @MrAttoAttoAtto, @Voultapher, @WaffleLapkin, @agossard, @alex-gregory-ds, @alexander-beedie, @azimafroozeh, @bayoumi17m, @c-peters, @carnarez, @dependabot[bot], @dsprenkels, @hallmason17, @hamdanal, @ion-elgreco, @kdn36, @lun3x, @mcrumiller, @nameexhaustion, @orlp, @qxzcode, @r-brink, @ritchie46, @sweb and dependabot[bot]

Contributors

orlp, sweb, and 28 other contributors

Assets 2

12 Jan 23:27

github-actions

py-1.37.1

bb79993

Python Polars 1.37.1

🚀 Performance improvements

Speed up SQL interface "UNION" clauses (#26039)

🐞 Bug fixes

Optimize slicing support on compressed IPC (#26071)
CPU check for musl builds (#26076)
Propagate C Stream import errors instead of panicking (#26036)
Fix slicing on compressed IPC (#26066)

📖 Documentation

Clarify min_by/max_by behavior on ties (#26077)

🛠️ Other improvements

Mark top slow normal tests as slow (#26080)
Update breaking deps (#26055)
Fix for upstream url bug and update deps (#26052)
Properly pin chrono (#26051)
Don't run rust doctests (#26046)
Update deps (#26042)
Ignore very slow test (#26041)

Thank you to all our contributors for making this release possible!
@Voultapher, @alexander-beedie, @kdn36, @nameexhaustion, @orlp, @ritchie46 and @wtn

Contributors

wtn, orlp, and 5 other contributors

Assets 2

10 Jan 12:28

github-actions

py-1.37.0

1674b37

Python Polars 1.37.0

🚀 Performance improvements

Speed up SQL interface "ORDER BY" clauses (#26037)
Add fast kernel for is_nan and use it for numpy NaN->null conversion (#26034)
Optimize ArrayFromIter implementations for ObjectArray (#25712)
New streaming NDJSON sink pipeline (#25948)
New streaming CSV sink pipeline (#25900)
Dispatch partitioned usage of sink_* functions to new-streaming by default (#25910)
Replace ryu with faster zmij (#25885)
Reduce memory usage for .item() count in grouped first/last (#25787)
Skip schema inference if schema provided for scan_csv/ndjson (#25757)
Add width-aware chunking to prevent degradation with wide data (#25764)
Use new sink pipeline for write/sink_ipc (#25746)
Reduce memory usage when scanning multiple parquet files in streaming (#25747)
Don't call cluster_with_columns optimization if not needed (#25724)

✨ Enhancements

Add new pl.PartitionBy API (#26004)
ArrowStreamExportable and sink_delta (#25994)
Release musl builds (#25894)
Implement streaming decompression for CSV COUNT(*) fast path (#25988)
Add nulls support for rolling_mean_by (#25917)
Add lazy collect_all (#25991)
Add streaming decompression for NDJSON schema inference (#25992)
Improved handling of unqualified SQL JOIN columns that are ambiguous (#25761)
Drop Python 3.9 support (#25984)
Expose record batch size in {sink,write}_ipc (#25958)
Add null_on_oob parameter to expr.get (#25957)
Suggest correct timezone if timezone validation fails (#25937)
Support streaming IPC scan from S3 object store (#25868)
Implement streaming CSV schema inference (#25911)
Support hashing of meta expressions (#25916)
Improve SQLContext recognition of possible table objects in the Python globals (#25749)
Add pl.Expr.(min|max)_by (#25905)
Improve MemSlice Debug impl (#25913)
Implement or fix json encode/decode for (U)Int128, Categorical, Enum, Decimal (#25896)
Expand scatter to more dtypes (#25874)
Implement streaming CSV decompression (#25842)
Add Series sql method for API consistency (#25792)
Mark Polars as safe for free-threading (#25677)
Support Binary and Decimal in arg_(min|max) (#25839)
Allow Decimal parsing in str.json_decode (#25797)
Add shift support for Object data type (#25769)
Add missing Series.arr.mean (#25774)
Allow scientific notation when parsing Decimals (#25711)

🐞 Bug fixes

Release GIL on collect_batches (#26033)
Missing buffer update in String is_in Parquet pushdown (#26019)
Make struct.with_fields data model coherent (#25610)
Incorrect output order for order sensitive operations after join_asof (#25990)
Use SeriesExport for pyo3-polars FFI (#26000)
Add pl.Schema to type signature for DataFrame.cast (#25983)
Don't write Parquet min/max statistics for i128 (#25986)
Ensure chunk consistency in in-memory join (#25979)
Fix varying block metadata length in IPC reader (#25975)
Implement collect_batches properly in Rust (#25918)
Fix panic on arithmetic with bools in list (#25898)
Convert to index type with strict cast in some places (#25912)
Empty dataframe in streaming non-strict hconcat (#25903)
Infer large u64 in json as i128 (#25904)
Set http client timeouts to 10 minutes (#25902)
Correct lexicographic ordering for Parquet BYTE_ARRAY statistics (#25886)
Raise error on duplicate group_by names in upsample() (#25811)
Correctly export view buffer sizes nested in Extension types (#25853)
Fix DataFrame.estimated_size not handling overlapping chunks correctly (#25775)
Ensure Kahan sum does not introduce NaN from infinities (#25850)
Trim excess bytes in parquet decode (#25829)
Fix panic/deadlock sinking parquet with rows larger than 64MB estimated size (#25836)
Fix quantile midpoint interpolation (#25824)
Don't use cast when converting from physical in list.get (#25831)
Invalid null count on int -> categorical cast (#25816)
Update groups in list.eval (#25826)
Use downcast before FFI conversion in PythonScan (#25815)
Double-counting of row metrics (#25810)
Cast nulls to expected type in streaming union node (#25802)
Incorrect slice pushdown into map_groups (#25809)
Fix panic writing parquet with single bool column (#25807)
Fix upsample with group_by incorrectly introduced NULLs on group key columns (#25794)
Panic in top_k pruning (#25798)
Fix incorrect collect_schema for unpivot followed by join (#25782)
Verify arr namespace is called from array column (#25650)
Ensure LazyFrame.serialize() unchanged after collect_schema() (#25780)
Function map_(rows|elements) with return_dtype = pl.Object (#25753)
Fix incorrect cargo sub-feature (#25738)

📖 Documentation

Fix display of deprecation warning (#26010)
Document null behaviour for rank (#25887)
Add QUALIFY clause and SUBSTRING function to the SQL docs (#25779)
Update mixed-offset datetime parsing example in user guide (#25915)
Update bare-metal docs for mounted anonymous results (#25801)
Fix credential parameter name in cloud-storage.py (#25788)
Configuration options update (#25756)

🛠️ Other improvements

Update rust compiler (#26017)
Improve csv test coverage (#25980)
Ramp up CSV read size (#25997)
Mark lazy parameter to collect_all as unstable (#25999)
Update ruff action and simplify version handling (#25940)
Run python lint target as part of pre-commit (#25982)
Disable HTTP timeout for receiving response body (#25970)
Fix mypy lint (#25963)
Add AI contribution policy (#25956)
Fix failing scan delta S3 test (#25932)
Improve MemSlice Debug impl (#25913)
Remove and deprecate batched csv reader (#25884)
Remove unused AnonymousScan functions (#25872)
Filter DeprecationWarning from pyparsing indirectly through pyiceberg (#25854)
Various small improvements (#25835)
Clear venv with appropriate version of Python (#25851)
Skip schema inference if schema provided for scan_csv/ndjson (#25757)
Ensure proper async connection cleanup on DB test exit (#25766)
Ensure we uninstall other Polars runtimes in CI (#25739)
Make 'make requirements' more robust (#25693)
Remove duplicate compression level types (#25723)

Thank you to all our contributors for making this release possible!
@AndreaBozzo, @EndPositive, @Kevin-Patyk, @MarcoGorelli, @Voultapher, @alexander-beedie, @anosrepenilno, @arlyon, @azimafroozeh, @carnarez, @dependabot[bot], @dsprenkels, @edizeqiri, @eitanf, @gab23r, @henryharbeck, @hutch3232, @ion-elgreco, @jqnatividad, @kdn36, @lun3x, @m1guelperez, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @sachinn854, @yonikremer and dependabot[bot]

Contributors

orlp, dsprenkels, and 26 other contributors

Assets 2

10 Dec 01:15

github-actions

py-1.36.1

2a151c1

Python Polars 1.36.1

🚀 Performance improvements

Tune partitioned sink_parquet cloud performance (#25687)

✨ Enhancements

Allow creation of Object literal (#25690)
Don't collect schema in SQL union processing (#25675)

🐞 Bug fixes

Don't invalidate node in cluster-with-columns (#25714)
Move boto3 extra from s3fs in dev requirements (#25667)
Add missing type stubs for bin_slice, bin_head, and bin_tail (#25697)
Binary slice methods missing from Series and docs (#25683)
Mix-up of variable_name/value_name in unpivot (#25685)
Invalid usage of drop_first in to_dummies when nulls present (#25435)

📖 Documentation

Fix typos in Excel and Pandas migration guides (#25709)
Add "right" to how options in join() docstrings (#25678)

🛠️ Other improvements

Move Object lit fix earlier in the function (#25713)
Remove unused decimal file (#25701)
Move boto3 extra from s3fs in dev requirements (#25667)
Upgrade to latest version of sqlparser-rs (#25673)
Update slab to version without RUSTSEC (#25686)
Fix typo (#25684)

Thank you to all our contributors for making this release possible!
@AndreaBozzo, @Kevin-Patyk, @alexander-beedie, @dsprenkels, @jamesfricker, @mcrumiller, @nameexhaustion, @orlp and @ritchie46

Contributors

orlp, dsprenkels, and 7 other contributors

Assets 2

08 Dec 17:12

github-actions

py-1.36.0

d28f504

Python Polars 1.36.0

🏆 Highlights

Add Extension types (#25322)

✨ Enhancements

Add SQL support for the QUALIFY clause (#25652)
Add bin.slice(), bin.head(), and bin.tail() methods (#25647)
Add SQL syntax support for CROSS JOIN UNNEST(col) (#25623)
Add separate env var to log tracked metrics (#25586)
Expose fields for generating physical plan visualization data (#25562)
Allow pl.Object in pivot value (#25533)
Minor improvement for as_struct repr (#25529)
Temporal quantile in rolling context (#25479)
Add quantile for missing temporals (#25464)
Add strict parameter to pl.concat(how='horizontal') (#25452)
Support decimals in search_sorted (#25450)
Expose and document pl.Categories (#25443)
Use reference to Graph pipes when flushing metrics (#25442)
Extend SQL UNNEST support to handle multiple array expressions (#25418)
Add SQL support for ROW_NUMBER, RANK, and DENSE_RANK functions (#25409)
Allow elementwise Expr.over in aggregation context (#25402)
Add SQL support for named WINDOW references (#25400)
Add leftmost option to str.replace_many / str.find_many / str.extract_many (#25398)
Automatically Parquet dictionary encode floats (#25387)
Support unique_counts for all datatypes (#25379)
Add maintain_order to Expr.mode (#25377)
Allow hash for all List dtypes (#25372)
Add empty_as_null and keep_nulls to {Lazy,Data}Frame.explode (#25369)
Display function of streaming physical plan map node (#25368)
Allow slice on scalar in aggregation context (#25358)
Allow implode and aggregation in aggregation context (#25357)
Move GraphMetrics into StreamingQuery (#25310)
Documentation on Polars Cloud manifests (#25295)
Add empty_as_null and keep_nulls flags to Expr.explode (#25289)
Allow Expr.unique on List/Array with non-numeric types (#25285)
Raise suitable error on non-integer "n" value for clear (#25266)
Allow Expr.rolling in aggregation contexts (#25258)
Allow bare .row() on a single-row DataFrame, equivalent to .item() on a single-element DataFrame (#25229)
Support additional forms of SQL CREATE TABLE statements (#25191)
Add support for Float16 dtype (#25185)
Support column-positional SQL "UNION" operations (#25183)
Add unstable Schema.to_arrow() (#25149)
Make DSL-hash skippable (#25140)
Improve error message on unsupported SQL subquery comparisons (#25135)
Support arbitrary expressions in SQL JOIN constraints (#25132)
Allow arbitrary expressions as the Expr.rolling index_column (#25117)
Set polars/ user-agent (#25112)
Support ewm_var/std in streaming engine (#25109)
Rewrite IR::Scan to IR::DataFrameScan in expand_datasets when applicable (#25106)
Add ignore_nulls to first / last (#25105)
Allow arbitrary Expressions in "subset" parameter of unique frame method (#25099)
Add BIT_NOT support to the SQL interface (#25094)
Streaming {Expr,LazyFrame}.rolling (#25058)
Add LazyFrame.pivot (#25016)
Add SQL support for LEAD and LAG functions (#23956)
Add having to group_by context (#23550)
Add show methods for DataFrame and LazyFrame (#19634)

🚀 Performance improvements

Set parallelization threshold in take_unchecked_impl (#25672)
New single file IO sink pipeline enabled for sink_parquet (#25670)
Correct overly eager local predicate insertion for unpivot (#25644)
New partitioned IO sink pipeline enabled for sink_parquet (#25629)
Use strong hash instead of traversal for CSPE equality (#25537)
Reduce HuggingFace API calls (#25521)
Fix panic in is_between support in streaming Parquet predicate push down (#25476)
Faster kernels for rle_lengths (#25448)
Mark output of more non-order-maintaining ops as unordered (#25419)
Enable predicate expressions on unsigned integers (#25416)
Allow detecting plan sortedness in more cases (#25408)
Add parquet prefiltering for string regexes (#25381)
Fast find start window in group_by_dynamic with large offset (#25376)
Use fast path for agg_min/agg_max when nulls present (#25374)
Add streaming native LazyFrame.group_by_dynamic (#25342)
Fuse positive slice into streaming LazyFrame.rolling (#25338)
Mark Expr.reshape((-1,)) as row separable (#25326)
Return references from aexpr_to_leaf_names_iter (#25319)
Use bitmap instead of Vec in first/last w. skip_nulls (#25318)
Lazy gather for {forward,backward}_fill in group-by contexts (#25115)
Add streaming sorted Group-By (#25013)

🐞 Bug fixes

Rechunk on nested dtypes in take_unchecked_impl parallel path (#25662)
Fix streaming SchemaMismatch panic on list.drop_nulls (#25661)
Fix panic on Boolean rolling_sum calculation for list or array eval (#25660)
Fix "dtype is unknown" panic in cross joins with literals (#25658)
Fix panic edge-case when scanning hive partitioned data (#25656)
Fix "unreachable code" panic in UDF dtype inference (#25655)
Address potential "batch_size" parameter collision in scan_pyarrow_dataset (#25654)
Fix empty format handling (#25638)
Improve SQL GROUP BY and ORDER BY expression resolution, handling aliasing edge-cases (#25637)
Preserve List inner dtype during chunked take operations (#25634)
Fix lifetime for AmortSeries lazy group iterator (#25620)
Fix spearman panicking on nulls (#25619)
Properly resolve HAVING clause during SQL GROUP BY operations (#25615)
Prevent false positives in is_in for large integers (#25608)
Differentiate between empty list an no list for unpivot (#25597)
Bug in boolean unique_counts (#25587)
Hang in multi-chunk DataFrame .rows() (#25582)
Correct arr_to_any_value for object arrays (#25581)
Have PySeries::new_f16 receive pf16s instead of f32s (#25579)
Set Float16 parquet schema type to Float16 (#25578)
Fix incorrect .list.eval after slicing operations (#25540)
Strict conversion AnyValue to Struct (#25536)
Rolling mean/median for temporals (#25512)
Add .rolling_rank() support for temporal types and pl.Boolean (#25509)
Fix occurence of exact matches of .join_asof(strategy="nearest", allow_exact_matches=False, ...) (#25506)
Always respect return_dtype in map_elements and map_rows (#25504)
Fix group lengths check in sort_by with AggregatedScalar (#25503)
Fix dictionary replacement error in write_ipc() (#25497)
Fix expr slice pushdown causing shape error on literals (#25485)
Allow empty list in sort_by in list.eval context (#25481)
Raise error on out-of-range dates in temporal operations (#25471)
Validate list.slice parameters are not lists (#25458)
Make sum on strings error in group_by context (#25456)
Prevent panic when joining sorted LazyFrame with itself (#25453)
Apply CSV dict overrides by name only (#25436)
Incorrect result in aggregated first/last with ignore_nulls (#25414)
Fix off-by-one bug in ColumnPredicates generation for inequalities operating on integer columns (#25412)
Use Cargo.template.toml to prevent git dependencies from using template (#25392)
Fix arr.{eval,agg} in aggregation context (#25390)
Support AggregatedList in list.{eval,agg} context (#25385)
Nested dtypes in streaming first_non_null/last_non_null (#25375)
Remove Expr casts in pl.lit invocations (#25373)
Optimize projection pushdown through HConcat (#25371)
Revert pl.format behavior with nulls (#25370)
Correct eq_missing for struct with nulls (#25363)
Resolve edge-case with SQL aggregates that have the same name as one of the GROUP BY keys (#25362)
Unique on literal in aggregation context (#25359)
Aggregation with drop_nulls on literal (#25356)
SQL NATURAL joins should coalesce the key columns (#25353)
Mark {forward,backward}_fill as length_preserving (#25352)
Correct drop_items for scalar input (#25351)
Schema mismatch with list.agg, unique and scalar (#25348)
AnyValue::to_physical for categoricals (#25341)
Bugs in pl.from_repr with signed exponential floats and line wrapping (#25331)
Remove ClosableFile (#25330)
Increase precision when constructing float Series (#25323)
Fix link errors reported by markdown-link-check (#25314)
Parquet is_in for mixed validity pages (#25313)
Fix building polars-plan with features lazy,concat_str (but no strings) (#25306)
Fix building polars-mem-engine with the async feature (#25300)
Nested dtypes in streaming first/last (#25298)
Fix length preserving check for eval expressions in streaming engine (#25294)
Panic exception when calling Expr.rolling in .over (#25283)
Don't quietly allow unsupported SQL SELECT clauses (#25282)
Reverse on chunked struct (#25281)
Correct {first,last}_non_null if there are empty chunks (#25279)
Incorrect results for aggregated {n_,}unique on bools (#25275)
Run async DB queries with regular asyncio if not inside a running loop (#25268)
Fix small bug with PyExpr to PyObject conversion (#25265)
Fix building polars-expr without timezones feature (#25254)
Correctly prune projected columns in hints (#25250)
Address multiple issues with SQL OVER clause behaviour for window functions (#25249)
Allow Null dtype values in scatter (#25245)
Make str.json_decode output deterministic with lists (#25240)
Correct handle requested stops in streaming shift (#25239)
Use (i64, u64) for VisualizationData (offset, length) slices (#25203)
Fix serialization of lazyframes containing huge tables (#25190)
Fix single-column CSV header duplication with leading empty lines (#25186)
Enhanced column resolution/tracking through multi-way SQL joins (#25181)
Fix format_str in case of multiple chunks (#25162)
Handle some unusual pl.col.<colname> edge-cases (#25153)
Fix incorrect reshape on sliced lists (#25139)
Support "index" as column name in group_by iterator (#25138)
Fix panic in dt.truncate for invalid duration strings (#25124)
DSL_SCHEMA_HASH should not changed by line endings (#25123...

Contributors

wtn, orlp, and 35 other contributors

Assets 2

02 Dec 12:42

github-actions

py-1.36.0-beta.2

8035d57

Python Polars 1.36.0-beta.2 Pre-release

Pre-release

🏆 Highlights

Add Extension types (#25322)

✨ Enhancements

Add SQL support for ROW_NUMBER, RANK, and DENSE_RANK functions (#25409)
Add SQL support for named WINDOW references (#25400)
Add BIT_NOT support to the SQL interface (#25094)
Add LazyFrame.pivot (#25016)
Add allow_empty flag to item (#25048)
Add empty_as_null and keep_nulls flags to Expr.explode (#25289)
Add empty_as_null and keep_nulls to {Lazy,Data}Frame.explode (#25369)
Add having to group_by context (#23550)
Add ignore_nulls to first / last (#25105)
Add maintain_order to Expr.mode (#25377)
Add quantile for missing temporals (#25464)
Add leftmost option to str.replace_many / str.find_many / str.extract_many (#25398)
Add strict parameter to pl.concat(how='horizontal') (#25452)
Add support for Float16 dtype (#25185)
Add unstable Schema.to_arrow (#25149)
Allow Expr.rolling in aggregation contexts (#25258)
Allow Expr.unique on List/Array with non-numeric types (#25285)
Allow glimpse to return a DataFrame (#24803)
Allow hash for all List dtypes (#25372)
Allow implode and aggregation in aggregation context (#25357)
Allow slice on scalar in aggregation context (#25358)
Allow arbitrary Expressions in "subset" parameter of unique frame method (#25099)
Allow arbitrary expressions as the Expr.rolling index_column (#25117)
Allow bare .row on a single-row DataFrame, equivalent to .item on a single-element DataFrame (#25229)
Allow elementwise Expr.over in aggregation context (#25402)
Allow pl.Object in pivot value (#25533)
Automatically Parquet dictionary encode floats (#25387)
Display function of streaming physical plan map node (#25368)
Documentation on Polars Cloud manifests (#25295)
Expose and document pl.Categories (#25443)
Expose fields for generating physical plan visualization data (#25562)
Extend SQL UNNEST support to handle multiple array expressions (#25418)
Improve SQL UNNEST behaviour (#22546)
Improve error message on unsupported SQL subquery comparisons (#25135)
Make DSL-hash skippable (#25140)
Minor improvement for as_struct repr (#25529)
Move GraphMetrics into StreamingQuery (#25310)
Raise suitable error on non-integer "n" value for clear (#25266)
Rewrite IR::Scan to IR::DataFrameScan in expand_datasets when applicable (#25106)
Set polars/ user-agent (#25112)
Streaming {Expr,LazyFrame}.rolling (#25058)
Support BYTE_ARRAY backed Decimals in Parquet (#25076)
Support ewm_var/std in streaming engine (#25109)
Support unique_counts for all datatypes (#25379)
Support additional forms of SQL "CREATE TABLE" statements (#25191)
Support arbitrary expressions in SQL JOIN constraints (#25132)
Support column-positional SQL "UNION" operations (#25183)
Support decimals in search_sorted (#25450)
Temporal quantile in rolling context (#25479)
Use reference to Graph pipes when flushing metrics (#25442)

🚀 Performance improvements

Add parquet prefiltering for string regexes (#25381)
Add streaming native LazyFrame.group_by_dynamic (#25342)
Add streaming sorted Group-By (#25013)
Allow detecting plan sortedness in more cases (#25408)
Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
Enable predicate expressions on unsigned integers (#25416)
Fast find start window in group_by_dynamic with large offset (#25376)
Faster kernels for rle_lengths (#25448)
Fuse positive slice into streaming LazyFrame.rolling (#25338)
Lazy gather for {forward,backward}_fill in group-by contexts (#25115)
Mark Expr.reshape((-1,)) as row separable (#25326)
Mark output of more non-order-maintaining ops as unordered (#25419)
Optimize ipc stream read performance (#24671)
Reduce HuggingFace API calls (#25521)
Return references from aexpr_to_leaf_names_iter (#25319)
Skip filtering scan IR if no paths were filtered (#25037)
Use bitmap instead of Vec in first/last w. skip_nulls (#25318)
Use fast path for agg_min/agg_max when nulls present (#25374)
Use strong hash instead of traversal for CSPE equality (#25537)

🐞 Bug fixes

Add .rolling_rank support for temporal types and pl.Boolean (#25509)
Address issues with SQL OVER clause behaviour for window functions (#25249)
Aggregation with drop_nulls on literal (#25356)
Allow Null dtype values in scatter (#25245)
Allow broadcast in group_by for ApplyExpr and BinaryExpr (#25053)
Allow empty list in sort_by in list.eval context (#25481)
Allow for negative time in group_by_dynamic iterator (#25041)
Always respect return_dtype in map_elements and map_rows (#25504)
AnyValue::to_physical for categoricals (#25341)
Apply CSV dict overrides by name only (#25436)
Block predicate pushdown when group_by key values are changed (#25032)
Bugs in pl.from_repr with signed exponential floats and line wrapping (#25331)
Correct drop_items for scalar input (#25351)
Correct eq_missing for struct with nulls (#25363)
Correct {first,last}_non_null if there are empty chunks (#25279)
Correct handle requested stops in streaming shift (#25239)
Correctly prune projected columns in hints (#25250)
DSL_SCHEMA_HASH should not changed by line endings (#25123)
Don't push down predicates passed inserted cache nodes (#25042)
Don't quietly allow unsupported SQL SELECT clauses (#25282)
Don't trigger DeprecationWarning from SQL "IN" constraints that use subqueries (#25111)
Enhanced column resolution/tracking through multi-way SQL joins (#25181)
Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
Ensure out-of-range integers and other edge case values don't give wrong results for index_of (#24369)
Fix CSV select(len) off by 1 with comment prefix (#25069)
Fix arr.{eval,agg} in aggregation context (#25390)
Fix format_str in case of multiple chunks (#25162)
Fix groups update on slices with different offsets (#25097)
Fix assertion panic on group_by (#25179)
Fix building polars-expr without timezones feature (#25254)
Fix building polars-mem-engine with the async feature (#25300)
Fix building polars-plan with features lazy,concat_str (but no strings) (#25306)
Fix dictionary replacement error in write_ipc (#25497)
Fix expr slice pushdown causing shape error on literals (#25485)
Fix field metadata for nested categorical PyCapsule export (#25052)
Fix group lengths check in sort_by with AggregatedScalar (#25503)
Fix handling Null dtype in ApplyExpr on group_by (#25077)
Fix incorrect .list.eval after slicing operations (#25540)
Fix incorrect reshape on sliced lists (#25139)
Fix length preserving check for eval expressions in streaming engine (#25294)
Fix occurence of exact matches of .join_asof(strategy="nearest", allow_exact_matches=False, ...) (#25506)
Fix off-by-one bug in ColumnPredicates generation for inequalities operating on integer columns (#25412)
Fix panic if scan predicate produces 0 length mask (#25089)
Fix panic in dt.truncate for invalid duration strings (#25124)
Fix panic in is_between support in streaming Parquet predicate push down (#25476)
Fix panic when using struct field as join key (#25059)
Fix serialization of lazyframes containing huge tables (#25190)
Fix single-column CSV header duplication with leading empty lines (#25186)
Fix small bug with PyExpr to PyObject conversion (#25265)
Group-By aggregation problems caused by AmortSeries (#25043)
Handle some unusual pl.col.<colname> edge-cases (#25153)
Incorrect result in aggregated first/last with ignore_nulls (#25414)
Incorrect results for aggregated {n_,}unique on bools (#25275)
Invert drop_nans filtering in group-by context (#25146)
Make str.json_decode output deterministic with lists (#25240)
Mark {forward,backward}_fill as length_preserving (#25352)
Minor improvement to internal is_pycapsule utility function (#25073)
Nested dtypes in streaming first_non_null/last_non_null (#25375)
Nested dtypes in streaming first/last (#25298)
Panic exception when calling Expr.rolling in .over (#25283)
Panic in group_by_dynamic with group_by and multiple chunks (#25075)
Parquet is_in for mixed validity pages (#25313)
Prevent panic when joining sorted LazyFrame with itself (#25453)
Raise error for all/any on list instead of panic (#25018)
Raise error on out-of-range dates in temporal operations (#25471)
Remove Expr casts in pl.lit invocations (#25373)
Resolve edge-case with SQL aggregates that have the same name as one of the "GROUP BY" keys (#25362)
Return the correct string-case Expr reprs (#25101)
Reverse on chunked struct (#25281)
Revert pl.format behavior with nulls (#25370)
Rolling mean/median for temporals (#25512)
Run async DB queries with regular asyncio if not inside a running loop (#25268)
SQL "NATURAL" joins should coalesce the key columns (#25353)
Schema mismatch with list.agg, unique and scalar (#25348)
Solve multiple issues relating to arena mutation in SQL subqueries (#25110)
Strict conversion AnyValue to Struct (#25536)
Support "index" as column name in group_by iterator (#25138)
Support AggregatedList in list.{eval,agg} context (#25385)
The SQL interface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091)
Unique key names in streaming sort/top_k (#25082)
Unique on literal in aggregation context (#25359)
Use (i64, u64) for VisualizationData (offset, length) slices (#25203)
Use Cargo.template.toml to prevent git dependencies from using template (#25392)
Validate list.slice parameters are not list...

Contributors

wtn, orlp, and 30 other contributors

Assets 2

09 Nov 13:20

github-actions

py-1.35.2

c5f0f25

Python Polars 1.35.2

Fix incorrect drop_nans() result when used in group_by() / over() (#25146)
Fix handling Null dtype in ApplyExpr on group_by(#25077)
Fix assertion panic on group_by (#25179)
Fix Wide-table join performance regression (#25222)

Thank you to all our contributors for making this release possible!
@coastalwhite, @kdn36, @nameexhaustion and @ritchie46

Contributors

ritchie46, coastalwhite, and 2 other contributors

Assets 2

03 Nov 15:18

github-actions

rs-0.52.0

ed23bd6

Rust Polars 0.52.0

🏆 Highlights

Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)

🚀 Performance improvements

Lazy gather for {forward,backward}_fill in group-by contexts (#25115)
Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
Skip filtering scan IR if no paths were filtered (#25037)
Optimize ipc stream read performance (#24671)
Bump foldhash to 0.2.0 and hashbrown to 0.16.0 (#25014)
Lower unique to native group-by and speed up n_unique in group-by context (#24976)
Better parallelize take{_slice,}_unchecked (#24980)
Implement native skew and kurtosis in group-by context (#24961)
Use native group-by aggregations for bitwise_* operations (#24935)
Address group_by_dynamic slowness in sparse data (#24916)
Native filter/drop_nulls/drop_nans in group-by context (#24897)
Implement cumulative_eval using the group-by engine (#24889)
Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
Implement native null_count, any and all group-by aggregations (#24859)
Speed up reverse in group-by context (#24855)
Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
Don't check duplicates on streaming simple projection in release mode (#24830)
Lower approx_n_unique to the streaming engine (#24821)
Duration/interval string parsing optimisation (2-5x faster) (#24771)
Use native reducer for first/last on Decimals, Categoricals and Enums (#24786)
Implement indexed method for BitMapIter::nth (#24766)
Pushdown slices on plans within unions (#24735)
Optimize gather_every(n=1) to slice (#24704)
Lower null count to streaming engine (#24703)
Native streaming gather_every (#24700)
Pushdown filter with strptime if input is literal (#24694)
Avoid copying expanded paths (#24669)
Relax filter expr ordering (#24662)
Remove unnecessary groups call in aggregated (#24651)
Skip files in scan_iceberg with filter based on metadata statistics (#24547)
Push row_index predicate for all scan types (#24537)
Perform integer in-filtering for Parquet inequality predicates (#24525)
Stop caching Parquet metadata after 8 files (#24513)

✨ Enhancements

Improve error message on unsupported SQL subquery comparisons (#25135)
Rewrite IR::Scan to IR::DataFrameScan in expand_datasets when applicable (#25106)
Support ewm_var/std in streaming engine (#25109)
Make DSL-hash skippable (#25140)
Streaming {Expr,LazyFrame}.rolling (#25058)
Set polars/<version> user-agent (#25112)
Add BIT_NOT support to the SQL interface (#25094)
Support BYTE_ARRAY backed Decimals in Parquet (#25076)
Add allow_empty flag to item (#25048)
Support ewm_mean() in streaming engine (#25003)
Improve row-count estimates (#24996)
Remove filtered scan paths in IR when possible (#24974)
Introduce remote Polars MCP server (#24977)
Allow local scans on polars cloud (configurable) (#24962)
Add Expr.item to strictly extract a single value from an expression (#24888)
Add environment variable to roundtrip empty struct in Parquet (#24914)
Add glob parameter to scan_ipc (#24898)
Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
Add list.agg and arr.agg (#24790)
Implement {Expr,Series}.rolling_rank() (#24776)
Support MergeSorted in CSPE (#24805)
Duration/interval string parsing optimisation (2-5x faster) (#24771)
Recursively apply CSPE (#24798)
Add streaming engine per-node metrics (#24788)
Add arr.eval (#24472)
Improve rolling_(sum|mean) accuracy (#24743)
Add nth_set_bit_u64() with unit test (#24035)
Add separator to {Data,Lazy}Frame.unnest (#24716)
Add union() function for unordered concatenation (#24298)
Add name.replace to the set of column rename options (#17942)
Allow duration strings with leading "+" (#24737)
Drop now-unnecessary post-init "schema_overrides" cast on DataFrame load from list of dicts (#24739)
Add support for UInt128 to pyo3-polars (#24731)
Implement maintain_order for cross join (#24665)
Add support to output dt.total_{}() duration values as fractionals (#24598)
Support scanning from file:/path URIs (#24603)
Log which file the schema was sourced from, and which file caused an extra column error (#24621)
Add LazyFrame.{sink,collect}_batches (#23980)
Deterministic import order for Python Polars package variants (#24531)
Add support to display lazy query plan in marimo notebooks without needing to install matplotlib or mermaid (#24540)
Add unstable hidden_file_prefix parameter to scan_parquet (#24507)
Use fixed-scale Decimals (#24542)
Add support for unsigned 128-bit integers (#24346)

🐞 Bug fixes

Fix CSV select(len()) off by 1 with comment prefix (#25069)
Fix incorrect reshape on sliced lists (#25139)
Support "index" as column name in group_by iterator (#25138)
DSL_SCHEMA_HASH should not changed by line endings (#25123)
Solve multiple issues relating to arena mutation in SQL subqueries (#25110)
Fix panic in dt.truncate for invalid duration strings (#25124)
Don't trigger DeprecationWarning from SQL "IN" constraints that use subqueries (#25111)
Return the correct string-case Expr reprs (#25101)
Fix groups update on slices with different offsets (#25097)
Fix handling Null dtype in ApplyExpr on group_by (#25077)
Raise error for all/any on list instead of panic (#25018)
Unique key names in streaming sort/top_k (#25082)
The SQL interface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091)
Fix panic if scan predicate produces 0 length mask (#25089)
Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
Panic in group_by_dynamic with group_by and multiple chunks (#25075)
Fix panic when using struct field as join key (#25059)
Allow broadcast in group_by for ApplyExpr and BinaryExpr (#25053)
Fix field metadata for nested categorical PyCapsule export (#25052)
Block predicate pushdown when group_by key values are changed (#25032)
Group-By aggregation problems caused by AmortSeries (#25043)
Don't push down predicates passed inserted cache nodes (#25042)
Allow for negative time in group_by_dynamic iterator (#25041)
Re-enable CPU feature check before import (#25010)
Correctness any(ignore_nulls) and OOB in all (#25005)
Streaming any/all with ignore_nulls=False (#25008)
Fix incorrect join_asof on a casted expression (#25006)
Optimize memory on rolling groups in ApplyExpr (#24709)
Fallback Pyarrow scan to in-memory engine (#24991)
Make Operator::swap_operands return correct operators for Plus, Minus, Multiply and Divide (#24997)
Capitalize letters after numbers in to_titlecase (#24993)
Preserve null values in pct_change (#24952)
Raise length mismatch on over with sliced groups (#24887)
Check duplicate name in transpose (#24956)
Follow Kleene logic in any / all for group-by (#24940)
Do not optimize cross join to iejoin if order maintaining (#24950)
Broadcast partition_by columns in over expression (#24874)
Clear index cache on stacked df.filter expressions (#24870)
Fix 'explode' mapping strategy on scalar value (#24861)
Fix repeated with_row_index() after scan() silently ignored (#24866)
Correctly return min and max for enums in groupby aggregation (#24808)
Refactor BinaryExpr in group_by dispatch logic (#24548)
Fix aggstate for gather (#24857)
Keep scalars for length preserving functions in group_by (#24819)
Have range feature depend on dtype-array feature (#24853)
Fix duplicate select panic (#24836)
Inconsistency of list.sum() result type with None values (#24476)
Division by zero in Expr.dt.truncate (#24832)
Potential deadlock in __arrow_c_stream__ (#24831)
Allow double aggregations in group-by contexts (#24823)
Series.shrink_dtype for i128/u128 (#24833)
Fix dtype in EvalExpr (#24650)
Allow aggregations on AggState::LiteralScalar (#24820)
Dispatch to group_aware for fallible expressions with masked out elements (#24815)
Fix error for arr.sum() on small integer Array dtypes containing nulls (#24478)
Fix XOR did not follow kleene when one side is unit-length (#24810)
Incorrect precision in Series.str.to_decimal (#24804)
Use overlapping instead of rolling (#24787)
Fix iterable on dynamic_group_by and rolling object (#24740)
Use Kahan summation for in-memory groupby sum/mean (#24774)
Release GIL in PythonScan predicate evaluation (#24779)
Type error in bitmask::nth_set_bit_u64 (#24775)
Add Expr.sign for Decimal datatype (#24717)
Correct str.replace with missing pattern (#24768)
Support decimal_comma on Decimal type in write_csv (#24718)
Parse Decimal with comma as decimal separator in CSV (#24685)
Make Categories pickleable (#24691)
Shift on array within list (#24678)
Fix handling of AggregatedScalar in ApplyExpr single input (#24634)
Support reading of mixed compressed/uncompressed IPC buffers (#24674)
Overflow in slice-slice optimization (#24658)
Package discovery for setuptools (#24656)
Add type assertion to prevent out-of-bounds in GenericFirstLastGroupedReduction (#24590)
Remove inclusion of polars dir in runtime sdist/wheel (#24654)
Method dt.month_end was unnecessarily raising when the month-start timestamp was ambiguous (#24647)
Fix unsupported arrow type Dictionary error in scan_iceberg() (#24573)
Raise Exception instead of panic when unnest on non-struct column (#24471)
Include missing feature dependency from polars-stream/diff to `polars-pla...

Contributors

orlp, dsprenkels, and 38 other contributors

Assets 2

30 Oct 12:13

github-actions

py-1.35.1

a99ad34

Python Polars 1.35.1

🚀 Performance improvements

Don't recompute full rolling moment window when NaNs/nulls leave the window (#25078)
Skip filtering scan IR if no paths were filtered (#25037)
Optimize ipc stream read performance (#24671)

✨ Enhancements

Support BYTE_ARRAY backed Decimals in Parquet (#25076)
Allow glimpse to return a DataFrame (#24803)
Add allow_empty flag to item (#25048)

🐞 Bug fixes

The SQL interface should use logical, not bitwise, behaviour for unary "NOT" operator (#25091)
Fix panic if scan predicate produces 0 length mask (#25089)
Ensure SQL table alias resolution checks against CTE aliases on fallback (#25071)
Panic in group_by_dynamic with group_by and multiple chunks (#25075)
Minor improvement to internal is_pycapsule utility function (#25073)
Fix panic when using struct field as join key (#25059)
Allow broadcast in group_by for ApplyExpr and BinaryExpr (#25053)
Fix field metadata for nested categorical PyCapsule export (#25052)
Block predicate pushdown when group_by key values are changed (#25032)
Group-By aggregation problems caused by AmortSeries (#25043)
Don't push down predicates passed inserted cache nodes (#25042)
Allow for negative time in group_by_dynamic iterator (#25041)

📖 Documentation

Fix typo in public dataset URL (#25044)

🛠️ Other improvements

Disable recursive CSPE for now (#25085)
Change group length mismatch error to ShapeError (#25004)
Update toolchain (#25007)

Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @Liyixin95, @alexander-beedie, @coastalwhite, @kdn36, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @stijnherfst

Contributors

orlp, alexander-beedie, and 8 other contributors

Assets 2

26 Oct 20:05

github-actions

py-1.35.0

e9fce55

Python Polars 1.35.0

🏆 Highlights

Stabilize decimal (#25020)

🚀 Performance improvements

Bump foldhash to 0.2.0 and hashbrown to 0.16.0 (#25014)
Lower unique to native group-by and speed up n_unique in group-by context (#24976)
Better parallelize take{_slice,}_unchecked (#24980)
Implement native skew and kurtosis in group-by context (#24961)
Use native group-by aggregations for bitwise_* operations (#24935)
Address group_by_dynamic slowness in sparse data (#24916)
Push filters to PyIceberg (#24910)
Native filter/drop_nulls/drop_nans in group-by context (#24897)
Implement cumulative_eval using the group-by engine (#24889)
Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
Implement native null_count, any and all group-by aggregations (#24859)
Speed up reverse in group-by context (#24855)
Prune unused categorical values when exporting to arrow/parquet/IPC/pickle (#24829)
Don't check duplicates on streaming simple projection in release mode (#24830)
Lower approx_n_unique to the streaming engine (#24821)
Duration/interval string parsing optimisation (2-5x faster) (#24771)
Use native reducer for first/last on Decimals, Categoricals and Enums (#24786)
Implement indexed method for BitMapIter::nth (#24766)
Pushdown slices on plans within unions (#24735)

✨ Enhancements

Stabilize decimal (#25020)
Support ewm_mean() in streaming engine (#25003)
Improve row-count estimates (#24996)
Remove filtered scan paths in IR when possible (#24974)
Introduce remote Polars MCP server (#24977)
Allow local scans on polars cloud (configurable) (#24962)
Add Expr.item to strictly extract a single value from an expression (#24888)
Add environment variable to roundtrip empty struct in Parquet (#24914)
Fast-count for scan_iceberg().select(len()) (#24602)
Add glob parameter to scan_ipc (#24898)
Prevent generation of copies of Dataframes in DslPlan serialization (#24852)
Add list.agg and arr.agg (#24790)
Implement {Expr,Series}.rolling_rank() (#24776)
Don't require PyArrow for read_database_uri if ADBC engine version supports PyCapsule interface (#24029)
Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
Support MergeSorted in CSPE (#24805)
Duration/interval string parsing optimisation (2-5x faster) (#24771)
Recursively apply CSPE (#24798)
Add streaming engine per-node metrics (#24788)
Add arr.eval (#24472)
Drop PyArrow requirement for non-batched usage of read_database with the ADBC engine and support iter_batches with the ADBC engine (#24180)
Improve rolling_(sum|mean) accuracy (#24743)
Add separator to {Data,Lazy}Frame.unnest (#24716)
Add union() function for unordered concatenation (#24298)
Add name.replace to the set of column rename options (#17942)
Support np.ndarray -> AnyValue conversion (#24748)
Allow duration strings with leading "+" (#24737)
Drop now-unnecessary post-init "schema_overrides" cast on DataFrame load from list of dicts (#24739)
Add support for UInt128 to pyo3-polars (#24731)

🐞 Bug fixes

Re-enable CPU feature check before import (#25010)
Implement read_excel workaround for fastexcel/calamine issue loading a column subset from a named table (#25012)
Correctness any(ignore_nulls) and OOB in all (#25005)
Streaming any/all with ignore_nulls=False (#25008)
Fix incorrect join_asof on a casted expression (#25006)
Optimize memory on rolling groups in ApplyExpr (#24709)
Fallback Pyarrow scan to in-memory engine (#24991)
Make Operator::swap_operands return correct operators for Plus, Minus, Multiply and Divide (#24997)
Capitalize letters after numbers in to_titlecase (#24993)
Preserve null values in pct_change (#24952)
Raise length mismatch on over with sliced groups (#24887)
Check duplicate name in transpose (#24956)
Follow Kleene logic in any / all for group-by (#24940)
Do not optimize cross join to iejoin if order maintaining (#24950)
Fix typing of scan_parquet partially unknown (#24928)
Properly release the GIL for read_parquet_metadata (#24922)
Broadcast partition_by columns in over expression (#24874)
Clear index cache on stacked df.filter expressions (#24870)
Fix 'explode' mapping strategy on scalar value (#24861)
Fix repeated with_row_index() after scan() silently ignored (#24866)
Correctly return min and max for enums in groupby aggregation (#24808)
Refactor BinaryExpr in group_by dispatch logic (#24548)
Fix aggstate for gather (#24857)
Keep scalars for length preserving functions in group_by (#24819)
Have range feature depend on dtype-array feature (#24853)
Fix duplicate select panic (#24836)
Inconsistency of list.sum() result type with None values (#24476)
Division by zero in Expr.dt.truncate (#24832)
Potential deadlock in __arrow_c_stream__ (#24831)
Allow double aggregations in group-by contexts (#24823)
Series.shrink_dtype for i128/u128 (#24833)
Fix dtype in EvalExpr (#24650)
Allow aggregations on AggState::LiteralScalar (#24820)
Dispatch to group_aware for fallible expressions with masked out elements (#24815)
Fix error for arr.sum() on small integer Array dtypes containing nulls (#24478)
Fix regression on write_database() to Snowflake due to unsupported string view type (#24622)
Fix XOR did not follow kleene when one side is unit-length (#24810)
Make Series init consistent with DataFrame init for string values declared with temporal dtype (#24785)
Incorrect precision in Series.str.to_decimal (#24804)
Use overlapping instead of rolling (#24787)
Fix iterable on dynamic_group_by and rolling object (#24740)
Use Kahan summation for in-memory groupby sum/mean (#24774)
Release GIL in PythonScan predicate evaluation (#24779)
Type error in bitmask::nth_set_bit_u64 (#24775)
Add Expr.sign for Decimal datatype (#24717)
Correct str.replace with missing pattern (#24768)
Ensure schema_overrides is respected when loading iterable row data (#24721)
Support decimal_comma on Decimal type in write_csv (#24718)

📖 Documentation

Introduce remote Polars MCP server (#24977)
Add {arr,list}.agg API references (#24970)
Support LLM in docs (#24958)
Update Cloud docs with correct fn argument order (#24939)
Update name.replace examples (#24941)
Add i128 and u128 features to user guide (#24938)
Add partitioning examples for sink_* methods (#24918)
Add more {unique,value}_counts examples (#24927)
Indent the versionchanged (#24783)
Relax fsspec wording (#24881)
Add pl.field into the api docs (#24846)
Fix duplicated article in SECURITY.md (#24762)
Document output name determination in when/then/otherwise (#24746)
Specify that precision=None becomes 38 for Decimal (#24742)
Mention polars[rt64] and polars[rtcompat] instead of u64-idx and lts-cpu (#24749)
Fix source mapping (#24736)

📦 Build system

Ensure build_feature_flags.py is included in artifact (#25024)
Update pyo3 and numpy crates to version 0.26 (#24760)

🛠️ Other improvements

Fix benchmark ci (#25019)
Fix non-deterministic test (#25009)
Fix makefile arch detection (#25011)
Make LazyFrame.set_sorted into a FunctionIR::Hint (#24981)
Remove symbolic links (#24982)
Deprecate Expr.agg_groups() and pl.groups() (#24919)
Dispatch to no-op rayon thread-pool from streaming (#24957)
Unpin pydantic (#24955)
Ensure safety of scan fast-count IR lowering in streaming (#24953)
Re-use iterators in set_ operations (#24850)
Remove GroupByPartitioned and dispatch to streaming engine (#24903)
Turn element() into {A,}Expr::Element (#24885)
Pass ScanOptions to new_from_ipc (#24893)
Update tests to be index type agnostic (#24891)
Unset Context in Window expression (#24875)
Fix failing delta test (#24867)
Move FunctionExpr dispatch from plan to expr (#24839)
Fix SQL test giving wrong error message (#24835)
Consolidate dtype paths in ApplyExpr (#24825)
Add days_in_month to documentation (#24822)
Enable ruff D417 lint (#24814)
Turn pl.format into proper elementwise expression (#24811)
Fix remote benchmark by no-longer saving builds (#24812)
Refactor ApplyExpr in group_by context on multiple inputs (#24520)
IR text plan graph generator (#24733)
Temporarily pin pydantic to fix CI (#24797)
Extend and rename rolling groups to overlapping (#24577)
Refactor DataType proptest strategies (#24763)
Add union to documentation (#24769)

Thank you to all our contributors for making this release possible!
@EndPositive, @EnricoMi, @JakubValtar, @Kevin-Patyk, @MarcoGorelli, @Object905, @alexander-beedie, @borchero, @carnarez, @cmdlineluser, @coastalwhite, @craigalodon, @dsprenkels, @eitsupi, @etrotta, @henryharbeck, @jordanosborn, @kdn36, @math-hiyoko, @mjanssen, @nameexhaustion, @orlp, @pavelzw, @r-brink, @ritchie46, @thomasjpfan and @williambdean

Contributors

orlp, dsprenkels, and 25 other contributors

Assets 2

Releases: pola-rs/polars

Python Polars 1.38.0

⚠️ Deprecations

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.37.1

🚀 Performance improvements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.37.0

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.36.1

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.36.0

✨ Enhancements

🚀 Performance improvements

🐞 Bug fixes

Contributors

Uh oh!

Python Polars 1.36.0-beta.2

🏆 Highlights

✨ Enhancements

🚀 Performance improvements

🐞 Bug fixes

Contributors

Uh oh!

Python Polars 1.35.2

Contributors

Uh oh!

Rust Polars 0.52.0

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

Contributors

Uh oh!

Python Polars 1.35.1

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.35.0

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!