docs: update user-defined-functions for 0.19.x #13071

MarcoGorelli · 2023-12-16T11:53:35Z

I'd like to make a separate PR to do a bit of an overhaul of this page, as it's quite complex and am not too keen on starting off with a gotcha...but that's for a separate PR, first I'd suggest just updating the syntax

The Rust examples are similar to the Python ones, but don't actually match 100% (this is already the case, regardless of this PR) - for example, the Python dataframe has i64 values, whereas the Rust one i32. No big deal, just something I noticed

this is how the rewritten part in the middle looks like now:

Preview (click here)

MarcoGorelli · 2023-12-16T11:56:00Z

docs/src/python/user-guide/expressions/user-defined-functions.py

+print(df)
+# --8<-- [end:dataframe]

+# --8<-- [start:shift_map_batches]
 out = df.group_by("keys", maintain_order=True).agg(
-    pl.col("values").map_batches(lambda s: s.shift()).alias("shift_map"),
+    pl.col("values").map_batches(lambda s: s.shift()).alias("shift_map_batches"),
    pl.col("values").shift().alias("shift_expression"),
 )
-print(df)
-# --8<-- [end:dataframe]
+print(out)
+# --8<-- [end:shift_map_batches]


currently, this snippet creates df, then creates out, then prints df. But out is never used - instead, in the .md file, the output of out is hard-coded.

I'm suggesting to, instead, split the snippet into two:

create df, and show it

create out, and show it, without hard-coding any output

Sounds good.

MarcoGorelli · 2023-12-16T20:19:41Z

docs/user-guide/expressions/user-defined-functions.md

 achieve the same goals.

 ### Adding a counter

 In this example we create a global `counter` and then add the integer `1` to the global state at every element processed.
 Every iteration the result of the increment will be added to the element value.

-> Note, this example isn't provided in Rust. The reason is that the global `counter` value would lead to data races when this apply is evaluated in parallel. It would be possible to wrap it in a `Mutex` to protect the variable, but that would be obscuring the point of the example. This is a case where the Python Global Interpreter Lock's performance tradeoff provides some safety guarantees.
+> Note, this example isn't provided in Rust. The reason is that the global `counter` value would lead to data races when this `apply` is evaluated in parallel. It would be possible to wrap it in a `Mutex` to protect the variable, but that would be obscuring the point of the example. This is a case where the Python Global Interpreter Lock's performance tradeoff provides some safety guarantees.


keeping this one as apply because that's still the name on the Rust side

MarcoGorelli · 2023-12-16T21:26:13Z

docs/user-guide/expressions/user-defined-functions.md

@@ -45,9 +41,9 @@ df.with_columns([
 ])
 ```

-Use cases for `map` in the `group_by` context are slim. They are only used for performance reasons, but can quite easily lead to incorrect results. Let me explain why.
+Use cases for `map_batches` in the `group_by` context are slim. They are only used for performance reasons, but can quite easily lead to incorrect results. Let me explain why.


to be honest I don't really understand this phrase to begin with - what are the performance reasons to use map_batch? Or is that only on the Rust side, referring to map?

You could do an elementwise operations with map batches. E.g. lambda x * 2 would be correct in both.

MarcoGorelli · 2023-12-17T08:41:44Z

docs/user-guide/expressions/user-defined-functions.md

-{{code_block('user-guide/expressions/user-defined-functions','dataframe',['map'])}}
+{{code_block('user-guide/expressions/user-defined-functions','dataframe',[])}}


now this snippet just creates a dataframe, so I've removed the map reference and put it in the next snippet (as map_batches)

MarcoGorelli · 2023-12-17T09:07:47Z

docs/user-guide/expressions/user-defined-functions.md

+=== ":fontawesome-brands-python: Python"
+[:material-api: `map_elements`](https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.map_elements.html)

-{{code_block('user-guide/expressions/user-defined-functions','apply',['apply'])}}
+{{code_block('user-guide/expressions/user-defined-functions','map_elements',[])}}


map_elements (and map_batches) don't appear in the Rust API docs

To avoid warnings in the docs build, I've just added the Python-only link to map_elements above

ritchie46 · 2023-12-17T14:34:52Z

docs/src/python/user-guide/expressions/user-defined-functions.py

 out = df.group_by("keys", maintain_order=True).agg(
-    pl.col("values").map_batches(lambda s: s.shift()).alias("shift_map"),
+    pl.col("values").map_batches(lambda s: s.shift()).alias("shift_map_batches"),


Shouldn't this be map_elements? A shift here would be incorrect? (Don't read the context).

I think the purpose of this section is to show how using map_batches within group_by leads to incorrect (or at least, unexpected) results

So although this should be map_elements, the way it's written it:

shows that the "wrong" one (map_batches) gives unexpected results

shows that the "correct" one (map_elements) gives the expected results

Right! Reviewing lost snippets is hard. 😅

github-actions bot added documentation Improvements or additions to documentation python Related to Python Polars rust Related to Rust Polars labels Dec 16, 2023

MarcoGorelli commented Dec 16, 2023

View reviewed changes

MarcoGorelli force-pushed the update-udf-docs branch 2 times, most recently from 557fe67 to 4c045c1 Compare December 16, 2023 12:34

MarcoGorelli mentioned this pull request Dec 16, 2023

feat(rust!): rename map to map_batches and appy to map_elements #13075

Closed

docs: update user-defined-functions for 0.19.x

356bd52

MarcoGorelli force-pushed the update-udf-docs branch 3 times, most recently from c21cb76 to 356bd52 Compare December 16, 2023 20:18

MarcoGorelli commented Dec 16, 2023

View reviewed changes

MarcoGorelli commented Dec 17, 2023

View reviewed changes

MarcoGorelli marked this pull request as ready for review December 17, 2023 08:42

MarcoGorelli requested review from ritchie46, c-peters and stinodego as code owners December 17, 2023 08:42

MarcoGorelli marked this pull request as draft December 17, 2023 08:46

avoid invalid references in Rust docs

baef62d

MarcoGorelli commented Dec 17, 2023

View reviewed changes

MarcoGorelli marked this pull request as ready for review December 17, 2023 09:08

ritchie46 reviewed Dec 17, 2023

View reviewed changes

ritchie46 approved these changes Dec 17, 2023

View reviewed changes

ritchie46 merged commit d21713a into pola-rs:main Dec 17, 2023
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update user-defined-functions for 0.19.x #13071

docs: update user-defined-functions for 0.19.x #13071

MarcoGorelli commented Dec 16, 2023 •

edited by stinodego

Loading

MarcoGorelli Dec 16, 2023

ritchie46 Dec 17, 2023

MarcoGorelli Dec 16, 2023

MarcoGorelli Dec 16, 2023

ritchie46 Dec 17, 2023

MarcoGorelli Dec 17, 2023

MarcoGorelli Dec 17, 2023

ritchie46 Dec 17, 2023

MarcoGorelli Dec 17, 2023 •

edited

Loading

ritchie46 Dec 17, 2023 •

edited

Loading

		{{code_block('user-guide/expressions/user-defined-functions','dataframe',['map'])}}
		{{code_block('user-guide/expressions/user-defined-functions','dataframe',[])}}

docs: update user-defined-functions for 0.19.x #13071

docs: update user-defined-functions for 0.19.x #13071

Conversation

MarcoGorelli commented Dec 16, 2023 • edited by stinodego Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli Dec 17, 2023 • edited Loading

Choose a reason for hiding this comment

ritchie46 Dec 17, 2023 • edited Loading

Choose a reason for hiding this comment

MarcoGorelli commented Dec 16, 2023 •

edited by stinodego

Loading

MarcoGorelli Dec 17, 2023 •

edited

Loading

ritchie46 Dec 17, 2023 •

edited

Loading