Merge branch 'main' into docs-blog-pointblank-intro

posit-dev · Jan 31, 2025 · 11582a3 · 11582a3
2 parents b3a0230 + c983857
commit 11582a3
Show file tree

Hide file tree

Showing 7 changed files with 140 additions and 11 deletions.
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -0,0 +1 @@
+*  @rich-iannone @machow
diff --git a/.github/workflows/ci-tests.yaml b/.github/workflows/ci-tests.yaml
@@ -103,6 +103,7 @@ jobs:
   release-pypi:
     name: "Release to pypi"
     runs-on: ubuntu-latest
+    environment: deploy-pypi
     needs: [build, test-pandas]
     if: github.event_name == 'release'
     steps:

diff --git a/docs/blog/locbody-mask/index.qmd b/docs/blog/locbody-mask/index.qmd
@@ -0,0 +1,107 @@
+---
+title: "Style Table Body with `mask=` in `loc.body()`"
+html-table-processing: none
+author: Jerry Wu
+date: 2025-01-24
+freeze: true
+jupyter: python3
+format:
+  html:
+    code-summary: "Show the Code"
+---
+
+In Great Tables `0.16.0`, we introduced the `mask=` parameter in `loc.body()`, enabling users to apply conditional styling to rows on a per-column basis more efficiently when working with a Polars DataFrame. This post will demonstrate how it works and compare it with the "old-fashioned" approach:
+
+* **Leveraging the `mask=` parameter in `loc.body()`:** Use Polars expressions for streamlined styling.
+* **Utilizing the `locations=` parameter in `GT.tab_style()`:** Pass a list of `loc.body()` objects.
+
+Let’s dive in.
+
+### Preparations
+We'll use the built-in dataset `gtcars` to create a Polars DataFrame. Next, we'll select the columns `mfr`, `drivetrain`, `year`, and `hp` to create a small pivoted table named `df_mini`. Finally, we'll pass `df_mini` to the `GT` object to create a table named `gt`, using `drivetrain` as the `rowname_col=` and `mfr` as the `groupname_col=`, as shown below:
+```{python}
+# | code-fold: true
+import polars as pl
+from great_tables import GT, loc, style
+from great_tables.data import gtcars
+from polars import selectors as cs
+
+year_cols = ["2014", "2015", "2016", "2017"]
+df_mini = (
+    pl.from_pandas(gtcars)
+    .filter(pl.col("mfr").is_in(["Ferrari", "Lamborghini", "BMW"]))
+    .sort("drivetrain")
+    .pivot(on="year", index=["mfr", "drivetrain"], values="hp", aggregate_function="mean")
+    .select(["mfr", "drivetrain", *year_cols])
+)
+
+gt = GT(df_mini).tab_stub(rowname_col="drivetrain", groupname_col="mfr").opt_stylize(color="cyan")
+gt
+```
+
+The numbers in the cells represent the average horsepower for each combination of `mfr` and `drivetrain` for a specific year.
+
+### Leveraging the `mask=` parameter in `loc.body()`
+The `mask=` parameter in `loc.body()` accepts a Polars expression that evaluates to a boolean result for each cell.
+
+Here’s how we can use it to achieve the two goals:
+
+* Highlight the cell text in red if the column datatype is numerical and the cell value exceeds 650.
+* Fill the background color as lightgrey if the cell value is missing in the last two columns (`2016` and `2017`).
+
+```{python}
+(
+    gt.tab_style(
+        style=style.text(color="red"),
+        locations=loc.body(mask=cs.numeric().gt(650))
+    ).tab_style(
+        style=style.fill(color="lightgrey"),
+        locations=loc.body(mask=pl.nth(-2, -1).is_null()),
+    )
+)
+```
+
+In this example:
+
+* `cs.numeric()` targets numerical columns, and `.gt(650)` checks if the cell value is greater than 650.
+* `pl.nth(-2, -1)` targets the last two columns, and `.is_null()` identifies missing values.
+
+Did you notice that we can use Polars selectors and expressions to dynamically identify columns at runtime? This is definitely a killer feature when working with pivoted operations.
+
+The `mask=` parameter acts as a syntactic sugar, streamlining the process and removing the need to loop through columns manually.
+
+::: {.callout-warning collapse="false"}
+## Using `mask=` Independently
+`mask=` should not be used in combination with the `columns` or `rows` arguments. Attempting to do so will raise a `ValueError`.
+:::
+
+### Utilizing the `locations=` parameter in `GT.tab_style()`
+A more "old-fashioned" approach involves passing a list of `loc.body()` objects to the `locations=` parameter in `GT.tab_style()`:
+```{python}
+# | eval: false
+(
+    gt.tab_style(
+        style=style.text(color="red"),
+        locations=[loc.body(columns=col, rows=pl.col(col).gt(650))
+                   for col in year_cols],
+    ).tab_style(
+        style=style.fill(color="lightgrey"),
+        locations=[loc.body(columns=col, rows=pl.col(col).is_null())
+                   for col in year_cols[-2:]],
+    )
+)
+```
+
+This approach, though functional, demands additional effort:
+
+* Explicitly preparing the column names in advance.
+* Specifying the `columns=` and `rows=` arguments for each `loc.body()` in the loop.
+
+While effective, it is less efficient and more verbose compared to the first approach.
+
+### Wrapping up
+With the introduction of the `mask=` parameter in `loc.body()`, users can now style the table body in a more vectorized-like manner, akin to using `df.apply()` in Pandas, enhancing the overall user experience.
+
+We extend our gratitude to [@igorcalabria](https://github.com/igorcalabria) for suggesting this feature in [#389](https://github.com/posit-dev/great-tables/issues/389) and providing an insightful explanation of its utility. A special thanks to [@henryharbeck](https://github.com/henryharbeck) for providing the second approach.
+
+We hope you enjoy this new functionality as much as we do! Have ideas to make Great Tables even better? Share them with us via [GitHub Issues](https://github.com/posit-dev/great-tables/issues). We're always amazed by the creativity of our users! See you, until the next great table.
diff --git a/docs/get-started/basic-styling.qmd b/docs/get-started/basic-styling.qmd
@@ -228,6 +228,17 @@ gt_pl_air.tab_style(
 )
 ```
 
+Lastly, you can use **Polars** selectors or expressions to conditionally select rows on a per-column basis.
+```{python}
+import polars.selectors as cs
+
+gt_pl_air.tab_style(
+    style=style.fill(color="yellow"),
+    locations=loc.body(mask=cs.all().eq(cs.all().max())),
+)
+```
+
+
 ## Learning more
 
 * API Docs:

diff --git a/docs/get-started/loc-selection.qmd b/docs/get-started/loc-selection.qmd
@@ -84,10 +84,9 @@ For example, `loc.header()` includes both `loc.title()` and `loc.subtitle()`.
 )
 ```
 
-## Body columns and rows
-
-Use `loc.body()` to style specific cells in the table body.
+## Body columns, rows and mask
 
+Use `columns=` and `rows=` in `loc.body()` to style specific cells in the table body.
 ```{python}
 (
     GT(pl_exibble).tab_style(
@@ -100,6 +99,16 @@ Use `loc.body()` to style specific cells in the table body.
 )
 ```
 
+Alternatively, use `mask=` in `loc.body()` to apply conditional styling to rows on a per-column basis.
+```{python}
+(
+    GT(pl_exibble).tab_style(
+        style.fill("yellow"),
+        loc.body(mask=cs.string().str.contains("p")),
+    )
+)
+```
+
 This is discussed in detail in [Styling the Table Body](./basic-styling.qmd).
 
 ## Column labels

diff --git a/great_tables/_locations.py b/great_tables/_locations.py
@@ -494,6 +494,10 @@ class LocBody(Loc):
     custom styling with the [`tab_style()`](`great_tables.GT.tab_style`) method. That method has a
     `locations=` argument and this class should be used there to perform the targeting.
 
+    :::{.callout-warning}
+    `mask=` is still experimental.
+    :::
+
     Parameters
     ----------
     columns
@@ -507,10 +511,6 @@ class LocBody(Loc):
         you can pass a Polars expression for cell-based selection. This argument must be used
         exclusively and cannot be combined with the `columns=` or `rows=` arguments.
 
-    :::{.callout-warning}
-    `mask=` is still experimental.
-    :::
-
     Returns
     -------
     LocBody
@@ -864,8 +864,8 @@ def resolve_mask(
     if masked.height != frame.height:
         raise ValueError(
             "The DataFrame length after applying `mask` differs from the original."
-            "\n\n* Original length: {frame.height}"
-            "\n* Mask length: {masked.height}"
+            f"\n\n* Original length: {frame.height}"
+            f"\n* Mask length: {masked.height}"
         )
 
     cellpos_data: list[tuple[int, int, str]] = []  # column, row, colname for `CellPos`

diff --git a/great_tables/data/__init__.py b/great_tables/data/__init__.py
@@ -32,7 +32,7 @@
 _gtcars_dtype = {
     "mfr": "object",
     "model": "object",
-    "year": "float64",
+    "year": "int64",
     "trim": "object",
     "bdy_style": "object",
     "hp": "float64",
@@ -414,7 +414,7 @@
 Columns: 15
 $ mfr         <str> 'Ford', 'Ferrari', 'Ferrari'
 $ model       <str> 'GT', '458 Speciale', '458 Spider'
-$ year        <f64> 2017.0, 2015.0, 2015.0
+$ year        <i64> 2017, 2015, 2015
 $ trim        <str> 'Base Coupe', 'Base Coupe', 'Base'
 $ bdy_style   <str> 'coupe', 'coupe', 'convertible'
 $ hp          <f64> 647.0, 597.0, 562.0