docs: add post on use of Great Tables in Pointblank library #595

rich-iannone · 2025-01-31T21:33:15Z

This adds a blog post that describes how package maintainers can use Great Tables can be used to provide tabular reporting outputs. We demonstrate this by way of pointblank, a new Python package that returns GT objects as reporting artifacts.

codecov · 2025-01-31T21:34:54Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.71%. Comparing base (e538fbc) to head (20458df).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #595   +/-   ##
=======================================
  Coverage   90.71%   90.71%           
=======================================
  Files          46       46           
  Lines        5417     5417           
=======================================
  Hits         4914     4914           
  Misses        503      503

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jrycw

small typo :)

jrycw · 2025-02-11T01:23:37Z

docs/blog/pointblank-intro/index.qmd

+jupyter: python3
+---
+
+The Great Tables package allows you to make tables, and they're really great when part of a report, a book, or a web page. The API is meant to be easy to work with so DataFrames could be made into publication-qualty tables without a lot of hassle. And having nice-looking tables in the mix elevates the quality of the medium you're working in.


publication-quality?

jrycw · 2025-02-11T01:28:07Z

Pointblank looks amazing! I'm curious—could it potentially be integrated into the test suite for Great Tables?

machow · 2025-02-11T16:58:44Z

Pointblank looks amazing! I'm curious—could it potentially be integrated into the test suite for Great Tables?

Rich walked me through the narwhals CI, which tests some things in its downstream so we could always do something similar?!

https://github.com/narwhals-dev/narwhals/blob/main/.github/workflows/downstream_tests.yml

machow

This is looking great! I added some suggestions for setting up examples and directing readers' attention right after examples.

Thoughts that aren't critical

One thing I noticed is the term report is used 14 times, in these ways:

reporting objects
reporting tables
"the main reporting table"
"the reporting being a table"
"Report for validation step 1"
"the use of a table for reporting is..."
step report table
Great Tables makes sense for reporting

It's not clear to me what report means exactly here. What is a reporting object? I think the article is good as is, but it might be helpful to define this a bit in the future / tighten up usage. Maybe related might be just saying what job that reports do in this context (e.g. monitoring, diagnosing, documenting, reassuring?!)

When you mean something more specific than "report" I think you should use the more specific term. For example, we have the main table labeled as Validation Report in our controlled vocabulary on Miro. If that's the correct term, we should use that (or change it in miro).

machow · 2025-02-11T16:59:03Z

.github/workflows/ci-docs.yaml

@@ -14,12 +14,15 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
+      - name: Get tags


I think we can remove this now

machow · 2025-02-11T17:07:11Z

docs/blog/pointblank-intro/index.qmd

+Just like Great Tables, Pointblank's primary input is a table and the goal of that library is to perform checks of the tabular data. Other libraries in this domain include [Great Expectations](https://github.com/great-expectations/great_expectations), [pandera](https://github.com/unionai-oss/pandera), and [Soda](https://github.com/sodadata/soda-core?tab=readme-ov-file), and [PyDeequ](https://github.com/awslabs/python-deequ). Let's look at the main reporting table that users are likely to see quite often.
+


Prep'ing people for what they'll be seeing in example

Suggested change

Just like Great Tables, Pointblank's primary input is a table and the goal of that library is to perform checks of the tabular data. Other libraries in this domain include [Great Expectations](https://github.com/great-expectations/great_expectations), [pandera](https://github.com/unionai-oss/pandera), and [Soda](https://github.com/sodadata/soda-core?tab=readme-ov-file), and [PyDeequ](https://github.com/awslabs/python-deequ). Let's look at the main reporting table that users are likely to see quite often.

Just like Great Tables, Pointblank's primary input is a table and the goal of that library is to perform checks of the tabular data. Other libraries in this domain include [Great Expectations](https://github.com/great-expectations/great_expectations), [pandera](https://github.com/unionai-oss/pandera), [Soda](https://github.com/sodadata/soda-core?tab=readme-ov-file), and [PyDeequ](https://github.com/awslabs/python-deequ).

Below is the main validation report table that users are likely to see quite often. Each row is a validation step, with columns reporting details about each step and their results.

Thanks! Adding this in.

machow · 2025-02-11T17:12:44Z

docs/blog/pointblank-intro/index.qmd

+validation
+```
+
+The table is chock full of the information you need when doing data validation tasks. And it's also easy on the eyes. Some cool features include:


Directed attention at example

Suggested change

The table is chock full of the information you need when doing data validation tasks. And it's also easy on the eyes. Some cool features include:

The first validation step (`cols_val_gt()`) checks the `d` column in the data, to ensure each value is greater than `1000`. Notice that the red bar on the left indicates it failed, and the `FAIL` column says it has 6 failing values out of 13 `UNITS`.

The table is chock full of the information you need when doing data validation tasks. And it's also easy on the eyes. Some cool features include:

Thanks! This will be added in.

machow · 2025-02-11T17:14:01Z

docs/blog/pointblank-intro/index.qmd

+validation
+```
+
+Pointblank makes it easy to get started by giving you a simple entry point (`Validate()`), allowing you to define as many validation steps as needed.


Suggested change

Pointblank makes it easy to get started by giving you a simple entry point (`Validate()`), allowing you to define as many validation steps as needed.

Pointblank makes it easy to get started by giving you a simple entry point (`Validate()`), allowing you to define as many validation steps as needed. Each validation step is specified by calling methods like `.cols_vals_gt()`, which is short for checking that "column values are greater than" some specified value.

This is good. Using it!

machow · 2025-02-11T17:14:36Z

docs/blog/pointblank-intro/index.qmd

+
+### Exploring data validation failures
+
+Note that the above validation showed 6 failures in the first step. You might want to know exactly *what* failed, giving you a chance to fix the underlying data quality issues. To do that, you can use the `get_step_report()` method:


Suggested change

Note that the above validation showed 6 failures in the first step. You might want to know exactly *what* failed, giving you a chance to fix the underlying data quality issues. To do that, you can use the `get_step_report()` method:

Note that the above validation report table showed 6 failures in the first validation step. You might want to know exactly *what* failed, giving you a chance to fix the underlying data quality issues. To do that, you can use the `get_step_report()` method:

Much clearer! Adding it in.

machow · 2025-02-11T17:18:18Z

docs/blog/pointblank-intro/index.qmd

+
+### Previewing datasets across backends
+
+Because Pointblank supports many backends, with varying ways for displaying the underlying data, we provide the `preview()` function. With that you can get a beautiful and consistent view of any data table. Here is how it looks against a 2,000 row DuckDB table that's included in the package (`game_revenue`):


Tweaked a bit to clarify that backends vary (not pointblank), which motivates preview()

Suggested change

Because Pointblank supports many backends, with varying ways for displaying the underlying data, we provide the `preview()` function. With that you can get a beautiful and consistent view of any data table. Here is how it looks against a 2,000 row DuckDB table that's included in the package (`game_revenue`):

Because many of the backends Pointblank supports have varying ways to view the underlying data, we provide a unified `preview()` function. It gives you a beautiful and consistent view of any data table. Here is how it looks against a 2,000 row DuckDB table that's included in the package (`game_revenue`):

Nice! Definitely adding this in.

machow · 2025-02-11T17:22:08Z

docs/blog/pointblank-intro/index.qmd

+pb.preview(pb.load_dataset(dataset="game_revenue", tbl_type="duckdb"))
+```
+
+The `preview()` function had a few design goals in mind:


Directed people's attention at example:

Suggested change

The `preview()` function had a few design goals in mind:

Notice that table displays only 10 rows by default, 5 from the top and 5 from the bottom. The grey text on the left of the table indicates the row number, and a blue line helps demarcate top and bottom rows.

The `preview()` function had a few design goals in mind:

This is great! Will add it in.

rich-iannone · 2025-02-11T17:39:08Z

Pointblank looks amazing! I'm curious—could it potentially be integrated into the test suite for Great Tables?

Thanks! And regarding testing of it, I think that's something we could do down the line (like how Narwhals has their GH workflows for testing downstream libraries).

machow

LGTM, thanks this is really great!

machow

LGTM, thanks this is really great!

Add initial version of blog post

b3a0230

rich-iannone added 5 commits January 31, 2025 16:49

Merge branch 'main' into docs-blog-pointblank-intro

11582a3

Merge branch 'main' into docs-blog-pointblank-intro

6d74c16

Add pointblank to dev dependencies

567e365

Add dependency group for blogposts; update ci-docs workflow

cd24b0f

Update workflow, pyproject.toml, and blog post

8dba124

github-actions bot temporarily deployed to pr-595 February 10, 2025 20:54 Destroyed

Update text in post

63e3b20

rich-iannone marked this pull request as ready for review February 10, 2025 21:13

rich-iannone requested a review from machow as a code owner February 10, 2025 21:13

github-actions bot temporarily deployed to pr-595 February 10, 2025 21:20 Destroyed

jrycw reviewed Feb 11, 2025

View reviewed changes

Fix typo

cb5cdb3

github-actions bot temporarily deployed to pr-595 February 11, 2025 05:30 Destroyed

machow requested changes Feb 11, 2025

View reviewed changes

rich-iannone added 2 commits February 11, 2025 12:51

Update blog post with suggested changes

d556728

Remove 'Get tags' step from docs CI workflow

20458df

github-actions bot deployed to pr-595 February 11, 2025 17:58 View deployment

rich-iannone requested a review from machow February 11, 2025 18:10

machow approved these changes Feb 11, 2025

View reviewed changes

rich-iannone merged commit 3d6ad09 into main Feb 11, 2025
14 checks passed

rich-iannone deleted the docs-blog-pointblank-intro branch February 11, 2025 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add post on use of Great Tables in Pointblank library #595

docs: add post on use of Great Tables in Pointblank library #595

rich-iannone commented Jan 31, 2025 •

edited

Loading

codecov bot commented Jan 31, 2025 •

edited

Loading

jrycw left a comment

jrycw Feb 11, 2025

jrycw commented Feb 11, 2025

machow commented Feb 11, 2025

machow left a comment •

edited

Loading

machow Feb 11, 2025

machow Feb 11, 2025

rich-iannone Feb 11, 2025

machow Feb 11, 2025

rich-iannone Feb 11, 2025

machow Feb 11, 2025

rich-iannone Feb 11, 2025

machow Feb 11, 2025

rich-iannone Feb 11, 2025

machow Feb 11, 2025

rich-iannone Feb 11, 2025

machow Feb 11, 2025

rich-iannone Feb 11, 2025

rich-iannone commented Feb 11, 2025

machow left a comment

machow left a comment

		Just like Great Tables, Pointblank's primary input is a table and the goal of that library is to perform checks of the tabular data. Other libraries in this domain include [Great Expectations](https://github.com/great-expectations/great_expectations), [pandera](https://github.com/unionai-oss/pandera), and [Soda](https://github.com/sodadata/soda-core?tab=readme-ov-file), and [PyDeequ](https://github.com/awslabs/python-deequ). Let's look at the main reporting table that users are likely to see quite often.

-Just like Great Tables, Pointblank's primary input is a table and the goal of that library is to perform checks of the tabular data. Other libraries in this domain include [Great Expectations](https://github.com/great-expectations/great_expectations), [pandera](https://github.com/unionai-oss/pandera), and [Soda](https://github.com/sodadata/soda-core?tab=readme-ov-file), and [PyDeequ](https://github.com/awslabs/python-deequ). Let's look at the main reporting table that users are likely to see quite often.
+Just like Great Tables, Pointblank's primary input is a table and the goal of that library is to perform checks of the tabular data. Other libraries in this domain include [Great Expectations](https://github.com/great-expectations/great_expectations), [pandera](https://github.com/unionai-oss/pandera), [Soda](https://github.com/sodadata/soda-core?tab=readme-ov-file), and [PyDeequ](https://github.com/awslabs/python-deequ).
+Below is the main validation report table that users are likely to see quite often. Each row is a validation step, with columns reporting details about each step and their results.

	Pointblank makes it easy to get started by giving you a simple entry point (`Validate()`), allowing you to define as many validation steps as needed.
	Pointblank makes it easy to get started by giving you a simple entry point (`Validate()`), allowing you to define as many validation steps as needed. Each validation step is specified by calling methods like `.cols_vals_gt()`, which is short for checking that "column values are greater than" some specified value.


		### Exploring data validation failures

		Note that the above validation showed 6 failures in the first step. You might want to know exactly what failed, giving you a chance to fix the underlying data quality issues. To do that, you can use the `get_step_report()` method:

	Note that the above validation showed 6 failures in the first step. You might want to know exactly what failed, giving you a chance to fix the underlying data quality issues. To do that, you can use the `get_step_report()` method:
	Note that the above validation report table showed 6 failures in the first validation step. You might want to know exactly what failed, giving you a chance to fix the underlying data quality issues. To do that, you can use the `get_step_report()` method:


		### Previewing datasets across backends

		Because Pointblank supports many backends, with varying ways for displaying the underlying data, we provide the `preview()` function. With that you can get a beautiful and consistent view of any data table. Here is how it looks against a 2,000 row DuckDB table that's included in the package (`game_revenue`):

	Because Pointblank supports many backends, with varying ways for displaying the underlying data, we provide the `preview()` function. With that you can get a beautiful and consistent view of any data table. Here is how it looks against a 2,000 row DuckDB table that's included in the package (`game_revenue`):
	Because many of the backends Pointblank supports have varying ways to view the underlying data, we provide a unified `preview()` function. It gives you a beautiful and consistent view of any data table. Here is how it looks against a 2,000 row DuckDB table that's included in the package (`game_revenue`):

docs: add post on use of Great Tables in Pointblank library #595

docs: add post on use of Great Tables in Pointblank library #595

Conversation

rich-iannone commented Jan 31, 2025 • edited Loading

codecov bot commented Jan 31, 2025 • edited Loading

Codecov Report

jrycw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrycw commented Feb 11, 2025

machow commented Feb 11, 2025

machow left a comment • edited Loading

Choose a reason for hiding this comment

Thoughts that aren't critical

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rich-iannone commented Feb 11, 2025

machow left a comment

Choose a reason for hiding this comment

machow left a comment

Choose a reason for hiding this comment

rich-iannone commented Jan 31, 2025 •

edited

Loading

codecov bot commented Jan 31, 2025 •

edited

Loading

machow left a comment •

edited

Loading