Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add post on use of Great Tables in Pointblank library #595

Merged
merged 10 commits into from
Feb 11, 2025

Conversation

rich-iannone
Copy link
Member

@rich-iannone rich-iannone commented Jan 31, 2025

This adds a blog post that describes how package maintainers can use Great Tables can be used to provide tabular reporting outputs. We demonstrate this by way of pointblank, a new Python package that returns GT objects as reporting artifacts.

Copy link

codecov bot commented Jan 31, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.71%. Comparing base (e538fbc) to head (20458df).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #595   +/-   ##
=======================================
  Coverage   90.71%   90.71%           
=======================================
  Files          46       46           
  Lines        5417     5417           
=======================================
  Hits         4914     4914           
  Misses        503      503           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot temporarily deployed to pr-595 February 10, 2025 20:54 Destroyed
@rich-iannone rich-iannone marked this pull request as ready for review February 10, 2025 21:13
@rich-iannone rich-iannone requested a review from machow as a code owner February 10, 2025 21:13
@github-actions github-actions bot temporarily deployed to pr-595 February 10, 2025 21:20 Destroyed
Copy link
Collaborator

@jrycw jrycw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typo :)

jupyter: python3
---

The Great Tables package allows you to make tables, and they're really great when part of a report, a book, or a web page. The API is meant to be easy to work with so DataFrames could be made into publication-qualty tables without a lot of hassle. And having nice-looking tables in the mix elevates the quality of the medium you're working in.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

publication-quality?

@jrycw
Copy link
Collaborator

jrycw commented Feb 11, 2025

Pointblank looks amazing! I'm curious—could it potentially be integrated into the test suite for Great Tables?

@github-actions github-actions bot temporarily deployed to pr-595 February 11, 2025 05:30 Destroyed
@machow
Copy link
Collaborator

machow commented Feb 11, 2025

Pointblank looks amazing! I'm curious—could it potentially be integrated into the test suite for Great Tables?

Rich walked me through the narwhals CI, which tests some things in its downstream so we could always do something similar?!

https://github.com/narwhals-dev/narwhals/blob/main/.github/workflows/downstream_tests.yml

Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! I added some suggestions for setting up examples and directing readers' attention right after examples.

Thoughts that aren't critical

One thing I noticed is the term report is used 14 times, in these ways:

  • reporting objects
  • reporting tables
  • "the main reporting table"
  • "the reporting being a table"
  • "Report for validation step 1"
  • "the use of a table for reporting is..."
  • step report table
  • Great Tables makes sense for reporting

It's not clear to me what report means exactly here. What is a reporting object? I think the article is good as is, but it might be helpful to define this a bit in the future / tighten up usage. Maybe related might be just saying what job that reports do in this context (e.g. monitoring, diagnosing, documenting, reassuring?!)

When you mean something more specific than "report" I think you should use the more specific term. For example, we have the main table labeled as Validation Report in our controlled vocabulary on Miro. If that's the correct term, we should use that (or change it in miro).

@@ -14,12 +14,15 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Get tags
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this now

Comment on lines 20 to 21
Just like Great Tables, Pointblank's primary input is a table and the goal of that library is to perform checks of the tabular data. Other libraries in this domain include [Great Expectations](https://github.com/great-expectations/great_expectations), [pandera](https://github.com/unionai-oss/pandera), and [Soda](https://github.com/sodadata/soda-core?tab=readme-ov-file), and [PyDeequ](https://github.com/awslabs/python-deequ). Let's look at the main reporting table that users are likely to see quite often.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prep'ing people for what they'll be seeing in example

Suggested change
Just like Great Tables, Pointblank's primary input is a table and the goal of that library is to perform checks of the tabular data. Other libraries in this domain include [Great Expectations](https://github.com/great-expectations/great_expectations), [pandera](https://github.com/unionai-oss/pandera), and [Soda](https://github.com/sodadata/soda-core?tab=readme-ov-file), and [PyDeequ](https://github.com/awslabs/python-deequ). Let's look at the main reporting table that users are likely to see quite often.
Just like Great Tables, Pointblank's primary input is a table and the goal of that library is to perform checks of the tabular data. Other libraries in this domain include [Great Expectations](https://github.com/great-expectations/great_expectations), [pandera](https://github.com/unionai-oss/pandera), [Soda](https://github.com/sodadata/soda-core?tab=readme-ov-file), and [PyDeequ](https://github.com/awslabs/python-deequ).
Below is the main validation report table that users are likely to see quite often. Each row is a validation step, with columns reporting details about each step and their results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Adding this in.

validation
```

The table is chock full of the information you need when doing data validation tasks. And it's also easy on the eyes. Some cool features include:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directed attention at example

Suggested change
The table is chock full of the information you need when doing data validation tasks. And it's also easy on the eyes. Some cool features include:
The first validation step (`cols_val_gt()`) checks the `d` column in the data, to ensure each value is greater than `1000`. Notice that the red bar on the left indicates it failed, and the `FAIL` column says it has 6 failing values out of 13 `UNITS`.
The table is chock full of the information you need when doing data validation tasks. And it's also easy on the eyes. Some cool features include:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This will be added in.

validation
```

Pointblank makes it easy to get started by giving you a simple entry point (`Validate()`), allowing you to define as many validation steps as needed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Pointblank makes it easy to get started by giving you a simple entry point (`Validate()`), allowing you to define as many validation steps as needed.
Pointblank makes it easy to get started by giving you a simple entry point (`Validate()`), allowing you to define as many validation steps as needed. Each validation step is specified by calling methods like `.cols_vals_gt()`, which is short for checking that "column values are greater than" some specified value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. Using it!


### Exploring data validation failures

Note that the above validation showed 6 failures in the first step. You might want to know exactly *what* failed, giving you a chance to fix the underlying data quality issues. To do that, you can use the `get_step_report()` method:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that the above validation showed 6 failures in the first step. You might want to know exactly *what* failed, giving you a chance to fix the underlying data quality issues. To do that, you can use the `get_step_report()` method:
Note that the above validation report table showed 6 failures in the first validation step. You might want to know exactly *what* failed, giving you a chance to fix the underlying data quality issues. To do that, you can use the `get_step_report()` method:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much clearer! Adding it in.


### Previewing datasets across backends

Because Pointblank supports many backends, with varying ways for displaying the underlying data, we provide the `preview()` function. With that you can get a beautiful and consistent view of any data table. Here is how it looks against a 2,000 row DuckDB table that's included in the package (`game_revenue`):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tweaked a bit to clarify that backends vary (not pointblank), which motivates preview()

Suggested change
Because Pointblank supports many backends, with varying ways for displaying the underlying data, we provide the `preview()` function. With that you can get a beautiful and consistent view of any data table. Here is how it looks against a 2,000 row DuckDB table that's included in the package (`game_revenue`):
Because many of the backends Pointblank supports have varying ways to view the underlying data, we provide a unified `preview()` function. It gives you a beautiful and consistent view of any data table. Here is how it looks against a 2,000 row DuckDB table that's included in the package (`game_revenue`):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Definitely adding this in.

pb.preview(pb.load_dataset(dataset="game_revenue", tbl_type="duckdb"))
```

The `preview()` function had a few design goals in mind:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directed people's attention at example:

Suggested change
The `preview()` function had a few design goals in mind:
Notice that table displays only 10 rows by default, 5 from the top and 5 from the bottom. The grey text on the left of the table indicates the row number, and a blue line helps demarcate top and bottom rows.
The `preview()` function had a few design goals in mind:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Will add it in.

@rich-iannone
Copy link
Member Author

Pointblank looks amazing! I'm curious—could it potentially be integrated into the test suite for Great Tables?

Thanks! And regarding testing of it, I think that's something we could do down the line (like how Narwhals has their GH workflows for testing downstream libraries).

Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks this is really great!

Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks this is really great!

@rich-iannone rich-iannone merged commit 3d6ad09 into main Feb 11, 2025
14 checks passed
@rich-iannone rich-iannone deleted the docs-blog-pointblank-intro branch February 11, 2025 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants