-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add post on use of Great Tables in Pointblank library #595
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #595 +/- ##
=======================================
Coverage 90.71% 90.71%
=======================================
Files 46 46
Lines 5417 5417
=======================================
Hits 4914 4914
Misses 503 503 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small typo :)
docs/blog/pointblank-intro/index.qmd
Outdated
jupyter: python3 | ||
--- | ||
|
||
The Great Tables package allows you to make tables, and they're really great when part of a report, a book, or a web page. The API is meant to be easy to work with so DataFrames could be made into publication-qualty tables without a lot of hassle. And having nice-looking tables in the mix elevates the quality of the medium you're working in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
publication-quality?
Pointblank looks amazing! I'm curious—could it potentially be integrated into the test suite for Great Tables? |
Rich walked me through the narwhals CI, which tests some things in its downstream so we could always do something similar?! https://github.com/narwhals-dev/narwhals/blob/main/.github/workflows/downstream_tests.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! I added some suggestions for setting up examples and directing readers' attention right after examples.
Thoughts that aren't critical
One thing I noticed is the term report is used 14 times, in these ways:
- reporting objects
- reporting tables
- "the main reporting table"
- "the reporting being a table"
- "Report for validation step 1"
- "the use of a table for reporting is..."
- step report table
- Great Tables makes sense for reporting
It's not clear to me what report means exactly here. What is a reporting object? I think the article is good as is, but it might be helpful to define this a bit in the future / tighten up usage. Maybe related might be just saying what job that reports do in this context (e.g. monitoring, diagnosing, documenting, reassuring?!)
When you mean something more specific than "report" I think you should use the more specific term. For example, we have the main table labeled as Validation Report in our controlled vocabulary on Miro. If that's the correct term, we should use that (or change it in miro).
.github/workflows/ci-docs.yaml
Outdated
@@ -14,12 +14,15 @@ jobs: | |||
runs-on: ubuntu-latest | |||
steps: | |||
- uses: actions/checkout@v4 | |||
- name: Get tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this now
docs/blog/pointblank-intro/index.qmd
Outdated
Just like Great Tables, Pointblank's primary input is a table and the goal of that library is to perform checks of the tabular data. Other libraries in this domain include [Great Expectations](https://github.com/great-expectations/great_expectations), [pandera](https://github.com/unionai-oss/pandera), and [Soda](https://github.com/sodadata/soda-core?tab=readme-ov-file), and [PyDeequ](https://github.com/awslabs/python-deequ). Let's look at the main reporting table that users are likely to see quite often. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prep'ing people for what they'll be seeing in example
Just like Great Tables, Pointblank's primary input is a table and the goal of that library is to perform checks of the tabular data. Other libraries in this domain include [Great Expectations](https://github.com/great-expectations/great_expectations), [pandera](https://github.com/unionai-oss/pandera), and [Soda](https://github.com/sodadata/soda-core?tab=readme-ov-file), and [PyDeequ](https://github.com/awslabs/python-deequ). Let's look at the main reporting table that users are likely to see quite often. | |
Just like Great Tables, Pointblank's primary input is a table and the goal of that library is to perform checks of the tabular data. Other libraries in this domain include [Great Expectations](https://github.com/great-expectations/great_expectations), [pandera](https://github.com/unionai-oss/pandera), [Soda](https://github.com/sodadata/soda-core?tab=readme-ov-file), and [PyDeequ](https://github.com/awslabs/python-deequ). | |
Below is the main validation report table that users are likely to see quite often. Each row is a validation step, with columns reporting details about each step and their results. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Adding this in.
validation | ||
``` | ||
|
||
The table is chock full of the information you need when doing data validation tasks. And it's also easy on the eyes. Some cool features include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directed attention at example
The table is chock full of the information you need when doing data validation tasks. And it's also easy on the eyes. Some cool features include: | |
The first validation step (`cols_val_gt()`) checks the `d` column in the data, to ensure each value is greater than `1000`. Notice that the red bar on the left indicates it failed, and the `FAIL` column says it has 6 failing values out of 13 `UNITS`. | |
The table is chock full of the information you need when doing data validation tasks. And it's also easy on the eyes. Some cool features include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This will be added in.
docs/blog/pointblank-intro/index.qmd
Outdated
validation | ||
``` | ||
|
||
Pointblank makes it easy to get started by giving you a simple entry point (`Validate()`), allowing you to define as many validation steps as needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointblank makes it easy to get started by giving you a simple entry point (`Validate()`), allowing you to define as many validation steps as needed. | |
Pointblank makes it easy to get started by giving you a simple entry point (`Validate()`), allowing you to define as many validation steps as needed. Each validation step is specified by calling methods like `.cols_vals_gt()`, which is short for checking that "column values are greater than" some specified value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good. Using it!
docs/blog/pointblank-intro/index.qmd
Outdated
|
||
### Exploring data validation failures | ||
|
||
Note that the above validation showed 6 failures in the first step. You might want to know exactly *what* failed, giving you a chance to fix the underlying data quality issues. To do that, you can use the `get_step_report()` method: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the above validation showed 6 failures in the first step. You might want to know exactly *what* failed, giving you a chance to fix the underlying data quality issues. To do that, you can use the `get_step_report()` method: | |
Note that the above validation report table showed 6 failures in the first validation step. You might want to know exactly *what* failed, giving you a chance to fix the underlying data quality issues. To do that, you can use the `get_step_report()` method: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much clearer! Adding it in.
docs/blog/pointblank-intro/index.qmd
Outdated
|
||
### Previewing datasets across backends | ||
|
||
Because Pointblank supports many backends, with varying ways for displaying the underlying data, we provide the `preview()` function. With that you can get a beautiful and consistent view of any data table. Here is how it looks against a 2,000 row DuckDB table that's included in the package (`game_revenue`): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tweaked a bit to clarify that backends vary (not pointblank), which motivates preview()
Because Pointblank supports many backends, with varying ways for displaying the underlying data, we provide the `preview()` function. With that you can get a beautiful and consistent view of any data table. Here is how it looks against a 2,000 row DuckDB table that's included in the package (`game_revenue`): | |
Because many of the backends Pointblank supports have varying ways to view the underlying data, we provide a unified `preview()` function. It gives you a beautiful and consistent view of any data table. Here is how it looks against a 2,000 row DuckDB table that's included in the package (`game_revenue`): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Definitely adding this in.
pb.preview(pb.load_dataset(dataset="game_revenue", tbl_type="duckdb")) | ||
``` | ||
|
||
The `preview()` function had a few design goals in mind: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directed people's attention at example:
The `preview()` function had a few design goals in mind: | |
Notice that table displays only 10 rows by default, 5 from the top and 5 from the bottom. The grey text on the left of the table indicates the row number, and a blue line helps demarcate top and bottom rows. | |
The `preview()` function had a few design goals in mind: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Will add it in.
Thanks! And regarding testing of it, I think that's something we could do down the line (like how Narwhals has their GH workflows for testing downstream libraries). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks this is really great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks this is really great!
This adds a blog post that describes how package maintainers can use Great Tables can be used to provide tabular reporting outputs. We demonstrate this by way of pointblank, a new Python package that returns GT objects as reporting artifacts.