Skip to content

Commit 630d51b

Browse files
authored
Merge pull request #47 from cparmet/modernize
Modernize package
2 parents 15ac324 + d02516e commit 630d51b

File tree

13 files changed

+2572
-2703
lines changed

13 files changed

+2572
-2703
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ repos:
2020
rev: 22.12.0
2121
hooks:
2222
- id: black
23-
language_version: python3.8
23+
language_version: python3.9
2424

2525
- repo: https://github.com/pre-commit/mirrors-mypy
2626
rev: v1.5.0

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.11

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Pandas Checks
22
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pandas-checks)
33

4-
<img src="https://raw.githubusercontent.com/cparmet/pandas-checks/main/static/pandas-check-gh-social.jpg" alt="Banner image for Pandas Checks">
4+
<img src="https://raw.githubusercontent.com/cparmet/pandas-checks/main/static/pandas-check-gh-social.jpg" alt="Banner image for Pandas Checks" style="max-height: 125px; width: auto;">
55

66
## What is it?
77
**Pandas Checks** is a Python package for data science and data engineering. It adds non-invasive health checks for Pandas method chains.
@@ -121,7 +121,7 @@ New methods in Pandas Checks:
121121
- `.check.ncols()`: Count columns - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.ncols) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.ncols)
122122
- `.check.ndups()`: Count rows with duplicate values - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.ndups) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.ndups)
123123
- `.check.nnulls()`: Count rows with null values - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.nnulls) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.nnulls)
124-
- `.check.nrows()` - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.nrows) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.nrows)
124+
- `.check.nrows()`: Count rows - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.nrows) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.nrows)
125125
- `.check.print()`: Print a string, a variable, or the current dataframe - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.print) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.print)
126126

127127
### Export interim files

docs/index.md

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,11 @@
1-
# About
2-
3-
<img src="https://raw.githubusercontent.com/cparmet/pandas-checks/main/static/pandas-check-gh-social.jpg" alt="Banner image for Pandas Checks">
1+
---
2+
title: About
3+
---
4+
5+
<img src="https://raw.githubusercontent.com/cparmet/pandas-checks/main/static/pandas-check-gh-social.jpg" alt="Banner image for Pandas Checks" style="max-height: 200px; width: auto;">
46

7+
[TOC]
8+
59
## What is it?
610

711
**Pandas Checks** is a Python package for data science and data engineering. It adds non-invasive health checks for [Pandas](https://github.com/pandas-dev/pandas/) method chains.
@@ -24,8 +28,17 @@ As Fleetwood Mac says, you would never break the chain.
2428

2529
If you run into trouble or have questions, I'd love to know. Please [open an issue](https://github.com/cparmet/pandas-checks/issues).
2630

27-
Contributions are appreciated! Please open an [issue](https://github.com/cparmet/pandas-checks/issues) or submit a [pull request](https://github.com/cparmet/pandas-checks/pulls). Pandas Checks uses the wonderful libraries [poetry](https://python-poetry.org) for package and dependency management, [nox](https://nox.thea.codes/en/stable/) for test automation, and [mkdocs](https://www.mkdocs.org/) for docs.
28-
31+
Contributions are appreciated! Please open an [issue](https://github.com/cparmet/pandas-checks/issues) or submit a [pull request](https://github.com/cparmet/pandas-checks/pulls). To run the tests, run `uv run --group dev nox`
32+
33+
## Acknowledgments
34+
35+
Pandas Checks uses the following wonderful libraries:
36+
37+
- [uv](https://github.com/astral-sh/uv) for package and dependency management
38+
- [nox](https://nox.thea.codes/en/stable/) for test automation
39+
- [mkdocs](https://www.mkdocs.org/) for...making docs!
40+
- [pre-commit hooks](https://pre-commit.com/)
41+
- [black](https://black.readthedocs.io/en/stable/) for code formatting
2942

3043
## License
3144

docs/usage.md

Lines changed: 10 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
# Usage
1+
---
2+
title: Usage
3+
---
4+
5+
[TOC]
26

37
## Installation
48
First make Pandas Check available in your environment.
@@ -93,7 +97,7 @@ New methods in Pandas Checks:
9397
- `.check.ncols()`: Count columns - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.ncols) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.ncols)
9498
- `.check.ndups()`: Count rows with duplicate values - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.ndups) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.ndups)
9599
- `.check.nnulls()`: Count rows with null values - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.nnulls) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.nnulls)
96-
- `.check.nrows()` - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.nrows) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.nrows)
100+
- `.check.nrows()`: Count rows - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.nrows) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.nrows)
97101
- `.check.print()`: Print a string, a variable, or the current dataframe - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.print) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.print)
98102

99103
### Export interim files
@@ -120,17 +124,20 @@ These methods can be used to disable subsequent Pandas Checks methods, either te
120124

121125
### Validate data
122126
Custom:
127+
123128
- `.check.assert_data()`: Check that data passes an arbitrary condition - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.assert_data) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.assert_data)
124129

125130
Types:
131+
126132
- `.check.assert_datetime()` - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.assert_datetime) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.assert_datetime)
127133
- `.check.assert_float()` - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.assert_float) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.assert_float)
128134
- `.check.assert_int()` - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.assert_int) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.assert_int)
129135
- `.check.assert_str()` - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.assert_str) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.assert_str)
130136
- `.check.assert_timedelta()` - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.assert_timedelta) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.assert_timedelta)
131137
- `.check.assert_type()` - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.assert_type) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.assert_type)
132-
138+
133139
Values:
140+
134141
- `.check.assert_all_nulls()` - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.assert_all_nulls) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.assert_all_nulls)
135142
- `.check.assert_less_than()` - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.assert_less_than) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.assert_less_than)
136143
- `.check.assert_greater_than()` - [DataFrame](https://cparmet.github.io/pandas-checks/API%20reference/DataFrameChecks/#pandas_checks.DataFrameChecks.DataFrameChecks.assert_greater_than) | [Series](https://cparmet.github.io/pandas-checks/API%20reference/SeriesChecks/#pandas_checks.SeriesChecks.SeriesChecks.assert_greater_than)
@@ -210,14 +217,3 @@ You can also adjust settings within a method chain by bookending the chain, like
210217
.check.enable_checks() # Turn it back on for the next code
211218
)
212219
```
213-
214-
215-
### Hybrid EDA-Production data processing
216-
217-
Exploratory Data Analysis is often taught as a one-time step we do to plan our production data processing. But sometimes EDA is a cyclical process we go back to for deeper inspection during debugging, code edits, or changes in the input data. If explorations were useful in EDA, they may be useful again.
218-
219-
Unfortunately, it's hard to go back to the original EDA code. It's too out of sync. The prod data processing pipeline has usually evolved too much, making the EDA code a historical artifact full of cobwebs that we can't easily fire up again.
220-
221-
But if you use Pandas Checks during EDA, you could roll your `.check` methods into your first production code. Then in prod mode, disable Pandas Checks when you don't need it, to save compute and streamline output. When you ever need to pull out those EDA tools, enable Pandas Checks globally or locally.
222-
223-
This can make your prod pipline more transparent and easier to inspect.

mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,5 @@ plugins:
99
theme:
1010
name: material
1111
repo_url: https://github.com/cparmet/pandas-checks/
12+
markdown_extensions:
13+
- toc

noxfile.py

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,23 @@
1-
from nox_poetry import Session, session
1+
from nox import Session, options
2+
from nox_uv import session
23

4+
options.default_venv_backend = "uv"
35

4-
@session(python=["3.8", "3.9", "3.10", "3.11"])
5-
def tests(session: Session) -> None:
6+
7+
@session(
8+
python=["3.9", "3.10", "3.11", "3.12", "3.13"],
9+
uv_groups=["test"],
10+
)
11+
def test(s: Session) -> None:
612
"""Run the test suite."""
13+
s.install(".") # Install pandas-checks
14+
s.run("pytest")
15+
16+
17+
# def tests(session: Session) -> None:
718

8-
session.install(".") # Install pandas-checks
9-
session.install(
10-
"pytest", "pytest-cases", "pyarrow", "openpyxl"
11-
) # Install test packages
19+
# # session.install(
20+
# # "pytest", "pytest-cases", "pyarrow", "openpyxl"
21+
# # ) # Install test packages
1222

13-
session.run("pytest")
23+
# session.run("pytest")

pandas_checks/DataFrameChecks.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ def assert_data(
120120
iris
121121
.check.assert_data(lambda df: df.shape[0]>0)
122122
123-
# Or customize the message displayed when alert fails
123+
# Or customize the message displayed when assert fails
124124
.check.assert_data(lambda df: df.shape[0]>0, "Assertion failed, DataFrame has no rows!")
125125
126126
# Or show a warning instead of raising an exception
@@ -1788,7 +1788,7 @@ def unique(
17881788
) -> pd.DataFrame:
17891789
"""Displays the unique values in a column, without modifying the DataFrame itself.
17901790
1791-
See Pandas docs for [unique()]((https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.unique.html)) for additional usage information, including more configuration options you can pass to this Pandas Checks method.
1791+
See Pandas docs for [unique()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.unique.html) for additional usage information, including more configuration options you can pass to this Pandas Checks method.
17921792
17931793
Example:
17941794
```python
@@ -1832,7 +1832,7 @@ def value_counts(
18321832
) -> pd.DataFrame:
18331833
"""Displays the value counts for a column, without modifying the DataFrame itself.
18341834
1835-
See Pandas docs for [value_counts()]((https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.value_counts.html)) for additional usage information, including more configuration options you can pass to this Pandas Checks method.
1835+
See Pandas docs for [value_counts()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.value_counts.html) for additional usage information, including more configuration options you can pass to this Pandas Checks method.
18361836
18371837
Example:
18381838
```python
@@ -1887,7 +1887,7 @@ def write(
18871887
- .tsv # Tab-separated data file
18881888
- .xlsx
18891889
1890-
This functions uses the corresponding Pandas export function, such as `to_csv()` and `to_feather()`. See [Pandas docs for those corresponding export functions][Pandas docs for those export functions](https://pandas.pydata.org/docs/reference/io.html) for additional usage information, including more configuration options you can pass to this Pandas Checks method.
1890+
This functions uses the corresponding Pandas export function, such as `to_csv()` and `to_feather()`. See [Pandas docs for those corresponding export functions](https://pandas.pydata.org/docs/reference/io.html) for additional usage information, including more configuration options you can pass to this Pandas Checks method.
18911891
18921892
Example:
18931893
```python

pandas_checks/SeriesChecks.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ def assert_data(
113113
# Validate that a Series has at least 1 row:
114114
.check.assert_data(lambda s: s.shape[0]>0)
115115
116-
# Or customize the message displayed when alert fails
116+
# Or customize the message displayed when assert fails
117117
.check.assert_data(lambda df: s.shape[0]>0, "Assertion failed, Series has no rows!")
118118
119119
# Or show a warning instead of raising an exception

pandas_checks/utils.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,9 +70,9 @@ def _series_is_type(s: pd.Series, dtype: Type[Any]) -> bool:
7070
if dtype in [str, "str"]:
7171
return pd.api.types.is_string_dtype(s)
7272
elif dtype in [datetime, "datetime", "date"]:
73-
return pd.api.types.is_datetime64_any_dtype(
74-
s
75-
) or pd.api.types.is_datetime64tz_dtype(s)
73+
return pd.api.types.is_datetime64_any_dtype(s) or isinstance(
74+
s, pd.DatetimeTZDtype
75+
)
7676
elif dtype in [timedelta, "timedelta"]:
7777
return pd.api.types.is_timedelta64_dtype(s)
7878
else:

0 commit comments

Comments
 (0)