This Polars plugin is a wrapper around the T-Digest Rust implementation. It not only provides means to compute an estimated qunatile, but exposes the tdigest creation and merging functionality so it can be used to estimate quantiles in a distributed environment.
For an example see the Yellow Taxi Notebook. Note that this example is a bit artifical as it doesn't distribute the computation. It is mainly meant to show how to use the plugin with multiple partitions of a dataset. It does not make sense to use this plugin for computations on a single machine as the tdigest computation essentially adds overhead to the percentile computation and is therefore slower than computing the actual percentile.
Setup your virtual environment with a python version >=3.8
, e.g. use
python -m venv .env
source .env/bin/activate
``` .
Install the python dependencies used for development:
```bash
python -m pip install -r requirements.txt
Install Rust.
In order to build the package, please run maturin develop
. If you want to test performance, run maturin develop --release
.
Cargo commands (e.g. cargo build
, cargo test
) don't work out of the box.
In order to use cargo instead of maturin for local development, remove extension-module
from cargo.toml
:
replace
pyo3 = { version = "0.21.2", features = ["extension-module", "abi3-py38"] }
with
pyo3 = { version = "0.21.2", features = ["abi3-py38"] }
Before committing and pushing your work, make sure to run
cargo fmt --all && cargo clippy --all-features
python -m ruff check . --fix --exit-non-zero-on-fix
and resolve any errors.