Thank you for your interest in contributing to pg_textsearch! This document provides guidelines for contributing to the project.
-
Clone the repository:
git clone https://github.com/timescale/pg_textsearch cd pg_textsearch -
Install PostgreSQL 17 or 18 with development headers:
# Ubuntu/Debian sudo apt install postgresql-server-dev-17 # or 18 # macOS with Homebrew brew install postgresql@17 # or @18
-
Build the extension:
make make install # may need sudo -
Run tests:
make installcheck
Install pre-commit hooks to automatically check formatting:
# macOS
brew install pre-commit && pre-commit install
# Linux/pip
pip install pre-commit && pre-commit installWe follow PostgreSQL coding conventions. Key points:
- Line limit: 79 characters
- Indentation: Tabs
- Brace style: Allman (opening braces on new lines)
- Naming: snake_case for functions and variables
- Comments: 2 spaces before trailing comments
- Headers: Use
#pragma onceinstead of include guards - Includes:
postgres.hmust be the first include, followed by standard library headers (with<>), then project headers (with"")
See the PostgreSQL coding conventions for more details.
Format your code before committing:
make format # auto-format all source files
make format-check # check formatting without changesBefore submitting a pull request:
- Build:
makemust succeed without errors - Tests:
make installcheckmust pass - Concurrency:
make test-concurrencyshould pass - Formatting:
make format-checkmust pass
If you modify error messages, update the corresponding expected output files
in test/expected/.
Automated benchmarks run weekly and can be triggered on-demand. The benchmark suite uses public IR datasets to measure indexing and query performance.
Trigger benchmarks manually using the GitHub CLI:
# Full MS MARCO benchmark (8.8M passages, ~13 minutes)
gh workflow run benchmark.yml -f dataset=msmarco -f msmarco_size=full
# Quick test with smaller subset (1M passages, ~4 minutes)
gh workflow run benchmark.yml -f dataset=msmarco -f msmarco_size=1M
# Run all datasets (MS MARCO + Wikipedia)
gh workflow run benchmark.yml -f dataset=all
# Wikipedia only (configurable size: 10K, 100K, 1M, full)
gh workflow run benchmark.yml -f dataset=wikipedia -f wikipedia_size=100KCheck benchmark status and results:
# List recent benchmark runs
gh run list --workflow=benchmark.yml
# View a specific run
gh run view <run-id>
# Download benchmark artifacts (includes JSON metrics)
gh run download <run-id>Each run produces:
benchmark_results.txt- Full output logbenchmark_metrics.json- Structured metrics for comparisonbenchmark_summary.md- Formatted summary
Run benchmarks locally using the benchmark runner:
cd benchmarks
# Run Cranfield (quick validation, ~1400 docs)
./runner/run_benchmark.sh cranfield --download --load --query
# Run MS MARCO locally (requires ~4GB disk space for full dataset)
./runner/run_benchmark.sh msmarco --download --load --query --report| Dataset | Documents | Description |
|---|---|---|
| Cranfield | 1,400 | Classic IR test collection (quick validation) |
| MS MARCO | 8.8M | Microsoft passage ranking dataset |
| Wikipedia | Configurable | Wikipedia article extracts |
Historical benchmark results are tracked and published to GitHub Pages:
Dashboard URL: https://timescale.github.io/pg_textsearch/benchmarks/
The dashboard shows:
- Index Build Time - Time to build the BM25 index
- Query Latencies - Per-query execution times (short, medium, long queries)
- Average Throughput - Mean latency across 20 representative queries
Performance is automatically monitored:
- PRs: Cranfield benchmarks run on every PR touching
src/orbenchmarks/. Results are posted as PR comments comparing against the baseline. - Weekly: Full MS MARCO benchmarks run every Sunday, updating the baseline.
- Releases: A benchmark gate runs before each release with a stricter 120% threshold. Releases are blocked if performance regresses significantly.
Alert thresholds:
- PRs and weekly: 150% of baseline (warn but don't fail)
- Releases: 120% of baseline (blocks release)
- Write clear, concise commit messages
- Focus on the "why" rather than the "what"
- Reference related issues when applicable
- Fork the repository and create a feature branch
- Make your changes with appropriate tests
- Ensure all tests pass locally
- Submit a pull request to the
mainbranch
All pull requests are automatically tested against PostgreSQL 17 and 18.
Include:
- A brief summary of changes
- Testing steps or notes
- Any breaking changes or migration notes
When reporting bugs, please include:
- PostgreSQL version
- pg_textsearch version
- Operating system
- Steps to reproduce
- Expected vs actual behavior
- Relevant error messages or logs
For feature requests, describe:
- The problem you're trying to solve
- Your proposed solution (if any)
- Any alternatives you've considered
Be respectful and constructive in all interactions. We're building something together.
- Open a GitHub Discussion for general questions
- Check existing issues before opening new ones
By contributing to pg_textsearch, you agree that your contributions will be licensed under the PostgreSQL License.