Contributing to pg_textsearch

Thank you for your interest in contributing to pg_textsearch! This document provides guidelines for contributing to the project.

Getting Started

Development Setup

Clone the repository:

git clone https://github.com/timescale/pg_textsearch
cd pg_textsearch

Install PostgreSQL 17 or 18 with development headers:

# Ubuntu/Debian
sudo apt install postgresql-server-dev-17  # or 18

# macOS with Homebrew
brew install postgresql@17  # or @18

Build the extension:
```
make
make install  # may need sudo
```
Run tests:
```
make installcheck
```

Pre-commit Hooks (Recommended)

Install pre-commit hooks to automatically check formatting:

# macOS
brew install pre-commit && pre-commit install

# Linux/pip
pip install pre-commit && pre-commit install

Making Changes

Code Style

We follow PostgreSQL coding conventions. Key points:

Line limit: 79 characters
Indentation: Tabs
Brace style: Allman (opening braces on new lines)
Naming: snake_case for functions and variables
Comments: 2 spaces before trailing comments
Headers: Use #pragma once instead of include guards
Includes: postgres.h must be the first include, followed by standard library headers (with <>), then project headers (with "")

See the PostgreSQL coding conventions for more details.

Format your code before committing:

make format        # auto-format all source files
make format-check  # check formatting without changes

Testing Requirements

Before submitting a pull request:

Build: make must succeed without errors
Tests: make installcheck must pass
Concurrency: make test-concurrency should pass
Formatting: make format-check must pass

If you modify error messages, update the corresponding expected output files in test/expected/.

Benchmarks

Automated benchmarks run weekly and can be triggered on-demand. The benchmark suite uses public IR datasets to measure indexing and query performance.

Running Benchmarks On-Demand

Trigger benchmarks manually using the GitHub CLI:

# Full MS MARCO benchmark (8.8M passages, ~13 minutes)
gh workflow run benchmark.yml -f dataset=msmarco -f msmarco_size=full

# Quick test with smaller subset (1M passages, ~4 minutes)
gh workflow run benchmark.yml -f dataset=msmarco -f msmarco_size=1M

# Run all datasets (MS MARCO + Wikipedia)
gh workflow run benchmark.yml -f dataset=all

# Wikipedia only (configurable size: 10K, 100K, 1M, full)
gh workflow run benchmark.yml -f dataset=wikipedia -f wikipedia_size=100K

Viewing Results

Check benchmark status and results:

# List recent benchmark runs
gh run list --workflow=benchmark.yml

# View a specific run
gh run view <run-id>

# Download benchmark artifacts (includes JSON metrics)
gh run download <run-id>

Each run produces:

benchmark_results.txt - Full output log
benchmark_metrics.json - Structured metrics for comparison
benchmark_summary.md - Formatted summary

Local Benchmarks

Run benchmarks locally using the benchmark runner:

cd benchmarks

# Run Cranfield (quick validation, ~1400 docs)
./runner/run_benchmark.sh cranfield --download --load --query

# Run MS MARCO locally (requires ~4GB disk space for full dataset)
./runner/run_benchmark.sh msmarco --download --load --query --report

Benchmark Datasets

Dataset	Documents	Description
Cranfield	1,400	Classic IR test collection (quick validation)
MS MARCO	8.8M	Microsoft passage ranking dataset
Wikipedia	Configurable	Wikipedia article extracts

Performance Dashboard

Historical benchmark results are tracked and published to GitHub Pages:

Dashboard URL: https://timescale.github.io/pg_textsearch/benchmarks/

The dashboard shows:

Index Build Time - Time to build the BM25 index
Query Latencies - Per-query execution times (short, medium, long queries)
Average Throughput - Mean latency across 20 representative queries

Regression Alerts

Performance is automatically monitored:

PRs: Cranfield benchmarks run on every PR touching src/ or benchmarks/. Results are posted as PR comments comparing against the baseline.
Weekly: Full MS MARCO benchmarks run every Sunday, updating the baseline.
Releases: A benchmark gate runs before each release with a stricter 120% threshold. Releases are blocked if performance regresses significantly.

Alert thresholds:

PRs and weekly: 150% of baseline (warn but don't fail)
Releases: 120% of baseline (blocks release)

Commit Guidelines

Write clear, concise commit messages
Focus on the "why" rather than the "what"
Reference related issues when applicable

Pull Request Process

Fork the repository and create a feature branch
Make your changes with appropriate tests
Ensure all tests pass locally
Submit a pull request to the main branch

All pull requests are automatically tested against PostgreSQL 17 and 18.

PR Description

Include:

A brief summary of changes
Testing steps or notes
Any breaking changes or migration notes

Reporting Issues

Bug Reports

When reporting bugs, please include:

PostgreSQL version
pg_textsearch version
Operating system
Steps to reproduce
Expected vs actual behavior
Relevant error messages or logs

Feature Requests

For feature requests, describe:

The problem you're trying to solve
Your proposed solution (if any)
Any alternatives you've considered

Code of Conduct

Be respectful and constructive in all interactions. We're building something together.

Questions?

Open a GitHub Discussion for general questions
Check existing issues before opening new ones

License

By contributing to pg_textsearch, you agree that your contributions will be licensed under the PostgreSQL License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to pg_textsearch

Getting Started

Development Setup

Pre-commit Hooks (Recommended)

Making Changes

Code Style

Testing Requirements

Benchmarks

Running Benchmarks On-Demand

Viewing Results

Local Benchmarks

Benchmark Datasets

Performance Dashboard

Regression Alerts

Commit Guidelines

Pull Request Process

PR Description

Reporting Issues

Bug Reports

Feature Requests

Code of Conduct

Questions?

License

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to pg_textsearch

Getting Started

Development Setup

Pre-commit Hooks (Recommended)

Making Changes

Code Style

Testing Requirements

Benchmarks

Running Benchmarks On-Demand

Viewing Results

Local Benchmarks

Benchmark Datasets

Performance Dashboard

Regression Alerts

Commit Guidelines

Pull Request Process

PR Description

Reporting Issues

Bug Reports

Feature Requests

Code of Conduct

Questions?

License