Skip to content

feat: add precision-summary field with clean QPS, P50, P95 metrics #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 78 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,84 @@ scenario against which it should be tested. A specific scenario may assume
running the server in a single or distributed mode, a different client
implementation and the number of client instances.


## Quick Start

### Quick Start with Docker

The easiest way to run vector-db-benchmark is using Docker. We provide pre-built images on Docker Hub.

```bash
# Pull the latest image
docker pull filipe958/vector-db-benchmark:latest

# Run with help
docker run --rm filipe958/vector-db-benchmark:latest run.py --help

# Check which datasets are available
docker run --rm filipe958/vector-db-benchmark:latest run.py --describe datasets

# Basic Redis benchmark with local Redis
docker run --rm -v $(pwd)/results:/app/results --network=host \
filipe958/vector-db-benchmark:latest \
run.py --host localhost --engines redis-default-simple --datasets glove-25-angular

# At the end of the run, you will find the results in the `results` directory. Lets open the summary one, in the precision summary

$ jq ".precision_summary" results/*-summary.json
{
"0.91": {
"qps": 1924.5,
"p50": 49.828,
"p95": 58.427
},
"0.94": {
"qps": 1819.9,
"p50": 51.68,
"p95": 66.83
},
"0.9775": {
"qps": 1477.8,
"p50": 65.368,
"p95": 73.849
},
"0.9950": {
"qps": 1019.8,
"p50": 95.115,
"p95": 106.73
}
}
```

### Using with Redis

For testing with Redis, start a Redis container first:

```bash
# Start Redis container
docker run -d --name redis-test -p 6379:6379 redis:8.2-rc1-bookworm

# Run benchmark against Redis

docker run --rm -v $(pwd)/results:/app/results --network=host \
filipe958/vector-db-benchmark:latest \
run.py --host localhost --engines redis-default-simple --dataset random-100

# Or use the convenience script
./docker-run.sh -H localhost -e redis-default-simple -d random-100


# Clean up Redis container when done
docker stop redis-test && docker rm redis-test
```

### Available Docker Images

- **Latest**: `filipe958/vector-db-benchmark:latest`

For detailed Docker setup and publishing information, see [DOCKER_SETUP.md](DOCKER_SETUP.md).


## Data sets

We have a number of precomputed data sets. All data sets have been pre-split into train/test and include ground truth data for the top-100 nearest neighbors.
Expand Down Expand Up @@ -71,59 +149,6 @@ We have a number of precomputed data sets. All data sets have been pre-split int
| Random Match Keyword Small Vocab-256: Small vocabulary keyword matching (no filters) | 256 | 1,000,000 | 10,000 | 100 | Cosine |


## 🐳 Docker Usage

The easiest way to run vector-db-benchmark is using Docker. We provide pre-built images on Docker Hub.

### Quick Start with Docker

```bash
# Pull the latest image
docker pull filipe958/vector-db-benchmark:latest

# Run with help
docker run --rm filipe958/vector-db-benchmark:latest run.py --help


# Basic Redis benchmark with local Redis (recommended)
docker run --rm -v $(pwd)/results:/app/results --network=host \
filipe958/vector-db-benchmark:latest \
run.py --host localhost --engines redis-default-simple --dataset random-100

# Without results output
docker run --rm --network=host filipe958/vector-db-benchmark:latest \
run.py --host localhost --engines redis-default-simple --dataset random-100

```

### Using with Redis

For testing with Redis, start a Redis container first:

```bash
# Start Redis container
docker run -d --name redis-test -p 6379:6379 redis:8.2-rc1-bookworm

# Run benchmark against Redis

docker run --rm -v $(pwd)/results:/app/results --network=host \
filipe958/vector-db-benchmark:latest \
run.py --host localhost --engines redis-default-simple --dataset random-100

# Or use the convenience script
./docker-run.sh -H localhost -e redis-default-simple -d random-100


# Clean up Redis container when done
docker stop redis-test && docker rm redis-test
```

### Available Docker Images

- **Latest**: `filipe958/vector-db-benchmark:latest`

For detailed Docker setup and publishing information, see [DOCKER_SETUP.md](DOCKER_SETUP.md).

## How to run a benchmark?

Benchmarks are implemented in server-client mode, meaning that the server is
Expand Down
29 changes: 25 additions & 4 deletions engine/base_client/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,16 @@ def format_precision_key(precision_value: float) -> str:
return f"{rounded:.4f}"


def analyze_precision_performance(search_results: Dict[str, Any]) -> Dict[str, Dict[str, Any]]:
"""Analyze search results to find best RPS at each actual precision level achieved."""
def analyze_precision_performance(search_results: Dict[str, Any]) -> tuple[Dict[str, Dict[str, Any]], Dict[str, Dict[str, float]]]:
"""Analyze search results to find best RPS at each actual precision level achieved.

Returns:
tuple: (precision_dict, precision_summary_dict)
- precision_dict: Full precision analysis with config details
- precision_summary_dict: Simplified summary with just QPS, P50, P95
"""
precision_dict = {}
precision_summary_dict = {}

# First, collect all actual precision levels achieved by experiments and format them
precision_mapping = {} # Maps formatted precision to actual precision
Expand All @@ -53,6 +60,8 @@ def analyze_precision_performance(search_results: Dict[str, Any]) -> Dict[str, D
best_rps = 0
best_config = None
best_experiment_id = None
best_p50_time = 0
best_p95_time = 0

for experiment_id, experiment_data in search_results.items():
mean_precision = experiment_data["results"]["mean_precisions"]
Expand All @@ -66,16 +75,26 @@ def analyze_precision_performance(search_results: Dict[str, Any]) -> Dict[str, D
"search_params": experiment_data["params"]["search_params"]
}
best_experiment_id = experiment_id
best_p50_time = experiment_data["results"]["p50_time"]
best_p95_time = experiment_data["results"]["p95_time"]

# Add to precision dict with the formatted precision as key
if best_config is not None:
# Full precision analysis (existing format)
precision_dict[formatted_precision] = {
"rps": best_rps,
"config": best_config,
"experiment_id": best_experiment_id
}

return precision_dict
# Simplified precision summary
precision_summary_dict[formatted_precision] = {
"qps": round(best_rps, 1),
"p50": round(best_p50_time * 1000, 3), # Convert to ms
"p95": round(best_p95_time * 1000, 3) # Convert to ms
}

return precision_dict, precision_summary_dict

warnings.filterwarnings("ignore", category=DeprecationWarning)

Expand Down Expand Up @@ -285,10 +304,12 @@ def run_experiment(

# Add precision analysis if search results exist
if results["search"]:
precision_analysis = analyze_precision_performance(results["search"])
precision_analysis, precision_summary = analyze_precision_performance(results["search"])
if precision_analysis: # Only add if we have precision data
results["precision"] = precision_analysis
results["precision_summary"] = precision_summary
print(f"Added precision analysis with {len(precision_analysis)} precision thresholds")
print(f"Added precision summary with {len(precision_summary)} precision levels")

summary_file = f"{self.name}-{dataset.config.name}-summary.json"
summary_path = RESULTS_DIR / summary_file
Expand Down