Submitting to the CASE Benchmark Leaderboard

This guide explains how to evaluate your model and submit results to the CASE Benchmark leaderboard.

Prerequisites

Install case-benchmark:
```
pip install case-benchmark
```

Download benchmark data:

case-benchmark download --output-dir /path/to/benchmark

Verify download:

python -m case_benchmark.download --verify --output-dir /path/to/benchmark

Running Evaluation

Using Built-in Model Wrappers

For supported models (SpeechBrain, WeSpeaker, pyannote, NeMo, Resemblyzer):

# Install model dependencies
pip install case-benchmark[speechbrain]

# Run evaluation
case-benchmark evaluate \
    --model speechbrain \
    --benchmark-dir /path/to/benchmark \
    --output-dir results/ \
    --device cpu

Using Custom Models

Create a model wrapper implementing the EmbeddingModel interface:

from case_benchmark.models.base import EmbeddingModel
from case_benchmark import CASEBenchmark
import numpy as np
from pathlib import Path

class MyModel(EmbeddingModel):
    def load(self, device: str = "cpu") -> None:
        # Load your model
        self.model = load_my_model(device)
        self._device = device
        self._loaded = True

    def extract_embedding(self, audio_path: Path) -> np.ndarray:
        # Extract embedding from audio file
        audio = load_audio(audio_path)  # Your audio loading
        embedding = self.model.encode(audio)
        return embedding.numpy()

    @property
    def embedding_dim(self) -> int:
        return 192  # Your embedding dimension

    @property
    def name(self) -> str:
        return "My Custom Model"

# Run evaluation
benchmark = CASEBenchmark("/path/to/benchmark")
model = MyModel()
model.load("cuda")

results = benchmark.evaluate(model)
results.print_summary()
results.save("results/my_model.json")

Result Format

Your results JSON file should contain:

{
  "model_name": "My Model",
  "clean_eer": 0.0058,
  "absolute_eer": 0.0301,
  "degradation_factor": 0.0243,
  "case_score_v1": 5.03,
  "config": {
    "benchmark_dir": "/path/to/benchmark",
    "device": "cuda"
  },
  "category_breakdown": {
    "clean": 0.0058,
    "codec": 0.0173,
    "mic": 0.0059,
    "noise": 0.0073,
    "reverb": 0.0588,
    "playback": 0.0857
  },
  "protocol_results": {
    "clean_clean": {"eer": 0.0058, "min_dcf": 0.018, "num_trials": 10000},
    "clean_codec_gsm": {"eer": 0.0210, "min_dcf": 0.198, "num_trials": 10000},
    ...
  }
}

Key metrics for leaderboard ranking:

clean_eer: Baseline performance (lower is better)
degradation_factor: Robustness to carrier effects (lower is better)

See Metrics for full explanation of each metric.

Submission Requirements

To submit to the leaderboard, you need:

1. Results File

JSON file with all protocol results
Must include all 24 protocols
Generated by case-benchmark evaluate or compatible code

2. Model Card

A markdown file describing your model:

# Model Name

## Architecture
- Type: ECAPA-TDNN / ResNet / Transformer / etc.
- Parameters: X million
- Embedding dimension: 192

## Training
- Data: VoxCeleb2 (no overlap with VoxCeleb1-O test set)
- Augmentations: [list augmentations used]
- Loss: AAM-Softmax / Contrastive / etc.
- Training time: X GPU-hours

## Preprocessing
- Sample rate: 16kHz
- Features: 80-dim mel spectrogram
- Duration: variable / fixed X seconds

## Reproducibility
- Code: [link to code if available]
- Checkpoint: [link to weights if available]

3. Verification

We require that:

Your training data does NOT include VoxCeleb1-O test speakers
Results are reproducible (we may re-run evaluation)
Model card accurately describes the system

How to Submit

Option 1: GitHub Pull Request

Fork the gittb/case-benchmark repository
Add your results to results/<model_name>/:
- results.json - evaluation results
- model_card.md - model description
Open a pull request

Option 2: GitHub Issue

Open an issue in the repository
Attach your results JSON and model card
Include contact information for verification

Leaderboard Rules

No VoxCeleb1-O training: Models must not be trained on VoxCeleb1-O test set speakers
Reproducibility: Results must be reproducible
Single model: No ensembles (unless clearly labeled)
No test-time augmentation: Standard inference only
16kHz input: All models must accept 16kHz audio

FAQ

Can I use external data for training?

Yes, as long as it doesn't include VoxCeleb1-O test speakers.

Can I use data augmentation during training?

Yes, and we encourage it! The CASE Benchmark specifically measures robustness to carrier conditions.

My model uses a different sample rate. What should I do?

Resample to 16kHz before evaluation. The benchmark audio is all 16kHz.

Can I submit multiple models?

Yes, each model should be submitted separately with its own model card.

How often is the leaderboard updated?

We aim to update within 1 week of submission verification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submitting to the CASE Benchmark Leaderboard

Prerequisites

Running Evaluation

Using Built-in Model Wrappers

Using Custom Models

Result Format

Submission Requirements

1. Results File

2. Model Card

3. Verification

How to Submit

Option 1: GitHub Pull Request

Option 2: GitHub Issue

Leaderboard Rules

FAQ

Can I use external data for training?

Can I use data augmentation during training?

My model uses a different sample rate. What should I do?

Can I submit multiple models?

How often is the leaderboard updated?

FilesExpand file tree

submission.md

Latest commit

History

submission.md

File metadata and controls

Submitting to the CASE Benchmark Leaderboard

Prerequisites

Running Evaluation

Using Built-in Model Wrappers

Using Custom Models

Result Format

Submission Requirements

1. Results File

2. Model Card

3. Verification

How to Submit

Option 1: GitHub Pull Request

Option 2: GitHub Issue

Leaderboard Rules

FAQ

Can I use external data for training?

Can I use data augmentation during training?

My model uses a different sample rate. What should I do?

Can I submit multiple models?

How often is the leaderboard updated?