Request for comment: batched inference API for high-throughput redaction

### Background

The current Python API exposes a single-input redaction surface:

- `opf.redact(text: str) -> str` (module-level convenience)
- `OPF.redact(text: str, *, decode: DecodeOptions | None = None) -> str | RedactionResult`

Internally, `predict_text` in [`opf/_core/runtime.py`](https://github.com/openai/privacy-filter/blob/main/opf/_core/runtime.py) builds a batch-of-one tensor per call:

```python
window_tokens = torch.tensor(
    [list(window.tokens)],
    device=runtime.device,
    dtype=torch.int32,
)
```

The CLI wrapper in [`opf/__main__.py`](https://github.com/openai/privacy-filter/blob/main/opf/__main__.py) also iterates inputs one at a time via `iter_inputs()`.

### Motivation

The README positions OPF for "high-throughput data sanitization workflows." For that use case, batch-size-1 inference leaves a lot on the table, particularly for this architecture:

- The model is a sparse MoE (128 experts total, top-4 per token, 1.5B total / 50M active)
- Attention is banded with band size 128 (effective window 257), so per-token cost is relatively stable and doesn't scale quadratically with sequence length
- Throughput is largely bounded by expert dispatch/gather overhead, not per-token compute. Amortizing that overhead across a batch should give meaningful speedup on short-to-medium inputs.

Concretely, realistic workflows that would benefit:

- Sanitizing a corpus of chat logs / support tickets / log lines (thousands of small inputs)
- Pipeline preprocessors that redact in a streaming fashion
- CI-style batch sweeps (e.g., `find . -name "*.md" | xargs -P ... opf`, which today serializes anyway because each call re-loads the runtime)

### Proposed scope (for discussion, not committing to any shape yet)

A public `redact_many` or `redact_batch` entrypoint:

```python
def redact_many(
    self,
    texts: Sequence[str],
    *,
    decode: DecodeOptions | None = None,
    batch_size: int | None = None,
) -> list[str | RedactionResult]: ...
```

And optionally a matching CLI mode so `cat inputs.txt | opf --stdin-mode line` can batch internally instead of serializing window-by-window.

### Open questions for maintainers

Before filing a PR, I would appreciate guidance on:

1. **Appetite**: is a batched public API in-scope for this repo, or would you prefer users drive batching externally (e.g., construct their own batches via the private `predict_text` path)?
2. **API surface**: `redact_many(texts)` returning a list, or a streaming generator `redact_iter(texts)` that yields as windows complete?
3. **Batching axis**: fixed `batch_size` vs. token-budget packing (pack until N tokens), vs. both?
4. **Windowing interaction**: examples with different window counts complicate batching. Acceptable to pad the short ones, or is per-example sequential windowing with batched token-classification forward passes a better split?
5. **CLI exposure**: should a batched mode be exposed via a flag (e.g. `--batch-size N`), or kept as Python-API-only initially?

Happy to prototype whichever shape aligns with maintainer preference. I'd rather ask than submit a large PR that touches the public API in a direction you'd push back on.

### Not requesting in this issue

- Changes to the Viterbi decoder or label taxonomy
- Async / multi-GPU / model-parallel inference
- Any change to the default CLI single-input behavior

### Related

- PR #5 (open): adds `--stdin-mode {line,whole}` to redact CLI. Orthogonal to this, but the `line` mode is a natural consumer of a future internal batched path.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for comment: batched inference API for high-throughput redaction #6

Background

Motivation

Proposed scope (for discussion, not committing to any shape yet)

Open questions for maintainers

Not requesting in this issue

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Request for comment: batched inference API for high-throughput redaction #6

Description

Background

Motivation

Proposed scope (for discussion, not committing to any shape yet)

Open questions for maintainers

Not requesting in this issue

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions