Background
The current Python API exposes a single-input redaction surface:
opf.redact(text: str) -> str (module-level convenience)
OPF.redact(text: str, *, decode: DecodeOptions | None = None) -> str | RedactionResult
Internally, predict_text in opf/_core/runtime.py builds a batch-of-one tensor per call:
window_tokens = torch.tensor(
[list(window.tokens)],
device=runtime.device,
dtype=torch.int32,
)
The CLI wrapper in opf/__main__.py also iterates inputs one at a time via iter_inputs().
Motivation
The README positions OPF for "high-throughput data sanitization workflows." For that use case, batch-size-1 inference leaves a lot on the table, particularly for this architecture:
- The model is a sparse MoE (128 experts total, top-4 per token, 1.5B total / 50M active)
- Attention is banded with band size 128 (effective window 257), so per-token cost is relatively stable and doesn't scale quadratically with sequence length
- Throughput is largely bounded by expert dispatch/gather overhead, not per-token compute. Amortizing that overhead across a batch should give meaningful speedup on short-to-medium inputs.
Concretely, realistic workflows that would benefit:
- Sanitizing a corpus of chat logs / support tickets / log lines (thousands of small inputs)
- Pipeline preprocessors that redact in a streaming fashion
- CI-style batch sweeps (e.g.,
find . -name "*.md" | xargs -P ... opf, which today serializes anyway because each call re-loads the runtime)
Proposed scope (for discussion, not committing to any shape yet)
A public redact_many or redact_batch entrypoint:
def redact_many(
self,
texts: Sequence[str],
*,
decode: DecodeOptions | None = None,
batch_size: int | None = None,
) -> list[str | RedactionResult]: ...
And optionally a matching CLI mode so cat inputs.txt | opf --stdin-mode line can batch internally instead of serializing window-by-window.
Open questions for maintainers
Before filing a PR, I would appreciate guidance on:
- Appetite: is a batched public API in-scope for this repo, or would you prefer users drive batching externally (e.g., construct their own batches via the private
predict_text path)?
- API surface:
redact_many(texts) returning a list, or a streaming generator redact_iter(texts) that yields as windows complete?
- Batching axis: fixed
batch_size vs. token-budget packing (pack until N tokens), vs. both?
- Windowing interaction: examples with different window counts complicate batching. Acceptable to pad the short ones, or is per-example sequential windowing with batched token-classification forward passes a better split?
- CLI exposure: should a batched mode be exposed via a flag (e.g.
--batch-size N), or kept as Python-API-only initially?
Happy to prototype whichever shape aligns with maintainer preference. I'd rather ask than submit a large PR that touches the public API in a direction you'd push back on.
Not requesting in this issue
- Changes to the Viterbi decoder or label taxonomy
- Async / multi-GPU / model-parallel inference
- Any change to the default CLI single-input behavior
Related
Background
The current Python API exposes a single-input redaction surface:
opf.redact(text: str) -> str(module-level convenience)OPF.redact(text: str, *, decode: DecodeOptions | None = None) -> str | RedactionResultInternally,
predict_textinopf/_core/runtime.pybuilds a batch-of-one tensor per call:The CLI wrapper in
opf/__main__.pyalso iterates inputs one at a time viaiter_inputs().Motivation
The README positions OPF for "high-throughput data sanitization workflows." For that use case, batch-size-1 inference leaves a lot on the table, particularly for this architecture:
Concretely, realistic workflows that would benefit:
find . -name "*.md" | xargs -P ... opf, which today serializes anyway because each call re-loads the runtime)Proposed scope (for discussion, not committing to any shape yet)
A public
redact_manyorredact_batchentrypoint:And optionally a matching CLI mode so
cat inputs.txt | opf --stdin-mode linecan batch internally instead of serializing window-by-window.Open questions for maintainers
Before filing a PR, I would appreciate guidance on:
predict_textpath)?redact_many(texts)returning a list, or a streaming generatorredact_iter(texts)that yields as windows complete?batch_sizevs. token-budget packing (pack until N tokens), vs. both?--batch-size N), or kept as Python-API-only initially?Happy to prototype whichever shape aligns with maintainer preference. I'd rather ask than submit a large PR that touches the public API in a direction you'd push back on.
Not requesting in this issue
Related
--stdin-mode {line,whole}to redact CLI. Orthogonal to this, but thelinemode is a natural consumer of a future internal batched path.