feat(library): add Polygraf PII detection and masking integration#1657
feat(library): add Polygraf PII detection and masking integration#1657DataMonarch wants to merge 4 commits into
Conversation
Add PolygrafDetectionOptions and PolygrafDetection pydantic models to support configuring Polygraf as a PII detection provider, following the same pattern as PrivateAI and GLiNER integrations. The config block supports server_endpoint and per-stage (input, output, retrieval) entity lists for selective PII detection. Co-authored-by: Cursor <cursoragent@cursor.com>
Implement the Polygraf library module with: - request.py: async HTTP client for the Polygraf PII text-detect API, using aiohttp with Bearer token auth via POLYGRAF_API_KEY env var. - actions.py: polygraf_detect_pii and polygraf_mask_pii actions that read config, call the API, and filter/mask entities by type. Masking replaces detected spans with <ENTITY_TYPE> placeholders. - flows.v1.co: Colang v1.0 subflows for detect/mask on input, output, and retrieval stages. - flows.co: Colang v2.x flows for the same stages. Follows the established patterns from the GLiNER and PrivateAI integrations for action signatures, error handling, and flow structure. Co-authored-by: Cursor <cursoragent@cursor.com>
Add 7 tests covering: - No-op when no detection/masking flows are configured - Input/output/retrieval PII detection (blocking) - Input/output/retrieval PII masking (entity replacement) Tests use mock actions registered directly with the app (matching the GLiNER test pattern) to avoid depending on a live Polygraf server. Co-authored-by: Cursor <cursoragent@cursor.com>
- Add docs/user-guides/community/polygraf.md with setup, configuration, and entity type reference for the Polygraf PII integration. - Update guardrail-catalog.md with a Polygraf PII Detection section alongside the existing PrivateAI and GLiNER entries. - Update overview.md to list Polygraf under PII detection providers. - Add example configs for pii_detection and pii_masking use cases. Co-authored-by: Cursor <cursoragent@cursor.com>
Documentation preview |
Greptile SummaryAdded Polygraf as a new PII detection and masking provider following the established patterns from PrivateAI and GLiNER integrations. Major Changes:
Implementation Details:
Architecture:
|
| Filename | Overview |
|---|---|
| nemoguardrails/library/polygraf/actions.py | Implements core PII detection and masking actions with proper error handling and entity filtering |
| nemoguardrails/library/polygraf/request.py | Handles HTTP requests to Polygraf API; check API_Key header format (uses 'Bearer' prefix) |
| nemoguardrails/rails/llm/config.py | Adds Polygraf configuration schema matching existing PII provider patterns |
| tests/test_polygraf.py | Comprehensive test suite with 7 tests covering detection, masking, and all flow types |
Sequence Diagram
sequenceDiagram
participant User
participant NemoGuardrails
participant PolygrafAction
participant PolygrafAPI
User->>NemoGuardrails: Send message with PII
NemoGuardrails->>PolygrafAction: polygraf_detect_pii(source, text, config)
PolygrafAction->>PolygrafAction: Get config (server_endpoint, entities)
PolygrafAction->>PolygrafAction: Get POLYGRAF_API_KEY from env
PolygrafAction->>PolygrafAPI: POST /v1/pii/text-detect<br/>{text, headers with API_Key}
PolygrafAPI-->>PolygrafAction: Return detected entities
PolygrafAction->>PolygrafAction: Filter by enabled entities
PolygrafAction-->>NemoGuardrails: Return has_pii boolean
alt PII Detected
NemoGuardrails->>User: "I can't answer that"
else No PII
NemoGuardrails->>User: Continue normal flow
end
Note over User,PolygrafAPI: Masking Flow
User->>NemoGuardrails: Send message with PII
NemoGuardrails->>PolygrafAction: polygraf_mask_pii(source, text, config)
PolygrafAction->>PolygrafAPI: POST /v1/pii/text-detect
PolygrafAPI-->>PolygrafAction: Return detected entities
PolygrafAction->>PolygrafAction: Replace entities with <ENTITY_TYPE>
PolygrafAction-->>NemoGuardrails: Return masked text
NemoGuardrails->>User: Process with masked content
Last reviewed commit: 20f2588
| headers: Dict[str, str] = {"Content-Type": "application/json"} | ||
|
|
||
| if api_key: | ||
| headers["API_Key"] = f"Bearer {api_key}" |
There was a problem hiding this comment.
Verify the API_Key header name and Bearer prefix format match Polygraf API expectations - some APIs use Authorization or X-API-Key without Bearer prefix
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/library/polygraf/request.py
Line: 46
Comment:
Verify the `API_Key` header name and `Bearer` prefix format match Polygraf API expectations - some APIs use `Authorization` or `X-API-Key` without Bearer prefix
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
a6be550 to
c69efe5
Compare
|
closing in favor of #1693 |
Summary
same patterns as the existing PrivateAI and GLiNER integrations.
unit tests (7 passing), user guide documentation, and example configs.
Testing
pytest tests/test_polygraf.py)