diff --git a/docs/about/overview.mdx b/docs/about/overview.mdx index ba37f29470..3ddc6f59b2 100644 --- a/docs/about/overview.mdx +++ b/docs/about/overview.mdx @@ -103,6 +103,7 @@ The NeMo Guardrails library supports PII detection through multiple integrations - **Gliner**: Use [NVIDIA GLiNER-PII](/configure-guardrails/guardrail-catalog/third-party/gliner) for detecting entities such as names, email addresses, phone numbers, social security numbers, and more. - **Presidio-based detection**: Use [Microsoft Presidio](/configure-guardrails/guardrail-catalog/third-party/presidio) for detecting entities such as names, email addresses, phone numbers, social security numbers, and more. - **Private AI**: Integrate with [Private AI](/configure-guardrails/guardrail-catalog/third-party/privateai) for advanced PII detection and masking. +- **Polygraf**: Integrate with [Polygraf](/configure-guardrails/guardrail-catalog/third-party/polygraf) for advanced PII detection and masking. - **AutoAlign**: Use [AutoAlign PII detection](/configure-guardrails/guardrail-catalog/third-party/auto-align) with customizable entity types. - **GuardrailsAI**: Access [GuardrailsAI PII validators](/configure-guardrails/guardrail-catalog/third-party/guardrails-ai) from the Guardrails Hub. diff --git a/docs/configure-rails/guardrail-catalog/community/polygraf.mdx b/docs/configure-rails/guardrail-catalog/community/polygraf.mdx new file mode 100644 index 0000000000..a6a4ce7178 --- /dev/null +++ b/docs/configure-rails/guardrail-catalog/community/polygraf.mdx @@ -0,0 +1,126 @@ +--- +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +title: "Polygraf Integration" +--- +[Polygraf](https://polygraf.ai/) offers a state-of-the-art PII detection and masking API designed to help identify and protect sensitive information in your text. This integration enables NeMo Guardrails to use Polygraf for PII detection and masking in input, output, and retrieval flows. + +## Setup + +1. Obtain a Polygraf API key and set it as an environment variable so the integration can authenticate cloud requests: + + ```bash + export POLYGRAF_API_KEY="your-polygraf-api-key" + ``` + +2. Pick the endpoint that matches your deployment: + + - For **Polygraf cloud**, use `https://governance.api.polygraf.ai/gcp/pii/text-detect`. + - For **self-hosted** deployments, set this to your service endpoint (the local default is typically `http://localhost:8000/v1/pii/text-detect`). + +3. Update your `config.yml` file to include the Polygraf settings. + + **PII detection config** + + ```yaml + rails: + config: + polygraf: + server_endpoint: "https://governance.api.polygraf.ai/gcp/pii/text-detect" + input: + entities: # If no entity is specified here, any detected PII will trigger the rail. + - Email + - Person + - Phone + output: + entities: + - Email + - Person + - Phone + input: + flows: + - polygraf detect pii on input + output: + flows: + - polygraf detect pii on output + ``` + + The detection flow blocks the input, output, or retrieval text if PII is detected and an entity match is configured. + + **PII masking config** + + ```yaml + rails: + config: + polygraf: + server_endpoint: "https://governance.api.polygraf.ai/gcp/pii/text-detect" + input: + entities: + - Email + - Person + - Phone + output: + entities: + - Email + - Person + - Phone + input: + flows: + - polygraf mask pii on input + output: + flows: + - polygraf mask pii on output + ``` + + The masking flow replaces detected PII spans with `` placeholders. For example, `Hi John, my email is john@example.com` becomes `Hi , my email is `. + +### Retrieval Flows + +To detect or mask PII in retrieved documents, configure the `retrieval` entities and enable the retrieval flow variant: + +```yaml +rails: + config: + polygraf: + server_endpoint: "https://governance.api.polygraf.ai/gcp/pii/text-detect" + retrieval: + entities: + - Email + - Person + - Phone + + retrieval: + flows: + - polygraf detect pii on retrieval + # or for masking: + # - polygraf mask pii on retrieval +``` + +## Usage + +Once configured, the Polygraf integration can automatically: + +1. Detect or mask PII in user inputs before they are processed by the LLM. +2. Detect or mask PII in LLM outputs before they are sent back to the user. +3. Detect or mask PII in retrieved chunks before they are sent to the LLM. + +The `polygraf_detect_pii` and `polygraf_mask_pii` actions in `nemoguardrails/library/polygraf/actions.py` handle the PII detection and masking processes, respectively. + +## Entity Types + +You can customize the PII handling behavior by modifying the `entities` lists under `input`, `output`, and `retrieval`. Entity labels should match the labels returned by your Polygraf deployment. Common entities include: + +- `Person` +- `Email` +- `Phone` + +For a complete list of supported entity types, refer to the [Polygraf documentation](https://polygraf.ai/api-agents/). + +## Failure Handling + +The integration is fail-closed: a Polygraf failure must not allow potentially-PII text to pass through the rail. + +- **Provider/network failure** (timeout, DNS, TLS, non-200 response, invalid JSON, malformed response shape). The underlying HTTP helper raises `ValueError`, which the actions catch internally. `polygraf detect pii on …` returns `True` (the rail blocks the message). `polygraf mask pii on …` replaces the entire payload with the `` placeholder. The actions log a structural warning (failure category only); request bodies, response bodies, and entity values are never logged. +- **Malformed entity span** (Polygraf returns an entry without a known `entity_type`, or with non-integer offsets, or with offsets outside `0 <= start < end <= len(text)`). The actions also fail closed: detection blocks the message and masking redacts the whole payload, rather than silently skipping the malformed span and forwarding the rest. +- **Default timeout**: `30` seconds per call. Slow or unreachable endpoints cannot hang the rail pipeline. +- **Missing API key**: if `POLYGRAF_API_KEY` is not set, the integration logs a warning since cloud endpoints typically reject unauthenticated requests, and proceeds to call the endpoint without an `Authorization` header. diff --git a/docs/index.yml b/docs/index.yml index a6a1b2c9f2..19aa15eb56 100644 --- a/docs/index.yml +++ b/docs/index.yml @@ -179,6 +179,9 @@ navigation: - page: PolicyAI path: configure-rails/guardrail-catalog/community/policyai.mdx slug: policyai + - page: Polygraf + path: configure-rails/guardrail-catalog/community/polygraf.mdx + slug: polygraf - page: Presidio path: configure-rails/guardrail-catalog/community/presidio.mdx slug: presidio diff --git a/examples/configs/polygraf/pii_detection/config.yml b/examples/configs/polygraf/pii_detection/config.yml new file mode 100644 index 0000000000..853d31c389 --- /dev/null +++ b/examples/configs/polygraf/pii_detection/config.yml @@ -0,0 +1,30 @@ +models: + - type: main + engine: openai + model: gpt-3.5-turbo-instruct + +rails: + config: + polygraf: + # For Polygraf cloud, use the governance endpoint. For self-hosted deployments, + # set this to your service endpoint (the default local example is typically + # "http://localhost:8000/v1/pii/text-detect"). + server_endpoint: "https://governance.api.polygraf.ai/gcp/pii/text-detect" + input: + entities: + - Email + - Person + - Phone + output: + entities: + - Email + - Person + - Phone + + input: + flows: + - polygraf detect pii on input + + output: + flows: + - polygraf detect pii on output diff --git a/examples/configs/polygraf/pii_masking/config.yml b/examples/configs/polygraf/pii_masking/config.yml new file mode 100644 index 0000000000..b758858d67 --- /dev/null +++ b/examples/configs/polygraf/pii_masking/config.yml @@ -0,0 +1,30 @@ +models: + - type: main + engine: openai + model: gpt-3.5-turbo-instruct + +rails: + config: + polygraf: + # For Polygraf cloud, use the governance endpoint. For self-hosted deployments, + # set this to your service endpoint (the default local example is typically + # "http://localhost:8000/v1/pii/text-detect"). + server_endpoint: "https://governance.api.polygraf.ai/gcp/pii/text-detect" + input: + entities: + - Email + - Person + - Phone + output: + entities: + - Email + - Person + - Phone + + input: + flows: + - polygraf mask pii on input + + output: + flows: + - polygraf mask pii on output diff --git a/nemoguardrails/library/polygraf/__init__.py b/nemoguardrails/library/polygraf/__init__.py new file mode 100644 index 0000000000..6c7f64065d --- /dev/null +++ b/nemoguardrails/library/polygraf/__init__.py @@ -0,0 +1,14 @@ +# SPDX-FileCopyrightText: Copyright (c) 2023-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/nemoguardrails/library/polygraf/actions.py b/nemoguardrails/library/polygraf/actions.py new file mode 100644 index 0000000000..3f57b43ee6 --- /dev/null +++ b/nemoguardrails/library/polygraf/actions.py @@ -0,0 +1,272 @@ +# SPDX-FileCopyrightText: Copyright (c) 2023-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""PII detection using Polygraf.""" + +import logging +import os +from typing import Any, Dict, List, Optional, Tuple + +from nemoguardrails import RailsConfig +from nemoguardrails.actions.actions import action +from nemoguardrails.library.polygraf.request import polygraf_request +from nemoguardrails.rails.llm.config import PolygrafDetection + +log = logging.getLogger(__name__) + +# Placeholder returned when masking cannot complete safely (provider failure +# or a configured entity span is malformed). Replacing the entire payload is +# the only fail-closed option that guarantees no raw PII passes downstream. +FAILSAFE_MASK_PLACEHOLDER = "" + + +def detect_pii_mapping(result: bool) -> bool: + """ + Mapping for polygraf_detect_pii. + + Since the function returns True when PII is detected, + we block if result is True. + """ + return result + + +def _get_polygraf_api_key() -> Optional[str]: + api_key = os.environ.get("POLYGRAF_API_KEY") + if not api_key: + log.warning( + "POLYGRAF_API_KEY environment variable is not set. " + "Polygraf cloud endpoints may reject unauthenticated requests." + ) + return api_key + + +def _entity_shape(entity: Any) -> str: + """Return a PII-free structural description of an entity for logging.""" + if isinstance(entity, dict): + return f"dict(keys={sorted(entity.keys())})" + return type(entity).__name__ + + +def _is_int(value: Any) -> bool: + """Strict integer check. + + ``bool`` is a subclass of ``int`` in Python, so a plain ``isinstance(x, int)`` + would accept ``True``/``False`` as valid offsets. Explicitly reject booleans + so a Polygraf response cannot smuggle bogus span coordinates past validation. + """ + return isinstance(value, int) and not isinstance(value, bool) + + +def _classify_entities( + entities: List[Any], + enabled_entities: Optional[List[str]], + text_length: int, +) -> Tuple[List[Tuple[int, int, str]], bool]: + """Split Polygraf entities into safe spans and report whether any + span is unsafe enough to require failing closed. + + Args: + entities: Raw entity records returned by Polygraf. + enabled_entities: The configured entity-type filter (or ``None`` to + accept every Polygraf-reported type). + text_length: Length of the original payload, used to validate that + integer offsets actually point inside the text. + + Returns: + (safe_spans, has_malformed_selected) + + - safe_spans: ``(start, end, entity_type)`` triples for entities that + pass the entity-type filter AND have a non-empty type AND have + strict integer offsets satisfying ``0 <= start < end <= text_length``. + - has_malformed_selected: ``True`` if any entity that *might* be a + selected PII span is malformed. Callers must treat this as a + fail-closed signal because there is no safe way to either trust + a missing-type entity or silently drop it. + """ + safe_spans: List[Tuple[int, int, str]] = [] + has_malformed_selected = False + + for entity in entities: + if not isinstance(entity, dict): + # Non-dict entries can't carry an entity_type or offsets we can + # validate, so they always count as a malformed-selected span. + # Log only the shape, never the value. + log.warning("Skipping malformed Polygraf entity: shape=%s", _entity_shape(entity)) + has_malformed_selected = True + continue + + entity_type = entity.get("entity_type") + start = entity.get("start") + end = entity.get("end") + + invalid_fields: List[str] = [] + if not isinstance(entity_type, str) or not entity_type: + invalid_fields.append("entity_type") + if not _is_int(start): + invalid_fields.append("start") + if not _is_int(end): + invalid_fields.append("end") + + if invalid_fields: + log.warning( + "Skipping malformed Polygraf entity: invalid_fields=%s keys=%s", + invalid_fields, + sorted(entity.keys()), + ) + # Fail closed conservatively: + # - Unknown entity_type: we cannot tell whether the filter would + # have selected it, so assume it would have. + # - Known entity_type missing from the filter: silently skip. + if "entity_type" in invalid_fields: + has_malformed_selected = True + elif enabled_entities is None or entity_type in enabled_entities: + has_malformed_selected = True + continue + + # Offsets are now known-good integers. Validate they actually point + # inside the text and form a non-empty, non-reversed span. + if not (0 <= start < end <= text_length): + log.warning( + "Skipping malformed Polygraf entity: out-of-range span (start=%d end=%d text_length=%d) keys=%s", + start, + end, + text_length, + sorted(entity.keys()), + ) + if enabled_entities is None or entity_type in enabled_entities: + has_malformed_selected = True + continue + + if enabled_entities and entity_type not in enabled_entities: + continue + + safe_spans.append((start, end, entity_type)) + + return safe_spans, has_malformed_selected + + +def _resolve_source_config(config: RailsConfig, source: str) -> Tuple[PolygrafDetection, Any, Optional[List[str]]]: + """Resolve the Polygraf config and per-source entity filter, validating ``source``.""" + polygraf_config: PolygrafDetection = getattr(config.rails.config, "polygraf") + source_config = getattr(polygraf_config, source, None) + if source_config is None: + valid_sources = ["input", "output", "retrieval"] + raise ValueError( + f"Polygraf can only be defined in the following flows: {valid_sources}. " + f"The current flow, '{source}', is not allowed." + ) + enabled_entities = source_config.entities if source_config.entities else None + return polygraf_config, source_config, enabled_entities + + +@action(is_system_action=False, output_mapping=detect_pii_mapping) +async def polygraf_detect_pii( + source: str, + text: str, + config: RailsConfig, + **kwargs, +) -> bool: + """Checks whether the provided text contains any PII using Polygraf. + + Args: + source: The source for the text, i.e. "input", "output", "retrieval". + text: The text to check. + config: The rails configuration object. + + Returns: + True if PII is detected (or if the detection cannot complete safely), + False otherwise. + + Raises: + ValueError: Only if ``source`` is not one of the allowed flows. + Provider/network failures are caught and treated as fail-closed + (the action returns True so the rail blocks the message). + """ + polygraf_config, _source_config, enabled_entities = _resolve_source_config(config, source) + server_endpoint = polygraf_config.server_endpoint + api_key = _get_polygraf_api_key() + session = kwargs.get("session") + + try: + entities: List[Dict[str, Any]] = await polygraf_request(text, server_endpoint, api_key, session=session) + except ValueError as err: + # Fail closed: a provider failure must not allow potentially-PII text + # through. Log only the failure category, never the input text or + # exception chain (which can contain response bodies with PII). + log.warning("Polygraf detection failed (%s); failing closed and blocking text.", type(err).__name__) + return True + + if not entities: + return False + + safe_spans, has_malformed_selected = _classify_entities(entities, enabled_entities, len(text)) + + # If a *selected* entity was malformed, treat the whole result as untrusted + # and fail closed even if other valid entities had no enabled match. + if has_malformed_selected: + log.warning("Polygraf returned a malformed selected entity; failing closed and blocking text.") + return True + + return len(safe_spans) > 0 + + +@action(is_system_action=False) +async def polygraf_mask_pii(source: str, text: str, config: RailsConfig, **kwargs) -> str: + """Masks any detected PII in the provided text using Polygraf. + + Args: + source: The source for the text, i.e. "input", "output", "retrieval". + text: The text to check. + config: The rails configuration object. + + Returns: + The altered text with PII masked. Returns ``FAILSAFE_MASK_PLACEHOLDER`` + when masking cannot complete safely (provider failure or a configured + entity span is malformed), so raw PII is never sent downstream. + + Raises: + ValueError: Only if ``source`` is not one of the allowed flows. + Provider/network failures are caught and treated as fail-closed. + """ + polygraf_config, _source_config, enabled_entities = _resolve_source_config(config, source) + server_endpoint = polygraf_config.server_endpoint + api_key = _get_polygraf_api_key() + session = kwargs.get("session") + + try: + entities: List[Dict[str, Any]] = await polygraf_request(text, server_endpoint, api_key, session=session) + except ValueError as err: + # Fail closed: if we cannot run masking at all, redact the entire text + # rather than risk forwarding raw PII downstream. + log.warning("Polygraf masking failed (%s); replacing payload with redaction placeholder.", type(err).__name__) + return FAILSAFE_MASK_PLACEHOLDER + + if not entities: + return text + + safe_spans, has_malformed_selected = _classify_entities(entities, enabled_entities, len(text)) + + if has_malformed_selected: + # A configured entity was reported with invalid offsets / type. We + # cannot guarantee in-place masking, so fail closed by redacting the + # entire payload instead of returning partially-masked text. + log.warning("Polygraf returned a malformed selected entity; replacing payload with redaction placeholder.") + return FAILSAFE_MASK_PLACEHOLDER + + masked_text = text + for start, end, entity_type in sorted(safe_spans, key=lambda x: x[0], reverse=True): + masked_text = masked_text[:start] + f"<{entity_type}>" + masked_text[end:] + + return masked_text diff --git a/nemoguardrails/library/polygraf/flows.co b/nemoguardrails/library/polygraf/flows.co new file mode 100644 index 0000000000..95a1211942 --- /dev/null +++ b/nemoguardrails/library/polygraf/flows.co @@ -0,0 +1,62 @@ +#### POLYGRAF PII DETECTION RAILS #### + +# INPUT RAILS + +flow polygraf detect pii on input + """Check if the user input has PII using Polygraf.""" + global $user_message + $has_pii = await PolygrafDetectPiiAction(source="input", text=$user_message) + + if $has_pii + bot inform answer unknown + abort + + +# OUTPUT RAILS + +flow polygraf detect pii on output + """Check if the bot output has PII using Polygraf.""" + global $bot_message + $has_pii = await PolygrafDetectPiiAction(source="output", text=$bot_message) + + if $has_pii + bot inform answer unknown + abort + + +# RETRIEVAL RAILS + +flow polygraf detect pii on retrieval + """Check if the relevant chunks from the knowledge base have any PII using Polygraf.""" + global $relevant_chunks + $has_pii = await PolygrafDetectPiiAction(source="retrieval", text=$relevant_chunks) + + if $has_pii + bot inform answer unknown + abort + + +#### POLYGRAF PII MASKING RAILS #### + +# INPUT RAILS + +flow polygraf mask pii on input + """Mask any detected PII in the user input using Polygraf.""" + global $user_message + $user_message = await PolygrafMaskPiiAction(source="input", text=$user_message) + + +# OUTPUT RAILS + +flow polygraf mask pii on output + """Mask any detected PII in the bot output using Polygraf.""" + global $bot_message + $bot_message = await PolygrafMaskPiiAction(source="output", text=$bot_message) + + +# RETRIEVAL RAILS + +flow polygraf mask pii on retrieval + """Mask any detected PII in the relevant chunks from the knowledge base using Polygraf.""" + global $relevant_chunks + $relevant_chunks = await PolygrafMaskPiiAction(source="retrieval", text=$relevant_chunks) diff --git a/nemoguardrails/library/polygraf/flows.v1.co b/nemoguardrails/library/polygraf/flows.v1.co new file mode 100644 index 0000000000..0e5dd7078d --- /dev/null +++ b/nemoguardrails/library/polygraf/flows.v1.co @@ -0,0 +1,58 @@ +#### POLYGRAF PII DETECTION RAILS #### + +# INPUT RAILS + +define subflow polygraf detect pii on input + """Check if the user input has PII using Polygraf.""" + $has_pii = execute polygraf_detect_pii(source="input", text=$user_message) + + if $has_pii + bot inform answer unknown + stop + + +# OUTPUT RAILS + +define subflow polygraf detect pii on output + """Check if the bot output has PII using Polygraf.""" + $has_pii = execute polygraf_detect_pii(source="output", text=$bot_message) + + if $has_pii + bot inform answer unknown + stop + + +# RETRIEVAL RAILS + +define subflow polygraf detect pii on retrieval + """Check if the relevant chunks from the knowledge base have any PII using Polygraf.""" + $has_pii = execute polygraf_detect_pii(source="retrieval", text=$relevant_chunks) + + if $has_pii + bot inform answer unknown + stop + + +#### POLYGRAF PII MASKING RAILS #### + +# INPUT RAILS + +define subflow polygraf mask pii on input + """Mask any detected PII in the user input using Polygraf.""" + $masked_input = execute polygraf_mask_pii(source="input", text=$user_message) + + $user_message = $masked_input + + +# OUTPUT RAILS + +define subflow polygraf mask pii on output + """Mask any detected PII in the bot output using Polygraf.""" + $bot_message = execute polygraf_mask_pii(source="output", text=$bot_message) + + +# RETRIEVAL RAILS + +define subflow polygraf mask pii on retrieval + """Mask any detected PII in the relevant chunks from the knowledge base using Polygraf.""" + $relevant_chunks = execute polygraf_mask_pii(source="retrieval", text=$relevant_chunks) diff --git a/nemoguardrails/library/polygraf/request.py b/nemoguardrails/library/polygraf/request.py new file mode 100644 index 0000000000..1a4092c280 --- /dev/null +++ b/nemoguardrails/library/polygraf/request.py @@ -0,0 +1,127 @@ +# SPDX-FileCopyrightText: Copyright (c) 2023-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Module for handling Polygraf PII detection requests.""" + +import asyncio +from typing import Any, Dict, List, Optional + +import aiohttp + +# Default per-request timeout for Polygraf calls. Matches the timeout pattern +# used by other community guardrail integrations and prevents hung rails when +# the Polygraf endpoint is unresponsive. +DEFAULT_TIMEOUT_SECONDS = 30 + + +async def polygraf_request( + text: str, + server_endpoint: str, + api_key: Optional[str] = None, + session: Optional[aiohttp.ClientSession] = None, + timeout: float = DEFAULT_TIMEOUT_SECONDS, +) -> List[Dict[str, Any]]: + """Send a PII detection request to the Polygraf API. + + Args: + text: The text to analyze. + server_endpoint: The API endpoint URL. + api_key: The API key for the Polygraf service. + session: Optional shared aiohttp session. Passing a session lets callers + reuse connections across multiple PII checks. + timeout: Per-request timeout in seconds. Applied to both caller-provided + and internally created sessions. + + Returns: + The list of entities detected by the Polygraf server. + + Raises: + ValueError: If the API call fails, times out, or the response cannot + be parsed as JSON. + """ + # Polygraf request payload. Some deployments accept/require additional flags + # controlling PII/PID detection and aggregation. + payload = { + "text": text, + # NOTE: Kept as `detect_pid` to match the working Polygraf API format + # provided by users of this integration. + "detect_pid": True, + "pid_granularity": 3, + "aggregate_entities": True, + } + headers: Dict[str, str] = {"Content-Type": "application/json"} + + if api_key: + headers["Authorization"] = f"Bearer {api_key}" + + client_timeout = aiohttp.ClientTimeout(total=timeout) + + if session is not None: + return await _send_polygraf_request(session, server_endpoint, payload, headers, client_timeout) + + async with aiohttp.ClientSession(timeout=client_timeout) as request_session: + return await _send_polygraf_request(request_session, server_endpoint, payload, headers, client_timeout) + + +async def _send_polygraf_request( + session: aiohttp.ClientSession, + server_endpoint: str, + payload: Dict[str, Any], + headers: Dict[str, str], + timeout: aiohttp.ClientTimeout, +) -> List[Dict[str, Any]]: + try: + try: + post_ctx = session.post(server_endpoint, json=payload, headers=headers, timeout=timeout) + except TypeError: + # Some test doubles do not accept a `timeout` kwarg; fall back to the + # session-level timeout instead. + post_ctx = session.post(server_endpoint, json=payload, headers=headers) + + async with post_ctx as resp: + if resp.status != 200: + raise ValueError(f"Polygraf call failed with status code {resp.status}.\nDetails: {await resp.text()}") + + try: + data = await resp.json() + except aiohttp.ContentTypeError as err: + raise ValueError( + f"Failed to parse Polygraf response as JSON. Status: {resp.status}, Content: {await resp.text()}" + ) from err + except asyncio.TimeoutError as err: + # `aiohttp.ClientTimeout` surfaces timeouts as asyncio.TimeoutError on + # both the connect and read paths. Normalize so callers see a single + # ValueError contract instead of asyncio plumbing exceptions. + raise ValueError(f"Polygraf call timed out after {timeout.total} seconds.") from err + except aiohttp.ClientError as err: + # DNS failures, connection resets, TLS errors, etc. should also surface + # as ValueError so the documented contract holds across all network + # failure modes. + raise ValueError(f"Polygraf call failed: {type(err).__name__}: {err}") from err + + # Polygraf may return either a raw list of entities or a wrapper object. + if isinstance(data, list): + return data + if isinstance(data, dict): + if "entities" in data: + entities = data["entities"] + if entities is None: + return [] + if isinstance(entities, list): + return entities + + raise ValueError( + "Invalid response from Polygraf service: expected a list of entities or an object with an 'entities' list." + ) diff --git a/nemoguardrails/rails/llm/config.py b/nemoguardrails/rails/llm/config.py index 7ffc635cde..75efd2e04b 100644 --- a/nemoguardrails/rails/llm/config.py +++ b/nemoguardrails/rails/llm/config.py @@ -392,6 +392,40 @@ class GLiNERDetection(BaseModel): ) +class PolygrafDetectionOptions(BaseModel): + """Configuration options for Polygraf.""" + + model_config = ConfigDict(extra="forbid") + + entities: List[str] = Field( + default_factory=list, + description="The list of entities that should be detected.", + ) + + +class PolygrafDetection(BaseModel): + """Configuration for Polygraf PII detection.""" + + model_config = ConfigDict(extra="forbid") + + server_endpoint: str = Field( + default="http://localhost:8000/v1/pii/text-detect", + description="The endpoint for the Polygraf detection server.", + ) + input: PolygrafDetectionOptions = Field( + default_factory=PolygrafDetectionOptions, + description="Configuration of the entities to be detected on the user input.", + ) + output: PolygrafDetectionOptions = Field( + default_factory=PolygrafDetectionOptions, + description="Configuration of the entities to be detected on the bot output.", + ) + retrieval: PolygrafDetectionOptions = Field( + default_factory=PolygrafDetectionOptions, + description="Configuration of the entities to be detected on retrieved relevant chunks.", + ) + + class _HFClassifierBase(BaseModel): """Shared fields for all HuggingFace classifier engines.""" @@ -1308,6 +1342,11 @@ class RailsConfigData(BaseModel): description="Configuration for GLiNER PII detection.", ) + polygraf: Optional[PolygrafDetection] = Field( + default_factory=PolygrafDetection, + description="Configuration for Polygraf PII detection.", + ) + fiddler: Optional[FiddlerGuardrails] = Field( default_factory=FiddlerGuardrails, description="Configuration for Fiddler Guardrails.", diff --git a/tests/test_polygraf.py b/tests/test_polygraf.py new file mode 100644 index 0000000000..9594661bdd --- /dev/null +++ b/tests/test_polygraf.py @@ -0,0 +1,1062 @@ +# SPDX-FileCopyrightText: Copyright (c) 2023-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Any, Dict, List, Optional + +import pytest + +from nemoguardrails import RailsConfig +from nemoguardrails.actions.actions import ActionResult, action +from nemoguardrails.library.polygraf.actions import ( + FAILSAFE_MASK_PLACEHOLDER, + polygraf_detect_pii, + polygraf_mask_pii, +) +from nemoguardrails.library.polygraf.request import polygraf_request +from tests.utils import TestChat + + +def create_polygraf_mock_response( + text: str, + entities_to_detect: Optional[List[str]] = None, +) -> List[Dict[str, Any]]: + """Create a mock Polygraf response based on the input text and entities to detect.""" + detected_entities = [] + + entity_patterns = { + "Person": ["John"], + "Email": ["test@gmail.com"], + } + + for entity_type, patterns in entity_patterns.items(): + if entities_to_detect and entity_type not in entities_to_detect: + continue + + for pattern in patterns: + start = 0 + while True: + pos = text.find(pattern, start) + if pos == -1: + break + detected_entities.append( + { + "entity_type": entity_type, + "entity_text": pattern, + "start": pos, + "end": pos + len(pattern), + "score": 0.99, + } + ) + start = pos + 1 + + return detected_entities + + +def create_mock_polygraf_detect_pii(entities_to_detect: Optional[List[str]] = None): + """Create a mock polygraf_detect_pii action that returns True when PII is detected.""" + + async def mock_polygraf_detect_pii(source: str, text: str, config, **kwargs): + entities = create_polygraf_mock_response(text, entities_to_detect) + return len(entities) > 0 + + return mock_polygraf_detect_pii + + +def create_mock_polygraf_mask_pii(entities_to_detect: Optional[List[str]] = None): + """Create a mock polygraf_mask_pii action that masks PII in text.""" + + async def mock_polygraf_mask_pii(source: str, text: str, config, **kwargs): + entities = create_polygraf_mock_response(text, entities_to_detect) + if not entities: + return text + + masked_text = text + for entity in sorted(entities, key=lambda x: x["start"], reverse=True): + start = entity["start"] + end = entity["end"] + entity_type = entity["entity_type"] + masked_text = masked_text[:start] + f"<{entity_type}>" + masked_text[end:] + + return masked_text + + return mock_polygraf_mask_pii + + +@action() +def retrieve_relevant_chunks(): + context_updates = {"relevant_chunks": "Mock retrieved context."} + + return ActionResult( + return_value=context_updates["relevant_chunks"], + context_updates=context_updates, + ) + + +def _polygraf_config(): + return RailsConfig.from_content( + yaml_content=""" + models: [] + rails: + config: + polygraf: + server_endpoint: http://localhost:8000/v1/pii/text-detect + input: + entities: + - Email + - Person + output: + entities: + - Email + - Person + retrieval: + entities: + - Email + - Person + """, + ) + + +class _FakeResponse: + def __init__(self, status=200, payload=None, text=""): + self.status = status + self._payload = payload if payload is not None else [] + self._text = text + + async def json(self): + return self._payload + + async def text(self): + return self._text + + +class _FakePostContextManager: + def __init__(self, response): + self.response = response + + async def __aenter__(self): + return self.response + + async def __aexit__(self, exc_type, exc, tb): + return False + + +class _FakeSession: + def __init__(self, response): + self.response = response + self.requests = [] + + def post(self, server_endpoint, json, headers): + self.requests.append( + { + "server_endpoint": server_endpoint, + "json": json, + "headers": headers, + } + ) + return _FakePostContextManager(self.response) + + +@pytest.mark.asyncio +async def test_polygraf_request_uses_shared_session_and_bearer_auth(): + session = _FakeSession( + _FakeResponse( + payload=[ + { + "entity_type": "Person", + "entity_text": "John", + "start": 0, + "end": 4, + "score": 0.99, + } + ] + ) + ) + + entities = await polygraf_request("John", "http://polygraf.example/pii", "secret", session=session) + + assert entities[0]["entity_type"] == "Person" + assert session.requests[0]["headers"]["Authorization"] == "Bearer secret" + assert session.requests[0]["json"]["detect_pid"] is True + assert session.requests[0]["json"]["aggregate_entities"] is True + + +@pytest.mark.asyncio +async def test_polygraf_request_accepts_wrapped_entities_response(): + session = _FakeSession(_FakeResponse(payload={"entities": [{"entity_type": "Email"}]})) + + entities = await polygraf_request("test@gmail.com", "http://polygraf.example/pii", None, session=session) + + assert entities == [{"entity_type": "Email"}] + + +@pytest.mark.asyncio +async def test_polygraf_request_accepts_null_entities_as_empty_response(): + session = _FakeSession(_FakeResponse(payload={"entities": None})) + + entities = await polygraf_request("hello", "http://polygraf.example/pii", None, session=session) + + assert entities == [] + + +@pytest.mark.asyncio +async def test_polygraf_request_raises_for_invalid_response_shape(): + session = _FakeSession(_FakeResponse(payload={"unexpected": []})) + + with pytest.raises(ValueError, match="Invalid response from Polygraf service"): + await polygraf_request("John", "http://polygraf.example/pii", None, session=session) + + +@pytest.mark.asyncio +async def test_polygraf_request_raises_for_non_200_response(): + session = _FakeSession(_FakeResponse(status=401, text="missing token")) + + with pytest.raises(ValueError, match="Polygraf call failed with status code 401"): + await polygraf_request("John", "http://polygraf.example/pii", None, session=session) + + +class _FakeSessionWithTimeoutKwarg: + def __init__(self, response): + self.response = response + self.timeouts = [] + + def post(self, server_endpoint, json, headers, timeout=None): + self.timeouts.append(timeout) + return _FakePostContextManager(self.response) + + +@pytest.mark.asyncio +async def test_polygraf_request_forwards_timeout_to_post(): + import aiohttp + + session = _FakeSessionWithTimeoutKwarg(_FakeResponse(payload=[])) + + await polygraf_request("hello", "http://polygraf.example/pii", None, session=session, timeout=7) + + assert isinstance(session.timeouts[0], aiohttp.ClientTimeout) + assert session.timeouts[0].total == 7 + + +class _FakeRaisingSession: + """Test double whose .post() raises a configurable exception when entered.""" + + def __init__(self, exc: BaseException): + self.exc = exc + + def post(self, *args, **kwargs): + async def _raise(): + raise self.exc + + class _Ctx: + def __init__(self, raise_fn): + self._raise_fn = raise_fn + + async def __aenter__(self_inner): + await self_inner._raise_fn() + + async def __aexit__(self_inner, *a): + return False + + return _Ctx(_raise) + + +@pytest.mark.asyncio +async def test_polygraf_request_normalizes_timeout_as_value_error(): + import asyncio + + session = _FakeRaisingSession(asyncio.TimeoutError()) + + with pytest.raises(ValueError, match="timed out"): + await polygraf_request("hello", "http://polygraf.example/pii", None, session=session, timeout=3) + + +@pytest.mark.asyncio +async def test_polygraf_request_normalizes_client_error_as_value_error(): + import aiohttp + + session = _FakeRaisingSession(aiohttp.ClientConnectionError("dns failure")) + + with pytest.raises(ValueError, match="Polygraf call failed"): + await polygraf_request("hello", "http://polygraf.example/pii", None, session=session) + + +def test_polygraf_config_rejects_unknown_keys(): + """Unknown Polygraf config keys must be rejected (extra='forbid').""" + + with pytest.raises(Exception) as excinfo: + RailsConfig.from_content( + yaml_content=""" + models: [] + rails: + config: + polygraf: + server_endpoint: http://localhost:8000/v1/pii/text-detect + unknown_field: 42 + """, + ) + assert "unknown_field" in str(excinfo.value) or "extra" in str(excinfo.value).lower() + + +# --------------------------------------------------------------------------- +# Colang 2.0 flow coverage +# --------------------------------------------------------------------------- + + +def _load_polygraf_v2_flows(): + """Parse the shipped Polygraf Colang 2.x flow file and return flow dicts.""" + import importlib.resources as resources + + from nemoguardrails.colang import parse_colang_file + + flows_path = resources.files("nemoguardrails.library.polygraf").joinpath("flows.co") + content = flows_path.read_text(encoding="utf-8") + parsed = parse_colang_file(filename="flows.co", content=content, version="2.x", include_source_mapping=False) + return [flow.to_dict() for flow in parsed["flows"]] + + +def _flow_global_vars(flow_dict): + """Return the set of variable names declared `global` in a parsed Colang 2 flow.""" + globals_found = set() + for el in flow_dict.get("elements", []): + if el.get("_type") == "global": + name = el.get("var_name") + if name: + globals_found.add(name) + # Some parser variants attach `global` as a spec_op; collect those too. + if el.get("_type") == "spec_op" and el.get("op") == "global": + spec = el.get("spec") or {} + name = spec.get("var_name") or spec.get("name") + if name: + globals_found.add(name) + return globals_found + + +def test_polygraf_v2_flows_parse_successfully(): + """flows.co must be valid Colang 2.x and define all six expected flows.""" + + flows = _load_polygraf_v2_flows() + flow_names = sorted(f.get("name") or "" for f in flows) + assert flow_names == [ + "polygraf detect pii on input", + "polygraf detect pii on output", + "polygraf detect pii on retrieval", + "polygraf mask pii on input", + "polygraf mask pii on output", + "polygraf mask pii on retrieval", + ] + + +@pytest.mark.unit +def test_polygraf_v2_input_flow_passes_actual_user_message_to_action(): + """End-to-end Colang 2 regression test. + + Reproduces the bug Pouyanpi flagged: a Colang 2 flow that reads a rails + variable (``$user_message``) without a ``global`` declaration ends up + sending ``text=null`` to the action. By registering a mock that records + the ``text`` the masking action received and running it through the + actual shipped ``polygraf mask pii on input`` flow body, we lock in the + fix end-to-end. + + If the ``global $user_message`` line is removed from ``flows.co``, the + Colang 2 runtime sends ``text=None`` and this test fails. + """ + captured = {} + + async def fake_polygraf_mask_pii(source: str, text, **kwargs): + captured["source"] = source + captured["text"] = text + return f"" + + # We use the SHIPPED polygraf flow body verbatim (read from flows.co) + # and wire it into a v2 input rail using the standard guardrails pattern + # used in tests/v2_x/test_input_output_rails_transformations.py. We do + # not go through `rails.input.flows` here because that codepath emits a + # deprecation warning and pulls in the whole library namespace, making + # the test less direct. + import importlib.resources as resources + + flows_co = resources.files("nemoguardrails.library.polygraf").joinpath("flows.co").read_text(encoding="utf-8") + + # Sanity check: this test relies on the shipped flow text being present. + assert "flow polygraf mask pii on input" in flows_co + assert "global $user_message" in flows_co + + colang_content = ( + """ +import core +import guardrails + +""" + + flows_co + + """ + +flow input rails $input_text + polygraf mask pii on input + +flow main + await user said "John" + bot say "done" +""" + ) + + config = RailsConfig.from_content( + colang_content=colang_content, + yaml_content=""" + colang_version: "2.x" + models: [] + """, + ) + + chat = TestChat(config, llm_completions=[]) + chat.app.register_action(fake_polygraf_mask_pii, "polygraf_mask_pii") + + chat >> "John" + chat << "done" + + # The action must have been called with the actual user text, not None. + # If the `global $user_message` declaration is missing from flows.co, the + # Colang 2 runtime would have sent text=None and this assertion would fail. + assert captured.get("text") == "John", ( + f"Polygraf v2 input rail invoked action with text={captured.get('text')!r} " + "instead of the actual user message. The most likely cause is a missing " + "`global $user_message` declaration in flows.co." + ) + assert captured.get("source") == "input" + + +def test_polygraf_v2_flows_declare_required_globals(): + """Each Polygraf v2 flow must declare the rails variable it reads as `global`. + + Without this, the Colang 2 runtime sends ``text: null`` to the action, + letting PII through (regression guarded by this test). + """ + + flows = _load_polygraf_v2_flows() + expected = { + "polygraf detect pii on input": "$user_message", + "polygraf detect pii on output": "$bot_message", + "polygraf detect pii on retrieval": "$relevant_chunks", + "polygraf mask pii on input": "$user_message", + "polygraf mask pii on output": "$bot_message", + "polygraf mask pii on retrieval": "$relevant_chunks", + } + by_name = {f["name"]: f for f in flows} + + for flow_name, required_var in expected.items(): + flow = by_name.get(flow_name) + assert flow is not None, f"Flow {flow_name!r} missing from flows.co" + + # Serialize the flow YAML and check the global declaration appears + # before any action invocation that reads the variable. We use a + # textual search because the parser emits global declarations in a + # few different shapes depending on the Colang 2 lexer state. + import yaml as _yaml + + from nemoguardrails.utils import CustomDumper + + flow_yaml = _yaml.dump(flow, sort_keys=False, Dumper=CustomDumper, width=1000) + assert required_var in flow_yaml, f"Flow {flow_name!r} does not reference {required_var}" + + # The text "global" should appear in the flow body. This is a + # smoke check that the declaration is present in some form. + assert "global" in flow_yaml.lower(), ( + f"Flow {flow_name!r} is missing a `global` declaration for {required_var}; " + "Colang 2 would otherwise send text=null to the Polygraf action." + ) + + +@pytest.mark.asyncio +async def test_polygraf_mask_pii_fails_closed_on_malformed_selected_entity(monkeypatch, caplog): + """A configured (selected) entity with bad offsets must fail closed, not silently skip.""" + + sensitive_email = "test@gmail.com" + sensitive_input = f"John lives here. Email: {sensitive_email}" + + async def mock_request(text, server_endpoint, api_key, session=None): + return [ + {"entity_type": "Person", "start": 0, "end": 4}, + # Email is in the configured entities and has malformed offsets -> + # the action must fail closed instead of returning partially masked text. + {"entity_type": "Email", "entity_text": sensitive_email}, + ] + + monkeypatch.setenv("POLYGRAF_API_KEY", "secret") + monkeypatch.setattr("nemoguardrails.library.polygraf.actions.polygraf_request", mock_request) + caplog.set_level("WARNING") + + result = await polygraf_mask_pii("input", sensitive_input, _polygraf_config()) + + assert result == FAILSAFE_MASK_PLACEHOLDER + # The original sensitive value must never appear in the returned text. + assert sensitive_email not in result + # Log warnings must only carry structural metadata, not the PII value. + assert sensitive_email not in caplog.text + assert "Skipping malformed Polygraf entity" in caplog.text + assert "invalid_fields" in caplog.text + + +@pytest.mark.asyncio +async def test_polygraf_mask_pii_skips_unselected_malformed_entity(monkeypatch, caplog): + """A *known-type* malformed entity that does NOT match the entity filter is skipped, not failed.""" + + async def mock_request(text, server_endpoint, api_key, session=None): + return [ + {"entity_type": "Person", "start": 0, "end": 4}, + # CreditCard is not in the configured entities for `input`; even though + # it's malformed, it must not trigger a fail-closed. + {"entity_type": "CreditCard"}, + ] + + monkeypatch.setenv("POLYGRAF_API_KEY", "secret") + monkeypatch.setattr("nemoguardrails.library.polygraf.actions.polygraf_request", mock_request) + caplog.set_level("WARNING") + + result = await polygraf_mask_pii("input", "John lives here", _polygraf_config()) + + assert result == " lives here" + + +@pytest.mark.asyncio +async def test_polygraf_mask_pii_fails_closed_on_missing_entity_type(monkeypatch, caplog): + """An entity with no entity_type cannot be safely classified -> fail closed even with a filter set.""" + + sensitive = "John lives here" + + async def mock_request(text, server_endpoint, api_key, session=None): + return [ + {"start": 0, "end": 4, "entity_text": "John"}, + ] + + monkeypatch.setenv("POLYGRAF_API_KEY", "secret") + monkeypatch.setattr("nemoguardrails.library.polygraf.actions.polygraf_request", mock_request) + caplog.set_level("WARNING") + + result = await polygraf_mask_pii("input", sensitive, _polygraf_config()) + + assert result == FAILSAFE_MASK_PLACEHOLDER + assert "John" not in caplog.text + + +@pytest.mark.parametrize( + "bad_offsets", + [ + {"start": True, "end": 4}, # bool start (subclass of int) must be rejected + {"start": 0, "end": False}, # bool end + {"start": -1, "end": 4}, # negative start + {"start": 5, "end": 3}, # reversed + {"start": 0, "end": 9999}, # end past text length + {"start": 0, "end": 0}, # empty span + ], +) +@pytest.mark.asyncio +async def test_polygraf_mask_pii_fails_closed_on_out_of_range_offsets(monkeypatch, bad_offsets): + """Invalid offsets (bool, negative, reversed, beyond text, empty) must fail closed for a selected entity.""" + + async def mock_request(text, server_endpoint, api_key, session=None): + return [{"entity_type": "Person", **bad_offsets}] + + monkeypatch.setenv("POLYGRAF_API_KEY", "secret") + monkeypatch.setattr("nemoguardrails.library.polygraf.actions.polygraf_request", mock_request) + + result = await polygraf_mask_pii("input", "John lives here", _polygraf_config()) + + assert result == FAILSAFE_MASK_PLACEHOLDER + + +@pytest.mark.asyncio +async def test_polygraf_detect_pii_fails_closed_on_missing_entity_type(monkeypatch): + """detect must block when an entity has no entity_type (cannot prove it's safe).""" + + async def mock_request(text, server_endpoint, api_key, session=None): + return [{"start": 0, "end": 4, "entity_text": "John"}] + + monkeypatch.setenv("POLYGRAF_API_KEY", "secret") + monkeypatch.setattr("nemoguardrails.library.polygraf.actions.polygraf_request", mock_request) + + result = await polygraf_detect_pii("input", "John lives here", _polygraf_config()) + + assert result is True + + +@pytest.mark.asyncio +async def test_polygraf_mask_pii_fails_closed_on_provider_error(monkeypatch, caplog): + """A timeout / network error from the request layer must redact the entire payload.""" + + sensitive_text = "John lives at 1 Main St; email test@gmail.com" + + async def mock_request(text, server_endpoint, api_key, session=None): + raise ValueError("Polygraf call timed out after 30 seconds.") + + monkeypatch.setenv("POLYGRAF_API_KEY", "secret") + monkeypatch.setattr("nemoguardrails.library.polygraf.actions.polygraf_request", mock_request) + caplog.set_level("WARNING") + + result = await polygraf_mask_pii("input", sensitive_text, _polygraf_config()) + + assert result == FAILSAFE_MASK_PLACEHOLDER + # Even on failure, the caller's text must not leak into logs. + assert sensitive_text not in caplog.text + assert "test@gmail.com" not in caplog.text + assert "Polygraf masking failed" in caplog.text + + +@pytest.mark.asyncio +async def test_polygraf_detect_pii_fails_closed_on_provider_error(monkeypatch, caplog): + """A request-layer ValueError must cause detect to block (return True).""" + + async def mock_request(text, server_endpoint, api_key, session=None): + raise ValueError("Polygraf call failed: ClientConnectorError: ...") + + monkeypatch.setenv("POLYGRAF_API_KEY", "secret") + monkeypatch.setattr("nemoguardrails.library.polygraf.actions.polygraf_request", mock_request) + caplog.set_level("WARNING") + + result = await polygraf_detect_pii("input", "John lives here", _polygraf_config()) + + assert result is True + assert "Polygraf detection failed" in caplog.text + + +@pytest.mark.asyncio +async def test_polygraf_detect_pii_fails_closed_on_malformed_selected_entity(monkeypatch, caplog): + """detect must block when a configured entity is reported with bad shape.""" + + async def mock_request(text, server_endpoint, api_key, session=None): + return [ + # Email is in the configured filter and missing offsets -> fail closed. + {"entity_type": "Email"}, + ] + + monkeypatch.setenv("POLYGRAF_API_KEY", "secret") + monkeypatch.setattr("nemoguardrails.library.polygraf.actions.polygraf_request", mock_request) + caplog.set_level("WARNING") + + result = await polygraf_detect_pii("input", "Some text", _polygraf_config()) + + assert result is True + assert "Polygraf returned a malformed selected entity" in caplog.text + + +@pytest.mark.asyncio +async def test_polygraf_actions_warn_when_api_key_missing(monkeypatch, caplog): + async def mock_request(text, server_endpoint, api_key, session=None): + assert api_key is None + return [] + + monkeypatch.delenv("POLYGRAF_API_KEY", raising=False) + monkeypatch.setattr("nemoguardrails.library.polygraf.actions.polygraf_request", mock_request) + caplog.set_level("WARNING") + + result = await polygraf_detect_pii("input", "John", _polygraf_config()) + + assert result is False + assert "POLYGRAF_API_KEY environment variable is not set" in caplog.text + + +@pytest.mark.asyncio +async def test_polygraf_mask_pii_accepts_extra_kwargs_and_shared_session(monkeypatch): + sentinel_session = object() + + async def mock_request(text, server_endpoint, api_key, session=None): + assert api_key == "secret" + assert session is sentinel_session + return [{"entity_type": "Person", "entity_text": "John", "start": 0, "end": 4, "score": 0.99}] + + monkeypatch.setenv("POLYGRAF_API_KEY", "secret") + monkeypatch.setattr("nemoguardrails.library.polygraf.actions.polygraf_request", mock_request) + + result = await polygraf_mask_pii("input", "John", _polygraf_config(), session=sentinel_session, extra="ignored") + + assert result == "" + + +@pytest.mark.unit +def test_polygraf_pii_detection_no_active_pii_detection(): + config = RailsConfig.from_content( + yaml_content=""" + models: [] + rails: + config: + polygraf: + server_endpoint: http://localhost:8000/v1/pii/text-detect + """, + colang_content=""" + define user express greeting + "hi" + + define flow + user express greeting + bot express greeting + + define bot inform answer unknown + "I can't answer that." + """, + ) + + chat = TestChat( + config, + llm_completions=[ + " express greeting", + ' "Hi! My name is John as well."', + ], + ) + + chat.app.register_action(retrieve_relevant_chunks, "retrieve_relevant_chunks") + chat.app.register_action(create_mock_polygraf_detect_pii(), "polygraf_detect_pii") + chat.app.register_action(create_mock_polygraf_mask_pii(), "polygraf_mask_pii") + + chat >> "Hi! I am Mr. John! And my email is test@gmail.com" + chat << "Hi! My name is John as well." + + +@pytest.mark.unit +def test_polygraf_pii_detection_input(): + config = RailsConfig.from_content( + yaml_content=""" + models: [] + rails: + config: + polygraf: + server_endpoint: http://localhost:8000/v1/pii/text-detect + input: + entities: + - Email + - Person + input: + flows: + - polygraf detect pii on input + """, + colang_content=""" + define user express greeting + "hi" + + define flow + user express greeting + bot express greeting + + define bot inform answer unknown + "I can't answer that." + """, + ) + + chat = TestChat( + config, + llm_completions=[ + " express greeting", + ' "Hi! My name is John as well."', + ], + ) + + chat.app.register_action(retrieve_relevant_chunks, "retrieve_relevant_chunks") + chat.app.register_action( + create_mock_polygraf_detect_pii(["Email", "Person"]), + "polygraf_detect_pii", + ) + chat.app.register_action( + create_mock_polygraf_mask_pii(["Email", "Person"]), + "polygraf_mask_pii", + ) + + chat >> "Hi! I am Mr. John! And my email is test@gmail.com" + chat << "I can't answer that." + + +@pytest.mark.unit +def test_polygraf_pii_detection_output(): + config = RailsConfig.from_content( + yaml_content=""" + models: [] + rails: + config: + polygraf: + server_endpoint: http://localhost:8000/v1/pii/text-detect + output: + entities: + - Email + - Person + output: + flows: + - polygraf detect pii on output + """, + colang_content=""" + define user express greeting + "hi" + + define flow + user express greeting + bot express greeting + + define bot inform answer unknown + "I can't answer that." + """, + ) + + chat = TestChat( + config, + llm_completions=[ + " express greeting", + ' "Hi! My name is John as well."', + ], + ) + + chat.app.register_action(retrieve_relevant_chunks, "retrieve_relevant_chunks") + chat.app.register_action( + create_mock_polygraf_detect_pii(["Email", "Person"]), + "polygraf_detect_pii", + ) + chat.app.register_action( + create_mock_polygraf_mask_pii(["Email", "Person"]), + "polygraf_mask_pii", + ) + + chat >> "Hi!" + chat << "I can't answer that." + + +@pytest.mark.unit +def test_polygraf_pii_detection_retrieval_with_no_pii(): + config = RailsConfig.from_content( + yaml_content=""" + models: [] + rails: + config: + polygraf: + server_endpoint: http://localhost:8000/v1/pii/text-detect + retrieval: + entities: + - Email + - Person + retrieval: + flows: + - polygraf detect pii on retrieval + """, + colang_content=""" + define user express greeting + "hi" + + define flow + user express greeting + bot express greeting + + define bot inform answer unknown + "I can't answer that." + """, + ) + + chat = TestChat( + config, + llm_completions=[ + " express greeting", + ' "Hi! My name is John as well."', + ], + ) + + chat.app.register_action(retrieve_relevant_chunks, "retrieve_relevant_chunks") + chat.app.register_action( + create_mock_polygraf_detect_pii(["Email", "Person"]), + "polygraf_detect_pii", + ) + chat.app.register_action( + create_mock_polygraf_mask_pii(["Email", "Person"]), + "polygraf_mask_pii", + ) + + chat >> "Hi!" + chat << "Hi! My name is John as well." + + +@pytest.mark.unit +def test_polygraf_pii_masking_on_output(): + config = RailsConfig.from_content( + yaml_content=""" + models: [] + rails: + config: + polygraf: + server_endpoint: http://localhost:8000/v1/pii/text-detect + output: + entities: + - Email + - Person + output: + flows: + - polygraf mask pii on output + """, + colang_content=""" + define user express greeting + "hi" + + define flow + user express greeting + bot express greeting + + define bot inform answer unknown + "I can't answer that." + """, + ) + + chat = TestChat( + config, + llm_completions=[ + " express greeting", + ' "Hi! I am John."', + ], + ) + + chat.app.register_action(retrieve_relevant_chunks, "retrieve_relevant_chunks") + chat.app.register_action( + create_mock_polygraf_detect_pii(["Email", "Person"]), + "polygraf_detect_pii", + ) + chat.app.register_action( + create_mock_polygraf_mask_pii(["Email", "Person"]), + "polygraf_mask_pii", + ) + + chat >> "Hi!" + response = chat.app.generate(messages=[{"role": "user", "content": "Hi!"}]) + assert "John" not in response["content"] + assert "" in response["content"] + + +@pytest.mark.unit +def test_polygraf_pii_masking_on_input(): + config = RailsConfig.from_content( + yaml_content=""" + models: [] + rails: + config: + polygraf: + server_endpoint: http://localhost:8000/v1/pii/text-detect + input: + entities: + - Email + - Person + input: + flows: + - polygraf mask pii on input + - check user message + """, + colang_content=""" + define user express greeting + "hi" + + define flow + user express greeting + bot express greeting + + define bot inform answer unknown + "I can't answer that." + + define flow check user message + execute check_user_message(user_message=$user_message) + """, + ) + + chat = TestChat( + config, + llm_completions=[ + " express greeting", + ' "Hi! Nice to meet you.', + ], + ) + + @action() + def check_user_message(user_message: str): + """Check if the user message has PII masked.""" + assert "John" not in user_message + assert "" in user_message + + chat.app.register_action(retrieve_relevant_chunks, "retrieve_relevant_chunks") + chat.app.register_action(check_user_message, "check_user_message") + chat.app.register_action( + create_mock_polygraf_detect_pii(["Email", "Person"]), + "polygraf_detect_pii", + ) + chat.app.register_action( + create_mock_polygraf_mask_pii(["Email", "Person"]), + "polygraf_mask_pii", + ) + + chat >> "Hi there! Are you John?" + + +@pytest.mark.unit +def test_polygraf_pii_masking_on_retrieval(): + config = RailsConfig.from_content( + yaml_content=""" + models: [] + rails: + config: + polygraf: + server_endpoint: http://localhost:8000/v1/pii/text-detect + retrieval: + entities: + - Email + - Person + retrieval: + flows: + - polygraf mask pii on retrieval + - check relevant chunks + """, + colang_content=""" + define user express greeting + "hi" + + define flow + user express greeting + bot express greeting + + define bot inform answer unknown + "I can't answer that." + + define flow check relevant chunks + execute check_relevant_chunks(relevant_chunks=$relevant_chunks) + """, + ) + + chat = TestChat( + config, + llm_completions=[ + " express greeting", + " Sorry, I don't have that in my knowledge base.", + ], + ) + + @action() + def check_relevant_chunks(relevant_chunks: str): + """Check if the relevant chunks have PII masked.""" + assert "test@gmail.com" not in relevant_chunks + assert "" in relevant_chunks + + @action() + def retrieve_relevant_chunk_for_masking(): + context_updates = {"relevant_chunks": "John's Email: test@gmail.com"} + return ActionResult( + return_value=context_updates["relevant_chunks"], + context_updates=context_updates, + ) + + chat.app.register_action(retrieve_relevant_chunk_for_masking, "retrieve_relevant_chunks") + chat.app.register_action(check_relevant_chunks) + chat.app.register_action( + create_mock_polygraf_detect_pii(["Email", "Person"]), + "polygraf_detect_pii", + ) + chat.app.register_action( + create_mock_polygraf_mask_pii(["Email", "Person"]), + "polygraf_mask_pii", + ) + + chat >> "Hey! Can you help me get John's email?"