Skip to content

Feat: Add masking support to the regex rail#1944

Open
RobGeada wants to merge 4 commits into
NVIDIA-NeMo:developfrom
RobGeada:RegexRedact
Open

Feat: Add masking support to the regex rail#1944
RobGeada wants to merge 4 commits into
NVIDIA-NeMo:developfrom
RobGeada:RegexRedact

Conversation

@RobGeada

@RobGeada RobGeada commented May 29, 2026

Copy link
Copy Markdown
Contributor

Description

Adds redact support to the regex rail, mirroring the mask action inside of the sensitive_data_detection.

Related Issue(s)

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added regex-based content redaction to mask sensitive information matching configurable patterns across user input, bot output, and retrieval chunks.
    • Introduced per-pattern custom redaction masks, allowing different replacement tokens for different pattern types (default: <REDACTED>).
  • Tests

    • Added comprehensive test coverage for redaction functionality, including edge cases and end-to-end flow validation.

Review Change Stack

@greptile-apps

greptile-apps Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds regex-based content redaction (masking) to the existing regex rail, mirroring the mask action from sensitive_data_detection. The new redact_regex_pattern action replaces matched spans with a configurable per-pattern mask token (default <REDACTED>), and the common config-loading logic is factored into a _get_regex_options helper shared between detection and redaction.

  • config.py: Adds RegexPatternConfig model supporting an optional mask_token per pattern; RegexDetectionOptions.patterns now accepts either plain strings or RegexPatternConfig objects, with normalization and pre-compilation in a model_validator.
  • actions.py: Extracts _get_regex_options helper and adds redact_regex_pattern action that uses lambda _: mask in re.sub to guarantee the mask token is never interpreted as a backreference template.
  • flows.co / flows.v1.co: Adds three new regex redact {input,output,retrieval} flow definitions; correctly applies global for context-variable mutation in Colang v2.
  • tests/test_regex_detection.py: Adds 9 new unit/e2e tests covering default tokens, custom tokens, mixed patterns, empty text, extra kwargs, and full round-trip for all three sources.

Confidence Score: 5/5

Safe to merge — the new redaction path is well-isolated, correctly handles edge cases, and follows the same patterns established by the existing sensitive_data_detection flows.

The implementation mirrors the existing sensitive_data_detection mask action closely and correctly. The lambda _: mask pattern in re.sub prevents backref interpretation of mask tokens. Colang v2 flows correctly use global for context-variable mutation. Tests cover default and per-pattern tokens, mixed configurations, empty text, extra kwargs, and end-to-end flows for all three sources.

No files require special attention.

Important Files Changed

Filename Overview
nemoguardrails/library/regex/actions.py Adds _get_regex_options helper to reduce duplication and redact_regex_pattern action; lambda-based re.sub replacement correctly prevents backref interpretation of mask tokens.
nemoguardrails/rails/llm/config.py Introduces RegexPatternConfig with pattern and mask_token fields; updates RegexDetectionOptions to accept Union[str, RegexPatternConfig] with correct normalization and pre-compilation via model_validator.
nemoguardrails/library/regex/flows.co Adds regex redact flows for input, output, and retrieval in Colang v2; correctly uses global for context-variable mutation, consistent with sensitive_data_detection flow patterns.
nemoguardrails/library/regex/flows.v1.co Adds Colang v1 subflow definitions for regex redact input, output, and retrieval; v1 does not require global, consistent with existing v1 check flows.
tests/test_regex_detection.py Adds 9 comprehensive tests for redaction covering default tokens, custom tokens, mixed patterns, no-match pass-through, empty text, extra kwargs dispatch, and e2e flows for all three sources.

Sequence Diagram

sequenceDiagram
    participant User
    participant Guardrails as NeMo Guardrails
    participant Flow as regex redact flow
    participant Action as redact_regex_pattern()
    participant Config as RegexDetectionOptions

    User->>Guardrails: send message
    Guardrails->>Flow: trigger regex redact flow
    Flow->>Action: RedactRegexPatternAction(source, text)
    Action->>Config: _get_regex_options(source, config)
    Config-->>Action: compiled_patterns + normalized_patterns or None
    alt options is None or no patterns
        Action-->>Flow: return original text unchanged
    else patterns configured
        loop for each compiled pattern
            Action->>Action: compiled.search(redacted)?
            alt match found
                Action->>Action: "redacted = compiled.sub(lambda, redacted)"
            end
        end
        Action-->>Flow: return redacted text
    end
    Flow->>Guardrails: update global context variable
    Guardrails-->>User: continue with redacted content
Loading

Reviews (3): Last reviewed commit: "Address action name and testing issues" | Re-trigger Greptile

Comment thread nemoguardrails/library/regex/actions.py Outdated
Comment thread nemoguardrails/library/regex/actions.py
@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR extends NeMo Guardrails with per-pattern content redaction. A new RegexPatternConfig model allows specifying custom mask tokens per pattern. The redact_regex_pattern action uses shared regex option lookup to replace matched substrings in user input, bot output, and retrieval chunks. Integration tests validate unit behavior and end-to-end flows.

Changes

Regex redaction feature

Layer / File(s) Summary
Regex pattern configuration with mask tokens
nemoguardrails/rails/llm/config.py
RegexPatternConfig model with pattern and optional mask_token (default "<REDACTED>"). RegexDetectionOptions.patterns now accepts strings or config objects; validator normalizes to RegexPatternConfig and pre-compiles regexes with case-insensitive flag support. New normalized_patterns property exposes the normalized list.
Regex redaction and detection actions
nemoguardrails/library/regex/actions.py
New _get_regex_options(source, config) helper validates source and retrieves source-specific options with logging. detect_regex_pattern refactored to use the helper with early return for empty text. New redact_regex_pattern action iteratively applies compiled regex substitutions with configured mask_token values, returning the redacted text.
Flow integration for input, output, and retrieval redaction
nemoguardrails/library/regex/flows.co, nemoguardrails/library/regex/flows.v1.co
Three new rails and subflows inject redaction after existing regex-check steps for user input ($user_message), bot output ($bot_message), and knowledge base chunks ($relevant_chunks), calling redact_regex_pattern with appropriate source values.
Unit and end-to-end tests for redaction
tests/test_regex_detection.py
Import of redact_regex_pattern. Seven async unit tests validate default mask replacement, custom mask_token override, mixed plain/object patterns, passthrough on no match, multi-pattern redaction, empty text preservation, and extra dispatcher kwargs. Three sync end-to-end tests exercise complete flows for input, output (custom mask), and retrieval redaction.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR description lacks test results/testing information for major changes (new regex redaction feature). Tests exist in code but are not documented in PR description as required. Add test results or testing information to PR description documenting that the 10 new unit tests and 3 e2e tests pass, validating the redaction feature works correctly.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title 'Feat: Add masking support to the regex rail' directly and clearly summarizes the main change—adding a new masking/redaction feature to the regex rail component.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
nemoguardrails/library/regex/actions.py (2)

121-125: ⚡ Quick win

Remove redundant search() check before sub().

The compiled.search(redacted) check is unnecessary because compiled.sub() already returns the original string unchanged when there are no matches. This adds an extra regex pass for every pattern.

Also consider adding strict=True to zip() for consistency with defensive coding practices.

♻️ Proposed fix
     redacted = text
-    for compiled, pcfg in zip(options.compiled_patterns, options.normalized_patterns):
-        if compiled.search(redacted):
-            log.info("Regex pattern redacted: %s", pcfg.pattern)
-            redacted = compiled.sub(pcfg.mask_token, redacted)
+    for compiled, pcfg in zip(options.compiled_patterns, options.normalized_patterns, strict=True):
+        new_redacted = compiled.sub(pcfg.mask_token, redacted)
+        if new_redacted != redacted:
+            log.info("Regex pattern redacted: %s", pcfg.pattern)
+            redacted = new_redacted
 
     return redacted
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nemoguardrails/library/regex/actions.py` around lines 121 - 125, Remove the
redundant compiled.search(redacted) check in the loop since
compiled.sub(pcfg.mask_token, redacted) is a no-op when there are no matches;
simply iterate over the pattern pairs and always assign redacted =
compiled.sub(pcfg.mask_token, redacted). Also make the zip defensive by using
zip(options.compiled_patterns, options.normalized_patterns, strict=True) to
ensure pattern lists are the same length; reference the variables compiled,
pcfg, options.compiled_patterns, options.normalized_patterns, redacted, and
pcfg.mask_token when applying this change.

86-89: 💤 Low value

Consider adding strict=True to zip() for defensive safety.

While compiled_patterns and normalized_patterns are created in lockstep by the validator, adding strict=True would catch any accidental misalignment early rather than silently dropping items.

♻️ Proposed fix
-    for compiled, pcfg in zip(options.compiled_patterns, options.normalized_patterns):
+    for compiled, pcfg in zip(options.compiled_patterns, options.normalized_patterns, strict=True):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nemoguardrails/library/regex/actions.py` around lines 86 - 89, The for-loop
pairing options.compiled_patterns with options.normalized_patterns using zip may
silently drop items if their lengths diverge; update the loop that iterates "for
compiled, pcfg in zip(options.compiled_patterns, options.normalized_patterns):"
to use strict=True (i.e., zip(..., strict=True)) so mismatched lengths raise an
error early and surface unintended misalignment between compiled_patterns and
normalized_patterns.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nemoguardrails/library/regex/flows.co`:
- Around line 12-15: The Colang2 flows in nemoguardrails/library/regex/flows.co
reference DetectRegexMatchAction which has no registered `@action`; update the
flow to use DetectRegexPatternAction (the registered detection action) or add a
DetectRegexMatchAction alias in actions.py so the dispatcher resolves correctly;
specifically, change occurrences of DetectRegexMatchAction in flows.co to
DetectRegexPatternAction (or add a wrapper/action with the name
detect_regex_match that delegates to detect_regex_pattern) so Colang2 "regex
check" flows resolve at runtime (note: regex redact flows already use
RedactRegexPatternAction correctly).

In `@tests/test_regex_detection.py`:
- Around line 889-921: The test test_regex_redact_input_e2e currently only
checks the canned LLM response and doesn’t verify that the user message was
redacted; change the test so it asserts what the LLM actually receives by
capturing the processed prompt or flow variable (e.g., $user_message) after the
regex redact input flow. Concretely, modify TestChat usage to either (a)
register an action or callback in the test harness that records the prompt sent
to the LLM (reference TestChat and llm_completions) and assert the recorded
prompt contains the redacted value (masking or removing the SSN), or (b) add an
explicit flow action in the RailsConfig.from_content scenario that stores the
post-redaction $user_message to a test-accessible place and assert that stored
value no longer contains "123-45-6789". Ensure the assertion fails if redaction
is not applied.

---

Nitpick comments:
In `@nemoguardrails/library/regex/actions.py`:
- Around line 121-125: Remove the redundant compiled.search(redacted) check in
the loop since compiled.sub(pcfg.mask_token, redacted) is a no-op when there are
no matches; simply iterate over the pattern pairs and always assign redacted =
compiled.sub(pcfg.mask_token, redacted). Also make the zip defensive by using
zip(options.compiled_patterns, options.normalized_patterns, strict=True) to
ensure pattern lists are the same length; reference the variables compiled,
pcfg, options.compiled_patterns, options.normalized_patterns, redacted, and
pcfg.mask_token when applying this change.
- Around line 86-89: The for-loop pairing options.compiled_patterns with
options.normalized_patterns using zip may silently drop items if their lengths
diverge; update the loop that iterates "for compiled, pcfg in
zip(options.compiled_patterns, options.normalized_patterns):" to use strict=True
(i.e., zip(..., strict=True)) so mismatched lengths raise an error early and
surface unintended misalignment between compiled_patterns and
normalized_patterns.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c9f33abf-0ebe-438e-be1e-15642a7a806d

📥 Commits

Reviewing files that changed from the base of the PR and between 64660a9 and b8795ca.

📒 Files selected for processing (5)
  • nemoguardrails/library/regex/actions.py
  • nemoguardrails/library/regex/flows.co
  • nemoguardrails/library/regex/flows.v1.co
  • nemoguardrails/rails/llm/config.py
  • tests/test_regex_detection.py

Comment thread nemoguardrails/library/regex/flows.co
Comment thread tests/test_regex_detection.py
@codecov

codecov Bot commented May 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.46154% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
nemoguardrails/library/regex/actions.py 82.85% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

@RobGeada RobGeada changed the title [Draft]: Add masking support to the regex rail Feat: Add masking support to the regex rail Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants