Feat: Add masking support to the regex rail#1944
Conversation
Greptile SummaryThis PR adds regex-based content redaction (masking) to the existing regex rail, mirroring the
|
| Filename | Overview |
|---|---|
| nemoguardrails/library/regex/actions.py | Adds _get_regex_options helper to reduce duplication and redact_regex_pattern action; lambda-based re.sub replacement correctly prevents backref interpretation of mask tokens. |
| nemoguardrails/rails/llm/config.py | Introduces RegexPatternConfig with pattern and mask_token fields; updates RegexDetectionOptions to accept Union[str, RegexPatternConfig] with correct normalization and pre-compilation via model_validator. |
| nemoguardrails/library/regex/flows.co | Adds regex redact flows for input, output, and retrieval in Colang v2; correctly uses global for context-variable mutation, consistent with sensitive_data_detection flow patterns. |
| nemoguardrails/library/regex/flows.v1.co | Adds Colang v1 subflow definitions for regex redact input, output, and retrieval; v1 does not require global, consistent with existing v1 check flows. |
| tests/test_regex_detection.py | Adds 9 comprehensive tests for redaction covering default tokens, custom tokens, mixed patterns, no-match pass-through, empty text, extra kwargs dispatch, and e2e flows for all three sources. |
Sequence Diagram
sequenceDiagram
participant User
participant Guardrails as NeMo Guardrails
participant Flow as regex redact flow
participant Action as redact_regex_pattern()
participant Config as RegexDetectionOptions
User->>Guardrails: send message
Guardrails->>Flow: trigger regex redact flow
Flow->>Action: RedactRegexPatternAction(source, text)
Action->>Config: _get_regex_options(source, config)
Config-->>Action: compiled_patterns + normalized_patterns or None
alt options is None or no patterns
Action-->>Flow: return original text unchanged
else patterns configured
loop for each compiled pattern
Action->>Action: compiled.search(redacted)?
alt match found
Action->>Action: "redacted = compiled.sub(lambda, redacted)"
end
end
Action-->>Flow: return redacted text
end
Flow->>Guardrails: update global context variable
Guardrails-->>User: continue with redacted content
Reviews (3): Last reviewed commit: "Address action name and testing issues" | Re-trigger Greptile
📝 WalkthroughWalkthroughThis PR extends NeMo Guardrails with per-pattern content redaction. A new ChangesRegex redaction feature
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
nemoguardrails/library/regex/actions.py (2)
121-125: ⚡ Quick winRemove redundant
search()check beforesub().The
compiled.search(redacted)check is unnecessary becausecompiled.sub()already returns the original string unchanged when there are no matches. This adds an extra regex pass for every pattern.Also consider adding
strict=Truetozip()for consistency with defensive coding practices.♻️ Proposed fix
redacted = text - for compiled, pcfg in zip(options.compiled_patterns, options.normalized_patterns): - if compiled.search(redacted): - log.info("Regex pattern redacted: %s", pcfg.pattern) - redacted = compiled.sub(pcfg.mask_token, redacted) + for compiled, pcfg in zip(options.compiled_patterns, options.normalized_patterns, strict=True): + new_redacted = compiled.sub(pcfg.mask_token, redacted) + if new_redacted != redacted: + log.info("Regex pattern redacted: %s", pcfg.pattern) + redacted = new_redacted return redacted🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemoguardrails/library/regex/actions.py` around lines 121 - 125, Remove the redundant compiled.search(redacted) check in the loop since compiled.sub(pcfg.mask_token, redacted) is a no-op when there are no matches; simply iterate over the pattern pairs and always assign redacted = compiled.sub(pcfg.mask_token, redacted). Also make the zip defensive by using zip(options.compiled_patterns, options.normalized_patterns, strict=True) to ensure pattern lists are the same length; reference the variables compiled, pcfg, options.compiled_patterns, options.normalized_patterns, redacted, and pcfg.mask_token when applying this change.
86-89: 💤 Low valueConsider adding
strict=Truetozip()for defensive safety.While
compiled_patternsandnormalized_patternsare created in lockstep by the validator, addingstrict=Truewould catch any accidental misalignment early rather than silently dropping items.♻️ Proposed fix
- for compiled, pcfg in zip(options.compiled_patterns, options.normalized_patterns): + for compiled, pcfg in zip(options.compiled_patterns, options.normalized_patterns, strict=True):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nemoguardrails/library/regex/actions.py` around lines 86 - 89, The for-loop pairing options.compiled_patterns with options.normalized_patterns using zip may silently drop items if their lengths diverge; update the loop that iterates "for compiled, pcfg in zip(options.compiled_patterns, options.normalized_patterns):" to use strict=True (i.e., zip(..., strict=True)) so mismatched lengths raise an error early and surface unintended misalignment between compiled_patterns and normalized_patterns.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nemoguardrails/library/regex/flows.co`:
- Around line 12-15: The Colang2 flows in nemoguardrails/library/regex/flows.co
reference DetectRegexMatchAction which has no registered `@action`; update the
flow to use DetectRegexPatternAction (the registered detection action) or add a
DetectRegexMatchAction alias in actions.py so the dispatcher resolves correctly;
specifically, change occurrences of DetectRegexMatchAction in flows.co to
DetectRegexPatternAction (or add a wrapper/action with the name
detect_regex_match that delegates to detect_regex_pattern) so Colang2 "regex
check" flows resolve at runtime (note: regex redact flows already use
RedactRegexPatternAction correctly).
In `@tests/test_regex_detection.py`:
- Around line 889-921: The test test_regex_redact_input_e2e currently only
checks the canned LLM response and doesn’t verify that the user message was
redacted; change the test so it asserts what the LLM actually receives by
capturing the processed prompt or flow variable (e.g., $user_message) after the
regex redact input flow. Concretely, modify TestChat usage to either (a)
register an action or callback in the test harness that records the prompt sent
to the LLM (reference TestChat and llm_completions) and assert the recorded
prompt contains the redacted value (masking or removing the SSN), or (b) add an
explicit flow action in the RailsConfig.from_content scenario that stores the
post-redaction $user_message to a test-accessible place and assert that stored
value no longer contains "123-45-6789". Ensure the assertion fails if redaction
is not applied.
---
Nitpick comments:
In `@nemoguardrails/library/regex/actions.py`:
- Around line 121-125: Remove the redundant compiled.search(redacted) check in
the loop since compiled.sub(pcfg.mask_token, redacted) is a no-op when there are
no matches; simply iterate over the pattern pairs and always assign redacted =
compiled.sub(pcfg.mask_token, redacted). Also make the zip defensive by using
zip(options.compiled_patterns, options.normalized_patterns, strict=True) to
ensure pattern lists are the same length; reference the variables compiled,
pcfg, options.compiled_patterns, options.normalized_patterns, redacted, and
pcfg.mask_token when applying this change.
- Around line 86-89: The for-loop pairing options.compiled_patterns with
options.normalized_patterns using zip may silently drop items if their lengths
diverge; update the loop that iterates "for compiled, pcfg in
zip(options.compiled_patterns, options.normalized_patterns):" to use strict=True
(i.e., zip(..., strict=True)) so mismatched lengths raise an error early and
surface unintended misalignment between compiled_patterns and
normalized_patterns.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c9f33abf-0ebe-438e-be1e-15642a7a806d
📒 Files selected for processing (5)
nemoguardrails/library/regex/actions.pynemoguardrails/library/regex/flows.conemoguardrails/library/regex/flows.v1.conemoguardrails/rails/llm/config.pytests/test_regex_detection.py
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Description
Adds redact support to the regex rail, mirroring the
maskaction inside of the sensitive_data_detection.Related Issue(s)
Checklist
Summary by CodeRabbit
Release Notes
New Features
<REDACTED>).Tests