feat(iorails): Add checks to run input or output rails only#2059
feat(iorails): Add checks to run input or output rails only#2059tgasser-nv wants to merge 2 commits into
Conversation
|
@greptile-apps Review this PR |
|
@coderabbitai Review this PR |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Get the PR diff and file list
gh pr view 2059 --repo NVIDIA-NeMo/Guardrails --json files,additions,deletions,title,body | jq '{title, additions, deletions, files: [.files[].path]}'Length of output: 1016 🏁 Script executed: #!/bin/bash
# Get the full diff
gh pr diff 2059 --repo NVIDIA-NeMo/GuardrailsLength of output: 45412 🏁 Script executed: #!/bin/bash
# Look at key files changed
fd -e py . --search-path . | xargs grep -l "checks_async\|checks\b" | grep -v __pycache__ | head -30Length of output: 1823 🏁 Script executed: #!/bin/bash
# Check what start() vs _ensure_started() looks like in IORails
rg -n "def start|def _ensure_started|async def start|async def _ensure_started" nemoguardrails/guardrails/iorails.pyLength of output: 370 🏁 Script executed: #!/bin/bash
# Check how LLMRails implements check/check_async (the duplicated functions)
rg -n "_determine_rails_from_messages\|_get_last_content_by_role\|def check\|def check_async" nemoguardrails/rails/llm/llmrails.py | head -40Length of output: 369 🏁 Script executed: #!/bin/bash
# Look at BaseGuardrails interface
rg -n "def check\|def check_async\|abstractmethod" nemoguardrails/guardrails/ --include="*.py" | grep -v test | head -30Length of output: 748 🏁 Script executed: #!/bin/bash
# Check RailsResult definition to understand the 'rail' field
rg -n "class RailsResult\|rail:" nemoguardrails/rails/llm/options.py | head -20Length of output: 292 🏁 Script executed: #!/bin/bash
# Check for the reversed() in _determine_rails_from_messages more closely
# Also look at LLMRails version for comparison
ast-grep --pattern 'def _determine_rails_from_messages($_) {
$$$
}'Length of output: 539 🏁 Script executed: #!/bin/bash
# Look at the _ensure_started usage
rg -n "_ensure_started\|self\.start\b" nemoguardrails/guardrails/iorails.py | head -20Length of output: 273 🏁 Script executed: #!/bin/bash
# Check the check() sync method in guardrails.py more carefully - _ensure_started removed
rg -n "_ensure_started\|def check\b\|def check_async" nemoguardrails/guardrails/guardrails.pyLength of output: 532 🏁 Script executed: #!/bin/bash
# Find RailsResult definition
rg -n "class RailsResult\|RailsResult\|rail:" nemoguardrails/rails/llm/options.py | head -30Length of output: 273 🏁 Script executed: #!/bin/bash
# Find BaseGuardrails and protocol
rg -rn "class BaseGuardrails\|def check\b\|def check_async\|abstractmethod" nemoguardrails/guardrails/ | grep -v "test_\|__pycache__" | head -40Length of output: 330 🏁 Script executed: #!/bin/bash
# Look at what LLMRails defines for check/check_async
cat nemoguardrails/rails/llm/llmrails.py | grep -n "def check\|_determine_rails\|_get_last_content" | head -20Length of output: 757 🏁 Script executed: #!/bin/bash
# Find where _determine_rails_from_messages is in the full codebase
rg -rn "_determine_rails_from_messages\|_get_last_content_by_role" | grep -v __pycache__Length of output: 505 🏁 Script executed: #!/bin/bash
# Inspect IORails check_async to see if it calls start() or something else
sed -n '610,730p' nemoguardrails/guardrails/iorails.pyLength of output: 6314 🏁 Script executed: #!/bin/bash
# Inspect guardrails.py check/check_async methods
grep -n "def check\|_ensure_started\|rails_engine\|IORails\|LLMRails" nemoguardrails/guardrails/guardrails.py | head -40Length of output: 3092 🏁 Script executed: #!/bin/bash
# Find RailsResult with rail field
rg -rn "class RailsResult\|^\s*rail\s*[=:]" nemoguardrails/ | grep -v __pycache__ | grep -v ".pyc" | head -20Length of output: 295 🏁 Script executed: #!/bin/bash
# Look at IORails.start() implementation
sed -n '305,340p' nemoguardrails/guardrails/iorails.pyLength of output: 2157 🏁 Script executed: #!/bin/bash
# Look at generate_async in IORails for comparison to check_async
grep -n "async def generate_async\|await self.start\|_ensure_started\|_running" nemoguardrails/guardrails/iorails.py | head -20Length of output: 1045 🏁 Script executed: #!/bin/bash
# Look at LLMRails check_async and the helper functions
sed -n '2030,2080p' nemoguardrails/rails/llm/llmrails.pyLength of output: 1866 🏁 Script executed: #!/bin/bash
# Look at LLMRails check_async implementation for comparison
sed -n '1600,1710p' nemoguardrails/rails/llm/llmrails.pyLength of output: 4912 🏁 Script executed: #!/bin/bash
# Find RailsResult class definition
grep -rn "class RailsResult\|dataclass.*RailsResult\|RailsResult.*dataclass" nemoguardrails/ | grep -v __pycache__Length of output: 370 🏁 Script executed: #!/bin/bash
# Find options.py and get its content around RailsResult
fd options.py nemoguardrails/ | xargs grep -n "RailsResult\|class Rails\|rail:" | head -30Length of output: 439
SummaryThe implementation is well-structured and the test coverage is thorough. The core pipeline ( 1.
|
Greptile SummaryThis PR adds
|
| Filename | Overview |
|---|---|
| nemoguardrails/guardrails/iorails.py | Core new feature: adds check() and check_async() to IORails, routing through the existing queue/tracing infrastructure; has an edge-case false-positive block when output-only check is called with no assistant message. |
| nemoguardrails/guardrails/guardrails.py | Removes NotImplementedError guards for IORails check/check_async and delegates directly to rails_engine; clean and correct. |
| nemoguardrails/guardrails/guardrails_types.py | Adds triggered_rail field to RailResult dataclass; straightforward additive change. |
| nemoguardrails/guardrails/rails_manager.py | Populates triggered_rail from the flow name when a blocking rail doesn't set it itself; correct defensive fallback. |
| nemoguardrails/guardrails/rail_action.py | Adds _last_user_content_or_empty() for output-only checks where user message may be absent; well-scoped addition. |
| nemoguardrails/guardrails/actions/content_safety_action.py | Switches ContentSafetyOutputAction from _last_user_content (raises) to _last_user_content_or_empty (returns ""); correct for output-only check use case. |
| tests/guardrails/test_iorails_check.py | New test file with comprehensive coverage of check_async auto-detect, explicit rail_types, blocked result details, sync check(), and lifecycle; missing coverage for output-only check with no assistant message. |
| tests/guardrails/test_guardrails.py | Removes check/check_async from the LLMRails-only list, adds delegation tests, and adds full end-to-end IORails check tests; well structured. |
| tests/guardrails/test_rails_manager.py | Adds TestTriggeredRail covering blocking rails populating triggered_rail and passing rails leaving it None; good coverage. |
| tests/guardrails/test_rail_action.py | Tests for the new _last_user_content_or_empty() helper; covers normal, no-user, and empty-list cases. |
| tests/guardrails/test_content_safety_iorails_actions.py | Adds test for output-only extract with no user message; confirms empty string is returned rather than raising. |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant C as Caller
participant G as Guardrails (facade)
participant IO as IORails
participant Q as AsyncWorkQueue
participant RM as RailsManager
participant RA as RailAction
C->>G: check_async(messages, rail_types)
G->>IO: _ensure_started() + check_async()
IO->>IO: start()
IO->>Q: submit(_run_check, messages, rail_types)
Q->>IO: _run_check()
IO->>IO: traced_request() span
IO->>IO: _do_check(messages, rail_types, req_id)
alt rail_types is None
IO->>IO: _determine_rails_from_messages()
note over IO: auto-detect from roles
else rail_types provided
IO->>IO: "rails_to_run = [rt.value for rt in rail_types]"
end
opt input in rails_to_run
IO->>RM: is_input_safe(messages)
RM->>RA: run(flow, messages, None)
RA-->>RM: RailResult(is_safe, triggered_rail)
RM-->>IO: RailResult
alt input blocked
IO-->>G: RailsResult(BLOCKED, REFUSAL_MESSAGE, rail)
end
end
opt output in rails_to_run
IO->>RM: is_output_safe(messages, bot_response)
RM->>RA: run(flow, messages, bot_response)
RA-->>RM: RailResult(is_safe, triggered_rail)
RM-->>IO: RailResult
alt output blocked
IO-->>G: RailsResult(BLOCKED, REFUSAL_MESSAGE, rail)
end
end
IO-->>G: RailsResult(PASSED, pass_content)
G-->>C: RailsResult
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant C as Caller
participant G as Guardrails (facade)
participant IO as IORails
participant Q as AsyncWorkQueue
participant RM as RailsManager
participant RA as RailAction
C->>G: check_async(messages, rail_types)
G->>IO: _ensure_started() + check_async()
IO->>IO: start()
IO->>Q: submit(_run_check, messages, rail_types)
Q->>IO: _run_check()
IO->>IO: traced_request() span
IO->>IO: _do_check(messages, rail_types, req_id)
alt rail_types is None
IO->>IO: _determine_rails_from_messages()
note over IO: auto-detect from roles
else rail_types provided
IO->>IO: "rails_to_run = [rt.value for rt in rail_types]"
end
opt input in rails_to_run
IO->>RM: is_input_safe(messages)
RM->>RA: run(flow, messages, None)
RA-->>RM: RailResult(is_safe, triggered_rail)
RM-->>IO: RailResult
alt input blocked
IO-->>G: RailsResult(BLOCKED, REFUSAL_MESSAGE, rail)
end
end
opt output in rails_to_run
IO->>RM: is_output_safe(messages, bot_response)
RM->>RA: run(flow, messages, bot_response)
RA-->>RM: RailResult(is_safe, triggered_rail)
RM-->>IO: RailResult
alt output blocked
IO-->>G: RailsResult(BLOCKED, REFUSAL_MESSAGE, rail)
end
end
IO-->>G: RailsResult(PASSED, pass_content)
G-->>C: RailsResult
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
nemoguardrails/guardrails/iorails.py:712-722
**False-positive block for output-only check with no assistant message**
When `rail_types=[RailType.OUTPUT]` is requested (explicitly or auto-detected) but the messages contain no assistant message, `_get_last_content_by_role(messages, "assistant")` returns `""`. `ContentSafetyOutputAction._extract_messages` then raises `RuntimeError("bot_response is required …")`, which `RailAction.run()` catches and converts to `RailResult(is_safe=False)`. `_do_check` then returns `RailsResult(status=BLOCKED, content=REFUSAL_MESSAGE)` — a false-positive block indistinguishable from a real safety verdict. Adding an early guard (`if not bot_response: return RailsResult(status=RailStatus.PASSED, content="")` or raising `ValueError`) before calling `is_output_safe` would surface the right outcome.
### Issue 2 of 2
nemoguardrails/guardrails/iorails.py:174
The `reversed()` call inside a set comprehension is a no-op — set construction is order-independent — and just allocates an extra iterator. A plain `for msg in messages` produces an identical set.
```suggestion
roles = {msg.get("role") for msg in messages}
```
Reviews (1): Last reviewed commit: "Initial checkin of checks (no tool suppo..." | Re-trigger Greptile
| if "output" in rails_to_run: | ||
| bot_response = _get_last_content_by_role(messages, "assistant") | ||
| log.info("[%s] Running output rails", req_id) | ||
| output_result = await self.rails_manager.is_output_safe(messages, bot_response) | ||
| if not output_result.is_safe: | ||
| log.info("[%s] Output blocked: %s", req_id, output_result.reason) | ||
| if self._metrics_enabled: | ||
| record_request_blocked(RailDirection.OUTPUT) | ||
| return RailsResult( | ||
| status=RailStatus.BLOCKED, content=REFUSAL_MESSAGE, rail=output_result.triggered_rail | ||
| ) |
There was a problem hiding this comment.
False-positive block for output-only check with no assistant message
When rail_types=[RailType.OUTPUT] is requested (explicitly or auto-detected) but the messages contain no assistant message, _get_last_content_by_role(messages, "assistant") returns "". ContentSafetyOutputAction._extract_messages then raises RuntimeError("bot_response is required …"), which RailAction.run() catches and converts to RailResult(is_safe=False). _do_check then returns RailsResult(status=BLOCKED, content=REFUSAL_MESSAGE) — a false-positive block indistinguishable from a real safety verdict. Adding an early guard (if not bot_response: return RailsResult(status=RailStatus.PASSED, content="") or raising ValueError) before calling is_output_safe would surface the right outcome.
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/guardrails/iorails.py
Line: 712-722
Comment:
**False-positive block for output-only check with no assistant message**
When `rail_types=[RailType.OUTPUT]` is requested (explicitly or auto-detected) but the messages contain no assistant message, `_get_last_content_by_role(messages, "assistant")` returns `""`. `ContentSafetyOutputAction._extract_messages` then raises `RuntimeError("bot_response is required …")`, which `RailAction.run()` catches and converts to `RailResult(is_safe=False)`. `_do_check` then returns `RailsResult(status=BLOCKED, content=REFUSAL_MESSAGE)` — a false-positive block indistinguishable from a real safety verdict. Adding an early guard (`if not bot_response: return RailsResult(status=RailStatus.PASSED, content="")` or raising `ValueError`) before calling `is_output_safe` would surface the right outcome.
How can I resolve this? If you propose a fix, please make it concise.| Returns ``{"rails": [...]}`` or ``None`` when there is no user/assistant | ||
| message to check. | ||
| """ | ||
| roles = {msg.get("role") for msg in reversed(messages)} |
There was a problem hiding this comment.
The
reversed() call inside a set comprehension is a no-op — set construction is order-independent — and just allocates an extra iterator. A plain for msg in messages produces an identical set.
| roles = {msg.get("role") for msg in reversed(messages)} | |
| roles = {msg.get("role") for msg in messages} |
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemoguardrails/guardrails/iorails.py
Line: 174
Comment:
The `reversed()` call inside a set comprehension is a no-op — set construction is order-independent — and just allocates an extra iterator. A plain `for msg in messages` produces an identical set.
```suggestion
roles = {msg.get("role") for msg in messages}
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Description
This PR adds support for the
checks()andchecks_async()methods on IORails to match LLMRails. This is exposed to the server via the LLMRails aliasing in nemoguardrails/init.py.This PR doesn't support tool-calling checks, and isn't stacked on top of any of the tool-calling PRs (#2030 , #2058 ). There will be a stacked PR to support tool-calling checks in the separate stack
Related Issue(s)
Verification
Pre-commit
Unit-test
Integration test with Chat
AI Assistance
Checklist