test(recorded): add rails library coverage (5/5)#1978
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
e9ed8b3 to
e3e8722
Compare
05feb29 to
4fbda3a
Compare
Greptile SummaryThis is the final PR in a 5-part stack adding recorded end-to-end test coverage for NeMo Guardrails library rails. It also ships a runtime fix to
|
| Filename | Overview |
|---|---|
| nemoguardrails/rails/llm/llmrails.py | Fixes _get_latest_user_message to return the message content string instead of the full dict; cassettes confirm the streaming prompt now renders correctly. |
| tests/recorded/rails/library/helpers.py | Provides check_rails, generate_with_fake_main, and stream_with_fake_main helpers; streaming helper intentionally passes no FakeLLMModel and relies on options={"rails": ["output"]} to prevent main-model calls. |
| tests/recorded/rails/library/test_composition.py | Composition-order tests covering regex→self-check→jailbreak→content safety→topic control ordering; cassettes confirm each rail short-circuits correctly before the next rail runs. |
| tests/recorded/rails/library/configs/full_stack/config.yml | Full-stack config now has consistent jailbreak-before-content-safety ordering matching full_stack_no_topic (previous ordering discrepancy resolved). |
| tests/recorded/rails/library/test_content_safety.py | Covers safe-pass, block, streaming block, generation block, and provider-error scenarios for NIM content safety; streaming cassette confirms the user_input fix. |
| tests/recorded/rails/library/test_injection.py | Tests injection detection (block and omit modes) plus exception propagation when enable_rails_exceptions=True; all regex-based, no cassettes needed. |
| tests/recorded/rails/library/configs.py | Centralises RailsConfigSource constants and the shared JAILBREAK_PROMPT test fixture. |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant T as Test
participant R as LLMRails
participant Regex as regex check
participant SC as self check (OpenAI)
participant JB as jailbreak detect (NIM)
participant CS as content safety (NIM)
participant TC as topic safety (NIM)
T->>R: "check_async(messages, rail_types=[INPUT])"
R->>Regex: detect_regex_pattern
Regex-->>R: passed
R->>SC: self_check_input prompt
SC-->>R: safe → passed
R->>JB: nemoguard-jailbreak-detect
JB-->>R: "jailbreak=true → BLOCKED (stops here)"
note over CS,TC: not reached when jailbreak blocks
T->>R: "stream_async(messages, generator, rails=[output])"
note over R: generator provides main content
R->>CS: content_safety_check_output (NIM)
CS-->>R: unsafe → stream error chunk emitted
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant T as Test
participant R as LLMRails
participant Regex as regex check
participant SC as self check (OpenAI)
participant JB as jailbreak detect (NIM)
participant CS as content safety (NIM)
participant TC as topic safety (NIM)
T->>R: "check_async(messages, rail_types=[INPUT])"
R->>Regex: detect_regex_pattern
Regex-->>R: passed
R->>SC: self_check_input prompt
SC-->>R: safe → passed
R->>JB: nemoguard-jailbreak-detect
JB-->>R: "jailbreak=true → BLOCKED (stops here)"
note over CS,TC: not reached when jailbreak blocks
T->>R: "stream_async(messages, generator, rails=[output])"
note over R: generator provides main content
R->>CS: content_safety_check_output (NIM)
CS-->>R: unsafe → stream error chunk emitted
Reviews (10): Last reviewed commit: "test(recorded): fix self-check-facts fal..." | Re-trigger Greptile
e3e8722 to
3b279b4
Compare
4fbda3a to
100ca5d
Compare
3b279b4 to
27732c7
Compare
039b4e1 to
9e5bfa4
Compare
27732c7 to
a759e09
Compare
9e5bfa4 to
e7713a2
Compare
a759e09 to
bbc9455
Compare
0092cf2 to
86ca895
Compare
efee56d to
d7a1262
Compare
0c5207e to
e25239a
Compare
d7a1262 to
19ec9e4
Compare
e25239a to
dac2f3c
Compare
0f531ea to
e6370fc
Compare
dac2f3c to
16ec0ef
Compare
16ec0ef to
80962e0
Compare
399c914 to
5d395f8
Compare
80962e0 to
3f40196
Compare
5d395f8 to
54bea6d
Compare
3f40196 to
c211bc1
Compare
tgasser-nv
left a comment
There was a problem hiding this comment.
Only question is on the changes to llmrails.py which seem to be out-of-scope for an integration-testing PR. The description says there were no runtime changes
| def _get_latest_user_message( | ||
| messages: Optional[List[dict]] = None, | ||
| ) -> dict: | ||
| ) -> Any: |
There was a problem hiding this comment.
Why is this change to the core runtime in the PR? It seems like a bugfix and out-of-scope for adding rails library coverage. It contradicts the PR description of "no runtime changes"
de9ab3c to
4222f94
Compare
c211bc1 to
cf71284
Compare
cf71284 to
8b07eb3
Compare
Fold library/helpers.py onto the shared build_rails()/async_chunks() instead of its own LLMRails(load_config(...)) + local _chunks (D11/F), and assert the content-safety output block via assert_blocked_generation (refusal + rail stop) rather than the weak assert_generation_response non-empty check (D6).
The test passed a list for relevant_chunks, which crashes retrieve_relevant_chunks (list + str) after the self_check_facts LLM call; the swallowed error produced the generic internal-error refusal, and the snapshot pinned that string as the expected fact-check block. A refactor that fixed the crash would have flipped this test red. Pass relevant_chunks as its canonical string form so the rail completes and the snapshot pins the real fact-check refusal. The underlying retrieve_relevant_chunks list+str bug is pre-existing on develop and tracked separately.
69745f6 to
81ef48f
Compare
8b07eb3 to
6b8a76e
Compare
Summary
Adds recorded coverage for library rails and their composition order.
Why
Built-in rails need replayable end-to-end coverage so future rail changes can be reviewed against stable behavior.
What Changed
Review Notes
This is the final stack PR and includes no additional runtime changes.
Stack Position
Part 5 of 5.
Stack Context
This stack decomposes recorded end-to-end replay coverage into reviewable slices. The PRs should be reviewed against their parent branch in the stack.
Please review each PR against its parent branch, not directly against the root base branch, except for part 1.
stack/recorded-tests-01-harnessdevelopstack/recorded-tests-02-deterministic-library-loadstack/recorded-tests-01-harnessstack/recorded-tests-03-clientsstack/recorded-tests-02-deterministic-library-loadstack/recorded-tests-04-public-apistack/recorded-tests-03-clientsstack/recorded-tests-05-library-railsstack/recorded-tests-04-public-apiValidation