test(recorded): add replay harness (1/5)#1974
Conversation
Documentation preview |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
59c19ea to
c0ff0c7
Compare
📝 WalkthroughWalkthroughThis PR introduces a comprehensive recorded/cassette-based integration testing framework using pytest-recording and VCR. It adds cassette body processing with normalization, pytest/VCR integration with YAML serialization and request matching, test assertions and helpers, validation utilities, cassette inspection tooling, Rails-specific configuration, and extensive documentation. ChangesRecorded Cassette Testing Framework
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
Makefile (1)
61-75: ⚡ Quick winDocument the new
record-teststarget in the help section.The help output does not mention the newly added
record-teststarget. Users runningmake helpwill not discover this workflow.📝 Proposed addition to help output
help: `@echo` '----' `@echo` 'test - run unit tests' `@echo` 'tests - run unit tests' `@echo` 'test TEST_FILE=<test_file> - run all tests in given file' `@echo` 'test_watch - run unit tests in watch mode' `@echo` 'test_coverage - run unit tests with coverage' + `@echo` 'record-tests - refresh recorded-test cassettes against live providers (requires API keys)' `@echo` 'docs - build docs, if you installed the docs dependencies'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` around lines 61 - 75, The help target is missing the newly added record-tests make target; update the help recipe (the help: target) to include a line describing "record-tests - run tests in record mode" (or similar) so users see it when running make help; locate the help target block and add a new echo line for "record-tests" matching the existing formatting used for other targets.tests/recorded/inspect_cassette.py (1)
25-25: ⚡ Quick winConsider making the imported functions public or restructuring.
The import of
_decode_body_json,_decode_body_text, and_stream_payloads_from_bodycouples this module to internal implementation details of the cassette module. Since these functions are being reused in a related tool, consider either:
- Making these functions public in
cassette.py(removing the underscore prefix)- Moving cassette inspection functionality into
cassette.pyas a public APIThis concern was previously raised in an earlier review.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/recorded/inspect_cassette.py` at line 25, This module imports private functions _decode_body_json, _decode_body_text, and _stream_payloads_from_body from cassette, which couples tests/recorded/inspect_cassette.py to cassette's internals; either make those functions public by renaming to decode_body_json, decode_body_text, and stream_payloads_from_body in cassette (and update callers), or move the cassette inspection utilities into a new public API in cassette (expose functions with those public names) and update inspect_cassette to import the public symbols instead; update any references to the old underscored names to use the new public function names (decode_body_json, decode_body_text, stream_payloads_from_body) so the dependency is on the public API rather than private internals.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/recorded/cassette.py`:
- Around line 123-147: The SSE helpers _sse_body_payloads and _sse_payloads_body
must only attempt a structured parse/rehydration when the input exactly matches
the simple single-line-per-event "data: <json>" format; update
_sse_body_payloads to validate the entire stream (reject any lines starting with
"event:", "id:", ":" comment lines, blank lines between events, or multi-line
data: blocks) and return None to force raw-body storage if it isn't a strict
single-line JSON-per-data event stream, and update _sse_payloads_body to only
rehydrate when given payloads that were produced by that strict parser
(otherwise avoid emitting a transformed body); apply the same strict-format
guard to the other SSE helpers referenced in the diff so replay always falls
back to raw-body unless a lossless round-trip is guaranteed.
In `@tests/recorded/conftest.py`:
- Around line 249-255: The current logic only scrubs when
_decode_json_body(request.body) returns a dict, leaving non-object JSON (lists,
strings, numbers, booleans) un-scrubbed; change the branch so that whenever
_decode_json_body succeeds (returns any JSON value), you call
_scrub_request_json(data) and then set request.body =
_encode_body_like(request.body, <scrubbed>), i.e. remove the isinstance(data,
dict) guard and always re-encode the scrubbed result (still keeping the existing
except block for decode errors) so batch payloads like lists get scrubbed too.
In `@tests/recorded/normalization.py`:
- Line 82: The code sets text = chunk.get("text") if it's a string but falls
back to chunk.get("content") without validation; update the assignment so the
fallback is only used when chunk.get("content") is also a string (e.g., text =
chunk.get("text") if isinstance(chunk.get("text"), str) else
chunk.get("content") if isinstance(chunk.get("content"), str) else None or ""),
ensuring both chunk.get("text") and chunk.get("content") are type-checked before
assigning to text.
---
Nitpick comments:
In `@Makefile`:
- Around line 61-75: The help target is missing the newly added record-tests
make target; update the help recipe (the help: target) to include a line
describing "record-tests - run tests in record mode" (or similar) so users see
it when running make help; locate the help target block and add a new echo line
for "record-tests" matching the existing formatting used for other targets.
In `@tests/recorded/inspect_cassette.py`:
- Line 25: This module imports private functions _decode_body_json,
_decode_body_text, and _stream_payloads_from_body from cassette, which couples
tests/recorded/inspect_cassette.py to cassette's internals; either make those
functions public by renaming to decode_body_json, decode_body_text, and
stream_payloads_from_body in cassette (and update callers), or move the cassette
inspection utilities into a new public API in cassette (expose functions with
those public names) and update inspect_cassette to import the public symbols
instead; update any references to the old underscored names to use the new
public function names (decode_body_json, decode_body_text,
stream_payloads_from_body) so the dependency is on the public API rather than
private internals.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c85977e8-5813-462b-90cc-5fc52b1fe6d4
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (20)
Makefilepyproject.tomlpytest.initests/recorded/README.mdtests/recorded/__init__.pytests/recorded/assertions.pytests/recorded/cassette.pytests/recorded/conftest.pytests/recorded/fake_cassettes.pytests/recorded/inspect_cassette.pytests/recorded/normalization.pytests/recorded/rails/__init__.pytests/recorded/rails/conftest.pytests/recorded/rails_config.pytests/recorded/sanitization.pytests/recorded/snapshots.pytests/recorded/test_cassette_sanitization.pytests/recorded/test_fake_cassettes.pytests/recorded/test_inspect_cassette.pytests/recorded/utils.py
Greptile SummaryThis PR introduces the recorded-test replay harness: cassette serialization with readable
|
| Filename | Overview |
|---|---|
| tests/recorded/cassette.py | Core cassette helpers: deepcopy mutation-safety, isinstance guard on yaml.safe_load, or-[] on interactions, defensive try/except in _stream_payloads — all previously flagged issues are addressed. |
| tests/recorded/conftest.py | Fixtures, VCR config, scrubbing hooks: logic is sound; the autouse asyncio.run() teardown is function-scoped and enforced by monkeypatch dependency. |
| tests/recorded/sanitization.py | Sanitization constants and regex patterns look comprehensive; SECRET_PATTERNS cover OpenAI keys, NVIDIA keys, bearer tokens, org/project IDs. |
| tests/recorded/utils.py | set_api_key_for_record_mode correctly sets the real env key in record mode but always returns dummy_value — fixture consumers that use the return value directly receive the wrong key (previously flagged). |
| tests/recorded/normalization.py | normalize_stream_chunks calls json.loads without try/except on error-prefix strings; a malformed error chunk raises JSONDecodeError instead of a clean assertion failure. |
| tests/recorded/assertions.py | assert_blocked_stream_error uses an unguarded json.loads list comprehension on error-prefix chunks; same class of issue as normalization.py. |
| Makefile | record-tests target is well-structured: record → snapshot-fill → verify; fake_cassette exclusion is correctly applied only to the record step. |
| tests/recorded/rails_config.py | RailsConfigSource descriptor and lru_cache-backed loader with model_copy deep copy look correct. |
| tests/recorded/fake_cassettes.py | Fake-cassette header parsing and metadata validation are straightforward and correct. |
| tests/recorded/inspect_cassette.py | cassette_summary uses isinstance(data, dict) guard (from the previous fix); CLI entry point looks clean. |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Dev as Developer
participant Make as make record-tests
participant VCR as pytest-recording (VCR)
participant Provider as LLM Provider
participant Cassette as cassette .yaml
Dev->>Make: make record-tests
Make->>VCR: "pytest --record-mode=all -m "not fake_cassette""
VCR->>Provider: real HTTP request (scrubbed via before_record_request)
Provider-->>VCR: real HTTP response
VCR->>VCR: before_record_response (sanitize secrets, normalize SSE)
VCR->>Cassette: write readable YAML (parsed_body blocks)
Make->>VCR: "pytest --block-network --inline-snapshot=create"
VCR->>Cassette: read + cassette_with_rehydrated_bodies
VCR-->>VCR: replay recorded response
VCR->>VCR: assertions (recorded_chat_response, cassette_request_jsons)
Make->>VCR: pytest --block-network (verify snapshots)
VCR->>Cassette: read + replay
VCR-->>Dev: all tests pass
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Dev as Developer
participant Make as make record-tests
participant VCR as pytest-recording (VCR)
participant Provider as LLM Provider
participant Cassette as cassette .yaml
Dev->>Make: make record-tests
Make->>VCR: "pytest --record-mode=all -m "not fake_cassette""
VCR->>Provider: real HTTP request (scrubbed via before_record_request)
Provider-->>VCR: real HTTP response
VCR->>VCR: before_record_response (sanitize secrets, normalize SSE)
VCR->>Cassette: write readable YAML (parsed_body blocks)
Make->>VCR: "pytest --block-network --inline-snapshot=create"
VCR->>Cassette: read + cassette_with_rehydrated_bodies
VCR-->>VCR: replay recorded response
VCR->>VCR: assertions (recorded_chat_response, cassette_request_jsons)
Make->>VCR: pytest --block-network (verify snapshots)
VCR->>Cassette: read + replay
VCR-->>Dev: all tests pass
Reviews (8): Last reviewed commit: "test(recorded): make replay proxy-indepe..." | Re-trigger Greptile
b3c164e to
f2aa414
Compare
|
Staged Fern docs preview: https://nvidia-preview-pr-1974.docs.buildwithfern.com/nemo/guardrails |
d44488d to
68db0d2
Compare
ab362c6 to
4aa1590
Compare
6bf95dc to
b6d7a5b
Compare
tgasser-nv
left a comment
There was a problem hiding this comment.
Looks good, just a few minor comments before merging
Foundation for converging the recorded suite's cross-surface drift, consumed by the public_api and library layers above: - rails/helpers.py: shared build_rails() construction helper + async_chunks() (replaces the LLMRails(load_config(...)) boilerplate inlined per test, D11/F). - assertions.py: assert_blocked_generation() asserts refusal + rail stop semantics, not just non-empty text (D6).
Replay under --block-network must not depend on ambient proxy env: a SOCKS proxy makes httpx raise ImportError (missing socksio) on a cassette hit, turning a deterministic replay into a shell-dependent error. Add an autouse fixture that strips proxy vars during replay (record_mode == none) while leaving them intact for recording. Also fix the README 'Adding a test' snippet to include the imports it relies on (LLMRails, load_config, suite-local snapshot, OPENAI_BASELINE_CONFIG) so a new contributor can copy-paste it and land on the intended snapshot re-export.
e06a4fe to
e522751
Compare
Summary
Adds the recorded-test replay harness, cassette sanitization, pytest markers, refresh target, and harness self-tests.
Why
The replay infrastructure needs to be reviewed separately from provider-specific coverage so cassette handling and sanitization rules are clear.
What Changed
Review Notes
No provider coverage is introduced in this PR beyond helper tests for the harness itself.
Stack Position
Part 1 of 5.
developStack Context
This stack decomposes recorded end-to-end replay coverage into reviewable slices. The PRs should be reviewed against their parent branch in the stack.
Please review each PR against its parent branch, not directly against the root base branch, except for part 1.
stack/recorded-tests-01-harnessdevelopstack/recorded-tests-02-deterministic-library-loadstack/recorded-tests-01-harnessstack/recorded-tests-03-clientsstack/recorded-tests-02-deterministic-library-loadstack/recorded-tests-04-public-apistack/recorded-tests-03-clientsstack/recorded-tests-05-library-railsstack/recorded-tests-04-public-apiValidation
Summary by CodeRabbit
New Features
Tests
Documentation
Chores
record-testsbuild target and updated project configuration for cassette recording.