feat(correctness): fan out from one millstone process#1724
Conversation
Millstone now holds multiple sinks per process and writes the same payload bytes to all configured targets. Eliminates per-payload divergence between Agent (baseline) and Agent+ADP (comparison) by construction. - millstone: `targets:` named map replaces single `target:`; in-process fan-out via TargetSender; fail-fast with named-target errors. - panoramic: single millstone process per test for both Docker and k8s paths; shared helpers for $GROUP placeholder resolution and socket enumeration. - 19 `test/correctness/*/millstone.yaml` migrated to `targets:` shape. - 4 new millstone unit tests + 6 new panoramic unit tests.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c353efc6f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Binary Size Analysis (Agent Data Plane)Baseline: f907c91 · Comparison: 5e9ddc9 · diff ✅ Binary size difference within thresholdChanges by Module
Detailed Symbol Changes |
Regression Detector (Agent Data Plane)Run ID: Optimization Goals: ✅ No significant changes detectedFine details of change detection per experiment (35)Experiments configured
Bounds Checks: ✅ Passed (5)
ExplanationA change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression ( |
tobz
left a comment
There was a problem hiding this comment.
Left a comment about trying to simplify things so that we're not adding more boilerplate.
| /// The targets to send payloads to. | ||
| /// | ||
| /// Each generated payload is sent to every target in turn (fan-out), in declared order, before | ||
| /// the next payload is generated. Keys are free-form target names (used in logs and errors); | ||
| /// values are address strings parsed the same way as a single target. At least one target is | ||
| /// required. | ||
| // TODO: the names of the target can only be: baseline and/or comparison | ||
| pub targets: IndexMap<String, TargetAddress>, |
There was a problem hiding this comment.
Any reason why not to just make it baseline_target and comparison_target in this PR? Or, alternatively, create a dead simple TargetConfiguration struct with baseline: TargetAddress, etc.
There was a problem hiding this comment.
Or, maybe even simpler: just make it use the same target address definition for both baseline and comparison automatically?
The updated correctness test configurations all have identical target addresses, so it doesn't seem like we actually need to differentiate at all? Like just do it internally, implicitly.
Summary
Reduce payload timing divergence by running a single millstone process that fans out to both
baselineandcomparisonin correctness tests.Before this change, the millstone container ran
sh -c '... & P1=$! ... & P2=$! ...'to fork twoindependent millstones - one per target which is one place where drift may be introduced.
In follow up PRs I intend to extend the solution to include
Key changes:
bin/correctness/millstone:Configacceptstargets:as a named map (baseline:/comparison:) instead of a singletarget:;TargetSenderfans out each generated payload toall configured sinks; driver and corpus collapsed to a single send loop; errors include the
target name.
bin/correctness/panoramic: Docker (runner.rs) and k8s (k8s.rs) paths each spawn onemillstone container/pod per test instead of two. New shared helpers in
correctness/config.rs(
resolve_group_placeholders,millstone_targets_all_sockets,millstone_first_network_port)substitute the
$GROUPplaceholder per-target on the host. The resolved YAML is written underthe per-test
log_dir(deliberately notmounts_dir, which is overlaid into the agentcontainers).
test/correctness/*/millstone.yamlmigrated mechanically to thetargets:shape.Change Type
How did you test this PR
Locally on macOS against rebuilt
correctness-toolsanddatadog-agentimages at this commit:-d test/integration/cases -d test/correctness) parallel (-p 4default): 51/52pass. Single failure:
dsd-origin-detection-matrix/unified-high-cardinality— window-edgedivergence, the residual window boundary failure class that subsequent PRs intend to target.
-p 1): 52/52 pass. This is slow! Parallelization is a good idea if we can fix it.Compare to the pre-fanout parallel baseline of 50/52 with all four
dsd-origin-detection-matrix/*variants plus
dsd-service-checksfailing under value-divergence.No leaked airlock resources after either run. I did see leaked airlock resources on an aborted run, and that is something I also intend to work on downstream.
References
Not directly, but documenting my ongoing sensitivity to local flakes:
dsd-mapper-blocklistfails locally. #1579