feat(replay): self-validating fixtures + throughput benchmark (Rex5 mainnet, deploy registry) by RealiCZ · Pull Request #306 · megaeth-labs/mega-evm

RealiCZ · 2026-06-09T08:31:53Z

Summary

Adds a self-validating replay-fixture workflow and a replay-throughput benchmark to mega-evme, plus the foundation to make them correct: Rex5 mainnet activation, a registry-driven system-contract deploy refactor, and a unified hardfork schedule.

mega-evme replay --dump-fixture <FILE> turns one replayed mainnet transaction into a self-contained EEST state-test fixture (pre-state + tx + env + megaEnv + post roots), written only if the local replay reproduces the on-chain receipt's gas and status. A single committed corpus of such fixtures is used two ways — validated for correctness on every PR, and timed for throughput (base-vs-PR) on demand — making it the real-transaction counterpart to the synthetic Criterion suite.

Foundation (consensus-sensitive — please review closely)

Rex5 mainnet activation — the mainnet hardfork config had Rex5 as Never, so replaying post-Rex5 mainnet blocks ran under Rex4 and produced wrong gas. Set Rex5 to Timestamp(1780632000) with the mainnet SequencerRegistryConfig. This is the only behavior change: it fixes when Rex5 activates on mainnet; no spec's semantics change (all specs stay frozen).
Unified hardfork schedule (block/chain.rs) — one source for the mainnet (4326) / testnet (6343) activation timestamps and hardfork_schedule.
Registry-driven system-contract deploy (system/deploy.rs) — a declarative SystemContractSpec + a shared transact_deploy replace five near-identical deploy functions; each contract exposes one *_spec builder as the single source of its gate / bytecode-version / force-create logic. Behavior-preserving: deploy order and per-spec semantics are byte-identical, and SequencerRegistry stays a separate seeded deploy.

Feature

mega-evme replay --dump-fixture — dump a self-validating EEST TestUnit. An on-chain fidelity gate (gas + success status) rejects the dump unless the replay reproduces the receipt, so a wrong spec/config can't slip through. Fixtures are byte-reproducible (ordered pre / storage / megaEnv) and reject transaction overrides and --override.spec.
state-test gains two modes — --bench times a fixture's isolated EVM transact (min/median/mean + Mgas/s, JSON) and is the single throughput-benchmark surface; --fill computes and writes a fixture's post in place (the offline analog of the dump's post-fill, for fixtures with no on-chain origin).
One corpus, two uses (bench/replay/fixtures/) — five self-contained fixtures (system-contract call, plain call, log-heavy, large DeFi, and the 142M-gas attack deploy from feat(bench): add attack_replay benchmark #299): validated per-PR by replay_corpus.rs and timed by an ABBA base-vs-PR driver (run.py) in a new replay-bench.yml workflow.
Criterion CI — add the existing attack_replay (feat(bench): add attack_replay benchmark #299) bench to the compared targets so its limit-tracker regression target is actually observed.
state-test schema/runner — pre and account storage are now BTreeMap (byte-stable fixtures); a symmetric to serializer so CREATE transactions round-trip (None ↔ ""); added megaEnv and megaGasUsed/megaStatus expectations; the production SALT hasher moved into mega-evm behind a test-utils feature so fixtures replay with mainnet bucket pricing. mega-t8n updated for the storage type.

Testing

mega-evm / state-test / mega-t8n suites pass; clippy, fmt, cargo sort, prettier, and the no_std/riscv check are clean.
New tests: dump round-trip + tamper detection, per-PR corpus self-validation, end-to-end --dump-fixture CLI write+validate, byte-reproducibility, the dump override/spec guards, and state-test --bench / --fill of a fixture.
Verified end-to-end against mainnet RPC: replay gas/status match on-chain receipts; dumped fixtures self-validate offline; the attack fixture reproduces the source's exact 141,927,106 gas under Rex5.

Notes / follow-ups

The replay benchmark complements (does not replace) the Criterion suite; sub-10µs cases are noisy on shared runners, so read the larger cases and the overall trend.
More corpus cases can be added incrementally — each is one self-contained JSON plus a manifest line (a mined tx via --dump-fixture, a non-on-chain case via --fill).

…hput Add `mega-evme replay --dump-fixture <FILE>` to produce a self-validating state-test fixture in a single command, and `--bench-runs`/`--bench-warmup` to measure EVM throughput (Mgas/s) in isolation. Unify replay fixtures onto the state-test TestUnit schema: optional megaEnv carrier (SALT bucket capacities + oracle storage), explicit megaGasUsed/ megaStatus expectations, and SpecName Rex variants. Promote the SALT bucket hasher into mega-evm (test-utils) so replay and state-test share it.

mega-evme replay capped mainnet at Rex4 (Rex5 was ForkCondition::Never), so current mainnet transactions replayed under the wrong spec and produced gas that diverged from on-chain receipts. Activate Rex5 at the mainnet timestamp (1780632000) and attach the SequencerRegistryConfig required by the Rex5 block executor; the seeded sequencer/admin are the on-chain SequencerRegistry values (the live system address is read from forked state at replay time).

A dumped fixture is validated under the same spec it was dumped with, so self-validation cannot detect a wrong spec / hardfork config. Before building a fixture, fetch the transaction's on-chain receipt and require the local replay to reproduce its gasUsed exactly; abort the dump (no file written) on mismatch. This makes --dump-fixture / --bench-runs faithful by construction and would catch a misconfigured hardfork schedule loudly.

…d primitive Every predeploy was deployed by the same 5-step state patch, duplicated across six transact_deploy_* functions and re-implemented again in mega-evme. Extract a declarative SystemContractSpec + one transact_deploy primitive that all wrappers delegate to, and a canonical flat_system_contract_specs registry that both the block executor and mega-evme iterate instead of hardcoding the per-contract gate and bytecode selection. The SequencerRegistry keeps its config-seeded, guarded wrapper but shares the same primitive. Also moves the per-chain hardfork schedules (real mainnet/testnet timestamps + SequencerRegistryConfig) from the mega-evme bin into mega-evm (block::chain::hardfork_schedule) as a single exportable source of truth; the replay command now delegates to it. Behavior is byte-identical for all frozen specs (969 mega-evm tests pass; replay gas still matches on-chain receipts exactly).

Commit three self-validating EEST fixtures dumped from real MegaETH mainnet transactions (mega-evme replay --dump-fixture; gas verified against on-chain receipts at dump time): a plain call, an Oracle system-contract call, and a large 726k-gas call. Add the replay_corpus integration test (runs under cargo test --workspace) and a dedicated Replay Fixtures workflow that re-executes the corpus through state-test offline — a deterministic per-PR guard that fails if a code change alters gas/status/post-state for the covered spec. Rex3/Rex4 coverage to follow once an archive RPC is available (the current endpoint prunes historical state).

The per-spec max-blobs config was duplicated verbatim between execute_test_suite and run_unit_once (the dump/timing path). Extract a single configure_max_blobs helper both call, so the validation and dump paths cannot silently drift.

…_all_activated Address review: the deploy registry duplicated each contract's gate and bytecode-version selection (notably the Oracle v1.0.0/v1.1.0/v2.0.0 + force-create logic), and after the registry refactor the per-contract transact_deploy_* wrappers became test-only — so the substantial deploy tests exercised the wrapper copy, not the production registry path. Extract one *_spec(hardforks, ts) -> Option<SystemContractSpec> builder per contract as the single source; the wrappers delegate to it and flat_system_contract_specs composes the builders. The existing deploy tests now cover the production path, and the version/gate logic lives in one place. Also reuse MegaHardforkConfig::with_all_activated() in all_activated_hardforks() instead of re-listing all nine forks.

…ia ReplayObservation Address review of the fixture/replay path: - Reject --dump-fixture/--bench-runs + tx overrides at the start of run(), before forking state, fetching the receipt, and replaying preceding txs. - Make state-test's execution_status public and derive the fixture's gas, status, and output from a single ExecutionResult (new ReplayObservation), so the dumped status string can no longer drift from the validation side. Drops build_draft's allow(too_many_arguments). - Fold the dump megaEnv snapshot and on-chain receipt gas into one Option, so the two are computed together (no implicit "both Some" invariant / expect). - Guard run_bench against zero runs.

…offline Previously --json printed the replay summary as one JSON object and the benchmark stats as a second — breaking any consumer that parses stdout as a single value. Compute the benchmark before rendering and fold it into the summary object under a 'bench' field (text mode still appends a block). Add offline integration tests (committed RPC capture incl. receipt) asserting the output is a single JSON document with the bench field, and that --bench-runs/--dump-fixture reject transaction overrides.

…ec; test dump CLI Address review of the fixture fidelity model: - Strengthen the on-chain fidelity gate from gas-only to gas + success status. A replay with coincidentally-matching gas but a different status no longer produces a self-consistent fixture. Logs are not compared separately: LOG opcodes are metered, so differing logs would already change gas. - Reject --override.spec with --dump-fixture/--bench-runs (alongside the existing transaction-override rejection): a forced spec would record a what-if, not the on-chain transaction. - Add offline integration tests for the end-to-end --dump-fixture CLI write path (validated through the state-test runner) and the --override.spec rejection. - Update help text and docs to state the gate covers gas and status, and the spec-override incompatibility.

…age/megaEnv) A dumped fixture's pre-state map, per-account storage, and megaEnv bucket/oracle lists came from hash-map / unordered iteration, so two dumps of the same transaction produced byte-different files (an online dump never matched its offline re-dump). Validation is content-based so this was only cosmetic, but it made the committed corpus non-reproducible and re-dumps a giant noisy diff. - state-test: TestUnit.pre and AccountInfo.storage are now BTreeMap, so they serialize in a deterministic (address / slot) order. All consumers insert into a DB or look up by key, so ordering is irrelevant to execution. - mega-evme: build_pre_state builds a BTreeMap; the dump sorts the megaEnv bucket-capacity and oracle-storage lists. - Re-normalized the three committed Rex5 corpus fixtures (re-dumped; state root, gas, and status unchanged — only key order). Verified: an online dump and two offline re-dumps of the same tx are now byte-identical, and the corpus still self-validates.

…orpus Adds the real-transaction counterpart to the synthetic Criterion suite: replay a corpus of characteristic mainnet transactions offline and compare PR throughput against the merge-base, so a change that speeds up or slows down real execution is visible on the PR. - bench/replay/captures/: four committed offline RPC captures (system-contract call, plain medium call, log-heavy, large multi-call DeFi) spanning ~2k–20k Mgas/s, each carrying its on-chain receipt. - bench/replay/manifest.json: corpus description (tx, expected_gas, category). - bench/replay/run.py: driver that replays each case via 'replay --bench-runs --json' against one or two binaries, ABBA-interleaved to cancel machine drift, verifies replayed gas matches the corpus, and emits a markdown Δ% table + JSON with a regression exit code (no third-party deps). - .github/workflows/replay-bench.yml: weekly / dispatch / member /benchmark-replay comment; builds PR + merge-base binaries, runs the driver, posts the table. Feature-detects --bench-runs on the baseline and falls back to measure-only (the baseline that predates this feature cannot be compared). - Docs: bench/replay/README.md + a pointer from the replay command docs. Verified locally end-to-end: measure mode, base-vs-PR compare (same binary reads as noise, 0 regressions), and the --fail-on-regression exit path.

The attack_replay bench (#299) replays a real mainnet attack contract deployment and is the hermetic regression target for limit-tracker / hot-path work (the 30× EQUIVALENCE→MINI_REX gap from AdditionalLimit accounting, e.g. caching net_usage in FrameLimitTracker). It existed as a bench target but was never in the base-vs-PR comparison set, so regressions on it went unobserved. Add it to BENCH_TARGETS so the Criterion comparison covers it. This is the right home for it: it is a prestate-snapshot replay of a *simulated* CREATE (no mined tx / receipt; its block is pruned), so it cannot be a replay-bench corpus case, and its cross-spec + vanilla-revm diagnostic is one the replay track (production spec, mega-vs-mega) cannot reproduce.

…e cases The replay-bench corpus could only hold real mined transactions (RPC captures). Attack/edge cases like attack_replay (#299) come from a prestateTracer snapshot, not a mined tx, and their block is pruned — so they need an archive node to capture, or cannot be captured at all. But they are self-contained (prestate + tx + env), so they need no network to *replay*. Add a second corpus input type so any conformant fixture JSON can be benchmarked: - state-test: 'state-test --bench' times a fixture's isolated EVM execution (reusing time_unit_execution) and emits the same {gas_used, success, bench{…}} shape as 'mega-evme replay --bench-runs', so the driver parses both identically. Adds bench_test_suite + UnitBench; SpecName is now Copy. - driver: a manifest case may be 'type: fixture' (state-test) or 'type: capture' (mega-evme, default); the driver resolves state-test next to mega-evme. - corpus: fixtures/attack_deploy.json — the #299 mainnet attack deploy converted from its prestate snapshot to a fixture. Reproduces the source's exact 141,927,106 gas under Rex5, so this non-on-chain, 142M-gas limit-tracker stress case is now tracked for regressions with no archive node. No archive node needed: a conformant JSON is enough.

…ures A dumped fixture already carries its prestate, so it is self-contained — and mega-evme replay --bench-runs and state-test --bench time the same execution (both call time_unit_execution on the unit). The capture cases were therefore redundant: the only difference was building the unit from an RPC cache vs loading it from a committed JSON. Make the corpus uniform — one format (state-test fixture), one tool (state-test --bench): - Dump the four mined-tx cases to self-contained fixtures (smaller than the captures they replace) and drop bench/replay/captures/. - manifest: all cases are now type: fixture. - driver/CI: --bin is a build directory holding mega-evme + state-test; CI builds both per version and feature-detects state-test --bench (measure-only fallback when the baseline predates it). - README: document the self-contained TestUnit as the unit of the benchmark. The 'capture' input type stays in the driver for ad-hoc use, but the committed corpus is fixtures only: a characteristic workload = one conformant JSON, with real mined txs and non-on-chain cases (attack snapshots) on the same path.

Review pass over the branch surfaced patch-cruft to remove: - bench/replay driver + CI: drop the dead 'capture' input path. The committed corpus is fixtures-only and ad-hoc single-tx benching is 'mega-evme replay --bench-runs' directly, so the driver is now state-test-only (simpler resolve, no cross-consumer JSON-shape drift). CI builds only state-test. - run.py: fix the stale module docstring (it described RPC captures). - state-test: remove the unused UnitBench::spec field. - mega-evme: remove the dead ReplayOutcome::original_tx field and the struct-wide #[allow(dead_code)] it was masking. - state-test: comment configure_max_blobs to record that OSAKA's 6 is distinct from the pre-PRAGUE 6 (so the OSAKA arm is not 'simplified' away).

The two fixture sets overlapped: crates/state-test/tests/fixtures/replay/rex5_* (correctness, validated per-PR) and bench/replay/fixtures/* (performance, timed) were the same EEST format with near-duplicate Rex5 mainnet workloads, and the correctness set was even validated twice (replay_corpus.rs in 'cargo test' AND a dedicated replay-fixtures.yml workflow doing the same thing). Collapse to a single corpus under bench/replay/fixtures/, used both ways: - replay_corpus.rs now globs bench/replay/fixtures/ and validates each fixture's recorded post-state every PR (the four dumped mined-tx fixtures carry post; attack_deploy is bench-only with empty post, naturally skipped). - run.py keeps timing the same corpus on demand. - Delete the duplicate crates/state-test/tests/fixtures/replay/ (rex5_* + README) and the redundant replay-fixtures.yml (cargo test already covers it). - Repoint bench_fixture.rs; update the README to document the one-corpus model. A fixture added for either purpose now serves both.

…orpus) attack_deploy was bench-only (empty post) because it was hand-converted from a prestateTracer snapshot and never had its post-state roots computed. Add the missing piece so the corpus is fully uniform — every fixture validated + benched. - state-test --fill: compute and write each fixture's post in place, reusing the same execute_unit_collect + Test::for_dump the dump path uses. The offline analog of --dump-fixture's post-fill, for fixtures with no on-chain origin. - Fix a latent serialization bug: TransactionParts.to deserialized "" -> None (CREATE) but serialized None -> null, which the deserializer then rejects — so re-serializing any CREATE (via --fill or --dump-fixture) produced an unreadable fixture. Add a symmetric serialize_maybe_empty (None -> ""). - Fill attack_deploy's post under Rex5 (state root + logs root + 141,927,106 gas + success); replay_corpus now validates it like the others (5/5). - Update manifest/README/test docs: attack_deploy is no longer bench-only. Only attack_deploy.json changes; the four CALL fixtures are byte-identical (Some address serialization is unchanged).

The crate README was a stub describing only the revme fork; the new --bench and --fill CLI modes were documented only in --help and bench/replay/README. Add a Modes section to the state-test README covering all three with their usage scenarios, so --fill's purpose (offline post-fill for non-on-chain fixtures) is discoverable from the tool's own docs.

…face Benchmarking was implemented twice — mega-evme replay --bench-runs and state-test --bench — both wrapping the same time_unit_execution and emitting the same JSON, with their own stats struct, JSON serializer, and timing loop. After the corpus unified on self-contained fixtures, the replay-bench driver only used state-test --bench, leaving --bench-runs a redundant convenience. Collapse benchmarking to state-test --bench: - mega-evme: drop --bench-runs / --bench-warmup, BenchStats, run_bench, the cmd.rs bench JSON/text rendering, and the bench branch of output_results. replay's job is now replay + --dump-fixture only. - The up-front override/spec guards now apply to --dump-fixture (unchanged behavior; --bench-runs no longer exists). - Tests: rename replay_bench.rs -> replay_dump.rs (it only tests --dump-fixture now); drop the two bench-output tests; repoint the tx-override guard test to --dump-fixture. Rename the capture replay_offline_bench.cache.json -> replay_offline.cache.json. - Docs: scrub mega-evme --bench-runs from replay.md, both READMEs, and code comments; point throughput benchmarking at 'dump + state-test --bench'. Ad-hoc single-tx benching is now: replay --dump-fixture then state-test --bench.

codecov · 2026-06-09T08:38:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.8%. Comparing base (e1f4f68) to head (874de89).

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…orkflow CodeQL flagged the workflow (Untrusted Checkout TOCTOU): a single job held a pull-requests: write token AND did gh pr checkout + built/ran the PR's (untrusted, possibly fork) code, so a malicious /benchmark-replay PR could exfiltrate the write-scoped token. Adopt the same hardened split benchmark.yml uses (#305): - prepare (write token): resolves feature/baseline SHAs via the API only — no checkout, never materializes PR code — and reacts to the comment. - bench (read-only: contents+pull-requests read): checks out, builds, and runs the PR code at pinned SHAs; no write-scoped token to abuse. Writes the report to the job summary and uploads it as an artifact. - compare (write token): downloads the report artifact and posts the PR comment; never touches PR code. Behavior is unchanged; only the trust boundary moved.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 58fc1e3e95

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

claude · 2026-06-09T08:44:32Z

Review

Reviewed the consensus-sensitive changes, the deploy registry refactor, fidelity gate logic, and the tooling additions. No correctness bugs found.

Rex5 mainnet activation

The only consensus-sensitive change is chain.rs:33: ForkCondition::Never → Timestamp(1780632000) for mainnet Rex5, plus the SequencerRegistryConfig seeding addresses. The boundary tests in chain.rs correctly verify the spec transition at that timestamp. One ask: could you confirm that 1780632000, rex5_initial_sequencer, and rex5_initial_admin were read directly from the on-chain SequencerRegistry and the MegaETH published upgrade schedule? These are the only values I cannot verify locally, and they silently produce wrong execution if wrong.

Deploy registry refactoring

flat_system_contract_specs emits exactly the old per-contract deploy order (Oracle → HPTimestamp → KeylessDeploy → AccessControl → LimitControl), with the Oracle's force_create_on_upgrade flag set correctly: true pre-Rex5 (matches the old transact_deploy_oracle_contract that called mark_created() on upgrade), false on Rex5+ (preserves storage). SequencerRegistry is correctly excluded. Behavior-identical with the old imperative code.

Minor efficiency note on transact_deploy_sequencer_registry: it calls db.load_cache_account for the guard check, then crate::transact_deploy calls it again. The second call is a cache hit so it is benign, but the guard could return early on a code-hash match to avoid the fall-through.

Fidelity gate

The two-level design is sound: (1) on-chain receipt gas+status guards against a wrong hardfork config; (2) isolated state-test re-execution cross-checks gas+status+output to verify the captured pre-state is complete. The post logs root from the isolated run provides the hard log fidelity guarantee.

`state-test` as a production dependency

state-test is now in [dependencies] (not [dev-dependencies]) in mega-evme/Cargo.toml because finalize_and_write calls execute_unit_collect from production code. Logically correct — the dump-fixture workflow needs the runner at runtime. Since mega-evm [test-utils] was already a production dep of mega-evme (for the SALT hasher), this extends an existing pattern. Worth noting: anyone building a minimal mega-evme gets the full state-test runner in their binary. A dump-fixture Cargo feature could gate this if binary size ever becomes a concern, but not a blocker.

Everything else

No gas leakage pitfalls: transact_deploy operates at block pre-execution level, not inside any EVM call frame.
No stable-spec behavior changes: all spec-specific paths remain spec-gated.
BTreeMap for pre and storage: byte-reproducible fixtures confirmed by the new byte-reproducibility test.
CREATE to round-trip (None ↔ ""): serialize_maybe_empty / deserialize_maybe_empty are symmetric.
no_std compliance: deploy.rs follows the alloc as std pattern correctly.
Workflow: cancel-in-progress: true is safe for an opt-in benchmark job.

Overall: logic is correct, tests are thorough, and the Rex5 activation fix is well-motivated. The one ask is confirmation of the three externally-sourced values above.

claude · 2026-06-09T08:47:38Z

Review

Consensus-sensitive changes (Rex5 activation + deploy refactor) — reviewed closely; no issues found.

Rex5 mainnet timestamp (1780632000) and SequencerRegistryConfig addresses match the on-chain registry values confirmed in the PR description.
Deploy refactor: behavior-preserving across all five contracts. Deploy order is identical, Oracle's force_create_on_upgrade pre/post-Rex5 split is correctly modelled in SystemContractSpec, and the SequencerRegistry's foreign-code-hash guard is preserved (the guard fires before transact_deploy is reached). The executor loop pushes the same outcomes as the old per-contract sequence.
block/chain.rs canonical schedule is clean; tests cover the spec boundary at 1780631999 vs 1780632000.

Feature (dump-fixture + bench/fill) — design is sound.

Two-phase fidelity gate (on-chain receipt + isolated re-execution cross-check) correctly prevents a misconfigured spec from producing a silently wrong fixture.
Pre-state closure (build_pre_state): called before commit_transaction_outcome, so db_ref() still reflects the pre-target-TX cache; zero-value slot omission is correct (unset storage defaults to zero, and within-TX warmup is replayed identically).
AHashBucketHasher is correctly threaded through external_envs_for so bucket IDs in the fixture match those recorded during replay.

CI workflow (replay-bench.yml) — security design is correct. The bench job runs untrusted code read-only; prepare/compare hold write scope but never check out PR code. The issue_comment trigger is properly gated on MEMBER/COLLABORATOR/OWNER association.

One minor inline note on SpecName::to_spec_id() — the new Equivalence variant falls through the _ => MegaSpecId::EQUIVALENCE catch-all. Works correctly today but could silently absorb a future MegaETH variant. Suggested making it explicit.

Otherwise LGTM.

Acting on reviewer feedback on #306: - replay --dump-fixture: reject transactions that read block hashes (BLOCKHASH). state-test does not seed env.block_hashes, so an isolated re-execution would read default hashes; a BLOCKHASH-dependent tx whose hash only affects storage/logs (not gas/status) would otherwise self-validate against the wrong roots. Guard via the executor's get_accessed_block_hashes() — the corpus has none, so no behavior change for existing fixtures. (codex P2) - replay-bench.yml: the comment-reaction endpoint needs Issues write — add issues: write to the prepare job and wrap the reaction non-fatally so it can never block the benchmark even if the permission is missing. (codex P2) - fixture.rs: correct the fidelity-gate comment — gas does not by itself imply log fidelity (same-length, different-content logs cost the same); the real pin is the logs root recorded by finalize_and_write and validated by the corpus. (claude) - AGENTS.md: document block/chain.rs as the source of truth for per-chain hardfork schedules under Core Source Layout. (claude doc-impact) Not changed: receipt-log comparison in the gate (deliberately gas+status only; cannot diverge for a deterministic replay) and an explicit Equivalence match arm (the suggestion keeps the catch-all, so it does not add exhaustiveness).

claude · 2026-06-09T09:15:22Z

LGTM. Prior reviews covered all the material areas; no new issues found on the current head. The Rex5 activation fix, deploy registry refactor, and fixture workflow all look correct.

RealiCZ · 2026-06-09T09:16:50Z

@codex re: your P2 "Check receipt logs before accepting fixtures" (fixture.rs) — not adding it. The gate stays gas + status; logs are already pinned by logs_root (recorded in finalize_and_write, re-validated by replay_corpus.rs every PR), and a deterministic replay can't diverge in logs. Closing as intentional.

chatgpt-codex-connector · 2026-06-09T09:19:03Z

Summary

Acknowledged the trigger: I did not add a receipt-log comparison gate.
The current fixture code already documents the intended design: the on-chain fidelity gate stays limited to gas and success status, while logs are not compared directly against the receipt because deterministic replay with identical pre-state and transaction cannot emit different logs; instead, finalize_and_write re-executes the isolated fixture and records the resulting logs_root into post for corpus validation. bin/mega-evme/src/replay/fixture.rsL120-L130
The isolated fixture finalization path records executed.logs_root into the dumped post expectation, which is the root that replay-corpus validation later checks. bin/mega-evme/src/replay/fixture.rsL174-L205
No code changes were needed, so I did not create a commit or PR.

Testing / Checks

✅ pwd && rg --files -g 'AGENTS.md' -g '!target' -g '!**/.git/**' | sort && git status --short --branch
✅ cat bin/mega-evme/AGENTS.md && nl -ba bin/mega-evme/src/replay/fixture.rs | sed -n '260,340p'
✅ rg -n "finalize_and_write|logs_root|megaStatus|receipt|gas" bin/mega-evme/src/replay/fixture.rs
✅ nl -ba bin/mega-evme/src/replay/fixture.rs | sed -n '1,220p'

View task →

flyq

Stale docs: system/AGENTS.md still describes per-contract hook calls in the executor (line 49) and omits deploy.rs; the BLOCKHASH dump rejection is missing from replay.md's incompatibility list (line 112).

Acting on reviewer feedback on #306 (flyq + claude): - block/chain.rs: testnet (6343) Rex5 was ForkCondition::Never, contradicting the published schedule (docs/spec/upgrades/overview.md: 1780459200) and the live testnet, so every testnet replay after Jun 3 silently ran under Rex4. Set Rex5 to Timestamp(1780459200) with the on-chain SequencerRegistryConfig seed, and fix the test that codified the wrong behavior. - replay --dump-fixture: the fidelity gate now compares the replay's logs root against the on-chain receipt (not just gas + status). LOG gas depends on topic count and data length, never content, so identical gas can hide different log payloads; finalize_and_write additionally cross-checks the isolated run's logs root against the replay, closing the L1-data-fee channel where the isolated run (which zeroes the L1 fee) could diverge while gas/status/output still match. - replay --dump-fixture: reject EIP-7702 (type-4) transactions explicitly rather than silently dropping the authorization list. - replay-bench.yml: add persist-credentials: false to the untrusted bench job's checkout, and have the write-privileged compare job re-render the table from the bench JSON (escaping the PR-controlled case names) instead of posting the PR's markdown verbatim. - system/deploy.rs: debug_assert that a SystemContractSpec's code_hash matches its code, restoring the invariant the per-contract deploy functions enforced.

…ixture

claude · 2026-06-10T07:55:43Z

LGTM on the code. No new correctness issues on the current head.

The two documentation gaps @flyq raised are still open and should be addressed before merge:

system/AGENTS.md line 49 — "WHERE TO LOOK" still says to add a hook call in block/executor.rs::pre_execution_changes; with the deploy registry refactor the correct path is to add a *_spec() builder and register it in flat_system_contract_specs() in deploy.rs.
docs/mega-evme/commands/replay.md line 113 — The --dump-fixture incompatibility list (transaction overrides, --override.spec, deposit transactions) is missing the BLOCKHASH-access rejection implemented in fixture.rs:488-500.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b1f83d6af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…test CLI

…replay-bench trigger, docs)

…ture doc accuracy)

claude · 2026-06-10T09:04:13Z

LGTM. All previously raised issues are addressed on the current head: the three documentation gaps (@flyq's stale system/AGENTS.md, missing chain.rs in block/AGENTS.md, and the BLOCKHASH rejection missing from replay.md) are all fixed. No new issues found.

claude · 2026-06-10T09:07:52Z

LGTM. All prior review threads have been addressed (Rex5 activation values, deploy registry refactor, documentation gaps in system/AGENTS.md and replay.md). No new issues on the current head.

claude · 2026-06-11T02:33:06Z

Documentation Impact

This PR adds crates/mega-evm/src/block/chain.rs as the canonical source of per-chain hardfork activation timestamps. The spec docs and tool docs are already correct, but the skill mapping tables need a new entry to cover this file.

Spec Documentation

No gaps. docs/spec/upgrades/overview.md already has the Rex5 mainnet (1780632000) and testnet (1780459200) activation timestamps, docs/mega-evme/commands/replay.md is updated in this PR with the --dump-fixture feature and the BLOCKHASH rejection, and the "currently equivalent to Rex5" sentence is corrected.

Agent / Skill Files

File	Reason
`.claude/skills/doc-impact-check/SKILL.md`	The code-to-doc mapping covers `crates/mega-evm/src/block/` but maps it only to `docs/spec/hardfork-spec.md` and `docs/spec/evm/overview.md`, not `docs/spec/upgrades/overview.md`. A future PR that only changes an activation timestamp in `chain.rs` (e.g. promoting Rex5 mainnet from `Never` to `Timestamp(1780632000)` as this PR does) will not be flagged as needing an `upgrades/overview.md` update. Add a dedicated row: `crates/mega-evm/src/block/chain.rs` — per-chain activation timestamps — `docs/spec/upgrades/overview.md`.
`.claude/skills/doc-freshness/SKILL.md`	Same gap in its code-to-doc mapping table — `block/` maps to `hardfork-spec.md` and `evm/overview.md` but not `upgrades/overview.md`. Add the same row.

These updates can be included in this PR or in a follow-up.

…re-review mega-state-test is now a published library, so CLI-era failure modes became public-API hazards. Library: no panic on unreadable suite files, unknown fork names (to_spec_id returns Result), or out-of-range nonce/fee fixture values; run() returns TestsFailed instead of process::exit(1) (CLI still exits 1); secret keys redacted from all error output and tx-build errors keep their real cause; expectException only accepts transactions invalid by construction, not structural fixture defects; expected-output validation fails when execution produced no output; currentChainID overflow is an error instead of silently running under 6342; env.blockHashes are injected into execution; --fill preserves unknown unit fields and rejects SpecName::Unknown for --bench-spec. mega-evme replay: fixture writes are tmp+rename atomic; the RPC capture file persists even when execution or the dump gate fails; the receipt anchor must belong to the replayed block (reorg guard); halted targets are anchored by exact halt reason, not just the coarse status; legacy gas-price absence is an error instead of a silent 0.

claude · 2026-06-11T03:13:48Z

LGTM. All prior review threads are resolved, flyq has approved, and no new issues on the current head.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 874de8902f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

RealiCZ added 20 commits June 9, 2026 14:48

RealiCZ requested review from Troublor, flyq and vincent-k2026 as code owners June 9, 2026 08:31

github-advanced-security AI found potential problems Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/replay-bench.yml Fixed

Comment thread .github/workflows/replay-bench.yml Fixed

Comment thread .github/workflows/replay-bench.yml Fixed

chatgpt-codex-connector Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread bin/mega-evme/src/replay/fixture.rs

Comment thread bin/mega-evme/src/replay/fixture.rs Outdated

Comment thread .github/workflows/replay-bench.yml Outdated

claude Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread bin/mega-evme/src/replay/fixture.rs Outdated

claude Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread crates/mega-state-test/src/types/spec.rs Outdated

flyq reviewed Jun 10, 2026

View reviewed changes

RealiCZ added 2 commits June 10, 2026 15:23

Merge remote-tracking branch 'origin/main' into cz/feat/replay-dump-f…

2b1f83d

…ixture

chatgpt-codex-connector Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread bin/mega-evme/src/replay/cmd.rs

Comment thread crates/mega-evm/src/block/chain.rs Outdated

Comment thread .github/workflows/replay-bench.yml Outdated

RealiCZ added 4 commits June 10, 2026 16:15

refactor: extract published mega-state-test library crate from state-…

4daeb3f

…test CLI

fix: address PR review (per-tx BLOCKHASH gate, fallback Rex5 params, …

3b931b1

…replay-bench trigger, docs)

fix: address full-diff review (CI comment injection, fill safety, fix…

138a9ac

…ture doc accuracy)

fix(state-test): list valid spec names in --bench-spec error

06a6387

github-advanced-security AI found potential problems Jun 10, 2026

View reviewed changes

Comment thread crates/mega-evm/tests/block_executor/accessed_block_hashes.rs Dismissed

Comment thread crates/mega-evm/tests/block_executor/accessed_block_hashes.rs Dismissed

Comment thread crates/mega-evm/tests/block_executor/accessed_block_hashes.rs Dismissed

style: prettier-format AGENTS.md workspace table

105a079

flyq approved these changes Jun 10, 2026

View reviewed changes

Merge branch 'main' into cz/feat/replay-dump-fixture

1a517f6

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread bin/mega-evme/src/replay/fixture.rs

Comment thread crates/mega-state-test/src/runner.rs

Comment thread crates/mega-state-test/src/runner.rs

Conversation

RealiCZ commented Jun 9, 2026

Summary

Foundation (consensus-sensitive — please review closely)

Feature

Testing

Notes / follow-ups

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Jun 9, 2026

Review

Rex5 mainnet activation

Deploy registry refactoring

Fidelity gate

state-test as a production dependency

Everything else

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Jun 9, 2026

Review

Uh oh!

claude Bot commented Jun 9, 2026

Uh oh!

RealiCZ commented Jun 9, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 9, 2026

Summary

Uh oh!

flyq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Jun 10, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Jun 10, 2026

Uh oh!

claude Bot commented Jun 10, 2026

Uh oh!

claude Bot commented Jun 11, 2026

Documentation Impact

Spec Documentation

Agent / Skill Files

Uh oh!

claude Bot commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 9, 2026 •

edited

Loading

`state-test` as a production dependency