Skip to content

feat(replay): self-validating fixtures + throughput benchmark (Rex5 mainnet, deploy registry)#306

Open
RealiCZ wants to merge 31 commits into
mainfrom
cz/feat/replay-dump-fixture
Open

feat(replay): self-validating fixtures + throughput benchmark (Rex5 mainnet, deploy registry)#306
RealiCZ wants to merge 31 commits into
mainfrom
cz/feat/replay-dump-fixture

Conversation

@RealiCZ

@RealiCZ RealiCZ commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a self-validating replay-fixture workflow and a replay-throughput benchmark to mega-evme, plus the foundation to make them correct: Rex5 mainnet activation, a registry-driven system-contract deploy refactor, and a unified hardfork schedule.

mega-evme replay --dump-fixture <FILE> turns one replayed mainnet transaction into a self-contained EEST state-test fixture (pre-state + tx + env + megaEnv + post roots), written only if the local replay reproduces the on-chain receipt's gas and status. A single committed corpus of such fixtures is used two ways — validated for correctness on every PR, and timed for throughput (base-vs-PR) on demand — making it the real-transaction counterpart to the synthetic Criterion suite.

Foundation (consensus-sensitive — please review closely)

  • Rex5 mainnet activation — the mainnet hardfork config had Rex5 as Never, so replaying post-Rex5 mainnet blocks ran under Rex4 and produced wrong gas. Set Rex5 to Timestamp(1780632000) with the mainnet SequencerRegistryConfig. This is the only behavior change: it fixes when Rex5 activates on mainnet; no spec's semantics change (all specs stay frozen).
  • Unified hardfork schedule (block/chain.rs) — one source for the mainnet (4326) / testnet (6343) activation timestamps and hardfork_schedule.
  • Registry-driven system-contract deploy (system/deploy.rs) — a declarative SystemContractSpec + a shared transact_deploy replace five near-identical deploy functions; each contract exposes one *_spec builder as the single source of its gate / bytecode-version / force-create logic. Behavior-preserving: deploy order and per-spec semantics are byte-identical, and SequencerRegistry stays a separate seeded deploy.

Feature

  • mega-evme replay --dump-fixture — dump a self-validating EEST TestUnit. An on-chain fidelity gate (gas + success status) rejects the dump unless the replay reproduces the receipt, so a wrong spec/config can't slip through. Fixtures are byte-reproducible (ordered pre / storage / megaEnv) and reject transaction overrides and --override.spec.
  • state-test gains two modes--bench times a fixture's isolated EVM transact (min/median/mean + Mgas/s, JSON) and is the single throughput-benchmark surface; --fill computes and writes a fixture's post in place (the offline analog of the dump's post-fill, for fixtures with no on-chain origin).
  • One corpus, two uses (bench/replay/fixtures/) — five self-contained fixtures (system-contract call, plain call, log-heavy, large DeFi, and the 142M-gas attack deploy from feat(bench): add attack_replay benchmark #299): validated per-PR by replay_corpus.rs and timed by an ABBA base-vs-PR driver (run.py) in a new replay-bench.yml workflow.
  • Criterion CI — add the existing attack_replay (feat(bench): add attack_replay benchmark #299) bench to the compared targets so its limit-tracker regression target is actually observed.
  • state-test schema/runnerpre and account storage are now BTreeMap (byte-stable fixtures); a symmetric to serializer so CREATE transactions round-trip (None""); added megaEnv and megaGasUsed/megaStatus expectations; the production SALT hasher moved into mega-evm behind a test-utils feature so fixtures replay with mainnet bucket pricing. mega-t8n updated for the storage type.

Testing

  • mega-evm / state-test / mega-t8n suites pass; clippy, fmt, cargo sort, prettier, and the no_std/riscv check are clean.
  • New tests: dump round-trip + tamper detection, per-PR corpus self-validation, end-to-end --dump-fixture CLI write+validate, byte-reproducibility, the dump override/spec guards, and state-test --bench / --fill of a fixture.
  • Verified end-to-end against mainnet RPC: replay gas/status match on-chain receipts; dumped fixtures self-validate offline; the attack fixture reproduces the source's exact 141,927,106 gas under Rex5.

Notes / follow-ups

  • The replay benchmark complements (does not replace) the Criterion suite; sub-10µs cases are noisy on shared runners, so read the larger cases and the overall trend.
  • More corpus cases can be added incrementally — each is one self-contained JSON plus a manifest line (a mined tx via --dump-fixture, a non-on-chain case via --fill).

RealiCZ added 20 commits June 9, 2026 14:48
…hput

Add `mega-evme replay --dump-fixture <FILE>` to produce a self-validating
state-test fixture in a single command, and `--bench-runs`/`--bench-warmup`
to measure EVM throughput (Mgas/s) in isolation.

Unify replay fixtures onto the state-test TestUnit schema: optional megaEnv
carrier (SALT bucket capacities + oracle storage), explicit megaGasUsed/
megaStatus expectations, and SpecName Rex variants. Promote the SALT bucket
hasher into mega-evm (test-utils) so replay and state-test share it.
mega-evme replay capped mainnet at Rex4 (Rex5 was ForkCondition::Never),
so current mainnet transactions replayed under the wrong spec and produced
gas that diverged from on-chain receipts. Activate Rex5 at the mainnet
timestamp (1780632000) and attach the SequencerRegistryConfig required by
the Rex5 block executor; the seeded sequencer/admin are the on-chain
SequencerRegistry values (the live system address is read from forked
state at replay time).
A dumped fixture is validated under the same spec it was dumped with, so
self-validation cannot detect a wrong spec / hardfork config. Before building
a fixture, fetch the transaction's on-chain receipt and require the local
replay to reproduce its gasUsed exactly; abort the dump (no file written) on
mismatch. This makes --dump-fixture / --bench-runs faithful by construction
and would catch a misconfigured hardfork schedule loudly.
…d primitive

Every predeploy was deployed by the same 5-step state patch, duplicated across
six transact_deploy_* functions and re-implemented again in mega-evme. Extract
a declarative SystemContractSpec + one transact_deploy primitive that all
wrappers delegate to, and a canonical flat_system_contract_specs registry that
both the block executor and mega-evme iterate instead of hardcoding the
per-contract gate and bytecode selection. The SequencerRegistry keeps its
config-seeded, guarded wrapper but shares the same primitive.

Also moves the per-chain hardfork schedules (real mainnet/testnet timestamps +
SequencerRegistryConfig) from the mega-evme bin into mega-evm
(block::chain::hardfork_schedule) as a single exportable source of truth; the
replay command now delegates to it.

Behavior is byte-identical for all frozen specs (969 mega-evm tests pass;
replay gas still matches on-chain receipts exactly).
Commit three self-validating EEST fixtures dumped from real MegaETH mainnet
transactions (mega-evme replay --dump-fixture; gas verified against on-chain
receipts at dump time): a plain call, an Oracle system-contract call, and a
large 726k-gas call.

Add the replay_corpus integration test (runs under cargo test --workspace) and
a dedicated Replay Fixtures workflow that re-executes the corpus through
state-test offline — a deterministic per-PR guard that fails if a code change
alters gas/status/post-state for the covered spec. Rex3/Rex4 coverage to follow
once an archive RPC is available (the current endpoint prunes historical state).
The per-spec max-blobs config was duplicated verbatim between execute_test_suite
and run_unit_once (the dump/timing path). Extract a single configure_max_blobs
helper both call, so the validation and dump paths cannot silently drift.
…_all_activated

Address review: the deploy registry duplicated each contract's gate and
bytecode-version selection (notably the Oracle v1.0.0/v1.1.0/v2.0.0 +
force-create logic), and after the registry refactor the per-contract
transact_deploy_* wrappers became test-only — so the substantial deploy tests
exercised the wrapper copy, not the production registry path.

Extract one *_spec(hardforks, ts) -> Option<SystemContractSpec> builder per
contract as the single source; the wrappers delegate to it and
flat_system_contract_specs composes the builders. The existing deploy tests now
cover the production path, and the version/gate logic lives in one place.

Also reuse MegaHardforkConfig::with_all_activated() in all_activated_hardforks()
instead of re-listing all nine forks.
…ia ReplayObservation

Address review of the fixture/replay path:
- Reject --dump-fixture/--bench-runs + tx overrides at the start of run(),
  before forking state, fetching the receipt, and replaying preceding txs.
- Make state-test's execution_status public and derive the fixture's gas,
  status, and output from a single ExecutionResult (new ReplayObservation),
  so the dumped status string can no longer drift from the validation side.
  Drops build_draft's allow(too_many_arguments).
- Fold the dump megaEnv snapshot and on-chain receipt gas into one Option, so
  the two are computed together (no implicit "both Some" invariant / expect).
- Guard run_bench against zero runs.
…offline

Previously --json printed the replay summary as one JSON object and the
benchmark stats as a second — breaking any consumer that parses stdout as a
single value. Compute the benchmark before rendering and fold it into the
summary object under a 'bench' field (text mode still appends a block).

Add offline integration tests (committed RPC capture incl. receipt) asserting
the output is a single JSON document with the bench field, and that
--bench-runs/--dump-fixture reject transaction overrides.
…ec; test dump CLI

Address review of the fixture fidelity model:
- Strengthen the on-chain fidelity gate from gas-only to gas + success status.
  A replay with coincidentally-matching gas but a different status no longer
  produces a self-consistent fixture. Logs are not compared separately: LOG
  opcodes are metered, so differing logs would already change gas.
- Reject --override.spec with --dump-fixture/--bench-runs (alongside the existing
  transaction-override rejection): a forced spec would record a what-if, not the
  on-chain transaction.
- Add offline integration tests for the end-to-end --dump-fixture CLI write path
  (validated through the state-test runner) and the --override.spec rejection.
- Update help text and docs to state the gate covers gas and status, and the
  spec-override incompatibility.
…age/megaEnv)

A dumped fixture's pre-state map, per-account storage, and megaEnv bucket/oracle
lists came from hash-map / unordered iteration, so two dumps of the same
transaction produced byte-different files (an online dump never matched its
offline re-dump). Validation is content-based so this was only cosmetic, but it
made the committed corpus non-reproducible and re-dumps a giant noisy diff.

- state-test: TestUnit.pre and AccountInfo.storage are now BTreeMap, so they
  serialize in a deterministic (address / slot) order. All consumers insert into
  a DB or look up by key, so ordering is irrelevant to execution.
- mega-evme: build_pre_state builds a BTreeMap; the dump sorts the megaEnv
  bucket-capacity and oracle-storage lists.
- Re-normalized the three committed Rex5 corpus fixtures (re-dumped; state root,
  gas, and status unchanged — only key order).

Verified: an online dump and two offline re-dumps of the same tx are now
byte-identical, and the corpus still self-validates.
…orpus

Adds the real-transaction counterpart to the synthetic Criterion suite: replay a
corpus of characteristic mainnet transactions offline and compare PR throughput
against the merge-base, so a change that speeds up or slows down real execution
is visible on the PR.

- bench/replay/captures/: four committed offline RPC captures (system-contract
  call, plain medium call, log-heavy, large multi-call DeFi) spanning ~2k–20k
  Mgas/s, each carrying its on-chain receipt.
- bench/replay/manifest.json: corpus description (tx, expected_gas, category).
- bench/replay/run.py: driver that replays each case via 'replay --bench-runs
  --json' against one or two binaries, ABBA-interleaved to cancel machine drift,
  verifies replayed gas matches the corpus, and emits a markdown Δ% table + JSON
  with a regression exit code (no third-party deps).
- .github/workflows/replay-bench.yml: weekly / dispatch / member /benchmark-replay
  comment; builds PR + merge-base binaries, runs the driver, posts the table.
  Feature-detects --bench-runs on the baseline and falls back to measure-only
  (the baseline that predates this feature cannot be compared).
- Docs: bench/replay/README.md + a pointer from the replay command docs.

Verified locally end-to-end: measure mode, base-vs-PR compare (same binary reads
as noise, 0 regressions), and the --fail-on-regression exit path.
The attack_replay bench (#299) replays a real mainnet attack contract deployment
and is the hermetic regression target for limit-tracker / hot-path work (the 30×
EQUIVALENCE→MINI_REX gap from AdditionalLimit accounting, e.g. caching net_usage
in FrameLimitTracker). It existed as a bench target but was never in the
base-vs-PR comparison set, so regressions on it went unobserved. Add it to
BENCH_TARGETS so the Criterion comparison covers it.

This is the right home for it: it is a prestate-snapshot replay of a *simulated*
CREATE (no mined tx / receipt; its block is pruned), so it cannot be a
replay-bench corpus case, and its cross-spec + vanilla-revm diagnostic is one the
replay track (production spec, mega-vs-mega) cannot reproduce.
…e cases

The replay-bench corpus could only hold real mined transactions (RPC captures).
Attack/edge cases like attack_replay (#299) come from a prestateTracer snapshot,
not a mined tx, and their block is pruned — so they need an archive node to
capture, or cannot be captured at all. But they are self-contained (prestate +
tx + env), so they need no network to *replay*.

Add a second corpus input type so any conformant fixture JSON can be benchmarked:
- state-test: 'state-test --bench' times a fixture's isolated EVM execution
  (reusing time_unit_execution) and emits the same {gas_used, success, bench{…}}
  shape as 'mega-evme replay --bench-runs', so the driver parses both identically.
  Adds bench_test_suite + UnitBench; SpecName is now Copy.
- driver: a manifest case may be 'type: fixture' (state-test) or 'type: capture'
  (mega-evme, default); the driver resolves state-test next to mega-evme.
- corpus: fixtures/attack_deploy.json — the #299 mainnet attack deploy converted
  from its prestate snapshot to a fixture. Reproduces the source's exact
  141,927,106 gas under Rex5, so this non-on-chain, 142M-gas limit-tracker stress
  case is now tracked for regressions with no archive node.

No archive node needed: a conformant JSON is enough.
…ures

A dumped fixture already carries its prestate, so it is self-contained — and
mega-evme replay --bench-runs and state-test --bench time the same execution
(both call time_unit_execution on the unit). The capture cases were therefore
redundant: the only difference was building the unit from an RPC cache vs loading
it from a committed JSON.

Make the corpus uniform — one format (state-test fixture), one tool (state-test
--bench):
- Dump the four mined-tx cases to self-contained fixtures (smaller than the
  captures they replace) and drop bench/replay/captures/.
- manifest: all cases are now type: fixture.
- driver/CI: --bin is a build directory holding mega-evme + state-test; CI builds
  both per version and feature-detects state-test --bench (measure-only fallback
  when the baseline predates it).
- README: document the self-contained TestUnit as the unit of the benchmark.

The 'capture' input type stays in the driver for ad-hoc use, but the committed
corpus is fixtures only: a characteristic workload = one conformant JSON, with
real mined txs and non-on-chain cases (attack snapshots) on the same path.
Review pass over the branch surfaced patch-cruft to remove:
- bench/replay driver + CI: drop the dead 'capture' input path. The committed
  corpus is fixtures-only and ad-hoc single-tx benching is 'mega-evme replay
  --bench-runs' directly, so the driver is now state-test-only (simpler
  resolve, no cross-consumer JSON-shape drift). CI builds only state-test.
- run.py: fix the stale module docstring (it described RPC captures).
- state-test: remove the unused UnitBench::spec field.
- mega-evme: remove the dead ReplayOutcome::original_tx field and the
  struct-wide #[allow(dead_code)] it was masking.
- state-test: comment configure_max_blobs to record that OSAKA's 6 is distinct
  from the pre-PRAGUE 6 (so the OSAKA arm is not 'simplified' away).
The two fixture sets overlapped: crates/state-test/tests/fixtures/replay/rex5_*
(correctness, validated per-PR) and bench/replay/fixtures/* (performance, timed)
were the same EEST format with near-duplicate Rex5 mainnet workloads, and the
correctness set was even validated twice (replay_corpus.rs in 'cargo test' AND a
dedicated replay-fixtures.yml workflow doing the same thing).

Collapse to a single corpus under bench/replay/fixtures/, used both ways:
- replay_corpus.rs now globs bench/replay/fixtures/ and validates each fixture's
  recorded post-state every PR (the four dumped mined-tx fixtures carry post;
  attack_deploy is bench-only with empty post, naturally skipped).
- run.py keeps timing the same corpus on demand.
- Delete the duplicate crates/state-test/tests/fixtures/replay/ (rex5_* + README)
  and the redundant replay-fixtures.yml (cargo test already covers it).
- Repoint bench_fixture.rs; update the README to document the one-corpus model.

A fixture added for either purpose now serves both.
…orpus)

attack_deploy was bench-only (empty post) because it was hand-converted from a
prestateTracer snapshot and never had its post-state roots computed. Add the
missing piece so the corpus is fully uniform — every fixture validated + benched.

- state-test --fill: compute and write each fixture's post in place, reusing the
  same execute_unit_collect + Test::for_dump the dump path uses. The offline
  analog of --dump-fixture's post-fill, for fixtures with no on-chain origin.
- Fix a latent serialization bug: TransactionParts.to deserialized "" -> None
  (CREATE) but serialized None -> null, which the deserializer then rejects — so
  re-serializing any CREATE (via --fill or --dump-fixture) produced an unreadable
  fixture. Add a symmetric serialize_maybe_empty (None -> "").
- Fill attack_deploy's post under Rex5 (state root + logs root + 141,927,106 gas
  + success); replay_corpus now validates it like the others (5/5).
- Update manifest/README/test docs: attack_deploy is no longer bench-only.

Only attack_deploy.json changes; the four CALL fixtures are byte-identical (Some
address serialization is unchanged).
The crate README was a stub describing only the revme fork; the new --bench and
--fill CLI modes were documented only in --help and bench/replay/README. Add a
Modes section to the state-test README covering all three with their usage
scenarios, so --fill's purpose (offline post-fill for non-on-chain fixtures) is
discoverable from the tool's own docs.
…face

Benchmarking was implemented twice — mega-evme replay --bench-runs and
state-test --bench — both wrapping the same time_unit_execution and emitting the
same JSON, with their own stats struct, JSON serializer, and timing loop. After
the corpus unified on self-contained fixtures, the replay-bench driver only used
state-test --bench, leaving --bench-runs a redundant convenience.

Collapse benchmarking to state-test --bench:
- mega-evme: drop --bench-runs / --bench-warmup, BenchStats, run_bench, the
  cmd.rs bench JSON/text rendering, and the bench branch of output_results.
  replay's job is now replay + --dump-fixture only.
- The up-front override/spec guards now apply to --dump-fixture (unchanged
  behavior; --bench-runs no longer exists).
- Tests: rename replay_bench.rs -> replay_dump.rs (it only tests --dump-fixture
  now); drop the two bench-output tests; repoint the tx-override guard test to
  --dump-fixture. Rename the capture replay_offline_bench.cache.json ->
  replay_offline.cache.json.
- Docs: scrub mega-evme --bench-runs from replay.md, both READMEs, and code
  comments; point throughput benchmarking at 'dump + state-test --bench'.

Ad-hoc single-tx benching is now: replay --dump-fixture then state-test --bench.
Comment thread .github/workflows/replay-bench.yml Fixed
Comment thread .github/workflows/replay-bench.yml Fixed
Comment thread .github/workflows/replay-bench.yml Fixed
@RealiCZ RealiCZ added comp:mega-evme Changes to the `mega-evme` tool comp:core Changes to the `mega-evm` core crate spec:unchanged No change to any `mega-evm`'s behavior comp:misc Changes to the miscellaneous part of this repo api:unchanged No change to the public interface or API dependencies Pull requests that update a dependency file labels Jun 9, 2026
@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.8%. Comparing base (e1f4f68) to head (874de89).

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…orkflow

CodeQL flagged the workflow (Untrusted Checkout TOCTOU): a single job held a
pull-requests: write token AND did gh pr checkout + built/ran the PR's
(untrusted, possibly fork) code, so a malicious /benchmark-replay PR could
exfiltrate the write-scoped token.

Adopt the same hardened split benchmark.yml uses (#305):
- prepare (write token): resolves feature/baseline SHAs via the API only — no
  checkout, never materializes PR code — and reacts to the comment.
- bench (read-only: contents+pull-requests read): checks out, builds, and runs
  the PR code at pinned SHAs; no write-scoped token to abuse. Writes the report
  to the job summary and uploads it as an artifact.
- compare (write token): downloads the report artifact and posts the PR comment;
  never touches PR code.

Behavior is unchanged; only the trust boundary moved.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 58fc1e3e95

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread bin/mega-evme/src/replay/fixture.rs
Comment thread bin/mega-evme/src/replay/fixture.rs Outdated
Comment thread .github/workflows/replay-bench.yml Outdated
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review

Reviewed the consensus-sensitive changes, the deploy registry refactor, fidelity gate logic, and the tooling additions. No correctness bugs found.

Rex5 mainnet activation

The only consensus-sensitive change is chain.rs:33: ForkCondition::Never → Timestamp(1780632000) for mainnet Rex5, plus the SequencerRegistryConfig seeding addresses. The boundary tests in chain.rs correctly verify the spec transition at that timestamp. One ask: could you confirm that 1780632000, rex5_initial_sequencer, and rex5_initial_admin were read directly from the on-chain SequencerRegistry and the MegaETH published upgrade schedule? These are the only values I cannot verify locally, and they silently produce wrong execution if wrong.

Deploy registry refactoring

flat_system_contract_specs emits exactly the old per-contract deploy order (Oracle → HPTimestamp → KeylessDeploy → AccessControl → LimitControl), with the Oracle's force_create_on_upgrade flag set correctly: true pre-Rex5 (matches the old transact_deploy_oracle_contract that called mark_created() on upgrade), false on Rex5+ (preserves storage). SequencerRegistry is correctly excluded. Behavior-identical with the old imperative code.

Minor efficiency note on transact_deploy_sequencer_registry: it calls db.load_cache_account for the guard check, then crate::transact_deploy calls it again. The second call is a cache hit so it is benign, but the guard could return early on a code-hash match to avoid the fall-through.

Fidelity gate

The two-level design is sound: (1) on-chain receipt gas+status guards against a wrong hardfork config; (2) isolated state-test re-execution cross-checks gas+status+output to verify the captured pre-state is complete. The post logs root from the isolated run provides the hard log fidelity guarantee.

state-test as a production dependency

state-test is now in [dependencies] (not [dev-dependencies]) in mega-evme/Cargo.toml because finalize_and_write calls execute_unit_collect from production code. Logically correct — the dump-fixture workflow needs the runner at runtime. Since mega-evm [test-utils] was already a production dep of mega-evme (for the SALT hasher), this extends an existing pattern. Worth noting: anyone building a minimal mega-evme gets the full state-test runner in their binary. A dump-fixture Cargo feature could gate this if binary size ever becomes a concern, but not a blocker.

Everything else

  • No gas leakage pitfalls: transact_deploy operates at block pre-execution level, not inside any EVM call frame.
  • No stable-spec behavior changes: all spec-specific paths remain spec-gated.
  • BTreeMap for pre and storage: byte-reproducible fixtures confirmed by the new byte-reproducibility test.
  • CREATE to round-trip (None""): serialize_maybe_empty / deserialize_maybe_empty are symmetric.
  • no_std compliance: deploy.rs follows the alloc as std pattern correctly.
  • Workflow: cancel-in-progress: true is safe for an opt-in benchmark job.

Overall: logic is correct, tests are thorough, and the Rex5 activation fix is well-motivated. The one ask is confirmation of the three externally-sourced values above.

Comment thread bin/mega-evme/src/replay/fixture.rs Outdated
Comment thread crates/mega-state-test/src/types/spec.rs Outdated
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review

Consensus-sensitive changes (Rex5 activation + deploy refactor) — reviewed closely; no issues found.

  • Rex5 mainnet timestamp (1780632000) and SequencerRegistryConfig addresses match the on-chain registry values confirmed in the PR description.
  • Deploy refactor: behavior-preserving across all five contracts. Deploy order is identical, Oracle's force_create_on_upgrade pre/post-Rex5 split is correctly modelled in SystemContractSpec, and the SequencerRegistry's foreign-code-hash guard is preserved (the guard fires before transact_deploy is reached). The executor loop pushes the same outcomes as the old per-contract sequence.
  • block/chain.rs canonical schedule is clean; tests cover the spec boundary at 1780631999 vs 1780632000.

Feature (dump-fixture + bench/fill) — design is sound.

  • Two-phase fidelity gate (on-chain receipt + isolated re-execution cross-check) correctly prevents a misconfigured spec from producing a silently wrong fixture.
  • Pre-state closure (build_pre_state): called before commit_transaction_outcome, so db_ref() still reflects the pre-target-TX cache; zero-value slot omission is correct (unset storage defaults to zero, and within-TX warmup is replayed identically).
  • AHashBucketHasher is correctly threaded through external_envs_for so bucket IDs in the fixture match those recorded during replay.

CI workflow (replay-bench.yml) — security design is correct. The bench job runs untrusted code read-only; prepare/compare hold write scope but never check out PR code. The issue_comment trigger is properly gated on MEMBER/COLLABORATOR/OWNER association.

One minor inline note on SpecName::to_spec_id() — the new Equivalence variant falls through the _ => MegaSpecId::EQUIVALENCE catch-all. Works correctly today but could silently absorb a future MegaETH variant. Suggested making it explicit.

Otherwise LGTM.

Acting on reviewer feedback on #306:

- replay --dump-fixture: reject transactions that read block hashes (BLOCKHASH).
  state-test does not seed env.block_hashes, so an isolated re-execution would
  read default hashes; a BLOCKHASH-dependent tx whose hash only affects
  storage/logs (not gas/status) would otherwise self-validate against the wrong
  roots. Guard via the executor's get_accessed_block_hashes() — the corpus has
  none, so no behavior change for existing fixtures. (codex P2)

- replay-bench.yml: the comment-reaction endpoint needs Issues write — add
  issues: write to the prepare job and wrap the reaction non-fatally so it can
  never block the benchmark even if the permission is missing. (codex P2)

- fixture.rs: correct the fidelity-gate comment — gas does not by itself imply
  log fidelity (same-length, different-content logs cost the same); the real
  pin is the logs root recorded by finalize_and_write and validated by the
  corpus. (claude)

- AGENTS.md: document block/chain.rs as the source of truth for per-chain
  hardfork schedules under Core Source Layout. (claude doc-impact)

Not changed: receipt-log comparison in the gate (deliberately gas+status only;
cannot diverge for a deterministic replay) and an explicit Equivalence match arm
(the suggestion keeps the catch-all, so it does not add exhaustiveness).
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

LGTM. Prior reviews covered all the material areas; no new issues found on the current head. The Rex5 activation fix, deploy registry refactor, and fixture workflow all look correct.

@RealiCZ

RealiCZ commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

@codex re: your P2 "Check receipt logs before accepting fixtures" (fixture.rs) — not adding it. The gate stays gas + status; logs are already pinned by logs_root (recorded in finalize_and_write, re-validated by replay_corpus.rs every PR), and a deterministic replay can't diverge in logs. Closing as intentional.

@chatgpt-codex-connector

Copy link
Copy Markdown

Summary

  • Acknowledged the trigger: I did not add a receipt-log comparison gate.
  • The current fixture code already documents the intended design: the on-chain fidelity gate stays limited to gas and success status, while logs are not compared directly against the receipt because deterministic replay with identical pre-state and transaction cannot emit different logs; instead, finalize_and_write re-executes the isolated fixture and records the resulting logs_root into post for corpus validation. bin/mega-evme/src/replay/fixture.rsL120-L130
  • The isolated fixture finalization path records executed.logs_root into the dumped post expectation, which is the root that replay-corpus validation later checks. bin/mega-evme/src/replay/fixture.rsL174-L205
  • No code changes were needed, so I did not create a commit or PR.

Testing / Checks

  • pwd && rg --files -g 'AGENTS.md' -g '!target' -g '!**/.git/**' | sort && git status --short --branch
  • cat bin/mega-evme/AGENTS.md && nl -ba bin/mega-evme/src/replay/fixture.rs | sed -n '260,340p'
  • rg -n "finalize_and_write|logs_root|megaStatus|receipt|gas" bin/mega-evme/src/replay/fixture.rs
  • nl -ba bin/mega-evme/src/replay/fixture.rs | sed -n '1,220p'

View task →

@flyq flyq left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Stale docs: system/AGENTS.md still describes per-contract hook calls in the executor (line 49) and omits deploy.rs; the BLOCKHASH dump rejection is missing from replay.md's incompatibility list (line 112).

Comment thread crates/mega-evm/src/block/chain.rs Outdated
Comment thread bin/mega-evme/Cargo.toml Outdated
Comment thread bin/mega-evme/src/replay/fixture.rs
Comment thread .github/workflows/replay-bench.yml Outdated
Comment thread bin/mega-evme/src/replay/fixture.rs
Comment thread crates/mega-evm/src/system/deploy.rs
RealiCZ added 2 commits June 10, 2026 15:23
Acting on reviewer feedback on #306 (flyq + claude):

- block/chain.rs: testnet (6343) Rex5 was ForkCondition::Never, contradicting
  the published schedule (docs/spec/upgrades/overview.md: 1780459200) and the
  live testnet, so every testnet replay after Jun 3 silently ran under Rex4. Set
  Rex5 to Timestamp(1780459200) with the on-chain SequencerRegistryConfig seed,
  and fix the test that codified the wrong behavior.

- replay --dump-fixture: the fidelity gate now compares the replay's logs root
  against the on-chain receipt (not just gas + status). LOG gas depends on topic
  count and data length, never content, so identical gas can hide different log
  payloads; finalize_and_write additionally cross-checks the isolated run's logs
  root against the replay, closing the L1-data-fee channel where the isolated run
  (which zeroes the L1 fee) could diverge while gas/status/output still match.

- replay --dump-fixture: reject EIP-7702 (type-4) transactions explicitly rather
  than silently dropping the authorization list.

- replay-bench.yml: add persist-credentials: false to the untrusted bench job's
  checkout, and have the write-privileged compare job re-render the table from
  the bench JSON (escaping the PR-controlled case names) instead of posting the
  PR's markdown verbatim.

- system/deploy.rs: debug_assert that a SystemContractSpec's code_hash matches
  its code, restoring the invariant the per-contract deploy functions enforced.
@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

LGTM on the code. No new correctness issues on the current head.

The two documentation gaps @flyq raised are still open and should be addressed before merge:

  1. system/AGENTS.md line 49 — "WHERE TO LOOK" still says to add a hook call in block/executor.rs::pre_execution_changes; with the deploy registry refactor the correct path is to add a *_spec() builder and register it in flat_system_contract_specs() in deploy.rs.
  2. docs/mega-evme/commands/replay.md line 113 — The --dump-fixture incompatibility list (transaction overrides, --override.spec, deposit transactions) is missing the BLOCKHASH-access rejection implemented in fixture.rs:488-500.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b1f83d6af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread bin/mega-evme/src/replay/cmd.rs
Comment thread crates/mega-evm/src/block/chain.rs Outdated
Comment thread .github/workflows/replay-bench.yml Outdated
Comment thread crates/mega-evm/tests/block_executor/accessed_block_hashes.rs Dismissed
Comment thread crates/mega-evm/tests/block_executor/accessed_block_hashes.rs Dismissed
Comment thread crates/mega-evm/tests/block_executor/accessed_block_hashes.rs Dismissed
@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

LGTM. All previously raised issues are addressed on the current head: the three documentation gaps (@flyq's stale system/AGENTS.md, missing chain.rs in block/AGENTS.md, and the BLOCKHASH rejection missing from replay.md) are all fixed. No new issues found.

@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

LGTM. All prior review threads have been addressed (Rex5 activation values, deploy registry refactor, documentation gaps in system/AGENTS.md and replay.md). No new issues on the current head.

@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Documentation Impact

This PR adds crates/mega-evm/src/block/chain.rs as the canonical source of per-chain hardfork activation timestamps. The spec docs and tool docs are already correct, but the skill mapping tables need a new entry to cover this file.

Spec Documentation

No gaps. docs/spec/upgrades/overview.md already has the Rex5 mainnet (1780632000) and testnet (1780459200) activation timestamps, docs/mega-evme/commands/replay.md is updated in this PR with the --dump-fixture feature and the BLOCKHASH rejection, and the "currently equivalent to Rex5" sentence is corrected.

Agent / Skill Files

File Reason
.claude/skills/doc-impact-check/SKILL.md The code-to-doc mapping covers crates/mega-evm/src/block/ but maps it only to docs/spec/hardfork-spec.md and docs/spec/evm/overview.md, not docs/spec/upgrades/overview.md. A future PR that only changes an activation timestamp in chain.rs (e.g. promoting Rex5 mainnet from Never to Timestamp(1780632000) as this PR does) will not be flagged as needing an upgrades/overview.md update. Add a dedicated row: crates/mega-evm/src/block/chain.rs — per-chain activation timestamps — docs/spec/upgrades/overview.md.
.claude/skills/doc-freshness/SKILL.md Same gap in its code-to-doc mapping table — block/ maps to hardfork-spec.md and evm/overview.md but not upgrades/overview.md. Add the same row.

These updates can be included in this PR or in a follow-up.

…re-review

mega-state-test is now a published library, so CLI-era failure modes became public-API hazards. Library: no panic on unreadable suite files, unknown fork names (to_spec_id returns Result), or out-of-range nonce/fee fixture values; run() returns TestsFailed instead of process::exit(1) (CLI still exits 1); secret keys redacted from all error output and tx-build errors keep their real cause; expectException only accepts transactions invalid by construction, not structural fixture defects; expected-output validation fails when execution produced no output; currentChainID overflow is an error instead of silently running under 6342; env.blockHashes are injected into execution; --fill preserves unknown unit fields and rejects SpecName::Unknown for --bench-spec.

mega-evme replay: fixture writes are tmp+rename atomic; the RPC capture file persists even when execution or the dump gate fails; the receipt anchor must belong to the replayed block (reorg guard); halted targets are anchored by exact halt reason, not just the coarse status; legacy gas-price absence is an error instead of a silent 0.
@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

LGTM. All prior review threads are resolved, flyq has approved, and no new issues on the current head.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 874de8902f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread bin/mega-evme/src/replay/fixture.rs
Comment thread crates/mega-state-test/src/runner.rs
Comment thread crates/mega-state-test/src/runner.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api:compatible Only new interface or API is introduced. Existing software is compatible. comp:core Changes to the `mega-evm` core crate comp:doc Changes in the documentation comp:mega-evme Changes to the `mega-evme` tool comp:misc Changes to the miscellaneous part of this repo dependencies Pull requests that update a dependency file rust Pull requests that update rust code spec:stable Touches stable spec code — must not change behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants