chore: adopt powdr's staged APC build + split GenerateConfig/SelectConfig#2830
Open
georgwiese wants to merge 3 commits into
Open
chore: adopt powdr's staged APC build + split GenerateConfig/SelectConfig#2830georgwiese wants to merge 3 commits into
georgwiese wants to merge 3 commits into
Conversation
powdr-labs/powdr#3752 removes `PgoAdapter::filter_blocks_and_create_apcs_with_pgo` and exposes `detect_blocks` + `generate_apcs` + `select_apcs` instead, so the build cache survives changes to `autoprecompiles` / `skip`. Inline the three steps here. Behavior preserved by setting `apc_candidates = autoprecompiles + skip` when unset — same final selection as the old fused call. Requires bumping the powdr pin in Cargo.toml once #3752 lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
georgwiese
added a commit
to powdr-labs/powdr
that referenced
this pull request
Jun 2, 2026
PR #3763 changes the powdr autoprecompile API. The pre-existing pins target sp1 and openvm-eth commits that use the old API, so CI fails even though my refactor itself is correct. - sp1-benchmarks/Cargo.toml: bump sp1 dep branch from powdr-labs/generate-apcs-api → powdr-labs/split-powdr-config (succinctlabs/sp1#2830 — adopts the new powdr API). - .github/actions/patch-openvm-eth: bump ref from 05ed9cd3 (main) → e716b51a (powdr-labs/openvm-eth#8 tip, adopts the new powdr API). The "Verify openvm-eth ref is on main" check will (intentionally) fail until openvm-eth#8 merges — the comment in pr-tests-with-secrets.yml already accounts for this case, and the actual reth test runs against the compatible code.
987b65d to
676a63c
Compare
georgwiese
added a commit
to powdr-labs/powdr
that referenced
this pull request
Jun 3, 2026
The sp1 adoption (succinctlabs/sp1#2830, branch powdr-labs/split-powdr-config) was slimmed to the irreducible GenerateConfig/SelectConfig migration — the staged artifact cache was dropped (no sp1 consumer used it). This empty commit re-runs CI so sp1-benchmarks builds against the new sp1 tip 676a63c8. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
powdr#3763 replaces `PowdrConfig` with per-stage `GenerateConfig` + `SelectConfig` and renames the `PgoConfig` enum to `PgoData`. Adopt it: - `sp1_configs(apc, skip, pgo) -> (GenerateConfig, SelectConfig)` builds the pair and resolves the `apc_candidates` cap up front via `GenerateConfig::with_select_defaults` (which needs the PGO mode). The returned configs are complete, so there's a single, obvious owner of the cap-default policy and no separate step for callers to remember. - `CompiledProgram::new(elf, generate, select, pgo_data)` takes the halves separately and no longer applies any defaulting. - The free `select_apcs` collapses to `select_apcs(ranked, select)`. - Examples, the sdk test, the gpu perf bin, and the autoprecompiles CLI all get the matching shape. - Bump the powdr pin to `be183c60e`. Deliberately does NOT adopt powdr's staged artifact cache: every sp1 consumer builds APCs once per config via the uncached `CompiledProgram::new`, so only the dev CLI would have benefited. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
676a63c to
5448340
Compare
github-merge-queue Bot
pushed a commit
to powdr-labs/powdr
that referenced
this pull request
Jun 9, 2026
## Summary Building on #3750, this PR splits the `select-apcs` command into: - `generate-apcs`: Builds a ranked list of APCs - `select-apcs`: Select the exact APCs to be added to the VM. This step is very simple (`ranked.into_iter().skip(skip).take(num_apcs).collect()`) The main benefit of this is that the expensive `generate-apcs` artifact can be cached, meaning it only has to be computed once when sweeping over different number of APCs. In particular, for `--pgo cell` (the default!), we build all APCs anyway. With this PR, they are cached! This PR also refactors the caching a bit, moving it from `main.rs` into `powdr_openvm`, which means that `openvm-eth` (and I think also `powdr-wasm`) can benefit from it. Some parts of the `powdr_autoprecompiles` are also refactored; in particular, the monolithic `PowdrConfig` is split up into a config type per pipeline stage,, improving modularity. This PR requires two companion PRs which should be reviewed as part of this PR: - powdr-labs/openvm-eth#8 - succinctlabs/sp1#2830 --- ## Part 1 — the staged pipeline `select-apcs` used to build APC candidates *and* pick the top N in one shot. This PR splits those into two cached stages: | Command | Stages run | | --------------- | -------------------------------------------------------------- | | `generate-apcs` | Profile + build & rank APC candidates | | `select-apcs` | … + trim the ranking to `--autoprecompiles` (after `--skip`) | | `setup` | … + assemble the program with selected APCs | | `execute` | … + run the guest in interpreted mode | | `prove` | … + STARK proof, optionally with `--recursion` (compression) | `--pgo` and `--max-columns` shape the ranking, so they live on the generate stage; `--autoprecompiles` and `--skip` are the trim, so they live on select. A new `--apc-candidates <N>` caps how many candidates `generate-apcs` builds. **`--apc-candidates` resolution:** - `--pgo cell` always builds **every** eligible candidate (its density ranking needs each candidate's post-opt cost), so a user-set `--apc-candidates` is warned-about and ignored. Crucially, the cap is *not* a function of `--autoprecompiles`, so the generate hash is identical across a sweep. - `--pgo instruction|none`: standalone `generate-apcs` builds all by default; in a fused pipeline (`select-apcs` and beyond) the cap defaults to `--autoprecompiles + --skip`. Pin it explicitly to over-build once and sweep selection cheaply. **Cache hierarchy** (`generate → select → setup`): each stage hashes only its own args plus a fingerprint of the transpiled guest `VmExe`. A later-stage flag (`--input`, `--mock`, `--recursion`, `--metrics`) never invalidates an earlier-stage cache; editing the guest invalidates everything downstream of it. The cache is opt-in via the global `--artifacts-dir <DIR>`; with no dir, every stage runs inline. See [`cli-openvm-riscv/README.md`](cli-openvm-riscv/README.md) for the full surface and `--profile-input` vs `--input` semantics. --- ## Part 2 — the config split `PowdrConfig` bundled fields from every pipeline stage into one struct. That forced generate-stage code to share a hash space with select-stage flags — making it impossible for the library to safely hash the config as a cache key, which in turn forced `StagedPipeline` to expose a stringly-typed `cache_key: &impl Hash` argument that each external caller (CLI, openvm-eth, sp1) fingerprinted on its own. Three callers, three conventions — easy to drop a field and silently serve stale cache hits. This PR splits `PowdrConfig` along the natural pipeline boundary: - **`GenerateConfig`** (`autoprecompiles/src/lib.rs`) holds only the fields the generate stage reads (`apc_candidates`, `superblock_max_bb_count`, `apc_max_instructions`, `apc_exec_count_cutoff`, `apc_candidates_dir_path`, `should_use_optimistic_precompiles`, `degree_bound`). Derives `Hash`. - **`SelectConfig`** holds `autoprecompiles` + `skip`. Derives `Hash`. - **`PgoConfig`** holds `pgo_type`, `max_columns`, and the opaque `inputs` byte string (serialized stdin). Derives `Hash`, so it participates in both the generate and select/setup hashes. - **`degree_bound`** keeps being passed plain into `setup` (no new wrapper). - **`GenerateConfig::with_select_defaults(pgo, select)`** is the single policy definition for the `apc_candidates` cap default. Callers chain it in one line when they know both halves — `compile_exe`, the CLI shim, openvm-eth, sp1 all share it. - **`InstructionPgo`** / **`NonePgo`** short-circuit on `apc_candidates == Some(0)` (mirroring Cell), set by `with_select_defaults` when `select.autoprecompiles == 0`. The stale `autoprecompiles == 0` field check is gone. `StagedPipeline`'s API drops `cache_key`; callers supply only an `input_fp: &impl Hash` covering anything hidden behind the closures (PGO stdin contents, RPC chain id, …). The library hashes `(guest_hash, generate, pgo)` for generate and `(guest_hash, generate, pgo, select)` for select/setup. Generate's hash omits `select` by construction, so a Cell-PGO `--autoprecompiles N` sweep automatically reuses the candidate ranking — Cell's `with_select_defaults` returns `apc_candidates = None` regardless of selection size, so the generate hash is identical across the sweep. `compile_exe` becomes a thin wrapper over the staged generate/select/setup functions (no caching). `PowdrConfig` and `default_powdr_openvm_config` are deleted. --- ## Verifying the cache All commands below use the `setup` subcommand (runs `generate → select → setup` with no proving, so they're fast) and `RUST_LOG=info` to surface the cache log lines. `cached()` logs `cache hit: <stage>/<hash>` on a hit and runs the stage inline on a miss. The hashes are toolchain/guest-specific — what matters is *which* stage logs a hit. Substitute any guest (e.g. `guest-keccak`) for heavier, more realistic candidate counts. The `cargo run` invocation is spelled out in full on each line (rather than stashed in a `$BIN` variable) so the blocks paste cleanly into both bash and zsh. Because the cache is hierarchical and short-circuits at the **outermost** hit, an identical rerun logs only `cache hit: setup/<h>` (select and generate are never consulted), whereas a run that only changes selection logs `cache hit: generate/<h>` and re-runs select + setup. ### 1. Cell PGO: `--autoprecompiles` sweep reuses one generate entry ```sh rm -rf /tmp/powdr-cache # Cold: builds & ranks every eligible candidate. RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache \ setup guest --profile-input 10 --autoprecompiles 2 --pgo cell # Different --autoprecompiles: generate is reused, only select + setup re-run. RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache \ setup guest --profile-input 10 --autoprecompiles 10 --pgo cell ``` Run 1 logs `Generating autoprecompiles for all N blocks in parallel` (cold build). Run 2 logs: ``` INFO cache hit: generate/<h> <- ranking reused across the sweep INFO Adjust the program ... <- select + setup re-run (cheap) INFO Setup completed. ``` A third identical rerun of run 2 instead logs a single `cache hit: setup/<h>` — the whole pipeline short-circuits. ### 2. Instruction PGO: sweeping rebuilds unless `--apc-candidates` is pinned Under `--pgo instruction|none` the build cap defaults to `--autoprecompiles + --skip`, so it lives in the generate hash and a naive sweep rebuilds every iteration: ```sh rm -rf /tmp/powdr-cache-instr # ap=2 then ap=5, no --apc-candidates → generate hash differs each time. RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache-instr \ setup guest --profile-input 10 --autoprecompiles 2 --pgo instruction RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache-instr \ setup guest --profile-input 10 --autoprecompiles 5 --pgo instruction ``` Both runs log `Generating up to N autoprecompiles in parallel` — **no** `cache hit: generate`. Pin `--apc-candidates` to the top of the sweep to share one generate entry: ```sh rm -rf /tmp/powdr-cache-instr2 RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache-instr2 \ setup guest --profile-input 10 --autoprecompiles 2 --apc-candidates 5 --pgo instruction RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache-instr2 \ setup guest --profile-input 10 --autoprecompiles 5 --apc-candidates 5 --pgo instruction ``` Now the second run logs `cache hit: generate/<h>`. ### 3. Later-stage flags don't invalidate earlier stages ```sh # After any cached setup run above, re-proving with a new --input / --mock # hits the setup cache rather than rebuilding the program: RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache \ prove guest --profile-input 10 --autoprecompiles 10 --pgo cell --input 123 --mock ``` `--input` / `--mock` are not part of any stage hash, so this logs `cache hit: setup/<h>` and proceeds straight to the (mock) prover. > All three scenarios above were run locally on this branch (release build, `guest` crate) and produced exactly the logged behavior described. --- ## Nightly run Manually-triggered nightly run [completed successfully](https://github.com/powdr-labs/powdr/actions/runs/26893147698) with the following bench results: - [`2026-06-03-2010`](https://github.com/powdr-labs/bench-results/tree/gh-pages/results/2026-06-03-2010) - [`2026-06-03-1723-gpu`](https://github.com/powdr-labs/bench-results/tree/gh-pages/results/2026-06-03-1723-gpu) Compared against the latest scheduled nightly on `main` ([run](https://github.com/powdr-labs/powdr/actions/runs/26919982316) → [`2026-06-04-0509`](https://github.com/powdr-labs/bench-results/tree/gh-pages/results/2026-06-04-0509), [`2026-06-04-0219-gpu`](https://github.com/powdr-labs/bench-results/tree/gh-pages/results/2026-06-04-0219-gpu)) via `scripts/analyze_nightly.py` (GPU run only benches reth, so `--benchmarks reth_gpu`): - **Behavior-preserving.** For every reth / reth_gpu config, `powdr_rows` and `app_proof_cells` match `main` *exactly* — same APCs selected, same circuit proven. Total-proof-time deltas are ≤1.5% in both directions (machine noise); the two configs `analyze_nightly.py` flags (reth `apc100` +1.5%, reth_gpu `apc030` +1.3%) have byte-identical workloads. Guest benches: −0.3% … −1.4%. - **No extra work — less, in fact.** In the reth compile sweep `apc ∈ {0,3,10,30,100}` (cell PGO), `main` rebuilds all 11110 candidates **4×** (~28 min, 0 cache hits); this branch builds them **once**, then logs `cache hit: generate/<h>` for apc 10/30/100 and `cache hit: setup/<h>` on every `prove-stark`. ~20 min of redundant rebuilding eliminated, as intended. - The `cargo bench` `optimize-keccak` figure (24.1s vs 20.1s) is noise: it optimizes a fixed pre-built APC untouched by this PR, and `main` itself ranges 17.7–24.1s across runs. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #3763 (split GenerateConfig/SelectConfig + refactored staged caching) landed on powdr main as 77712e655. Point the powdr git deps at that commit and refresh Cargo.lock to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
I updated the powdr hash to point to the latest |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adapts to powdr-labs/powdr#3763