chore: adopt powdr's staged APC build + split GenerateConfig/SelectConfig by georgwiese · Pull Request #2830 · succinctlabs/sp1

georgwiese · 2026-06-02T14:44:12Z

Summary

powdr-labs/powdr#3752 removes `PgoAdapter::filter_blocks_and_create_apcs_with_pgo` and exposes `detect_blocks` + `generate_apcs` + `select_apcs` instead, so the build cache survives changes to `autoprecompiles` / `skip`. Inline the three steps here. Behavior preserved by setting `apc_candidates = autoprecompiles + skip` when unset — same final selection as the old fused call. Requires bumping the powdr pin in Cargo.toml once #3752 lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #3763 changes the powdr autoprecompile API. The pre-existing pins target sp1 and openvm-eth commits that use the old API, so CI fails even though my refactor itself is correct. - sp1-benchmarks/Cargo.toml: bump sp1 dep branch from powdr-labs/generate-apcs-api → powdr-labs/split-powdr-config (succinctlabs/sp1#2830 — adopts the new powdr API). - .github/actions/patch-openvm-eth: bump ref from 05ed9cd3 (main) → e716b51a (powdr-labs/openvm-eth#8 tip, adopts the new powdr API). The "Verify openvm-eth ref is on main" check will (intentionally) fail until openvm-eth#8 merges — the comment in pr-tests-with-secrets.yml already accounts for this case, and the actual reth test runs against the compatible code.

The sp1 adoption (succinctlabs/sp1#2830, branch powdr-labs/split-powdr-config) was slimmed to the irreducible GenerateConfig/SelectConfig migration — the staged artifact cache was dropped (no sp1 consumer used it). This empty commit re-runs CI so sp1-benchmarks builds against the new sp1 tip 676a63c8. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

powdr#3763 replaces `PowdrConfig` with per-stage `GenerateConfig` + `SelectConfig` and renames the `PgoConfig` enum to `PgoData`. Adopt it: - `sp1_configs(apc, skip, pgo) -> (GenerateConfig, SelectConfig)` builds the pair and resolves the `apc_candidates` cap up front via `GenerateConfig::with_select_defaults` (which needs the PGO mode). The returned configs are complete, so there's a single, obvious owner of the cap-default policy and no separate step for callers to remember. - `CompiledProgram::new(elf, generate, select, pgo_data)` takes the halves separately and no longer applies any defaulting. - The free `select_apcs` collapses to `select_apcs(ranked, select)`. - Examples, the sdk test, the gpu perf bin, and the autoprecompiles CLI all get the matching shape. - Bump the powdr pin to `be183c60e`. Deliberately does NOT adopt powdr's staged artifact cache: every sp1 consumer builds APCs once per config via the uncached `CompiledProgram::new`, so only the dev CLI would have benefited. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

## Summary Building on #3750, this PR splits the `select-apcs` command into: - `generate-apcs`: Builds a ranked list of APCs - `select-apcs`: Select the exact APCs to be added to the VM. This step is very simple (`ranked.into_iter().skip(skip).take(num_apcs).collect()`) The main benefit of this is that the expensive `generate-apcs` artifact can be cached, meaning it only has to be computed once when sweeping over different number of APCs. In particular, for `--pgo cell` (the default!), we build all APCs anyway. With this PR, they are cached! This PR also refactors the caching a bit, moving it from `main.rs` into `powdr_openvm`, which means that `openvm-eth` (and I think also `powdr-wasm`) can benefit from it. Some parts of the `powdr_autoprecompiles` are also refactored; in particular, the monolithic `PowdrConfig` is split up into a config type per pipeline stage,, improving modularity. This PR requires two companion PRs which should be reviewed as part of this PR: - powdr-labs/openvm-eth#8 - succinctlabs/sp1#2830 --- ## Part 1 — the staged pipeline `select-apcs` used to build APC candidates *and* pick the top N in one shot. This PR splits those into two cached stages: | Command | Stages run | | --------------- | -------------------------------------------------------------- | | `generate-apcs` | Profile + build & rank APC candidates | | `select-apcs` | … + trim the ranking to `--autoprecompiles` (after `--skip`) | | `setup` | … + assemble the program with selected APCs | | `execute` | … + run the guest in interpreted mode | | `prove` | … + STARK proof, optionally with `--recursion` (compression) | `--pgo` and `--max-columns` shape the ranking, so they live on the generate stage; `--autoprecompiles` and `--skip` are the trim, so they live on select. A new `--apc-candidates <N>` caps how many candidates `generate-apcs` builds. **`--apc-candidates` resolution:** - `--pgo cell` always builds **every** eligible candidate (its density ranking needs each candidate's post-opt cost), so a user-set `--apc-candidates` is warned-about and ignored. Crucially, the cap is *not* a function of `--autoprecompiles`, so the generate hash is identical across a sweep. - `--pgo instruction|none`: standalone `generate-apcs` builds all by default; in a fused pipeline (`select-apcs` and beyond) the cap defaults to `--autoprecompiles + --skip`. Pin it explicitly to over-build once and sweep selection cheaply. **Cache hierarchy** (`generate → select → setup`): each stage hashes only its own args plus a fingerprint of the transpiled guest `VmExe`. A later-stage flag (`--input`, `--mock`, `--recursion`, `--metrics`) never invalidates an earlier-stage cache; editing the guest invalidates everything downstream of it. The cache is opt-in via the global `--artifacts-dir <DIR>`; with no dir, every stage runs inline. See [`cli-openvm-riscv/README.md`](cli-openvm-riscv/README.md) for the full surface and `--profile-input` vs `--input` semantics. --- ## Part 2 — the config split `PowdrConfig` bundled fields from every pipeline stage into one struct. That forced generate-stage code to share a hash space with select-stage flags — making it impossible for the library to safely hash the config as a cache key, which in turn forced `StagedPipeline` to expose a stringly-typed `cache_key: &impl Hash` argument that each external caller (CLI, openvm-eth, sp1) fingerprinted on its own. Three callers, three conventions — easy to drop a field and silently serve stale cache hits. This PR splits `PowdrConfig` along the natural pipeline boundary: - **`GenerateConfig`** (`autoprecompiles/src/lib.rs`) holds only the fields the generate stage reads (`apc_candidates`, `superblock_max_bb_count`, `apc_max_instructions`, `apc_exec_count_cutoff`, `apc_candidates_dir_path`, `should_use_optimistic_precompiles`, `degree_bound`). Derives `Hash`. - **`SelectConfig`** holds `autoprecompiles` + `skip`. Derives `Hash`. - **`PgoConfig`** holds `pgo_type`, `max_columns`, and the opaque `inputs` byte string (serialized stdin). Derives `Hash`, so it participates in both the generate and select/setup hashes. - **`degree_bound`** keeps being passed plain into `setup` (no new wrapper). - **`GenerateConfig::with_select_defaults(pgo, select)`** is the single policy definition for the `apc_candidates` cap default. Callers chain it in one line when they know both halves — `compile_exe`, the CLI shim, openvm-eth, sp1 all share it. - **`InstructionPgo`** / **`NonePgo`** short-circuit on `apc_candidates == Some(0)` (mirroring Cell), set by `with_select_defaults` when `select.autoprecompiles == 0`. The stale `autoprecompiles == 0` field check is gone. `StagedPipeline`'s API drops `cache_key`; callers supply only an `input_fp: &impl Hash` covering anything hidden behind the closures (PGO stdin contents, RPC chain id, …). The library hashes `(guest_hash, generate, pgo)` for generate and `(guest_hash, generate, pgo, select)` for select/setup. Generate's hash omits `select` by construction, so a Cell-PGO `--autoprecompiles N` sweep automatically reuses the candidate ranking — Cell's `with_select_defaults` returns `apc_candidates = None` regardless of selection size, so the generate hash is identical across the sweep. `compile_exe` becomes a thin wrapper over the staged generate/select/setup functions (no caching). `PowdrConfig` and `default_powdr_openvm_config` are deleted. --- ## Verifying the cache All commands below use the `setup` subcommand (runs `generate → select → setup` with no proving, so they're fast) and `RUST_LOG=info` to surface the cache log lines. `cached()` logs `cache hit: <stage>/<hash>` on a hit and runs the stage inline on a miss. The hashes are toolchain/guest-specific — what matters is *which* stage logs a hit. Substitute any guest (e.g. `guest-keccak`) for heavier, more realistic candidate counts. The `cargo run` invocation is spelled out in full on each line (rather than stashed in a `$BIN` variable) so the blocks paste cleanly into both bash and zsh. Because the cache is hierarchical and short-circuits at the **outermost** hit, an identical rerun logs only `cache hit: setup/<h>` (select and generate are never consulted), whereas a run that only changes selection logs `cache hit: generate/<h>` and re-runs select + setup. ### 1. Cell PGO: `--autoprecompiles` sweep reuses one generate entry ```sh rm -rf /tmp/powdr-cache # Cold: builds & ranks every eligible candidate. RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache \ setup guest --profile-input 10 --autoprecompiles 2 --pgo cell # Different --autoprecompiles: generate is reused, only select + setup re-run. RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache \ setup guest --profile-input 10 --autoprecompiles 10 --pgo cell ``` Run 1 logs `Generating autoprecompiles for all N blocks in parallel` (cold build). Run 2 logs: ``` INFO cache hit: generate/<h> <- ranking reused across the sweep INFO Adjust the program ... <- select + setup re-run (cheap) INFO Setup completed. ``` A third identical rerun of run 2 instead logs a single `cache hit: setup/<h>` — the whole pipeline short-circuits. ### 2. Instruction PGO: sweeping rebuilds unless `--apc-candidates` is pinned Under `--pgo instruction|none` the build cap defaults to `--autoprecompiles + --skip`, so it lives in the generate hash and a naive sweep rebuilds every iteration: ```sh rm -rf /tmp/powdr-cache-instr # ap=2 then ap=5, no --apc-candidates → generate hash differs each time. RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache-instr \ setup guest --profile-input 10 --autoprecompiles 2 --pgo instruction RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache-instr \ setup guest --profile-input 10 --autoprecompiles 5 --pgo instruction ``` Both runs log `Generating up to N autoprecompiles in parallel` — **no** `cache hit: generate`. Pin `--apc-candidates` to the top of the sweep to share one generate entry: ```sh rm -rf /tmp/powdr-cache-instr2 RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache-instr2 \ setup guest --profile-input 10 --autoprecompiles 2 --apc-candidates 5 --pgo instruction RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache-instr2 \ setup guest --profile-input 10 --autoprecompiles 5 --apc-candidates 5 --pgo instruction ``` Now the second run logs `cache hit: generate/<h>`. ### 3. Later-stage flags don't invalidate earlier stages ```sh # After any cached setup run above, re-proving with a new --input / --mock # hits the setup cache rather than rebuilding the program: RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache \ prove guest --profile-input 10 --autoprecompiles 10 --pgo cell --input 123 --mock ``` `--input` / `--mock` are not part of any stage hash, so this logs `cache hit: setup/<h>` and proceeds straight to the (mock) prover. > All three scenarios above were run locally on this branch (release build, `guest` crate) and produced exactly the logged behavior described. --- ## Nightly run Manually-triggered nightly run [completed successfully](https://github.com/powdr-labs/powdr/actions/runs/26893147698) with the following bench results: - [`2026-06-03-2010`](https://github.com/powdr-labs/bench-results/tree/gh-pages/results/2026-06-03-2010) - [`2026-06-03-1723-gpu`](https://github.com/powdr-labs/bench-results/tree/gh-pages/results/2026-06-03-1723-gpu) Compared against the latest scheduled nightly on `main` ([run](https://github.com/powdr-labs/powdr/actions/runs/26919982316) → [`2026-06-04-0509`](https://github.com/powdr-labs/bench-results/tree/gh-pages/results/2026-06-04-0509), [`2026-06-04-0219-gpu`](https://github.com/powdr-labs/bench-results/tree/gh-pages/results/2026-06-04-0219-gpu)) via `scripts/analyze_nightly.py` (GPU run only benches reth, so `--benchmarks reth_gpu`): - **Behavior-preserving.** For every reth / reth_gpu config, `powdr_rows` and `app_proof_cells` match `main` *exactly* — same APCs selected, same circuit proven. Total-proof-time deltas are ≤1.5% in both directions (machine noise); the two configs `analyze_nightly.py` flags (reth `apc100` +1.5%, reth_gpu `apc030` +1.3%) have byte-identical workloads. Guest benches: −0.3% … −1.4%. - **No extra work — less, in fact.** In the reth compile sweep `apc ∈ {0,3,10,30,100}` (cell PGO), `main` rebuilds all 11110 candidates **4×** (~28 min, 0 cache hits); this branch builds them **once**, then logs `cache hit: generate/<h>` for apc 10/30/100 and `cache hit: setup/<h>` on every `prove-stark`. ~20 min of redundant rebuilding eliminated, as intended. - The `cargo bench` `optimize-keccak` figure (24.1s vs 20.1s) is noise: it optimizes a fixed pre-built APC untouched by this PR, and `main` itself ranges 17.7–24.1s across runs. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #3763 (split GenerateConfig/SelectConfig + refactored staged caching) landed on powdr main as 77712e655. Point the powdr git deps at that commit and refresh Cargo.lock to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

georgwiese · 2026-06-09T14:59:47Z

I updated the powdr hash to point to the latest main commit, which contains powdr-labs/powdr#3763.

georgwiese mentioned this pull request Jun 3, 2026

Split "Generate APCs" + "Select APCs" stage + refactored caching powdr-labs/powdr#3763

Merged

georgwiese force-pushed the powdr-labs/split-powdr-config branch from 987b65d to 676a63c Compare June 3, 2026 15:07

georgwiese changed the title ~~Adopt powdr's split GenerateConfig + SelectConfig; opt into staged cache~~ Adopt powdr's split GenerateConfig + SelectConfig Jun 3, 2026

georgwiese changed the title ~~Adopt powdr's split GenerateConfig + SelectConfig~~ chore: adopt powdr's split GenerateConfig + SelectConfig Jun 3, 2026

georgwiese changed the base branch from powdr-labs/generate-apcs-api to powdr-labs/v6.0.2-apc-base June 4, 2026 09:43

georgwiese force-pushed the powdr-labs/split-powdr-config branch from 676a63c to 5448340 Compare June 4, 2026 09:51

georgwiese changed the title ~~chore: adopt powdr's split GenerateConfig + SelectConfig~~ chore: adopt powdr's staged APC build + split GenerateConfig/SelectConfig Jun 4, 2026

georgwiese mentioned this pull request Jun 5, 2026

Track upstream PgoAdapter API split #2805

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: adopt powdr's staged APC build + split GenerateConfig/SelectConfig#2830

chore: adopt powdr's staged APC build + split GenerateConfig/SelectConfig#2830
georgwiese wants to merge 3 commits into
powdr-labs/v6.0.2-apc-basefrom
powdr-labs/split-powdr-config

georgwiese commented Jun 2, 2026 •

edited

Loading

Uh oh!

georgwiese commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

georgwiese commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

georgwiese commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

georgwiese commented Jun 2, 2026 •

edited

Loading