Skip to content

chore: adopt powdr's staged APC build + split GenerateConfig/SelectConfig#2830

Open
georgwiese wants to merge 3 commits into
powdr-labs/v6.0.2-apc-basefrom
powdr-labs/split-powdr-config
Open

chore: adopt powdr's staged APC build + split GenerateConfig/SelectConfig#2830
georgwiese wants to merge 3 commits into
powdr-labs/v6.0.2-apc-basefrom
powdr-labs/split-powdr-config

Conversation

@georgwiese

@georgwiese georgwiese commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adapts to powdr-labs/powdr#3763

powdr-labs/powdr#3752 removes `PgoAdapter::filter_blocks_and_create_apcs_with_pgo`
and exposes `detect_blocks` + `generate_apcs` + `select_apcs` instead, so the
build cache survives changes to `autoprecompiles` / `skip`. Inline the three
steps here.

Behavior preserved by setting `apc_candidates = autoprecompiles + skip` when
unset — same final selection as the old fused call.

Requires bumping the powdr pin in Cargo.toml once #3752 lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
georgwiese added a commit to powdr-labs/powdr that referenced this pull request Jun 2, 2026
PR #3763 changes the powdr autoprecompile API. The pre-existing pins
target sp1 and openvm-eth commits that use the old API, so CI fails
even though my refactor itself is correct.

- sp1-benchmarks/Cargo.toml: bump sp1 dep branch from
  powdr-labs/generate-apcs-api → powdr-labs/split-powdr-config
  (succinctlabs/sp1#2830 — adopts the new powdr API).
- .github/actions/patch-openvm-eth: bump ref from
  05ed9cd3 (main) → e716b51a (powdr-labs/openvm-eth#8 tip, adopts
  the new powdr API).

The "Verify openvm-eth ref is on main" check will (intentionally) fail
until openvm-eth#8 merges — the comment in pr-tests-with-secrets.yml
already accounts for this case, and the actual reth test runs against
the compatible code.
@georgwiese georgwiese force-pushed the powdr-labs/split-powdr-config branch from 987b65d to 676a63c Compare June 3, 2026 15:07
@georgwiese georgwiese changed the title Adopt powdr's split GenerateConfig + SelectConfig; opt into staged cache Adopt powdr's split GenerateConfig + SelectConfig Jun 3, 2026
georgwiese added a commit to powdr-labs/powdr that referenced this pull request Jun 3, 2026
The sp1 adoption (succinctlabs/sp1#2830, branch powdr-labs/split-powdr-config)
was slimmed to the irreducible GenerateConfig/SelectConfig migration — the
staged artifact cache was dropped (no sp1 consumer used it). This empty commit
re-runs CI so sp1-benchmarks builds against the new sp1 tip 676a63c8.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@georgwiese georgwiese changed the title Adopt powdr's split GenerateConfig + SelectConfig chore: adopt powdr's split GenerateConfig + SelectConfig Jun 3, 2026
@georgwiese georgwiese changed the base branch from powdr-labs/generate-apcs-api to powdr-labs/v6.0.2-apc-base June 4, 2026 09:43
powdr#3763 replaces `PowdrConfig` with per-stage `GenerateConfig` +
`SelectConfig` and renames the `PgoConfig` enum to `PgoData`. Adopt it:

- `sp1_configs(apc, skip, pgo) -> (GenerateConfig, SelectConfig)` builds the
  pair and resolves the `apc_candidates` cap up front via
  `GenerateConfig::with_select_defaults` (which needs the PGO mode). The
  returned configs are complete, so there's a single, obvious owner of the
  cap-default policy and no separate step for callers to remember.
- `CompiledProgram::new(elf, generate, select, pgo_data)` takes the halves
  separately and no longer applies any defaulting.
- The free `select_apcs` collapses to `select_apcs(ranked, select)`.
- Examples, the sdk test, the gpu perf bin, and the autoprecompiles CLI all
  get the matching shape.
- Bump the powdr pin to `be183c60e`.

Deliberately does NOT adopt powdr's staged artifact cache: every sp1 consumer
builds APCs once per config via the uncached `CompiledProgram::new`, so only
the dev CLI would have benefited.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@georgwiese georgwiese force-pushed the powdr-labs/split-powdr-config branch from 676a63c to 5448340 Compare June 4, 2026 09:51
@georgwiese georgwiese changed the title chore: adopt powdr's split GenerateConfig + SelectConfig chore: adopt powdr's staged APC build + split GenerateConfig/SelectConfig Jun 4, 2026
github-merge-queue Bot pushed a commit to powdr-labs/powdr that referenced this pull request Jun 9, 2026
## Summary

Building on #3750, this PR splits the `select-apcs` command into:
- `generate-apcs`: Builds a ranked list of APCs
- `select-apcs`: Select the exact APCs to be added to the VM. This step
is very simple
(`ranked.into_iter().skip(skip).take(num_apcs).collect()`)

The main benefit of this is that the expensive `generate-apcs` artifact
can be cached, meaning it only has to be computed once when sweeping
over different number of APCs. In particular, for `--pgo cell` (the
default!), we build all APCs anyway. With this PR, they are cached!

This PR also refactors the caching a bit, moving it from `main.rs` into
`powdr_openvm`, which means that `openvm-eth` (and I think also
`powdr-wasm`) can benefit from it. Some parts of the
`powdr_autoprecompiles` are also refactored; in particular, the
monolithic `PowdrConfig` is split up into a config type per pipeline
stage,, improving modularity.

This PR requires two companion PRs which should be reviewed as part of
this PR:
- powdr-labs/openvm-eth#8
- succinctlabs/sp1#2830

---

## Part 1 — the staged pipeline

`select-apcs` used to build APC candidates *and* pick the top N in one
shot. This PR splits those into two cached stages:

| Command | Stages run |
| --------------- |
-------------------------------------------------------------- |
| `generate-apcs` | Profile + build & rank APC candidates |
| `select-apcs` | … + trim the ranking to `--autoprecompiles` (after
`--skip`) |
| `setup` | … + assemble the program with selected APCs |
| `execute` | … + run the guest in interpreted mode |
| `prove` | … + STARK proof, optionally with `--recursion` (compression)
|

`--pgo` and `--max-columns` shape the ranking, so they live on the
generate stage; `--autoprecompiles` and `--skip` are the trim, so they
live on select. A new `--apc-candidates <N>` caps how many candidates
`generate-apcs` builds.

**`--apc-candidates` resolution:**

- `--pgo cell` always builds **every** eligible candidate (its density
ranking needs each candidate's post-opt cost), so a user-set
`--apc-candidates` is warned-about and ignored. Crucially, the cap is
*not* a function of `--autoprecompiles`, so the generate hash is
identical across a sweep.
- `--pgo instruction|none`: standalone `generate-apcs` builds all by
default; in a fused pipeline (`select-apcs` and beyond) the cap defaults
to `--autoprecompiles + --skip`. Pin it explicitly to over-build once
and sweep selection cheaply.

**Cache hierarchy** (`generate → select → setup`): each stage hashes
only its own args plus a fingerprint of the transpiled guest `VmExe`. A
later-stage flag (`--input`, `--mock`, `--recursion`, `--metrics`) never
invalidates an earlier-stage cache; editing the guest invalidates
everything downstream of it. The cache is opt-in via the global
`--artifacts-dir <DIR>`; with no dir, every stage runs inline.

See [`cli-openvm-riscv/README.md`](cli-openvm-riscv/README.md) for the
full surface and `--profile-input` vs `--input` semantics.

---

## Part 2 — the config split

`PowdrConfig` bundled fields from every pipeline stage into one struct.
That forced generate-stage code to share a hash space with select-stage
flags — making it impossible for the library to safely hash the config
as a cache key, which in turn forced `StagedPipeline` to expose a
stringly-typed `cache_key: &impl Hash` argument that each external
caller (CLI, openvm-eth, sp1) fingerprinted on its own. Three callers,
three conventions — easy to drop a field and silently serve stale cache
hits.

This PR splits `PowdrConfig` along the natural pipeline boundary:

- **`GenerateConfig`** (`autoprecompiles/src/lib.rs`) holds only the
fields the generate stage reads (`apc_candidates`,
`superblock_max_bb_count`, `apc_max_instructions`,
`apc_exec_count_cutoff`, `apc_candidates_dir_path`,
`should_use_optimistic_precompiles`, `degree_bound`). Derives `Hash`.
- **`SelectConfig`** holds `autoprecompiles` + `skip`. Derives `Hash`.
- **`PgoConfig`** holds `pgo_type`, `max_columns`, and the opaque
`inputs` byte string (serialized stdin). Derives `Hash`, so it
participates in both the generate and select/setup hashes.
- **`degree_bound`** keeps being passed plain into `setup` (no new
wrapper).
- **`GenerateConfig::with_select_defaults(pgo, select)`** is the single
policy definition for the `apc_candidates` cap default. Callers chain it
in one line when they know both halves — `compile_exe`, the CLI shim,
openvm-eth, sp1 all share it.
- **`InstructionPgo`** / **`NonePgo`** short-circuit on `apc_candidates
== Some(0)` (mirroring Cell), set by `with_select_defaults` when
`select.autoprecompiles == 0`. The stale `autoprecompiles == 0` field
check is gone.

`StagedPipeline`'s API drops `cache_key`; callers supply only an
`input_fp: &impl Hash` covering anything hidden behind the closures (PGO
stdin contents, RPC chain id, …). The library hashes `(guest_hash,
generate, pgo)` for generate and `(guest_hash, generate, pgo, select)`
for select/setup. Generate's hash omits `select` by construction, so a
Cell-PGO `--autoprecompiles N` sweep automatically reuses the candidate
ranking — Cell's `with_select_defaults` returns `apc_candidates = None`
regardless of selection size, so the generate hash is identical across
the sweep.

`compile_exe` becomes a thin wrapper over the staged
generate/select/setup functions (no caching). `PowdrConfig` and
`default_powdr_openvm_config` are deleted.

---

## Verifying the cache

All commands below use the `setup` subcommand (runs `generate → select →
setup` with no proving, so they're fast) and `RUST_LOG=info` to surface
the cache log lines. `cached()` logs `cache hit: <stage>/<hash>` on a
hit and runs the stage inline on a miss. The hashes are
toolchain/guest-specific — what matters is *which* stage logs a hit.
Substitute any guest (e.g. `guest-keccak`) for heavier, more realistic
candidate counts. The `cargo run` invocation is spelled out in full on
each line (rather than stashed in a `$BIN` variable) so the blocks paste
cleanly into both bash and zsh.

Because the cache is hierarchical and short-circuits at the
**outermost** hit, an identical rerun logs only `cache hit: setup/<h>`
(select and generate are never consulted), whereas a run that only
changes selection logs `cache hit: generate/<h>` and re-runs select +
setup.

### 1. Cell PGO: `--autoprecompiles` sweep reuses one generate entry

```sh
rm -rf /tmp/powdr-cache

# Cold: builds & ranks every eligible candidate.
RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache \
    setup guest --profile-input 10 --autoprecompiles 2 --pgo cell

# Different --autoprecompiles: generate is reused, only select + setup re-run.
RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache \
    setup guest --profile-input 10 --autoprecompiles 10 --pgo cell
```

Run 1 logs `Generating autoprecompiles for all N blocks in parallel`
(cold build). Run 2 logs:

```
INFO cache hit: generate/<h>      <- ranking reused across the sweep
INFO Adjust the program ...       <- select + setup re-run (cheap)
INFO Setup completed.
```

A third identical rerun of run 2 instead logs a single `cache hit:
setup/<h>` — the whole pipeline short-circuits.

### 2. Instruction PGO: sweeping rebuilds unless `--apc-candidates` is
pinned

Under `--pgo instruction|none` the build cap defaults to
`--autoprecompiles + --skip`, so it lives in the generate hash and a
naive sweep rebuilds every iteration:

```sh
rm -rf /tmp/powdr-cache-instr
# ap=2 then ap=5, no --apc-candidates → generate hash differs each time.
RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache-instr \
    setup guest --profile-input 10 --autoprecompiles 2 --pgo instruction
RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache-instr \
    setup guest --profile-input 10 --autoprecompiles 5 --pgo instruction
```

Both runs log `Generating up to N autoprecompiles in parallel` — **no**
`cache hit: generate`. Pin `--apc-candidates` to the top of the sweep to
share one generate entry:

```sh
rm -rf /tmp/powdr-cache-instr2
RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache-instr2 \
    setup guest --profile-input 10 --autoprecompiles 2 --apc-candidates 5 --pgo instruction
RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache-instr2 \
    setup guest --profile-input 10 --autoprecompiles 5 --apc-candidates 5 --pgo instruction
```

Now the second run logs `cache hit: generate/<h>`.

### 3. Later-stage flags don't invalidate earlier stages

```sh
# After any cached setup run above, re-proving with a new --input / --mock
# hits the setup cache rather than rebuilding the program:
RUST_LOG=info cargo run -r --bin powdr_openvm_riscv -- --artifacts-dir /tmp/powdr-cache \
    prove guest --profile-input 10 --autoprecompiles 10 --pgo cell --input 123 --mock
```

`--input` / `--mock` are not part of any stage hash, so this logs `cache
hit: setup/<h>` and proceeds straight to the (mock) prover.

> All three scenarios above were run locally on this branch (release
build, `guest` crate) and produced exactly the logged behavior
described.

---

## Nightly run

Manually-triggered nightly run [completed
successfully](https://github.com/powdr-labs/powdr/actions/runs/26893147698)
with the following bench results:
-
[`2026-06-03-2010`](https://github.com/powdr-labs/bench-results/tree/gh-pages/results/2026-06-03-2010)
-
[`2026-06-03-1723-gpu`](https://github.com/powdr-labs/bench-results/tree/gh-pages/results/2026-06-03-1723-gpu)

Compared against the latest scheduled nightly on `main`
([run](https://github.com/powdr-labs/powdr/actions/runs/26919982316) →
[`2026-06-04-0509`](https://github.com/powdr-labs/bench-results/tree/gh-pages/results/2026-06-04-0509),
[`2026-06-04-0219-gpu`](https://github.com/powdr-labs/bench-results/tree/gh-pages/results/2026-06-04-0219-gpu))
via `scripts/analyze_nightly.py` (GPU run only benches reth, so
`--benchmarks reth_gpu`):

- **Behavior-preserving.** For every reth / reth_gpu config,
`powdr_rows` and `app_proof_cells` match `main` *exactly* — same APCs
selected, same circuit proven. Total-proof-time deltas are ≤1.5% in both
directions (machine noise); the two configs `analyze_nightly.py` flags
(reth `apc100` +1.5%, reth_gpu `apc030` +1.3%) have byte-identical
workloads. Guest benches: −0.3% … −1.4%.
- **No extra work — less, in fact.** In the reth compile sweep `apc ∈
{0,3,10,30,100}` (cell PGO), `main` rebuilds all 11110 candidates **4×**
(~28 min, 0 cache hits); this branch builds them **once**, then logs
`cache hit: generate/<h>` for apc 10/30/100 and `cache hit: setup/<h>`
on every `prove-stark`. ~20 min of redundant rebuilding eliminated, as
intended.
- The `cargo bench` `optimize-keccak` figure (24.1s vs 20.1s) is noise:
it optimizes a fixed pre-built APC untouched by this PR, and `main`
itself ranges 17.7–24.1s across runs.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #3763 (split GenerateConfig/SelectConfig + refactored staged caching)
landed on powdr main as 77712e655. Point the powdr git deps at that
commit and refresh Cargo.lock to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@georgwiese

Copy link
Copy Markdown
Collaborator Author

I updated the powdr hash to point to the latest main commit, which contains powdr-labs/powdr#3763.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant