Proper elastic scaling pipeline for v3 candidates

## Problem

With v3 the block building pipeline is finally sound for normal async and sync backing. Elastic scaling is not yet — collators today can land a candidate in an earlier relay chain block than the slot budget intended for it.

The pipeline assumes `2s build + 2s validation + 2s network propagation & statement distribution = 6s total` per candidate. If a candidate built mid-slot is squeezed into the previous slot's RC block, that budget is violated: validators outside the EU cluster don't have time to receive it and statement-distribute before backing closes. This is the symptom in #12028 / #10921 — non-EU validators see fewer backable candidates than EU authors put in blocks.

The relay chain runtime doesn't enforce this today. `parse_ump_signals` / `check_core_index` only check that the candidate's `cq_offset` has a core assigned for the para — not that the candidate is not getting backed earlier than intended.

## Solution (v3 candidates)

Enforce a **minimum** claim-queue position on the relay chain (runtime + provisioner): a candidate may be backed at its declared position or later, never earlier.

Collator side:

- First 2s of the slot: `cq_offset = 1` (grandchild of scheduling parent).
- Rest of the slot: `cq_offset = 2` (grand-grandchild).


### Picture (3 cores/slot)

<img width="2500" height="1458" alt="Image" src="https://github.com/user-attachments/assets/249c2698-4e2c-42a3-bc69-1f71962cd530" />

A candidate lands in the RC block of slot X iff its 2s-build + 4s-tail ends before slot X starts. Under one block per 2s, only the first-of-slot candidate qualifies for offset 1; the other two cores roll into offset 2. Steady state: 3 candidates per RC block, full ES throughput preserved, fairness restored for distant validators.

### Cooperative for now

The minimum is enforced on chain, but the collator-side rule ("offset 1 for first 2s, offset 2 after") is cooperative. We do **not** try to enforce intra-slot timing on the relay chain — for now.

Reason: we plan to allocate **more cores to a para than it actually needs** and leave them idle most of the time, keeping them as spare capacity. After a resubmission the para can burn through the spares to catch back up, instead of pausing block production while the unincluded segment buffer drains. Hard timing enforcement on the relay chain would interact badly with that: a candidate using a "spare" core legitimately needs to look like it's coming in early, and a strict time check would reject it.

Misbehavior also has little upside: a candidate that lies and sets `cq_offset = 1` when it isn't actually ready will just miss the seal and be re-tried as offset 2 later.

We can revisit strict enforcement later if it ever becomes worthwhile.

## Impact on resubmissions

Open: does the resubmission reasoning in #11903 still hold under the new timings? Specifically:

- The naive-resubmission illustrations in https://github.com/paritytech/polkadot-sdk/issues/11903#issuecomment-4342730609
- The naive vs smart comment in https://github.com/paritytech/polkadot-sdk/issues/11903#issuecomment-4335756144

Those drawings assumed candidates could be backed early. With minimum-offset enforcement the situation should improve — the naive path may no longer waste a core in the same way. Needs the diagrams redrawn against `2s build + 4s tail` + the minimum-offset rule, then re-evaluate whether the smart-resubmission variant is still worth the extra complexity.

## Scope

- [ ] Runtime: enforce minimum claim-queue position in `check_descriptor_version_and_signals` (paras_inherent)
- [ ] Provisioner: filter candidates that would violate the minimum
- [ ] Collator (cumulus, `slot_based/block_builder_task.rs`): pick `cq_offset` per-core based on elapsed-into-slot time, instead of a single value per slot iteration
- [ ] Tests: unit + zombienet for the per-core offset choice and the runtime reject path
- [ ] Redraw #11903 illustrations with the new timing; re-evaluate naive vs smart resubmission
- [ ] PRDoc

## Notes

- v3 only. v1/v2 keep existing semantics: v1/v2 don't have a proper time model and trying to enforce one is breaking.
- Related: #8893 (claim queue offset jumps for on-demand / interlaced paras) 
- Related: #12028 (statement-distribution latency) — apart from improving distribution latency, we should also give enough time (we have it).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper elastic scaling pipeline for v3 candidates #12063

Problem

Solution (v3 candidates)

Picture (3 cores/slot)

Cooperative for now

Impact on resubmissions

Scope

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proper elastic scaling pipeline for v3 candidates #12063

Description

Problem

Solution (v3 candidates)

Picture (3 cores/slot)

Cooperative for now

Impact on resubmissions

Scope

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions