Skip to content

Fix stack overflow in initialize_dpe by using execute_serialized#3622

Draft
ArthurHeymans wants to merge 15 commits intochipsalliance:caliptra-2.0from
ArthurHeymans:fix-dpe-stack-overflow
Draft

Fix stack overflow in initialize_dpe by using execute_serialized#3622
ArthurHeymans wants to merge 15 commits intochipsalliance:caliptra-2.0from
ArthurHeymans:fix-dpe-stack-overflow

Conversation

@ArthurHeymans
Copy link
Copy Markdown
Contributor

Summary

Root Cause

The new DPE crate (rev a26db5b, hybrid feature) changed CommandExecution::execute() from a required trait method to a default method that always allocates [0u8; size_of::<Response>()] on the stack. With ml-dsa enabled, Response is ~25KB (driven by CertifyKeyMldsa87Resp containing a 22KB cert buffer + 2.5KB ML-DSA public key). Previously each command implemented execute() directly and only constructed its specific small response variant.

initialize_dpe only runs DeriveContextCmd (producing a ~44 byte DeriveContextResp), but was paying the full 25KB stack cost per call. Combined with:

  • STACK_SIZE reduced from 140KB to 136KB
  • State struct growth from MAX_HANDLES doubling (32 → 64)

This caused the stack to overflow, manifesting as RUNTIME_GLOBAL_EXCEPTION (0xE000B).

Testing

Both previously-failing CI tests now pass locally:

  • test_rtalias::test_fht_info (was MailboxCmdFailed(RUNTIME_GLOBAL_EXCEPTION))
  • test_rtalias::test_pcr_log (was failing to build runtime)

Follows the same pattern already used in stash_measurement.rs in PR #3611.

parvathib and others added 12 commits April 13, 2026 06:52
…ommands

Squashed backport of:
- [dpe] Enable hybrid DPE feature (chipsalliance#3243)
- [dpe] Add ML-DSA DPE Crypto trait (chipsalliance#3315)
- [dpe] Add support for 64 DPE contexts (chipsalliance#3246)
- [dpe] Add ML-DSA DPE command (chipsalliance#3326)
- [dpe] Test ML-DSA CertifyKey (chipsalliance#3357)
- [dpe] Test using external mu with ML-DSA profile (chipsalliance#3371)
- [dpe] reduce nesting of InvokeDpeCmd::execute (chipsalliance#3386)
- [dpe] Add optional ML-DSA response over DMA (chipsalliance#3391)
- [dpe] Add HW model support for large DPE responses (chipsalliance#3403)
- [dpe] Support both DPE profiles in more tests (chipsalliance#3407)
- [dpe] Add a test for worst case scenario certs and CSRs (chipsalliance#3417)
- [dpe] Fail early for response sizing (chipsalliance#3415)
- [dpe] Add ML-DSA support to CertifyKeyExtended (chipsalliance#3426)
- [dpe] Test ML-DSA profile with golang verification (chipsalliance#3454)
- [dpe] Add comment that initialization is profile agnostic (chipsalliance#3460)

DPE rev: a26db5b — last rev bumped by PRs in this list (chipsalliance#3371).
Not updated to caliptra-dpe main (cfc9a71) because DPE commit
337f7e4 (Update CFI chipsalliance#531) renamed caliptra-cfi packages from
caliptra-cfi-*-git to caliptra-cfi-*, which conflicts with
caliptra-2.0 CFI. Updating 2.0 CFI to match would be a ROM change.

caliptra-2.0 adaptations:
- DpeMldsaCrypto uses MldsaReg (2.0 has dedicated ML-DSA HW register;
  main uses AbrReg shared Adams Bridge register for ML-DSA and ML-KEM)
- SignData::Mu returns CryptoError::NotImplemented (no external-mu on 2.0)
- Removed ML-KEM commands (no ML-KEM HW on 2.0)
- Removed main-only commands (ocp_lock, shake256)
- Flat PersistentData paths (2.0 uses single struct, not rom/fw split)
- Kept 2.0 caliptra-cfi-*-git package names to avoid ROM recompilation
- STACK_SIZE=136KB (main uses 154KB; 2.0 has ~30KB more PersistentData)
- RUNTIME_SIZE=176KB (main uses 210KB; 2.0 has fewer runtime features)

Known issue: firmware stack overflow (~16KB over 136KB budget) during
initialize_dpe. The hybrid DPE Response enum is ~25KB (MAX_CERT_SIZE
= 22KB) and is allocated on the stack by CommandExecution::execute().
Needs stack space or DCCM space resolution.
The DPE CommandExecution::execute() default implementation allocates
[0u8; size_of::<Response>()] on the stack. With the ml-dsa feature
enabled, size_of::<Response>() is ~25KB (driven by CertifyKeyMldsa87Resp
containing a 22KB cert buffer and 2.5KB public key).

initialize_dpe only runs DeriveContextCmd which produces a ~44 byte
DeriveContextResp, but the default execute() always allocates the full
25KB buffer regardless of command type. Combined with the reduced
STACK_SIZE (136KB vs 140KB) and larger State on stack (MAX_HANDLES
doubled from 32 to 64), this caused a stack overflow (~16KB over budget)
manifesting as RUNTIME_GLOBAL_EXCEPTION.

Switch to execute_serialized() with a DeriveContextResp-sized buffer,
matching the pattern already used in stash_measurement.rs.
Three unused imports introduced in the parent commit caused CI compile
failures with -Dwarnings:
- runtime/src/sign_with_exported_ecdsa.rs: cfi_assert, cfi_launder, cfi_check
- runtime/src/lib.rs: MailboxReqHeader

Verified locally with the same cargo --config that CI uses:
  cargo --config 'target."cfg(all())".rustflags = ["-Dwarnings"]' \
    build ... -p caliptra-runtime --bin caliptra-runtime
  cargo --config 'target."cfg(all())".rustflags = ["-Dwarnings"]' \
    clippy --locked --all-targets -- -D warnings
The DPE crate writes a full `dpe::Response` typed view (~25KB driven by
`CertifyKeyMldsa87Resp`) into the staging buffer regardless of the
actual cert size. Previously, `execute_command` allocated a
`MAX_RESP_SIZE`-sized (\~25KB) buffer once at the dispatcher level for
all commands. Combined with deep call chains (e.g. `AttestedCsrCmd`
already burns ~16KB in its own handler for `scratch` + `env_csr_eat`),
this overflowed the runtime's tight 136KB stack budget on
`GET_ATTESTED_*_CSR`.

Route the ~25KB-needing commands (`CERTIFY_KEY_EXTENDED_*` and
`INVOKE_DPE_*`) through dedicated functions so that buffer only exists
on the stack for those call paths. Other commands now use a much
smaller staging buffer sized for `AttestedCsrResp` (~12.8KB), which is
the largest non-DPE response.

A const assertion enumerates all variants routed through the small
buffer so adding a new larger response forces an explicit decision.
CI runs with -Dwarnings, so the leftover MAX_RESP_SIZE import and the
redundant caliptra_common::mailbox_api::* import in the COMMON_RESP_BUF_SIZE
sanity-check block both cause build failures. Drop them.
…dispatch

`execute_command` previously allocated the ~12.8KB staging response
buffer up front for every mailbox command. Because it was inlined into
`handle_mailbox_commands`, that buffer stayed live on the stack for
commands that do not need it, including `GET_ATTESTED_MLDSA87_CSR`
whose MLDSA keypair/PCT/sign call chain was overflowing the 136KB
runtime stack.

To keep the staging buffer off the stack during heavy work:

* Split `execute_command` into a small dispatcher plus
  `execute_command_with_common_resp`, marked `#[inline(never)]`, so the
  common buffer only exists on the stack for commands that actually use
  it. `handle_command` and `execute_command` are pinned out-of-line to
  prevent re-inlining.

* Route `GET_ATTESTED_{ECC384,MLDSA87}_CSR` through dedicated dispatch
  paths, alongside the existing `CERTIFY_KEY_EXTENDED_*` and
  `INVOKE_DPE_*` ones. Because these commands no longer flow through
  the common buffer, `AttestedCsrResp` can be dropped from
  `COMMON_RESP_BUF_SIZE`, shrinking it from ~12.8KB to ~9.2KB
  (`VarSizeDataResp`).

* Split `AttestedEccCsrCmd::execute` and `AttestedMldsaCsrCmd::execute`
  into a `prepare` phase (CSR EAT claims + RT alias public key +
  subject key identifier, including the expensive MLDSA keypair/PCT)
  and a `sign_and_finalize` phase (allocate response buffer, compute
  COSE Sign1 signature, write to mailbox). The mailbox response buffer
  is now only allocated after the heavy cryptographic work completes,
  so the 9.2KB buffer and the MLDSA `sign_var` frame never coexist.

* `generate_attested_{ecc,mldsa}_csr` now take the pre-computed RT
  alias public key as a parameter, eliminating a redundant second
  MLDSA keypair generation that used to happen during the signing
  phase with the response buffer live.

The previously failing `test_get_attested_rt_alias_mldsa_csr` now
passes, and the other `test_attested_csr` tests continue to pass.
Backport test updates from main (commit c3fced9) that were missed when
the DPE context count was increased from 32 to 64 on caliptra-2.0.

These tests were previously masked by earlier test failures (e.g. stack
overflow in test_get_attested_rt_alias_mldsa_csr) that prevented the
reallocate tests from running to their assertions.
The CertifyKeyExtendedResp type has a huge fixed-size payload buffer
(~25KB) that the mailbox truncates to the actual response size. The
tests were calling read_from_bytes which requires an exact size match
and failed with SizeError against the truncated slice.

Copy the truncated mailbox data into a zeroed fixed-size response
buffer, then use try_read_from_prefix on the inner CertifyKeyP384Resp
which has its own fixed-size cert buffer larger than any real DPE
response.

Also update test_invoke_dpe_get_profile_cmd to expect max_tci_nodes=64
after the DPE 64-context change (backport of main commit c3fced9).
DPE response types (CertifyKeyP384Resp, DeriveContextResp,
GetCertificateChainResp, SignP384Resp, etc.) contain large fixed-size
payload buffers (cert[MAX_CERT_SIZE], new_certificate[MAX_CERT_SIZE],
certificate_chain[MAX_CHUNK_SIZE]). The runtime serialises these with
as_bytes_partial, returning a buffer that is shorter than the full
struct size.

The previous parse_dpe_response used try_read_from_bytes which requires
an exact size match and failed with SizeError. Copy the truncated
payload into a zeroed fixed-size buffer before decoding, matching the
pattern used by mbx_send_and_check_resp_hdr.
The Caliptra runtime serialises DPE responses via as_bytes_partial,
returning only the actual used bytes of variable-size payload fields
(X.509/CSR certificates, certificate chains, signatures). The Go
verification client decodes into fixed-size structs whose largest
variant (ML-DSA CertifyKey) expects a 17408-byte certificate trailer.

Without padding, binary.Read into the fixed-size response struct fails
with 'unexpected EOF' when the actual cert is smaller than the
Certificate array in the Go struct. Pad the response in SendCmd with
zeros up to 32 KiB so binary.Read can populate the fixed fields while
CertificateSize continues to bound the real cert payload.

Also bump GetMaxTciNodes from 32 to 64 to match the DPE context count
backport (main commit c3fced9).
Match main branch timeout with 64 contexts.
Signed-off-by: Arthur Heymans <arthur.heymans@9elements.com>
Update hardcoded MAX_HANDLES=64 references to use dpe::MAX_HANDLES
symbolically (Rust) or the new value 32 (Go) after enabling the
arbitrary_max_handles feature with ARBITRARY_MAX_HANDLES=32.

- test_invoke_dpe: assert max_tci_nodes against dpe::MAX_HANDLES
- test_reallocate_dpe_context_limits: compute PL1 limit from
  dpe::MAX_HANDLES instead of hardcoded 64
- dpe_verification/transport.go: GetMaxTciNodes returns 32

Signed-off-by: Arthur Heymans <arthur.heymans@9elements.com>
PL0_DPE_ACTIVE_CONTEXT_DEFAULT_THRESHOLD and
PL1_DPE_ACTIVE_CONTEXT_DEFAULT_THRESHOLD were hardcoded to 32 each
(total 64), matching the old MAX_HANDLES=64. With arbitrary_max_handles
set to 32, the thresholds exceeded the actual handle count, causing
test failures:

- test_pl1_derive_context_dpe_context_thresholds
- test_pl1_init_ctx_dpe_context_thresholds
- test_pl0_pl1_reallocation_range
- test_dpe_validation_used_context_threshold_exceeded

Compute both thresholds as MAX_HANDLES / 2 so they stay consistent
with whatever value arbitrary_max_handles provides.

Signed-off-by: Arthur Heymans <arthur.heymans@9elements.com>
The test hardcoded pl0_limit=56 to trigger the PL1-less-than-used
error. With MAX_HANDLES=32 (total=32), 56 exceeds the total and hits
RUNTIME_REALLOCATE_DPE_CONTEXTS_PL0_GREATER_THAN_MAX instead.

Use MAX_HANDLES - 4 as the pl0_limit so PL1's share (4) is less than
the 12 PL1 contexts created by the test, correctly triggering
RUNTIME_REALLOCATE_DPE_CONTEXTS_PL1_LESS_THAN_USED.

Signed-off-by: Arthur Heymans <arthur.heymans@9elements.com>
Comment thread runtime/src/invoke_dpe.rs
return Ok(len);
}

// Populate the checksum so the full response can be checked at the destination
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove external response support for 2.0. The mailboxes are always big enough to support the full response and this just adds unnecessary complexity.

return Ok(len);
}

// Populate the checksum so the full response can be checked at the destination
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove it from invoke_dpe.rs we should probably remove it from here as well

@ArthurHeymans ArthurHeymans marked this pull request as draft April 23, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants