Fix stack overflow in initialize_dpe by using execute_serialized#3622
Draft
ArthurHeymans wants to merge 15 commits intochipsalliance:caliptra-2.0from
Draft
Fix stack overflow in initialize_dpe by using execute_serialized#3622ArthurHeymans wants to merge 15 commits intochipsalliance:caliptra-2.0from
ArthurHeymans wants to merge 15 commits intochipsalliance:caliptra-2.0from
Conversation
…ommands Squashed backport of: - [dpe] Enable hybrid DPE feature (chipsalliance#3243) - [dpe] Add ML-DSA DPE Crypto trait (chipsalliance#3315) - [dpe] Add support for 64 DPE contexts (chipsalliance#3246) - [dpe] Add ML-DSA DPE command (chipsalliance#3326) - [dpe] Test ML-DSA CertifyKey (chipsalliance#3357) - [dpe] Test using external mu with ML-DSA profile (chipsalliance#3371) - [dpe] reduce nesting of InvokeDpeCmd::execute (chipsalliance#3386) - [dpe] Add optional ML-DSA response over DMA (chipsalliance#3391) - [dpe] Add HW model support for large DPE responses (chipsalliance#3403) - [dpe] Support both DPE profiles in more tests (chipsalliance#3407) - [dpe] Add a test for worst case scenario certs and CSRs (chipsalliance#3417) - [dpe] Fail early for response sizing (chipsalliance#3415) - [dpe] Add ML-DSA support to CertifyKeyExtended (chipsalliance#3426) - [dpe] Test ML-DSA profile with golang verification (chipsalliance#3454) - [dpe] Add comment that initialization is profile agnostic (chipsalliance#3460) DPE rev: a26db5b — last rev bumped by PRs in this list (chipsalliance#3371). Not updated to caliptra-dpe main (cfc9a71) because DPE commit 337f7e4 (Update CFI chipsalliance#531) renamed caliptra-cfi packages from caliptra-cfi-*-git to caliptra-cfi-*, which conflicts with caliptra-2.0 CFI. Updating 2.0 CFI to match would be a ROM change. caliptra-2.0 adaptations: - DpeMldsaCrypto uses MldsaReg (2.0 has dedicated ML-DSA HW register; main uses AbrReg shared Adams Bridge register for ML-DSA and ML-KEM) - SignData::Mu returns CryptoError::NotImplemented (no external-mu on 2.0) - Removed ML-KEM commands (no ML-KEM HW on 2.0) - Removed main-only commands (ocp_lock, shake256) - Flat PersistentData paths (2.0 uses single struct, not rom/fw split) - Kept 2.0 caliptra-cfi-*-git package names to avoid ROM recompilation - STACK_SIZE=136KB (main uses 154KB; 2.0 has ~30KB more PersistentData) - RUNTIME_SIZE=176KB (main uses 210KB; 2.0 has fewer runtime features) Known issue: firmware stack overflow (~16KB over 136KB budget) during initialize_dpe. The hybrid DPE Response enum is ~25KB (MAX_CERT_SIZE = 22KB) and is allocated on the stack by CommandExecution::execute(). Needs stack space or DCCM space resolution.
The DPE CommandExecution::execute() default implementation allocates [0u8; size_of::<Response>()] on the stack. With the ml-dsa feature enabled, size_of::<Response>() is ~25KB (driven by CertifyKeyMldsa87Resp containing a 22KB cert buffer and 2.5KB public key). initialize_dpe only runs DeriveContextCmd which produces a ~44 byte DeriveContextResp, but the default execute() always allocates the full 25KB buffer regardless of command type. Combined with the reduced STACK_SIZE (136KB vs 140KB) and larger State on stack (MAX_HANDLES doubled from 32 to 64), this caused a stack overflow (~16KB over budget) manifesting as RUNTIME_GLOBAL_EXCEPTION. Switch to execute_serialized() with a DeriveContextResp-sized buffer, matching the pattern already used in stash_measurement.rs.
Three unused imports introduced in the parent commit caused CI compile
failures with -Dwarnings:
- runtime/src/sign_with_exported_ecdsa.rs: cfi_assert, cfi_launder, cfi_check
- runtime/src/lib.rs: MailboxReqHeader
Verified locally with the same cargo --config that CI uses:
cargo --config 'target."cfg(all())".rustflags = ["-Dwarnings"]' \
build ... -p caliptra-runtime --bin caliptra-runtime
cargo --config 'target."cfg(all())".rustflags = ["-Dwarnings"]' \
clippy --locked --all-targets -- -D warnings
The DPE crate writes a full `dpe::Response` typed view (~25KB driven by `CertifyKeyMldsa87Resp`) into the staging buffer regardless of the actual cert size. Previously, `execute_command` allocated a `MAX_RESP_SIZE`-sized (\~25KB) buffer once at the dispatcher level for all commands. Combined with deep call chains (e.g. `AttestedCsrCmd` already burns ~16KB in its own handler for `scratch` + `env_csr_eat`), this overflowed the runtime's tight 136KB stack budget on `GET_ATTESTED_*_CSR`. Route the ~25KB-needing commands (`CERTIFY_KEY_EXTENDED_*` and `INVOKE_DPE_*`) through dedicated functions so that buffer only exists on the stack for those call paths. Other commands now use a much smaller staging buffer sized for `AttestedCsrResp` (~12.8KB), which is the largest non-DPE response. A const assertion enumerates all variants routed through the small buffer so adding a new larger response forces an explicit decision.
CI runs with -Dwarnings, so the leftover MAX_RESP_SIZE import and the redundant caliptra_common::mailbox_api::* import in the COMMON_RESP_BUF_SIZE sanity-check block both cause build failures. Drop them.
…dispatch
`execute_command` previously allocated the ~12.8KB staging response
buffer up front for every mailbox command. Because it was inlined into
`handle_mailbox_commands`, that buffer stayed live on the stack for
commands that do not need it, including `GET_ATTESTED_MLDSA87_CSR`
whose MLDSA keypair/PCT/sign call chain was overflowing the 136KB
runtime stack.
To keep the staging buffer off the stack during heavy work:
* Split `execute_command` into a small dispatcher plus
`execute_command_with_common_resp`, marked `#[inline(never)]`, so the
common buffer only exists on the stack for commands that actually use
it. `handle_command` and `execute_command` are pinned out-of-line to
prevent re-inlining.
* Route `GET_ATTESTED_{ECC384,MLDSA87}_CSR` through dedicated dispatch
paths, alongside the existing `CERTIFY_KEY_EXTENDED_*` and
`INVOKE_DPE_*` ones. Because these commands no longer flow through
the common buffer, `AttestedCsrResp` can be dropped from
`COMMON_RESP_BUF_SIZE`, shrinking it from ~12.8KB to ~9.2KB
(`VarSizeDataResp`).
* Split `AttestedEccCsrCmd::execute` and `AttestedMldsaCsrCmd::execute`
into a `prepare` phase (CSR EAT claims + RT alias public key +
subject key identifier, including the expensive MLDSA keypair/PCT)
and a `sign_and_finalize` phase (allocate response buffer, compute
COSE Sign1 signature, write to mailbox). The mailbox response buffer
is now only allocated after the heavy cryptographic work completes,
so the 9.2KB buffer and the MLDSA `sign_var` frame never coexist.
* `generate_attested_{ecc,mldsa}_csr` now take the pre-computed RT
alias public key as a parameter, eliminating a redundant second
MLDSA keypair generation that used to happen during the signing
phase with the response buffer live.
The previously failing `test_get_attested_rt_alias_mldsa_csr` now
passes, and the other `test_attested_csr` tests continue to pass.
Backport test updates from main (commit c3fced9) that were missed when the DPE context count was increased from 32 to 64 on caliptra-2.0. These tests were previously masked by earlier test failures (e.g. stack overflow in test_get_attested_rt_alias_mldsa_csr) that prevented the reallocate tests from running to their assertions.
The CertifyKeyExtendedResp type has a huge fixed-size payload buffer (~25KB) that the mailbox truncates to the actual response size. The tests were calling read_from_bytes which requires an exact size match and failed with SizeError against the truncated slice. Copy the truncated mailbox data into a zeroed fixed-size response buffer, then use try_read_from_prefix on the inner CertifyKeyP384Resp which has its own fixed-size cert buffer larger than any real DPE response. Also update test_invoke_dpe_get_profile_cmd to expect max_tci_nodes=64 after the DPE 64-context change (backport of main commit c3fced9).
DPE response types (CertifyKeyP384Resp, DeriveContextResp, GetCertificateChainResp, SignP384Resp, etc.) contain large fixed-size payload buffers (cert[MAX_CERT_SIZE], new_certificate[MAX_CERT_SIZE], certificate_chain[MAX_CHUNK_SIZE]). The runtime serialises these with as_bytes_partial, returning a buffer that is shorter than the full struct size. The previous parse_dpe_response used try_read_from_bytes which requires an exact size match and failed with SizeError. Copy the truncated payload into a zeroed fixed-size buffer before decoding, matching the pattern used by mbx_send_and_check_resp_hdr.
The Caliptra runtime serialises DPE responses via as_bytes_partial, returning only the actual used bytes of variable-size payload fields (X.509/CSR certificates, certificate chains, signatures). The Go verification client decodes into fixed-size structs whose largest variant (ML-DSA CertifyKey) expects a 17408-byte certificate trailer. Without padding, binary.Read into the fixed-size response struct fails with 'unexpected EOF' when the actual cert is smaller than the Certificate array in the Go struct. Pad the response in SendCmd with zeros up to 32 KiB so binary.Read can populate the fixed fields while CertificateSize continues to bound the real cert payload. Also bump GetMaxTciNodes from 32 to 64 to match the DPE context count backport (main commit c3fced9).
Match main branch timeout with 64 contexts.
Signed-off-by: Arthur Heymans <arthur.heymans@9elements.com>
Update hardcoded MAX_HANDLES=64 references to use dpe::MAX_HANDLES symbolically (Rust) or the new value 32 (Go) after enabling the arbitrary_max_handles feature with ARBITRARY_MAX_HANDLES=32. - test_invoke_dpe: assert max_tci_nodes against dpe::MAX_HANDLES - test_reallocate_dpe_context_limits: compute PL1 limit from dpe::MAX_HANDLES instead of hardcoded 64 - dpe_verification/transport.go: GetMaxTciNodes returns 32 Signed-off-by: Arthur Heymans <arthur.heymans@9elements.com>
PL0_DPE_ACTIVE_CONTEXT_DEFAULT_THRESHOLD and PL1_DPE_ACTIVE_CONTEXT_DEFAULT_THRESHOLD were hardcoded to 32 each (total 64), matching the old MAX_HANDLES=64. With arbitrary_max_handles set to 32, the thresholds exceeded the actual handle count, causing test failures: - test_pl1_derive_context_dpe_context_thresholds - test_pl1_init_ctx_dpe_context_thresholds - test_pl0_pl1_reallocation_range - test_dpe_validation_used_context_threshold_exceeded Compute both thresholds as MAX_HANDLES / 2 so they stay consistent with whatever value arbitrary_max_handles provides. Signed-off-by: Arthur Heymans <arthur.heymans@9elements.com>
The test hardcoded pl0_limit=56 to trigger the PL1-less-than-used error. With MAX_HANDLES=32 (total=32), 56 exceeds the total and hits RUNTIME_REALLOCATE_DPE_CONTEXTS_PL0_GREATER_THAN_MAX instead. Use MAX_HANDLES - 4 as the pl0_limit so PL1's share (4) is less than the 12 PL1 contexts created by the test, correctly triggering RUNTIME_REALLOCATE_DPE_CONTEXTS_PL1_LESS_THAN_USED. Signed-off-by: Arthur Heymans <arthur.heymans@9elements.com>
zhalvorsen
approved these changes
Apr 23, 2026
| return Ok(len); | ||
| } | ||
|
|
||
| // Populate the checksum so the full response can be checked at the destination |
Collaborator
There was a problem hiding this comment.
I think we should remove external response support for 2.0. The mailboxes are always big enough to support the full response and this just adds unnecessary complexity.
| return Ok(len); | ||
| } | ||
|
|
||
| // Populate the checksum so the full response can be checked at the destination |
Collaborator
There was a problem hiding this comment.
If we remove it from invoke_dpe.rs we should probably remove it from here as well
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
initialize_dpeintroduced by PR DPE ML-DSA: hybrid feature, crypto trait, 64 contexts, dual profile cmds #3611DeriveContextCmd.execute()call sites toexecute_serialized()with a smallDeriveContextResp-sized buffer (~44 bytes) instead of the defaultexecute()which allocatessize_of::<Response>()(~25KB) on the stackRoot Cause
The new DPE crate (rev
a26db5b,hybridfeature) changedCommandExecution::execute()from a required trait method to a default method that always allocates[0u8; size_of::<Response>()]on the stack. Withml-dsaenabled,Responseis ~25KB (driven byCertifyKeyMldsa87Respcontaining a 22KB cert buffer + 2.5KB ML-DSA public key). Previously each command implementedexecute()directly and only constructed its specific small response variant.initialize_dpeonly runsDeriveContextCmd(producing a ~44 byteDeriveContextResp), but was paying the full 25KB stack cost per call. Combined with:STACK_SIZEreduced from 140KB to 136KBStatestruct growth fromMAX_HANDLESdoubling (32 → 64)This caused the stack to overflow, manifesting as
RUNTIME_GLOBAL_EXCEPTION(0xE000B).Testing
Both previously-failing CI tests now pass locally:
test_rtalias::test_fht_info(wasMailboxCmdFailed(RUNTIME_GLOBAL_EXCEPTION))test_rtalias::test_pcr_log(was failing to build runtime)Follows the same pattern already used in
stash_measurement.rsin PR #3611.