Fix stack overflow in initialize_dpe by using execute_serialized by ArthurHeymans · Pull Request #3622 · chipsalliance/caliptra-sw

ArthurHeymans · 2026-04-15T16:15:57Z

Summary

Fix stack overflow (~16KB over 136KB budget) during initialize_dpe introduced by PR DPE ML-DSA: hybrid feature, crypto trait, 64 contexts, dual profile cmds #3611
Switch two DeriveContextCmd.execute() call sites to execute_serialized() with a small DeriveContextResp-sized buffer (~44 bytes) instead of the default execute() which allocates size_of::<Response>() (~25KB) on the stack

Root Cause

The new DPE crate (rev a26db5b, hybrid feature) changed CommandExecution::execute() from a required trait method to a default method that always allocates [0u8; size_of::<Response>()] on the stack. With ml-dsa enabled, Response is ~25KB (driven by CertifyKeyMldsa87Resp containing a 22KB cert buffer + 2.5KB ML-DSA public key). Previously each command implemented execute() directly and only constructed its specific small response variant.

initialize_dpe only runs DeriveContextCmd (producing a ~44 byte DeriveContextResp), but was paying the full 25KB stack cost per call. Combined with:

STACK_SIZE reduced from 140KB to 136KB
State struct growth from MAX_HANDLES doubling (32 → 64)

This caused the stack to overflow, manifesting as RUNTIME_GLOBAL_EXCEPTION (0xE000B).

Testing

Both previously-failing CI tests now pass locally:

test_rtalias::test_fht_info (was MailboxCmdFailed(RUNTIME_GLOBAL_EXCEPTION))
test_rtalias::test_pcr_log (was failing to build runtime)

Follows the same pattern already used in stash_measurement.rs in PR #3611.

…ommands Squashed backport of: - [dpe] Enable hybrid DPE feature (chipsalliance#3243) - [dpe] Add ML-DSA DPE Crypto trait (chipsalliance#3315) - [dpe] Add support for 64 DPE contexts (chipsalliance#3246) - [dpe] Add ML-DSA DPE command (chipsalliance#3326) - [dpe] Test ML-DSA CertifyKey (chipsalliance#3357) - [dpe] Test using external mu with ML-DSA profile (chipsalliance#3371) - [dpe] reduce nesting of InvokeDpeCmd::execute (chipsalliance#3386) - [dpe] Add optional ML-DSA response over DMA (chipsalliance#3391) - [dpe] Add HW model support for large DPE responses (chipsalliance#3403) - [dpe] Support both DPE profiles in more tests (chipsalliance#3407) - [dpe] Add a test for worst case scenario certs and CSRs (chipsalliance#3417) - [dpe] Fail early for response sizing (chipsalliance#3415) - [dpe] Add ML-DSA support to CertifyKeyExtended (chipsalliance#3426) - [dpe] Test ML-DSA profile with golang verification (chipsalliance#3454) - [dpe] Add comment that initialization is profile agnostic (chipsalliance#3460) DPE rev: a26db5b — last rev bumped by PRs in this list (chipsalliance#3371). Not updated to caliptra-dpe main (cfc9a71) because DPE commit 337f7e4 (Update CFI chipsalliance#531) renamed caliptra-cfi packages from caliptra-cfi-*-git to caliptra-cfi-*, which conflicts with caliptra-2.0 CFI. Updating 2.0 CFI to match would be a ROM change. caliptra-2.0 adaptations: - DpeMldsaCrypto uses MldsaReg (2.0 has dedicated ML-DSA HW register; main uses AbrReg shared Adams Bridge register for ML-DSA and ML-KEM) - SignData::Mu returns CryptoError::NotImplemented (no external-mu on 2.0) - Removed ML-KEM commands (no ML-KEM HW on 2.0) - Removed main-only commands (ocp_lock, shake256) - Flat PersistentData paths (2.0 uses single struct, not rom/fw split) - Kept 2.0 caliptra-cfi-*-git package names to avoid ROM recompilation - STACK_SIZE=136KB (main uses 154KB; 2.0 has ~30KB more PersistentData) - RUNTIME_SIZE=176KB (main uses 210KB; 2.0 has fewer runtime features) Known issue: firmware stack overflow (~16KB over 136KB budget) during initialize_dpe. The hybrid DPE Response enum is ~25KB (MAX_CERT_SIZE = 22KB) and is allocated on the stack by CommandExecution::execute(). Needs stack space or DCCM space resolution.

The DPE CommandExecution::execute() default implementation allocates [0u8; size_of::<Response>()] on the stack. With the ml-dsa feature enabled, size_of::<Response>() is ~25KB (driven by CertifyKeyMldsa87Resp containing a 22KB cert buffer and 2.5KB public key). initialize_dpe only runs DeriveContextCmd which produces a ~44 byte DeriveContextResp, but the default execute() always allocates the full 25KB buffer regardless of command type. Combined with the reduced STACK_SIZE (136KB vs 140KB) and larger State on stack (MAX_HANDLES doubled from 32 to 64), this caused a stack overflow (~16KB over budget) manifesting as RUNTIME_GLOBAL_EXCEPTION. Switch to execute_serialized() with a DeriveContextResp-sized buffer, matching the pattern already used in stash_measurement.rs.

Three unused imports introduced in the parent commit caused CI compile failures with -Dwarnings: - runtime/src/sign_with_exported_ecdsa.rs: cfi_assert, cfi_launder, cfi_check - runtime/src/lib.rs: MailboxReqHeader Verified locally with the same cargo --config that CI uses: cargo --config 'target."cfg(all())".rustflags = ["-Dwarnings"]' \ build ... -p caliptra-runtime --bin caliptra-runtime cargo --config 'target."cfg(all())".rustflags = ["-Dwarnings"]' \ clippy --locked --all-targets -- -D warnings

The DPE crate writes a full `dpe::Response` typed view (~25KB driven by `CertifyKeyMldsa87Resp`) into the staging buffer regardless of the actual cert size. Previously, `execute_command` allocated a `MAX_RESP_SIZE`-sized (\~25KB) buffer once at the dispatcher level for all commands. Combined with deep call chains (e.g. `AttestedCsrCmd` already burns ~16KB in its own handler for `scratch` + `env_csr_eat`), this overflowed the runtime's tight 136KB stack budget on `GET_ATTESTED_*_CSR`. Route the ~25KB-needing commands (`CERTIFY_KEY_EXTENDED_*` and `INVOKE_DPE_*`) through dedicated functions so that buffer only exists on the stack for those call paths. Other commands now use a much smaller staging buffer sized for `AttestedCsrResp` (~12.8KB), which is the largest non-DPE response. A const assertion enumerates all variants routed through the small buffer so adding a new larger response forces an explicit decision.

CI runs with -Dwarnings, so the leftover MAX_RESP_SIZE import and the redundant caliptra_common::mailbox_api::* import in the COMMON_RESP_BUF_SIZE sanity-check block both cause build failures. Drop them.

…dispatch `execute_command` previously allocated the ~12.8KB staging response buffer up front for every mailbox command. Because it was inlined into `handle_mailbox_commands`, that buffer stayed live on the stack for commands that do not need it, including `GET_ATTESTED_MLDSA87_CSR` whose MLDSA keypair/PCT/sign call chain was overflowing the 136KB runtime stack. To keep the staging buffer off the stack during heavy work: * Split `execute_command` into a small dispatcher plus `execute_command_with_common_resp`, marked `#[inline(never)]`, so the common buffer only exists on the stack for commands that actually use it. `handle_command` and `execute_command` are pinned out-of-line to prevent re-inlining. * Route `GET_ATTESTED_{ECC384,MLDSA87}_CSR` through dedicated dispatch paths, alongside the existing `CERTIFY_KEY_EXTENDED_*` and `INVOKE_DPE_*` ones. Because these commands no longer flow through the common buffer, `AttestedCsrResp` can be dropped from `COMMON_RESP_BUF_SIZE`, shrinking it from ~12.8KB to ~9.2KB (`VarSizeDataResp`). * Split `AttestedEccCsrCmd::execute` and `AttestedMldsaCsrCmd::execute` into a `prepare` phase (CSR EAT claims + RT alias public key + subject key identifier, including the expensive MLDSA keypair/PCT) and a `sign_and_finalize` phase (allocate response buffer, compute COSE Sign1 signature, write to mailbox). The mailbox response buffer is now only allocated after the heavy cryptographic work completes, so the 9.2KB buffer and the MLDSA `sign_var` frame never coexist. * `generate_attested_{ecc,mldsa}_csr` now take the pre-computed RT alias public key as a parameter, eliminating a redundant second MLDSA keypair generation that used to happen during the signing phase with the response buffer live. The previously failing `test_get_attested_rt_alias_mldsa_csr` now passes, and the other `test_attested_csr` tests continue to pass.

Backport test updates from main (commit c3fced9) that were missed when the DPE context count was increased from 32 to 64 on caliptra-2.0. These tests were previously masked by earlier test failures (e.g. stack overflow in test_get_attested_rt_alias_mldsa_csr) that prevented the reallocate tests from running to their assertions.

The CertifyKeyExtendedResp type has a huge fixed-size payload buffer (~25KB) that the mailbox truncates to the actual response size. The tests were calling read_from_bytes which requires an exact size match and failed with SizeError against the truncated slice. Copy the truncated mailbox data into a zeroed fixed-size response buffer, then use try_read_from_prefix on the inner CertifyKeyP384Resp which has its own fixed-size cert buffer larger than any real DPE response. Also update test_invoke_dpe_get_profile_cmd to expect max_tci_nodes=64 after the DPE 64-context change (backport of main commit c3fced9).

DPE response types (CertifyKeyP384Resp, DeriveContextResp, GetCertificateChainResp, SignP384Resp, etc.) contain large fixed-size payload buffers (cert[MAX_CERT_SIZE], new_certificate[MAX_CERT_SIZE], certificate_chain[MAX_CHUNK_SIZE]). The runtime serialises these with as_bytes_partial, returning a buffer that is shorter than the full struct size. The previous parse_dpe_response used try_read_from_bytes which requires an exact size match and failed with SizeError. Copy the truncated payload into a zeroed fixed-size buffer before decoding, matching the pattern used by mbx_send_and_check_resp_hdr.

The Caliptra runtime serialises DPE responses via as_bytes_partial, returning only the actual used bytes of variable-size payload fields (X.509/CSR certificates, certificate chains, signatures). The Go verification client decodes into fixed-size structs whose largest variant (ML-DSA CertifyKey) expects a 17408-byte certificate trailer. Without padding, binary.Read into the fixed-size response struct fails with 'unexpected EOF' when the actual cert is smaller than the Certificate array in the Go struct. Pad the response in SendCmd with zeros up to 32 KiB so binary.Read can populate the fixed fields while CertificateSize continues to bound the real cert payload. Also bump GetMaxTciNodes from 32 to 64 to match the DPE context count backport (main commit c3fced9).

Match main branch timeout with 64 contexts.

Signed-off-by: Arthur Heymans <arthur.heymans@9elements.com>

Update hardcoded MAX_HANDLES=64 references to use dpe::MAX_HANDLES symbolically (Rust) or the new value 32 (Go) after enabling the arbitrary_max_handles feature with ARBITRARY_MAX_HANDLES=32. - test_invoke_dpe: assert max_tci_nodes against dpe::MAX_HANDLES - test_reallocate_dpe_context_limits: compute PL1 limit from dpe::MAX_HANDLES instead of hardcoded 64 - dpe_verification/transport.go: GetMaxTciNodes returns 32 Signed-off-by: Arthur Heymans <arthur.heymans@9elements.com>

PL0_DPE_ACTIVE_CONTEXT_DEFAULT_THRESHOLD and PL1_DPE_ACTIVE_CONTEXT_DEFAULT_THRESHOLD were hardcoded to 32 each (total 64), matching the old MAX_HANDLES=64. With arbitrary_max_handles set to 32, the thresholds exceeded the actual handle count, causing test failures: - test_pl1_derive_context_dpe_context_thresholds - test_pl1_init_ctx_dpe_context_thresholds - test_pl0_pl1_reallocation_range - test_dpe_validation_used_context_threshold_exceeded Compute both thresholds as MAX_HANDLES / 2 so they stay consistent with whatever value arbitrary_max_handles provides. Signed-off-by: Arthur Heymans <arthur.heymans@9elements.com>

The test hardcoded pl0_limit=56 to trigger the PL1-less-than-used error. With MAX_HANDLES=32 (total=32), 56 exceeds the total and hits RUNTIME_REALLOCATE_DPE_CONTEXTS_PL0_GREATER_THAN_MAX instead. Use MAX_HANDLES - 4 as the pl0_limit so PL1's share (4) is less than the 12 PL1 contexts created by the test, correctly triggering RUNTIME_REALLOCATE_DPE_CONTEXTS_PL1_LESS_THAN_USED. Signed-off-by: Arthur Heymans <arthur.heymans@9elements.com>

zhalvorsen · 2026-04-23T15:14:34Z

+            return Ok(len);
+        }
+
+        // Populate the checksum so the full response can be checked at the destination


I think we should remove external response support for 2.0. The mailboxes are always big enough to support the full response and this just adds unnecessary complexity.

zhalvorsen · 2026-04-23T15:18:20Z

+            return Ok(len);
+        }
+
+        // Populate the checksum so the full response can be checked at the destination


If we remove it from invoke_dpe.rs we should probably remove it from here as well

parvathib and others added 12 commits April 13, 2026 06:52

Fix unused imports in runtime lib after dispatch split

8562d77

CI runs with -Dwarnings, so the leftover MAX_RESP_SIZE import and the redundant caliptra_common::mailbox_api::* import in the COMMON_RESP_BUF_SIZE sanity-check block both cause build failures. Drop them.

Increase test_pl0_pl1_reallocation_range timeout

6d7733c

Match main branch timeout with 64 contexts.

Reduce number of handles to keep ROM layout

2c5a80e

Signed-off-by: Arthur Heymans <arthur.heymans@9elements.com>

ArthurHeymans marked this pull request as ready for review April 22, 2026 14:33

ArthurHeymans requested review from JohnTraverAmd, ajisaxena, bluegate010, fdamato, jhand2, korran, mhatrevi, rusty1968, swenson, timothytrippel, vsonims and zhalvorsen as code owners April 22, 2026 14:33

ArthurHeymans added 3 commits April 22, 2026 17:09

zhalvorsen approved these changes Apr 23, 2026

View reviewed changes

ArthurHeymans marked this pull request as draft April 23, 2026 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix stack overflow in initialize_dpe by using execute_serialized#3622

Fix stack overflow in initialize_dpe by using execute_serialized#3622
ArthurHeymans wants to merge 15 commits intochipsalliance:caliptra-2.0from
ArthurHeymans:fix-dpe-stack-overflow

ArthurHeymans commented Apr 15, 2026

Uh oh!

zhalvorsen Apr 23, 2026

Uh oh!

zhalvorsen Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ArthurHeymans commented Apr 15, 2026

Summary

Root Cause

Testing

Uh oh!

zhalvorsen Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

zhalvorsen Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants