Skip to content

proposal(profiling-ffi): catch_unwind via global panic callback#1974

Draft
r1viollet wants to merge 1 commit into
mainfrom
r1viollet/profiling-ffi-panic-callback
Draft

proposal(profiling-ffi): catch_unwind via global panic callback#1974
r1viollet wants to merge 1 commit into
mainfrom
r1viollet/profiling-ffi-panic-callback

Conversation

@r1viollet
Copy link
Copy Markdown
Contributor

Summary

Exploration PR — one of three sibling proposals for adding catch_unwind at the profiling FFI boundary. Not for merge as-is; intended to make the API shape concrete enough to choose between.

This variant catches panics inside the FFI body and routes the panic message through a globally registered C callback. The returned ProfileStatus carries a static sentinel c"libdatadog panicked", so the return path performs no allocation on the panic side.

  • New panic_handler module: ddog_prof_set_panic_handler(cb, userdata).
  • New wrap_with_profile_status! macro: catch_unwind around the body, fires the registered handler on caught payloads.
  • Nested catch_unwind inside fire_panic_handler so an OOM during message formatting does not itself unwind.
  • Example migration: ddog_prof_ProfilesDictionary_insert_function. Other ProfileStatus-returning functions are untouched in this PR.

Full design doc: docs/proposals/profiling-ffi-catch-unwind-callback.md.

Sibling proposals

  • r1viollet/profiling-ffi-panic-bit — bit-flag on ProfileStatus, no global state.
  • r1viollet/profiling-ffi-panic-bit-and-callback — both, combined.

Pick one (or none) of the three to take forward to a real RFC.

Test plan

  • `cargo check -p libdd-profiling-ffi` (done locally)
  • `cargo test -p libdd-profiling-ffi` covering handler registration / clearing / re-registration
  • Unit test: panic inside `wrap_with_profile_status!` with no registered handler → static sentinel, no allocation
  • Unit test: panic with registered handler → callback fires with correct function name and message
  • Stress test: concurrent set + fire (atomic ordering)
  • Document the no-reentry contract for the handler

🤖 Generated with Claude Code

Exploration PR — one of three sibling proposals for adding catch_unwind at
the profiling FFI boundary. Routes caught panics through a globally registered
C callback; the returned ProfileStatus carries a static sentinel so the return
path performs no allocation on the panic side.

- New panic_handler module: ddog_prof_set_panic_handler + fire_panic_handler.
- New wrap_with_profile_status! macro: catch_unwind around the body, fires
  the registered handler (if any) on caught payloads, returns
  ProfileStatus::from(c"libdatadog panicked").
- Nested catch_unwind inside fire_panic_handler so OOM during message
  formatting cannot itself unwind.
- One example migration: ddog_prof_ProfilesDictionary_insert_function.

See docs/proposals/profiling-ffi-catch-unwind-callback.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

📚 Documentation Check Results

⚠️ 769 documentation warning(s) found

📦 libdd-profiling-ffi - 769 warning(s)


Updated: 2026-05-11 14:00:27 UTC | Commit: 507aa8e | missing-docs job results

@github-actions
Copy link
Copy Markdown
Contributor

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

  • Base Branch: origin/main
  • PR Branch: origin/r1viollet/profiling-ffi-panic-callback

Summary by Rule

Rule Base Branch PR Branch Change

Annotation Counts by File

File Base Branch PR Branch Change

Annotation Stats by Crate

Crate Base Branch PR Branch Change
clippy-annotation-reporter 5 5 No change (0%)
datadog-ffe-ffi 1 1 No change (0%)
datadog-ipc 21 21 No change (0%)
datadog-live-debugger 6 6 No change (0%)
datadog-live-debugger-ffi 10 10 No change (0%)
datadog-profiling-replayer 4 4 No change (0%)
datadog-remote-config 3 3 No change (0%)
datadog-sidecar 57 57 No change (0%)
libdd-common 13 13 No change (0%)
libdd-common-ffi 12 12 No change (0%)
libdd-data-pipeline 5 5 No change (0%)
libdd-ddsketch 2 2 No change (0%)
libdd-dogstatsd-client 1 1 No change (0%)
libdd-profiling 13 13 No change (0%)
libdd-telemetry 20 20 No change (0%)
libdd-tinybytes 4 4 No change (0%)
libdd-trace-normalization 2 2 No change (0%)
libdd-trace-obfuscation 8 8 No change (0%)
libdd-trace-stats 1 1 No change (0%)
libdd-trace-utils 15 15 No change (0%)
Total 203 203 No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

@github-actions
Copy link
Copy Markdown
Contributor

🔒 Cargo Deny Results

⚠️ 6 issue(s) found, showing only errors (advisories, bans, sources)

📦 libdd-profiling-ffi - 6 error(s)

Show output
error[vulnerability]: NSEC3 closest-encloser proof validation enters unbounded loop on cross-zone responses
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:90:1
   │
90 │ hickory-proto 0.25.2 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
   │
   ├ ID: RUSTSEC-2026-0118
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0118
   ├ The NSEC3 closest-encloser proof validation in `hickory-proto`'s
     `DnssecDnsHandle` walks from the QNAME up to the SOA owner name, building a
     list of candidate encloser names. The iterator used assumes the
     QNAME is a descendant of the SOA owner, terminating only when the current
     candidate equals the SOA name. When the SOA in a response's authority section
     is not an ancestor of the QNAME, the loop stalls at the DNS root and never
     terminates, repeatedly calling `Name::base_name()` and pushing newly allocated
     `Name` and hashed-name entries into the candidate `Vec`.
     
     The bug is reachable by any caller of `DnssecDnsHandle` — including the
     resolver, recursor, and client — when built with the `dnssec-ring` or
     `dnssec-aws-lc-rs` feature and configured to perform DNSSEC validation. It is
     triggered while validating a NoData or NXDomain response whose authority
     section contains an SOA record from a zone other than an ancestor of the
     QNAME, on a code path that requires NSEC3 closest-encloser proof. In practice
     this can be reached through an insecure CNAME chain that crosses zone
     boundaries into a DNSSEC-signed zone returning NoData, but the minimum
     condition is just a mismatched SOA owner on a response requiring NSEC3
     validation.
     
     A `debug_assert_ne!(name, Name::root())` guards the loop body, so debug builds
     abort with a panic on the first iteration past the root. Release builds
     compile the assertion out and run the loop unbounded, allocating until the
     process exhausts available memory (OOM). A reachable upstream attacker who
     can return such a response can therefore crash a debug-built validator or
     exhaust memory on a release-built one.
     
     The affected code was migrated from `hickory-proto` to `hickory-net` as part of
     the 0.26.0 release. The `hickory-proto` 0.26.x release no longer offers
     `DnssecDnsHandle` and so we recommend all affected users update to `hickory-net`
     0.26.1 when the implementation of that type is required.
   ├ Announcement: https://github.com/hickory-dns/hickory-dns/security/advisories/GHSA-3v94-mw7p-v465
   ├ Solution: No safe upgrade is available!
   ├ hickory-proto v0.25.2
     └── hickory-resolver v0.25.2
         └── reqwest v0.13.2
             ├── libdd-common v4.0.0
             │   ├── libdd-common-ffi v33.0.0
             │   │   └── libdd-profiling-ffi v1.0.0
             │   ├── libdd-profiling v1.0.0
             │   │   ├── (dev) libdd-profiling v1.0.0 (*)
             │   │   └── libdd-profiling-ffi v1.0.0 (*)
             │   └── libdd-profiling-ffi v1.0.0 (*)
             └── libdd-profiling v1.0.0 (*)

error[vulnerability]: CPU exhaustion during message encoding due to O(n²) name compression
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:90:1
   │
90 │ hickory-proto 0.25.2 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
   │
   ├ ID: RUSTSEC-2026-0119
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0119
   ├ During message encoding, `hickory-proto`'s `BinEncoder` stores pointers to
     labels that are candidates for name compression in a `Vec<(usize, Vec<u8>)>`.
     The name compression logic then searches for matches with a linear scan.
     
     A malicious message with many records can both introduce many candidate labels,
     and invoke this linear scan many times. This can amplify CPU exhaustion in DoS
     attacks.
     
     This is similar to
     [CVE-2024-8508](https://www.nlnetlabs.nl/downloads/unbound/CVE-2024-8508.txt).
     
     We recommend all affected users update to `hickory-proto` 0.26.1 for the fix.
   ├ Announcement: https://github.com/hickory-dns/hickory-dns/security/advisories/GHSA-q2qq-hmj6-3wpp
   ├ Solution: Upgrade to >=0.26.1 (try `cargo update -p hickory-proto`)
   ├ hickory-proto v0.25.2
     └── hickory-resolver v0.25.2
         └── reqwest v0.13.2
             ├── libdd-common v4.0.0
             │   ├── libdd-common-ffi v33.0.0
             │   │   └── libdd-profiling-ffi v1.0.0
             │   ├── libdd-profiling v1.0.0
             │   │   ├── (dev) libdd-profiling v1.0.0 (*)
             │   │   └── libdd-profiling-ffi v1.0.0 (*)
             │   └── libdd-profiling-ffi v1.0.0 (*)
             └── libdd-profiling v1.0.0 (*)

error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:171:1
    │
171 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
    │
    ├ ID: RUSTSEC-2026-0097
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
    ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
      
      - The `log` and `thread_rng` features are enabled
      - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
      - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
      - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
      - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
      
      `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
    ├ Announcement: https://github.com/rust-random/rand/pull/1763
    ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
    ├ rand v0.8.5
      ├── libdd-common v4.0.0
      │   ├── libdd-common-ffi v33.0.0
      │   │   └── libdd-profiling-ffi v1.0.0
      │   ├── libdd-profiling v1.0.0
      │   │   ├── (dev) libdd-profiling v1.0.0 (*)
      │   │   └── libdd-profiling-ffi v1.0.0 (*)
      │   └── libdd-profiling-ffi v1.0.0 (*)
      ├── libdd-profiling v1.0.0 (*)
      └── proptest v1.5.0
          └── (dev) libdd-profiling v1.0.0 (*)

error[vulnerability]: Name constraints for URI names were incorrectly accepted
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:195:1
    │
195 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0098
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0098
    ├ Name constraints for URI names were ignored and therefore accepted.
      
      Note this library does not provide an API for asserting URI names, and URI name constraints are otherwise not implemented.  URI name constraints are now rejected unconditionally.
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-965h-392x-2mh5](https://github.com/rustls/webpki/security/advisories/GHSA-965h-392x-2mh5). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      ├── rustls v0.23.37
      │   ├── hyper-rustls v0.27.7
      │   │   ├── libdd-common v4.0.0
      │   │   │   ├── libdd-common-ffi v33.0.0
      │   │   │   │   └── libdd-profiling-ffi v1.0.0
      │   │   │   ├── libdd-profiling v1.0.0
      │   │   │   │   ├── (dev) libdd-profiling v1.0.0 (*)
      │   │   │   │   └── libdd-profiling-ffi v1.0.0 (*)
      │   │   │   └── libdd-profiling-ffi v1.0.0 (*)
      │   │   └── reqwest v0.13.2
      │   │       ├── libdd-common v4.0.0 (*)
      │   │       └── libdd-profiling v1.0.0 (*)
      │   ├── libdd-common v4.0.0 (*)
      │   ├── libdd-profiling v1.0.0 (*)
      │   ├── reqwest v0.13.2 (*)
      │   ├── rustls-platform-verifier v0.6.2
      │   │   ├── libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2 (*)
      │   └── tokio-rustls v0.26.0
      │       ├── hyper-rustls v0.27.7 (*)
      │       ├── libdd-common v4.0.0 (*)
      │       └── reqwest v0.13.2 (*)
      └── rustls-platform-verifier v0.6.2 (*)

error[vulnerability]: Name constraints were accepted for certificates asserting a wildcard name
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:195:1
    │
195 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0099
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0099
    ├ Permitted subtree name constraints for DNS names were accepted for certificates asserting a wildcard name.
      
      This was incorrect because, given a name constraint of `accept.example.com`, `*.example.com` could feasibly allow a name of `reject.example.com` which is outside the constraint.
      This is very similar to [CVE-2025-61727](https://go.dev/issue/76442).
      
      Since name constraints are restrictions on otherwise properly-issued certificates, this bug is reachable only after signature verification and requires misissuance to exploit.
      
      This vulnerability is identified as [GHSA-xgp8-3hg3-c2mh](https://github.com/rustls/webpki/security/advisories/GHSA-xgp8-3hg3-c2mh). Thank you to @1seal for the report.
    ├ Solution: Upgrade to >=0.103.12, <0.104.0-alpha.1 OR >=0.104.0-alpha.6 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      ├── rustls v0.23.37
      │   ├── hyper-rustls v0.27.7
      │   │   ├── libdd-common v4.0.0
      │   │   │   ├── libdd-common-ffi v33.0.0
      │   │   │   │   └── libdd-profiling-ffi v1.0.0
      │   │   │   ├── libdd-profiling v1.0.0
      │   │   │   │   ├── (dev) libdd-profiling v1.0.0 (*)
      │   │   │   │   └── libdd-profiling-ffi v1.0.0 (*)
      │   │   │   └── libdd-profiling-ffi v1.0.0 (*)
      │   │   └── reqwest v0.13.2
      │   │       ├── libdd-common v4.0.0 (*)
      │   │       └── libdd-profiling v1.0.0 (*)
      │   ├── libdd-common v4.0.0 (*)
      │   ├── libdd-profiling v1.0.0 (*)
      │   ├── reqwest v0.13.2 (*)
      │   ├── rustls-platform-verifier v0.6.2
      │   │   ├── libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2 (*)
      │   └── tokio-rustls v0.26.0
      │       ├── hyper-rustls v0.27.7 (*)
      │       ├── libdd-common v4.0.0 (*)
      │       └── reqwest v0.13.2 (*)
      └── rustls-platform-verifier v0.6.2 (*)

error[vulnerability]: Reachable panic in certificate revocation list parsing
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:195:1
    │
195 │ rustls-webpki 0.103.10 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
    │
    ├ ID: RUSTSEC-2026-0104
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0104
    ├ A panic was reachable when parsing certificate revocation lists via [`BorrowedCertRevocationList::from_der`]
      or [`OwnedCertRevocationList::from_der`].  This was the result of mishandling a syntactically valid empty
      `BIT STRING` appearing in the `onlySomeReasons` element of a `IssuingDistributionPoint` CRL extension.
      
      This panic is reachable prior to a CRL's signature being verified.
      
      Applications that do not use CRLs are not affected.
      
      Thank you to @tynus3 for the report.
    ├ Solution: Upgrade to >=0.103.13, <0.104.0-alpha.1 OR >=0.104.0-alpha.7 (try `cargo update -p rustls-webpki`)
    ├ rustls-webpki v0.103.10
      ├── rustls v0.23.37
      │   ├── hyper-rustls v0.27.7
      │   │   ├── libdd-common v4.0.0
      │   │   │   ├── libdd-common-ffi v33.0.0
      │   │   │   │   └── libdd-profiling-ffi v1.0.0
      │   │   │   ├── libdd-profiling v1.0.0
      │   │   │   │   ├── (dev) libdd-profiling v1.0.0 (*)
      │   │   │   │   └── libdd-profiling-ffi v1.0.0 (*)
      │   │   │   └── libdd-profiling-ffi v1.0.0 (*)
      │   │   └── reqwest v0.13.2
      │   │       ├── libdd-common v4.0.0 (*)
      │   │       └── libdd-profiling v1.0.0 (*)
      │   ├── libdd-common v4.0.0 (*)
      │   ├── libdd-profiling v1.0.0 (*)
      │   ├── reqwest v0.13.2 (*)
      │   ├── rustls-platform-verifier v0.6.2
      │   │   ├── libdd-profiling v1.0.0 (*)
      │   │   └── reqwest v0.13.2 (*)
      │   └── tokio-rustls v0.26.0
      │       ├── hyper-rustls v0.27.7 (*)
      │       ├── libdd-common v4.0.0 (*)
      │       └── reqwest v0.13.2 (*)
      └── rustls-platform-verifier v0.6.2 (*)

advisories FAILED, bans ok, sources ok

Updated: 2026-05-11 14:02:01 UTC | Commit: 507aa8e | dependency-check job results

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 3.33333% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.52%. Comparing base (91fd13c) to head (758e03c).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1974      +/-   ##
==========================================
- Coverage   72.63%   72.52%   -0.11%     
==========================================
  Files         448      451       +3     
  Lines       73582    73756     +174     
==========================================
+ Hits        53444    53491      +47     
- Misses      20138    20265     +127     
Components Coverage Δ
libdd-crashtracker 65.25% <ø> (+0.17%) ⬆️
libdd-crashtracker-ffi 37.68% <ø> (+0.85%) ⬆️
libdd-alloc 98.77% <ø> (ø)
libdd-data-pipeline 86.34% <ø> (-0.24%) ⬇️
libdd-data-pipeline-ffi 74.25% <ø> (-1.39%) ⬇️
libdd-common 79.81% <ø> (ø)
libdd-common-ffi 74.41% <ø> (ø)
libdd-telemetry 69.86% <ø> (+0.49%) ⬆️
libdd-telemetry-ffi 19.37% <ø> (ø)
libdd-dogstatsd-client 82.64% <ø> (ø)
datadog-ipc 74.75% <ø> (-1.47%) ⬇️
libdd-profiling 81.32% <3.33%> (-0.26%) ⬇️
libdd-profiling-ffi 63.40% <3.33%> (-1.12%) ⬇️
libdd-sampling 97.25% <ø> (ø)
datadog-sidecar 29.23% <ø> (-0.61%) ⬇️
datdog-sidecar-ffi 10.33% <ø> (-2.89%) ⬇️
spawn-worker 54.69% <ø> (ø)
libdd-tinybytes 93.16% <ø> (ø)
libdd-trace-normalization 81.71% <ø> (ø)
libdd-trace-obfuscation 87.26% <ø> (ø)
libdd-trace-protobuf 68.25% <ø> (ø)
libdd-trace-utils 89.31% <ø> (+0.04%) ⬆️
libdd-tracer-flare 86.88% <ø> (ø)
libdd-log 74.83% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datadog-prod-us1-3
Copy link
Copy Markdown

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 3.33%
Overall Coverage: 72.52% (-0.02%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 758e03c | Docs | Datadog PR Page | Give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants