You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Short-lived JVMs configured to reach the Agent over a UDS intermittently die with a JVM-level SIGBUS (si_code 2 BUS_ADRERR) — a native crash, not an application error — while the tracer's metrics subsystem brings up its DogStatsD connection.
The root cause is upstream in jnr/jffi (filed as jnr/jffi#194): when DogStatsD is sent over a Unix socket, the bundled com.datadoghq:java-dogstatsd-client constructs a jnr.unixsocket.UnixSocketAddress, which initializes jnr-ffi → jffi. jffi's StubLoader.unpackLibrary extracts its native stub with an InputStream.available()-guarded copy loop that silently truncates the .so (it does no length/digest check on a fresh extraction). System.load() of the short stub then faults in ld.so past EOF → SIGBUS/BUS_ADRERR.
The extracted …/jffi<rand>.so was truncated at a 4 KiB boundary; the dynamic linker's relocation write into the missing final page hit EOF → BUS_ADRERR.
Scope / notes:
Only the DogStatsD path is affected. The Agent's own trace/EVP transport over UDS uses the JDK-native socket (dd.jdk.socket.enabled, default true) and does not load jffi — consistent with jffi initializing ~20s in, inside the DogStatsD connect task, never at startup. The bundled java-dogstatsd-client is the sole remaining jnr/jffi consumer.
Tracer metrics are on by default, so this can fire in any UDS-configured JVM that lives long enough to run the periodic StatsD connect.
It is a timing race in jffi's copy loop — rare per-extraction, but frequent across a high-volume / CPU-saturated CI fleet (many thousands of short-lived JVMs). Long-lived production processes (one extraction at controlled startup) effectively never hit it.
The crash kills the JVM before data is flushed, so it is invisible in APM / CI Visibility and only recoverable from captured hs_err files.
Expected Behavior
Configuring the Agent connection as a UDS (and the tracer emitting its own metrics) must not be able to crash the host JVM. DogStatsD over UDS should not pull in a native FFI stub whose extraction can fail unsafely.
Reproduction Code
No deterministic repro — it is a timing race in jffi's stub extraction (see jnr/jffi#194). It reproduces statistically under load with:
dd-java-agent 1.62.0 attached, JDK 21
DD_TRACE_AGENT_URL=unix:///var/run/datadog/apm.socket (so DogStatsD also resolves to a UDS)
default tracer metrics (health metrics on)
many short-lived JVMs on CPU-saturated hosts
Diagnosed from captured -XX:ErrorFile hs_err logs showing the stack above and a truncated jffi*.so.
Tracer Version(s)
1.62.0
Java Version(s)
21.0.6 (Azul Zulu 21.40+17-CA)
JVM Vendor
Azul Systems (Zulu OpenJDK)
Bug Report
Short-lived JVMs configured to reach the Agent over a UDS intermittently die with a JVM-level
SIGBUS(si_code 2 BUS_ADRERR) — a native crash, not an application error — while the tracer's metrics subsystem brings up its DogStatsD connection.The root cause is upstream in jnr/jffi (filed as jnr/jffi#194): when DogStatsD is sent over a Unix socket, the bundled
com.datadoghq:java-dogstatsd-clientconstructs ajnr.unixsocket.UnixSocketAddress, which initializesjnr-ffi→jffi. jffi'sStubLoader.unpackLibraryextracts its native stub with anInputStream.available()-guarded copy loop that silently truncates the.so(it does no length/digest check on a fresh extraction).System.load()of the short stub then faults inld.sopast EOF →SIGBUS/BUS_ADRERR.hs_err excerpt:
The extracted
…/jffi<rand>.sowas truncated at a 4 KiB boundary; the dynamic linker's relocation write into the missing final page hit EOF →BUS_ADRERR.Scope / notes:
dd.jdk.socket.enabled, defaulttrue) and does not load jffi — consistent with jffi initializing ~20s in, inside the DogStatsD connect task, never at startup. The bundledjava-dogstatsd-clientis the sole remaining jnr/jffi consumer.hs_errfiles.Expected Behavior
Configuring the Agent connection as a UDS (and the tracer emitting its own metrics) must not be able to crash the host JVM. DogStatsD over UDS should not pull in a native FFI stub whose extraction can fail unsafely.
Reproduction Code
No deterministic repro — it is a timing race in jffi's stub extraction (see jnr/jffi#194). It reproduces statistically under load with:
dd-java-agent1.62.0 attached, JDK 21DD_TRACE_AGENT_URL=unix:///var/run/datadog/apm.socket(so DogStatsD also resolves to a UDS)Diagnosed from captured
-XX:ErrorFilehs_err logs showing the stack above and a truncatedjffi*.so.Suggested fixes
unpackLibrary(StubLoader.unpackLibrary uses InputStream.available() as copy-loop guard, silently truncating the extracted stub (→ SIGBUS BUS_ADRERR / "failed to map segment") jnr/jffi#194) — pick up the fix / bump jffi once available.java-dogstatsd-clientuse the JDK-nativejava.net.UnixDomainSocketAddress(JDK 16+) for UDS, as the Agent transport already does viadd.jdk.socket.enabled. This removes jffi from the metrics path entirely (cf. Dependency on JFFI when sending metrics to Unix socket java-dogstatsd-client#68, change span.type value #85).-Djffi.boot.library.path=…) in the agent so the buggyunpackLibrarycopy is never exercised.Related: jnr/jffi#194, jnr/jffi#46, jnr/jffi#158, DataDog/java-dogstatsd-client#68 / #85 / #258, #7643, #7165.