Skip to content

apollo_router.client spans are not the parent of downstream HTTP-level spans (span_id not injected as outbound parent-id) #9530

@jfairchild

Description

@jfairchild

Describe the bug

The apollo_router.client span (OTel http_request with span.kind=client) created in apollo-router/src/services/external.rs:333 is recorded with the correct trace ID and parent (the surrounding external_plugin / fetch span), but its own span_id is never injected as the outbound parent-id (W3C traceparent, Datadog x-datadog-parent-id, B3 X-B3-ParentSpanId). Every downstream span produced on the wire — service-mesh sidecars, the target service's server span, etc. — parents back to the surrounding span, not to the apollo_router.client span. As a result, apollo_router.client spans appear as leaves in the flamegraph even though they wrap a real network call with real children.

Likely root cause

Order of operations inside Externalizable::call:

// apollo-router/src/services/external.rs (v2.13.1)
let http_req_span = tracing::info_span!(HTTP_REQUEST_SPAN_NAME,             // line 333: span CREATED, not entered
    "otel.kind" = "CLIENT",
    "http.request.method" = "POST",
    /* ... */
    "otel.original_name" = "http_request",
);

get_text_map_propagator(|propagator| {
    propagator.inject_context(                                              // lines 343-347: headers INJECTED
        &prepare_context(http_req_span.context()),                          //   before span is entered
        &mut crate::otel_compat::HeaderInjector(http_request.headers_mut()),
    );
});

// ... later ...

let response = client.call(request).instrument(http_req_span).await?;       // line 355: span ENTERED

tracing::info_span! creates a span but does not make it the current span — .in_scope() / .entered() / .instrument() is required for that. By the time propagator.inject_context(...) runs at line 343–347, http_req_span has not been entered, so the OTel context tied to it still has the parent's span as the active span. The propagator therefore writes the parent's span_id (e.g. the surrounding external_plugin / subgraph_request span) into the outbound parent-id headers. The new client span is registered locally under the right parent for in-process tracing, but its identity never leaves the process.

To Reproduce

  1. Run Apollo Router with a coprocessor configured (any stage will do — RouterRequest, SubgraphRequest, etc.).
  2. Run an OTel-compatible target receiving the coprocessor traffic that emits its own server span (anything that reads the incoming traceparent and creates a child span — a stock OTel-instrumented HTTP server is sufficient).
  3. Send a single request through the router.
  4. In the resulting trace, look at the apollo_router.client span:
    • Filter / search for spans whose parent_id equals the apollo_router.client span's span_id0 spans.
    • The target service's server span will be parented to the outer fetch span (e.g. external_plugin) instead, with apollo_router.client appearing as a sibling leaf.

The same shape will appear with subgraph HTTP fetches if the codepath there does the equivalent (this issue is specifically about external.rs, but worth checking).

Expected behavior

apollo_router.client should be the parent of the downstream HTTP-level spans. Concretely: the parent-id written into the outbound request headers should equal the apollo_router.client span's own span_id, so that the receiving service's server span is a child of apollo_router.client, which in turn is a child of the surrounding fetch span. Today the chain skips the client span entirely.

Suggested fix

Either:

  1. Enter the span before injecting headers, e.g. wrap the propagator call in http_req_span.in_scope(|| { ... }). This makes http_req_span the active span at injection time, so the propagator emits its span_id as parent-id.

  2. Or, extract the new span's OTel context explicitly using OpenTelemetrySpanExt::context() semantics that yield the new span as the active span (rather than relying on it being the current span). The prepare_context helper appears intended to handle this — worth verifying whether it's correctly preferring the wrapped span over the prior current span.

Either fix should be a small change. Happy to send a PR if a maintainer confirms the intended behavior here.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions