Skip to content

Cloudflare: Support Custom Server-to-Server Trace IDs #15296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
parisholley opened this issue Feb 5, 2025 · 10 comments
Closed

Cloudflare: Support Custom Server-to-Server Trace IDs #15296

parisholley opened this issue Feb 5, 2025 · 10 comments

Comments

@parisholley
Copy link

Problem Statement

I use Inngest to orchestrate all of my event processing tasks. A feature of their platform is being able to break a single workflow up into multiple "steps", all of which are tagged with a single "run id". For tracing, I would like to keep each "sub step" together inside of a single trace but the library only supports reading trace ids from the header (sentry-trace/baggage).

Solution Brainstorm

Allow for overriding the trace id instead of relying on Sentry.continueTrace(). My current workaround is running this code within my inngest middleware:

Sentry.getActiveSpan()['_traceId'] = v5(ctx.runId, '92f4ea30-d9e0-4750-9d47-69d8c729d79a').replaceAll('-', '');

The only way around this would be to NOT use the out-of-the-box wrappers and roll my own but that means going down an unsupported path.

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Feb 5, 2025
@mydea
Copy link
Member

mydea commented Feb 5, 2025

The way to do this is to use continueTrace. In how far does that not work for you? Can you share your actual code where you invoke/use Sentry?

@parisholley
Copy link
Author

return withScope(scope => {
const propagationContext = propagationContextFromHeaders(sentryTrace, baggage);
scope.setPropagationContext(propagationContext);
return callback();
});

continueTrace still expects a "sentry managed" value vs. an arbitrary UUID that the user owns (eg: many companies have correlation id mechanisms built out already).

continueTrace also only supports "wrapping" a method invocation, in my case, the run id isn't known until a middleware hook is fired within the inngest framework (they provide their own request handler, similar to remix and other cloudflare compatible libs).

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Feb 5, 2025
@parisholley
Copy link
Author

Here is an example of what my middleware looks like:

https://www.inngest.com/docs/features/middleware/create

  return new InngestMiddleware({
    name: 'Sentry',
    init() {
      return {
        onFunctionRun({ ctx, fn }) {
          Sentry.setTag('runId', ctx.runId);

          const span = Sentry.getActiveSpan();

          if (span) {
            Sentry.updateSpanName(span, fn.name);

            // hack to allow chaining inngest runs together by a single run id
            // @ts-ignore
            span['_traceId'] = v5(ctx.runId, '92f4ea30-d9e0-4750-9d47-69d8c729d79a').replaceAll('-', '');
          }

          return {
            transformInput() {
              return {
                ctx: {
                  span<T>(name: string, fn: () => Promise<T>) {
                    return Sentry.startSpan({ name, op: 'function' }, () => fn());
                  },
                },
              };
            },
            finished({ result }) {
              if (result.error) {
                Sentry.captureException(result.error);
              }
            },
          };
        },
      };
    },
  });

@s1gr1d
Copy link
Member

s1gr1d commented Feb 6, 2025

Hello, adding an API for overriding the trace ID is a door to open various problems and a subject to break things as spans may not be continued correctly anymore.

What you could do is creating your own handler middleware which is starting/continuing a span just like the Sentry Cloudflare middleware is doing: https://github.com/getsentry/sentry-javascript/blob/develop/packages/cloudflare/src/handler.ts#L56-L94

Can you share how your Sentry setup looks like right now?

@parisholley
Copy link
Author

Enforcing the user to rely on sentry-managed data generation (sentry-trace, baggage) is the core issue that isn't impacted by how I have things setup. I've worked in numerous enterprise environments where trace/correlation id patterns have been well established (eg: tying requests from backend to frontend and making joining them via logs/observability) and it isn't feasible to expect a team to rebuild how they do distributed tracing and do it the "sentry way".

If i want to trace a message through a multi-step message queue or due to vendor/technology restrictions have to rely on webhooks/requests (eg: using a MessageSID from twilio) in which I cannot control headers (but can pull an id out of query parameters or payloads), there is no mechanism to tie these spans together.

I would imagine that using out-of-the-box handlers supplied by the SDK gives me benefits (eg: I noticed color coded response status codes in the sentry UI), that I don't want to keep up with every time there is an update. There simply needs to be way to leverage a user-owned value for tying requests together.

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Feb 6, 2025
@mydea
Copy link
Member

mydea commented Feb 6, 2025

Can you share how you set up Sentry? Which handler do you use, how do you configure this? We strive to ship helpful helpers out of the box, but provide primitives that allow you to do more custom things yourself if needed. We'll need to see a more full example/reproduction of your setup to be able to see if there are things we can/want to improve there. traceIds are immutable in any environment I have seen so far, so the fundamental issue is that no span can/should be started before the trace ID you want to use is defined. So from my POV this is an ordering problem, you need to first infer the trace ID (this can come from anywhere, does not need to be a header!), then call continueTrace with it, then add the sentry handler. But it's hard to say without a more complete example of your setup code!

@parisholley
Copy link
Author

I'm following the default cloudflare instructions when it comes to sentry, my flow is as follows:

  • I wrap my handler with Sentry.withSentry
  • My handler delegates to a framework (in this case Inngest)
  • At some point in the framework lifecycle, a "trace id" is provided and logic is executed

If this were a message queue environment (rabbitmq, sqs, etc) it would have the same issue, the frameworks will not expose the underlying correlation/trace id until well after the request has bootstrapped.

traceIds are immutable in any environment I have seen so far perhaps in a web environment, but that isn't true with asynchronous messaging (queues, etc)

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Feb 6, 2025
@parisholley
Copy link
Author

To add more color to this, there is a world where you wouldn't wrap the handler at all or delay the starting of a span/transaction until further down into the stack, however the big drawback here is you throw away insights into the latency between initialization and execution. For example, in the case of a serverless web-based queuing system (like inngest), i would lose insights into the cost of bootstrapping the worker (granted, isn't perfect, can't capture cold boot cost or module initialization) + framework and would effectively only see the span in the "middle of the request". As part of a broader distributed trace, the gap would still be there because of timestamps, but i'd have no visibility into whether that gap was the result of a queue delay (inngest took awhile to call the worker in the next step of a broader flow) vs. bootstrapping latency.

@mydea
Copy link
Member

mydea commented Feb 7, 2025

This is problematic to solve because Inngest (as far as I see from the docs) does not provide a way for middlewares to wrap the handler callback, which is what we would need to solve this nicely. Without this capability, automatic isolation the way we need it is not really possible.

Also, a traceId must be be a 32-hex-character lowercase string. Any other format is invalid and may (will) lead to problems. I am not quite sure what v5(ctx.runId, '92f4ea30-d9e0-4750-9d47-69d8c729d79a').replaceAll('-', '') actually does, but it may or may not actually create valid trace IDs.

Overall, I think the core question is, why should this even be the trace ID? I think the way to go is to store the run ID as attribute (or tag) on the span and use it from there. Trace IDs are meant to be immutable and we will not - cannot even, because the underlying system we use (OpenTelemetry) to handle this does not allow it - make it possible to change a trace ID after a span started, because this can lead to loads of unexpected and unsupported problems.

The only way to cleanly achieve what you want to achieve is to find a way to run code before the withSentry handler executes and to add the trace ID to the request object, something like this:

request.headers.set('sentry-trace', `${traceId}-{randomSpanId}`)
// randomSpanId can be a random 16 character lowercase hex string

Then this will be picked up by the Sentry handler.

An alternative to this will be unlocked soon when we ship Linked Spans: #14991

Then, you can also let Sentry create a span with a random trace ID, and then inside of your middleware you create a new span with a new trace, which can be linked to the outside trace. Something like this:

return new InngestMiddleware({
    name: 'Sentry',
    init() {
      return {
        onFunctionRun({ ctx, fn }) {
          Sentry.setTag('runId', ctx.runId);

          const span = Sentry.getActiveSpan();
          const spanLinks = span ? [{ context: span.spanContext() }] : [];

          return {
            transformInput() {
              return {
                ctx: {
                  span<T>(name: string, fn: () => Promise<T>) {
                    return Sentry.continueTrace({ traceId, spanId: span?.spanContext().spanId || generateSpanId() }, () => 
                      Sentry.startSpan({ name, op: 'function', forceTransaction: true, links: spanLinks, () => fn()));
                  },
                },
              };
            },
            finished({ result }) {
              if (result.error) {
                Sentry.captureException(result.error);
              }
            },
          };
        },
      };
    },
  });

This will then have an outer span, with an inner span that is related to the outer span, so you can still see related info in it.

@parisholley
Copy link
Author

It sounds like span links are what I need then :) I'll close, thanks for brainstorming with me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants