Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APM: Only do stats obfuscation when we know its safe to do so #3155

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ajgajg1134
Copy link
Contributor

@ajgajg1134 ajgajg1134 commented Feb 5, 2025

What does this PR do?

This adds a fix where the tracer attempts to load the obfuscation version of the trace-agent before doing obfuscation on stats payloads. If the agent supports client stats being obfuscated and our version of obfuscation is at-least as recent then we can safely obfuscate. Full agent support is here: DataDog/datadog-agent#33765 and until that is released, systems talking to old agents and using client stats will not obfuscate so that the trace-agent can perform the obfuscation a single time.

Motivation

A few points here:

  • Obfuscating for stats payloads is valuable from a performance perspective as it reduces the cardinality of stats thereby using stats buckets more efficiently
  • Double obfuscating stats payloads (as occurs today) results in overly modified redis resources (and sql, but it's less susceptible). e.g. a redis resource like "GET k1 \n SET k2", should become GET SET but instead after double obfuscation becomes GET

Here's a more thorough explanation from a design doc:
The trace-agent today accepts client (aka tracer) calculated stats payloads via the endpoint /v0.6/stats. This endpoint performs obfuscation on the resource name of the incoming payloads, today this is for SQL and Redis spans which may hold sensitive data in their resource name. The current implementation of the Go tracer’s client stats also performs this obfuscation using the same imported logic from the agent. This results in obfuscation logic occurring twice on the resource name, leading to unexpected resource names that are less useful. For example a redis span with resource name GET k1\nDEL k2\nMS… after initial obfuscation becomes GET DEL … However after being ingested by the trace-agent a second round of obfuscation occurs and the resource name is changed again, this time to the far less descriptive GET.

We can resolve this problem in the Go tracer by introducing a new header Datadog-Obfuscation-Version, this header is a simple version integer that we increment over time as new obfuscation features are added

Reviewer's Checklist

  • Changed code has unit tests for its functionality at or near 100% coverage.
  • System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • There is a benchmark for any new code, or changes to existing code.
  • If this interacts with the agent in a new way, a system test has been added.
  • Add an appropriate team label so this PR gets put in the right place for the release notes.
  • Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.
  • For internal contributors, a matching PR should be created to the v2-dev branch and reviewed by @DataDog/apm-go.

Unsure? Have a question? Request a review!

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Feb 5, 2025

Datadog Report

Branch report: andrew.glaude/avoidDblObfuscating
Commit report: 6ad44ed
Test service: dd-trace-go

✅ 0 Failed, 5227 Passed, 74 Skipped, 2m 11.46s Total Time

@ajgajg1134 ajgajg1134 marked this pull request as ready for review February 6, 2025 15:59
@ajgajg1134 ajgajg1134 requested a review from a team as a code owner February 6, 2025 15:59
@ajgajg1134 ajgajg1134 changed the title Only do stats obfuscation when we know its safe to do so APM: Only do stats obfuscation when we know its safe to do so Feb 6, 2025
@hannahkm
Copy link
Contributor

hannahkm commented Feb 6, 2025

Overall, LGTM! Just some questions for clarification:

Is a value of -1 mean anything in particular outside of testing? It seems that 0 means that the agent has not reported any version for obfuscation (in which case we trust that the agent will take care of all obfuscation?) and a version of 2 indicates it is newer than version 1. Anything similar for -1?

Doc you linked in Slack is very helpful! How much of it can we add to the PR notes/comments? At first, I was confused as to what constitutes a obfuscation "version", so I feel that adding more documentation for that might be helpful 👀

@ajgajg1134
Copy link
Contributor Author

Is a value of -1 mean anything in particular outside of testing? It seems that 0 means that the agent has not reported any version for obfuscation (in which case we trust that the agent will take care of all obfuscation?) and a version of 2 indicates it is newer than version 1. Anything similar for -1?

That's just a dummy / test value for the mock transport object to make it clear that no value has been set. This is useful since the default value of 0 has meaning here (it indicates no version was sent).

Doc you linked in Slack is very helpful! How much of it can we add to the PR notes/comments? At first, I was confused as to what constitutes a obfuscation "version", so I feel that adding more documentation for that might be helpful 👀

Good idea! I've updated the PR description here with some excerpts which hopefully make it a bit clearer? I'm happy to add some more code comments too if there's any particular areas that aren't clear!

Copy link
Contributor

@hannahkm hannahkm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! Makes sense to me 👍🏼

defaultURL = "http://" + defaultAddress
defaultHTTPTimeout = 10 * time.Second // defines the current timeout before giving up with the send process
traceCountHeader = "X-Datadog-Trace-Count" // header containing the number of traces in the payload
obfuscationVersionHeader = "Datadog-Obfuscation-Version" // header containing the version of obfuscation used, if any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, why do some Datadog- headers have the X- prefix, but others don't? Should it be added here?

Copy link
Contributor Author

@ajgajg1134 ajgajg1134 Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look this up, apparently the old MIME headers RFC said this should be done for custom headers. However since at least 2011 that's apparently been deprecated and you can just name headers whatever you want :P https://datatracker.ietf.org/doc/html/draft-saintandre-xdash-00 I noticed some of our more recent headers dropped the X- so it seemed to make sense to do that here as well

Oh looks like that draft is now this RFC itself from 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants