Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add span events as a top level field for v0.4 encoding #5229

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

wconti27
Copy link
Contributor

@wconti27 wconti27 commented Feb 7, 2025

What does this PR do?

Formatting Changes in this PR:

  • Changes variable naming in format.js to be less confusing

Actual Changes:

  • Updates format.js to format span events as a top-level field when using the v0.4 encoder and the agent supports the field. To check for agent support, we check to see if the /info endpoint returns a top-level span_events attributes with value true. Otherwise, span events are serialized the same as before within the span's tags as a JSON formatted string.

Motivation

Matches Otel Implementation. Being done in parallel across languages.

Plugin Checklist

Additional Notes

@wconti27 wconti27 requested a review from a team as a code owner February 7, 2025 18:54
@wconti27 wconti27 marked this pull request as draft February 7, 2025 18:54
@wconti27 wconti27 self-assigned this Feb 7, 2025
Copy link

github-actions bot commented Feb 7, 2025

Overall package size

Self size: 8.63 MB
Deduped: 95.03 MB
No deduping: 95.55 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.4.0 | 29.44 MB | 29.44 MB | | @datadog/native-appsec | 8.4.0 | 19.25 MB | 19.26 MB | | @datadog/native-iast-taint-tracking | 3.2.0 | 13.9 MB | 13.91 MB | | @datadog/pprof | 5.5.1 | 9.79 MB | 10.17 MB | | protobufjs | 7.2.5 | 2.77 MB | 5.16 MB | | @datadog/native-iast-rewriter | 2.8.0 | 2.6 MB | 2.74 MB | | @opentelemetry/core | 1.14.0 | 872.87 kB | 1.47 MB | | @datadog/native-metrics | 3.1.0 | 1.06 MB | 1.46 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | import-in-the-middle | 1.11.2 | 112.74 kB | 826.22 kB | | source-map | 0.7.4 | 226 kB | 226 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | lru-cache | 7.18.3 | 133.92 kB | 133.92 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.0 | 109.9 kB | 109.9 kB | | semver | 7.6.3 | 95.82 kB | 95.82 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 5.3.1 | 51.46 kB | 51.46 kB | | shell-quote | 1.8.1 | 44.96 kB | 44.96 kB | | istanbul-lib-coverage | 3.2.0 | 29.34 kB | 29.34 kB | | rfdc | 1.3.1 | 25.21 kB | 25.21 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | dc-polyfill | 0.1.4 | 23.1 kB | 23.1 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.3 | 4.47 kB | 4.47 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

Copy link

codecov bot commented Feb 7, 2025

Codecov Report

Attention: Patch coverage is 98.43750% with 1 line in your changes missing coverage. Please review.

Project coverage is 81.21%. Comparing base (6f79a86) to head (136a7da).
Report is 30 commits behind head on master.

Files with missing lines Patch % Lines
packages/dd-trace/src/format.js 96.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5229      +/-   ##
==========================================
+ Coverage   81.07%   81.21%   +0.13%     
==========================================
  Files         479      482       +3     
  Lines       21342    21551     +209     
==========================================
+ Hits        17303    17502     +199     
- Misses       4039     4049      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Feb 7, 2025

Datadog Report

Branch report: conti/serialize_span_events_as_top_level_field
Commit report: ee521cc
Test service: dd-trace-js-integration-tests

✅ 0 Failed, 629 Passed, 0 Skipped, 15m 25.74s Total Time

Copy link
Member

@rochdev rochdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the /info endpoint doesn't work because it needs to be done on a per-connection basis since each request could otherwise end up on a different agent with a different version and different capabilities, and by that point it's too late to make the decision in most cases because the data has already been generated/formatted. Because of that, the only way this can be implemented for 0.4 is with a tag, and if we want a more efficient data model it needs to be on a newer protocol version that is configured manually by the user.

@marcotc marcotc requested a review from tlhunter February 7, 2025 21:26
@tlhunter
Copy link
Member

@rochdev do you mean that a customer might setup a round robin proxy in front of multiple agents of different versions?

Do we officially support swapping out agent versions on this fly like this?

@pr-commenter
Copy link

pr-commenter bot commented Feb 10, 2025

Benchmarks

Benchmark execution time: 2025-02-10 21:59:19

Comparing candidate commit 136a7da in PR branch conti/serialize_span_events_as_top_level_field with baseline commit 6f79a86 in branch master.

Found 0 performance improvements and 15 performance regressions! Performance is the same for 897 metrics, 21 unstable metrics.

scenario:log-skip-log-20

  • 🟥 cpu_user_time [+29.438ms; +34.756ms] or [+8.224%; +9.710%]
  • 🟥 execution_time [+32.292ms; +34.989ms] or [+8.018%; +8.688%]

scenario:log-with-debug-20

  • 🟥 cpu_user_time [+29.553ms; +34.782ms] or [+8.244%; +9.703%]
  • 🟥 execution_time [+29.950ms; +34.091ms] or [+7.426%; +8.453%]

scenario:log-with-error-20

  • 🟥 cpu_user_time [+29.652ms; +34.628ms] or [+8.322%; +9.719%]
  • 🟥 execution_time [+31.521ms; +33.810ms] or [+7.861%; +8.432%]

scenario:log-without-log-20

  • 🟥 cpu_user_time [+30.013ms; +35.023ms] or [+8.973%; +10.470%]
  • 🟥 execution_time [+31.798ms; +34.602ms] or [+8.376%; +9.115%]

scenario:log-without-log-22

  • 🟥 cpu_user_time [+16.600ms; +21.500ms] or [+5.718%; +7.405%]

scenario:startup-with-tracer-20

  • 🟥 cpu_user_time [+22.720ms; +31.790ms] or [+9.448%; +13.220%]
  • 🟥 execution_time [+25.253ms; +32.878ms] or [+8.881%; +11.562%]
  • 🟥 instructions [+38.7M instructions; +43.4M instructions] or [+6.243%; +7.004%]

scenario:startup-with-tracer-22

  • 🟥 cpu_user_time [+28.674ms; +37.081ms] or [+14.083%; +18.212%]
  • 🟥 execution_time [+35.147ms; +36.758ms] or [+14.197%; +14.848%]
  • 🟥 instructions [+35.9M instructions; +39.6M instructions] or [+5.461%; +6.033%]

@rochdev
Copy link
Member

rochdev commented Feb 10, 2025

do you mean that a customer might setup a round robin proxy in front of multiple agents of different versions?

That's one of the potential issues, but there are other problematic scenarios as well. It's not as rare as you'd think, and when we tried implementing the /info endpoint years ago we literally hit the issue in less than 24h.

Do we officially support swapping out agent versions on this fly like this?

We do, depending on your definition of "officially".

@tlhunter
Copy link
Member

This seems like such a weird edge case that we shouldn't worry about. It would be akin to a user leaving an application running that communicates with Postgres 14, then downgrading that database to Postgres 13, and hoping that the connection re-establishes and the app works without error. As a software developer I would never assume that scenario would work. If anything it's fantastic that we allow the in-place upgrade.

@rochdev
Copy link
Member

rochdev commented Feb 12, 2025

It's not a weird edge case, and as mentioned above, we hit an issue within 24h of trying to implement the /info endpoint. Doing this right requires a lot of complexity that doesn't even handle every use case properly, whereas alternatives are much simpler and will work 100% of the time, so I don't see why we should go with the subpar option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants