Skip to content

enhancement(core): Add efficient trace payload (aka v1) encoder#1794

Draft
ajgajg1134 wants to merge 8 commits into
mainfrom
andrew.glaude/moreV1
Draft

enhancement(core): Add efficient trace payload (aka v1) encoder#1794
ajgajg1134 wants to merge 8 commits into
mainfrom
andrew.glaude/moreV1

Conversation

@ajgajg1134
Copy link
Copy Markdown
Contributor

pls no look tobz

Summary

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

How did you test this PR?

References

ajgajg1134 and others added 2 commits May 29, 2026 16:19
Replaces the `add_tracer_payloads` (protobuf field 42) call with
`add_idx_tracer_payloads` (protobuf field 90), which uses a string
interning table for deduplication. New fields now carried: 128-bit
trace ID on the chunk, sampling mechanism, origin, language version,
runtime ID, and per-span env/version/component/kind.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@dd-octo-sts dd-octo-sts Bot added area/io General I/O and networking. area/components Sources, transforms, and destinations. encoder/datadog-traces Datadog Traces encoder. labels Jun 1, 2026
@datadog-prod-us1-5
Copy link
Copy Markdown

datadog-prod-us1-5 Bot commented Jun 1, 2026

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 5 Pipeline jobs failed

DataDog/saluki | check-clippy   View in Datadog   GitLab

See error Compilation error due to needless borrow in lib/saluki-components/src/encoders/datadog/traces/mod.rs on multiple lines.

DataDog/saluki | check-docs   View in Datadog   GitLab

See error 9 spelling errors detected related to unrecognized identifiers in several files.

DataDog/saluki | check-fmt   View in Datadog   GitLab

See error Diff in mod.rs:594 - Found formatting issue in Rust source file

View all 5 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 8995438 | Docs | Datadog PR Page | Give us feedback!

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Jun 1, 2026

Regression Detector (Agent Data Plane)

Run ID: 620eec08-5735-4e8b-91e7-ff8cbd2f4b56
Baseline: 695ecfaa · Comparison: 89954387 · diff

Optimization Goals: ❌ 3 regressions detected

experiment goal Δ mean % links
otlp_ingest_metrics_5mb_memory memory 🔴 +7.95 metrics profiles logs
otlp_ingest_traces_5mb_cpu (erratic) cpu 🔴 +7.79 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_cpu (erratic) cpu 🔴 +5.35 metrics profiles logs
Fine details of change detection per experiment (32)

Experiments configured erratic: true are tagged (ignored) and skipped when determining which experiments regressed or improved. Experiments which are detected as erratic at runtime are tagged (erratic) to flag that the run's sample dispersion was high, but their regression / improvement signal still counts.

experiment goal Δ mean % links
otlp_ingest_traces_ottl_filtering_5mb_cpu (erratic) cpu ⚪ +0.65 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_memory memory ⚪ +0.52 metrics profiles logs
dsd_uds_10mb_3k_contexts_cpu (erratic) cpu ⚪ +0.45 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_throughput throughput ⚪ -0.37 metrics profiles logs
otlp_ingest_traces_ottl_transform_5mb_memory memory ⚪ +0.34 metrics profiles logs
dsd_uds_100mb_3k_contexts_memory memory ⚪ +0.31 metrics profiles logs
quality_gates_rss_dsd_heavy memory ⚪ +0.22 metrics profiles logs
dsd_uds_1mb_3k_contexts_memory memory ⚪ +0.21 metrics profiles logs
dsd_uds_500mb_3k_contexts_memory memory ⚪ +0.18 metrics profiles logs
quality_gates_rss_dsd_medium memory ⚪ +0.17 metrics profiles logs
quality_gates_rss_idle memory ⚪ +0.16 metrics profiles logs
otlp_ingest_traces_ottl_filtering_5mb_throughput throughput ⚪ -0.15 metrics profiles logs
dsd_uds_10mb_3k_contexts_memory memory ⚪ +0.14 metrics profiles logs
otlp_ingest_traces_5mb_throughput throughput ⚪ -0.13 metrics profiles logs
otlp_ingest_traces_5mb_memory memory ⚪ +0.11 metrics profiles logs
dsd_uds_512kb_3k_contexts_memory memory ⚪ +0.04 metrics profiles logs
otlp_ingest_metrics_5mb_throughput throughput ⚪ -0.01 metrics profiles logs
dsd_uds_1mb_3k_contexts_throughput throughput ⚪ -0.00 metrics profiles logs
dsd_uds_100mb_3k_contexts_throughput throughput ⚪ +0.00 metrics profiles logs
otlp_ingest_logs_5mb_throughput (ignored) throughput ⚪ +0.00 metrics profiles logs
dsd_uds_512kb_3k_contexts_throughput throughput ⚪ +0.01 metrics profiles logs
quality_gates_rss_dsd_low memory ⚪ -0.01 metrics profiles logs
dsd_uds_10mb_3k_contexts_throughput throughput ⚪ +0.01 metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory ⚪ -0.17 metrics profiles logs
otlp_ingest_logs_5mb_cpu (ignored) cpu ⚪ -0.51 metrics profiles logs
dsd_uds_1mb_3k_contexts_cpu (erratic) cpu ⚪ -0.66 metrics profiles logs
dsd_uds_100mb_3k_contexts_cpu (erratic) cpu ⚪ -0.96 metrics profiles logs
dsd_uds_500mb_3k_contexts_cpu (erratic) cpu ⚪ -1.86 metrics profiles logs
otlp_ingest_metrics_5mb_cpu (erratic) cpu ⚪ -2.01 metrics profiles logs
dsd_uds_512kb_3k_contexts_cpu (erratic) cpu ⚪ -2.37 metrics profiles logs
dsd_uds_500mb_3k_contexts_throughput throughput ⚪ +2.89 metrics profiles logs
otlp_ingest_logs_5mb_memory (ignored) memory ⚪ -12.26 metrics profiles logs
Bounds Checks: ✅ Passed (5)
experiment check replicates observed links
quality_gates_rss_dsd_heavy memory_usage 10/10 ✅ 119 MiB ≤ 140 MiB metrics profiles logs
quality_gates_rss_dsd_low memory_usage 10/10 ✅ 39.9 MiB ≤ 50 MiB metrics profiles logs
quality_gates_rss_dsd_medium memory_usage 10/10 ✅ 62.5 MiB ≤ 75 MiB metrics profiles logs
quality_gates_rss_dsd_ultraheavy memory_usage 10/10 ✅ 183 MiB ≤ 200 MiB metrics profiles logs
quality_gates_rss_idle memory_usage 10/10 ✅ 26.6 MiB ≤ 40 MiB metrics profiles logs
Explanation

A change is flagged as a regression when |Δ mean %| > 5.00% in the regressing direction for its optimization goal AND SMP marks the experiment as a regression (is_regression: true). Improvements use the matching criteria for the improving direction. Experiments configured erratic: true (tagged (ignored)) are skipped outright; experiments detected as erratic at runtime (tagged (erratic)) still count, since that flag describes sample dispersion rather than directional certainty. The Δ mean % cell is colored accordingly: 🟢 = improvement, 🔴 = regression, ⚪ = neutral. Reduction in CPU or memory is an improvement; reduction in ingress throughput is a regression.

Comment on lines +45 to +46
let mut new_spans = Span::get_spans_from_agent_payload(&payload);
new_spans.extend(Span::get_spans_from_idx_bytes(&payload, raw_body));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these always mutually exclusive depending on payload version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structurally no, but functionally at the moment yes. The trace-agent will only ever emit one or the other in a single agent payload. I'd have to double check if trace intake can support receiving both at the same time though

pub use super::super::trace_piecemeal_include::datadog::trace::*;
}

/// String-indexed (`idx`) trace payload types used by the v0.2 trace intake format.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v0.2? :suspect:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha yeah, versions are... confusing, this new payload format is v1 from tracer -> trace agent, but the endpoint the trace-agent uses at intake is called api/v0.2/traces.... which internally they actually call v1... but since this is a new field it's the same endpoint version on the backend...and the non indexed version of a tracer payload is v0.7 for tracers.... it's a mess

.customize_callback(SerdeCapableStructs)
.run_from_script();

// Separate invocation for idx proto types to avoid filename collision with
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally a take-it-or-leave-it thing, but it feels very opaque as an outsider to see "idx proto types" instead of just "v1"... like "v1" is immediately grokable/intuitive, but "idx" less so. Like we talk about it as the v1 protocol internally, not the "idx protocol."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is kinda my bad. See my other comment on trace versioning. I've started to try and talk about the project as the "Efficient trace payload" aka ETP to avoid this "V1" naming collision that happens everywhere. I went with idx in the trace-agent code since the biggest difference is that strings are 'indexed'. But I'm fine with renaming this to whatever is clearest here :P

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ETP or v1 make the most sense since they give something more tangible / intuitive to use when discussing.

#[derive(Debug)]
struct StringTable {
strings: Vec<MetaString>,
indices: FastHashMap<MetaString, u32>,
Copy link
Copy Markdown
Member

@tobz tobz Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you switch to FastIndexMap, you can drop strings and just use self.indices.insert_full to get back a tuple of (index, <maybe previous value>).. and then for lookups, you'd do self.indices.get_index_of to check if it existed and, if so, get the index for it in one fell swoop.

Although as I say that, I realize all we'd really need in that scenario is a set, not a full map.... so you would probably want to end up adding an equivalent alias for indexmap::IndexSet to lib/saluki-common/src/collections/mod.rs and then actually use FastIndexSet instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooo good call. TIL about IndexSet/IndexMap. The intern function is so much simpler now too

@dd-octo-sts dd-octo-sts Bot added the area/core Core functionality, event model, etc. label Jun 1, 2026
@ajgajg1134 ajgajg1134 changed the title Andrew.glaude/more v1 enhancement(core): Add efficient trace payload (aka v1) encoder Jun 1, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Jun 1, 2026

Binary Size Analysis (Agent Data Plane)

Baseline: 695ecfa · Comparison: 8995438 · diff
Analysis Configuration: stripped binaries · Pass/Fail Threshold: +5%
Sizes: 37.88 MiB (baseline) vs 37.84 MiB (comparison)
Size Change: -43.80 KiB (-0.11%)

✅ Binary size difference within threshold

Changes by Module
Module File Size Symbols
figment -78.97 KiB 75
http_body_util -16.28 KiB 37
tonic +12.05 KiB 19
saluki_components::sources::otlp +10.16 KiB 13
core +9.98 KiB 1055
datadog_protos::trace_piecemeal_include::datadog +8.33 KiB 21
serde_core +8.05 KiB 73
saluki_components::common::datadog +7.73 KiB 45
piecemeal -5.10 KiB 32
saluki_config::GenericConfiguration::as_typed -4.80 KiB 15
saluki_components::sources::dogstatsd -4.80 KiB 15
anon.74ec343e7df4c5afe576d99fcc874370.67.llvm.13153431456681701250 -3.91 KiB 1
anon.5505eb0c85c399b2b8d009c171c5330f.8.llvm.8575596082049325702 +3.90 KiB 1
saluki_components::forwarders::otlp +3.68 KiB 2
[sections] -3.40 KiB 7
hashbrown +3.29 KiB 19
anon.0aabd5de6de06276b581db469f8b07ea.361.llvm.2534536213424946774 +2.91 KiB 1
saluki_components::transforms::trace_sampler -2.89 KiB 12
saluki_components::transforms::apm_stats -2.83 KiB 6
anon.36eead261da73468b10600f578de22ed.87.llvm.14046994766694184360 -2.83 KiB 1
Detailed Symbol Changes
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW] +13.9Ki  [NEW] +13.7Ki    saluki_components::common::datadog::apm::_::_<impl serde_core::de::Deserialize for saluki_components::common::datadog::apm::ApmConfiguration>::deserialize::h9660ff35eea92aa4
  [NEW] +10.1Ki  [NEW] +9.90Ki    _<saluki_components::encoders::datadog::traces::TraceEndpointEncoder as saluki_components::common::datadog::request_builder::EndpointEncoder>::encode::h294b1e502de4868d
  [NEW] +8.66Ki  [NEW] +8.56Ki    piecemeal::io::scratch::ScratchWriter<B>::track_message::hc27b6007bbe6f73c
  [NEW] +8.03Ki  [NEW] +7.89Ki    _<tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::h1c2812045c0fe465
   +16% +7.59Ki   +16% +7.59Ki    saluki_components::sources::otlp::metrics::translator::OtlpMetricsTranslator::translate_metrics::he0bca799aae91dd7
  [NEW] +7.37Ki  [NEW] +7.17Ki    _<saluki_components::encoders::datadog::traces::DatadogTrace as saluki_core::components::encoders::Encoder>::run::_{{closure}}::h3b98a9deef40ad9d
  [NEW] +5.89Ki  [NEW] +5.74Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush::_{{closure}}::hd4e11543960fd6f5
  [NEW] +5.82Ki  [NEW] +5.74Ki    matchit::tree::Node<T>::insert::h6a0346f5d77d70d0
  [NEW] +5.51Ki  [NEW] +5.43Ki    prost::message::Message::encode::h358eac4c8468a0e8
  [DEL] -5.33Ki  [DEL] -5.17Ki    _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::ha3e379ed637e2cfa
  [DEL] -5.81Ki  [DEL] -5.66Ki    _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::h0ae9cd1d860bd4db
  [DEL] -5.86Ki  [DEL] -5.73Ki    figment::value::de::_<impl figment::value::value::Value>::deserialize_from::h9560f60e882663df
  [DEL] -5.89Ki  [DEL] -5.74Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush::_{{closure}}::haf172f1d298ad478
  [DEL] -6.14Ki  [DEL] -6.05Ki    matchit::router::Router<T>::insert::h65258e54e686a8f4
  [DEL] -7.37Ki  [DEL] -7.17Ki    _<saluki_components::encoders::datadog::traces::DatadogTrace as saluki_core::components::encoders::Encoder>::run::_{{closure}}::ha47828b80b022b38
  [DEL] -8.03Ki  [DEL] -7.89Ki    _<tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::h091e7554a2cb94d7
  [DEL] -9.80Ki  [DEL] -9.64Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h9d925db3b6777b7f
  [DEL] -11.3Ki  [DEL] -11.1Ki    _<saluki_components::encoders::datadog::traces::TraceEndpointEncoder as saluki_components::common::datadog::request_builder::EndpointEncoder>::encode::h854bf035420c3910
  [DEL] -11.8Ki  [DEL] -11.7Ki    _<figment::value::magic::Tagged<T> as figment::value::magic::Magic>::deserialize_from::h77de1a22118a46b1
  [DEL] -12.4Ki  [DEL] -12.3Ki    _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::ha091f686e9e8b2f0
  -0.5% -26.9Ki  -0.5% -21.5Ki    [6298 Others]
  -0.1% -43.8Ki  -0.1% -37.9Ki    TOTAL

self.strings.push(ms.clone());
self.indices.insert(ms, idx);
idx
let (idx, _) = self.indices.insert_full(MetaString::from(s));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will reconstruct a MetaString for every call, even if it doesn't end up actually inserting it due to already being present, so we should prefer the split get_index_of/insert_full paradigm.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought, which I'll flesh out when I have a chance: making this generic over stringtheory::CheapMetaString to avoid reconstructing when we already are passing in something that is derived from MetaString vs something that only ever comes as &str by the time we want to intern it.

Comment thread lib/saluki-components/src/encoders/datadog/traces/mod.rs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/components Sources, transforms, and destinations. area/core Core functionality, event model, etc. area/io General I/O and networking. encoder/datadog-traces Datadog Traces encoder.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants