test(smp): add default-bucket dsd_uds variant alongside 1Hz cases by jszwedko · Pull Request #1594 · DataDog/saluki

jszwedko · 2026-05-06T00:10:44Z

Summary

Doubles the dsd_uds_* case set so a single SMP run reports both variants side-by-side: Agent vs Agent+ADP at 1Hz aggregation (aggregator_bucket_size_seconds: 1) and at the Agent's default 10s bucket. The diff between matched *_1hz_* and *_default_* rows in the report isolates the telemetry impact of the bucket knob from the Agent-vs-Agent+ADP delta.

Stacked on #1592 (will auto-retarget to `main` when #1592 merges).

Test plan

CI green; SMP report posted under header "Regression Detector (Agent vs Agent+ADP, 1Hz vs default bucket)".
Report shows 30 cases (15 `1hz` + 15 `default`).
Compare matched rows (e.g., `dsd_uds_10mb_3k_contexts_1hz_throughput` vs `dsd_uds_10mb_3k_contexts_default_throughput`):
- `aggregate_flushed_total` rate: 1hz row should show ~10–15× higher than default row.
- Series `interval` field on the wire: 1 vs 10.
- Comparison-side (Agent+ADP) CPU/RSS deltas should be measurably different between variants.

🤖 Generated with Claude Code

Doubles the dsd_uds_* case set so a single SMP run reports both variants side-by-side: Agent vs Agent+ADP at 1Hz aggregation (aggregator_bucket_size_seconds: 1) and at the Agent's default 10s bucket. The diff between matched *_1hz_* and *_default_* rows in the report isolates the telemetry impact of the bucket knob from the Agent-vs-Agent+ADP delta. - Renames shared/datadog.yaml -> shared/datadog-1hz.yaml. - Adds shared/datadog-default-bucket.yaml. - Splits dsd_base template into a shared parent (lading generator) plus two leaves that pick the datadog.yaml source. Each leaf is the parent of one variant's experiments. - Bumps run-benchmarks-adp timeout 1h -> 2h for the doubled case set. - Renames the SMP PR-comment header to reflect the dual-variant setup. Stacked on jszwedko/agent-vs-adp-1hz-benchmark. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pr-commenter · 2026-05-06T00:18:32Z

Binary Size Analysis (Agent Data Plane)

Target: dc8f4b9 (baseline) vs b44c5fe (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 37.11 MiB
Comparison Size: 37.00 MiB
Size Change: -111.49 KiB (-0.29%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module	File Size	Symbols
`figment`	-118.82 KiB	125
`otlp_protos::otlp_include::opentelemetry`	-41.79 KiB	103
`hyper`	+25.69 KiB	76
`prost`	+24.60 KiB	66
`hyper_util`	-14.05 KiB	13
`hashbrown`	+11.34 KiB	72
`h2`	-8.86 KiB	92
`[sections]`	-8.52 KiB	6
`tonic`	-7.37 KiB	33
`core`	-7.09 KiB	878
`serde_core`	+6.51 KiB	85
`serde`	+6.39 KiB	18
`tower`	+5.15 KiB	11
`async_compression`	+4.62 KiB	19
`tokio_util`	+3.94 KiB	14
`alloc`	+3.76 KiB	50
`saluki_components::sources::otlp`	+3.75 KiB	17
`tokio`	+3.67 KiB	124
`saluki_core::data_model::event`	-3.40 KiB	8
`futures_channel`	-3.35 KiB	7

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW] +18.5Ki  [NEW] +18.3Ki    saluki_components::transforms::apm_stats::span_concentrator::SpanConcentrator::flush::h4cec187aab531472
  [NEW] +16.5Ki  [NEW] +16.4Ki    saluki_components::transforms::apm_stats::span_concentrator::SpanConcentrator::add_span::hf95b8e429cc5bbe0
  [NEW] +16.1Ki  [NEW] +16.0Ki    saluki_components::transforms::apm_stats::span_concentrator::SpanConcentrator::new_stat_span_from_span::h831751497d326aef
  +283% +15.9Ki  +288% +15.9Ki    h2::proto::connection::DynConnection<B>::recv_frame::h9d7adeb5727e1522
  [NEW] +12.3Ki  [NEW] +12.1Ki    _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::hf9a832fc7767a946
  [NEW] +9.39Ki  [NEW] +9.23Ki    _<hyper::proto::h2::server::Server<T,S,B,E> as core::future::future::Future>::poll::h7fad89436d42473e
  +750% +6.74Ki  +828% +6.74Ki    prost::encoding::message::merge_repeated::hc52fda914c63fb75
  [NEW] +6.50Ki  [NEW] +6.28Ki    saluki_components::common::datadog::apm::_::_<impl serde_core::de::Deserialize for saluki_components::common::datadog::apm::ApmConfiguration>::deserialize::h2b55df90d15c8dc3
  +739% +6.39Ki  +819% +6.39Ki    prost::encoding::message::merge_repeated::h125609fe5afef278
  [DEL] -6.56Ki  [DEL] -6.41Ki    _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::h14607bccbe25f0f5
 -24.0% -8.01Ki -24.1% -7.98Ki    _<saluki_components::transforms::apm_stats::ApmStats as saluki_core::components::transforms::Transform>::run::_{{closure}}::h63fe22416badd464
  [DEL] -8.13Ki  [DEL] -8.00Ki    figment::value::de::_<impl figment::value::value::Value>::deserialize_from::hc178e2144edf2db7
 -67.1% -9.24Ki -67.7% -9.24Ki    saluki_components::transforms::trace_obfuscation::sql::obfuscate_sql_string::hbc6c7c370aac7ff9
  [DEL] -9.49Ki  [DEL] -9.34Ki    _<figment::value::magic::Tagged<T> as figment::value::magic::Magic>::deserialize_from::h44f59c3078bb5bee
  [DEL] -9.74Ki  [DEL] -9.59Ki    _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::h795d206f112d7dfd
 -81.7% -10.1Ki -82.6% -10.1Ki    _<core::pin::Pin<P> as core::future::future::Future>::poll::h901e76ef802f2f4c
  [DEL] -11.6Ki  [DEL] -11.5Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h262edabf8ad9b351
  [DEL] -15.4Ki  [DEL] -15.3Ki    _<figment::value::magic::Tagged<T> as figment::value::magic::Magic>::deserialize_from::hf4491bd4db3bec82
  [DEL] -15.9Ki  [DEL] -15.7Ki    _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::hf41c2cc956726b9d
  [DEL] -32.1Ki  [DEL] -32.0Ki    saluki_components::transforms::apm_stats::ApmStats::process_trace::h7d2f794f20a992a4
  -1.4% -83.5Ki  -1.4% -69.0Ki    [4675 Others]
  -0.3%  -111Ki  -0.3% -96.7Ki    TOTAL

pr-commenter · 2026-05-06T00:36:36Z

Regression Detector (Agent vs Agent+ADP, 1Hz vs default bucket)

Regression Detector Results

Run ID: d8df5951-67f8-429c-b9a8-2f631f21756f

Baseline: 15f1e04a
Comparison: b44c5fe
Diff

Optimization Goals: ❌ Regression(s) detected

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
❌	dsd_uds_512kb_3k_contexts_1hz_memory	memory utilization	+5.68	[+5.46, +5.90]	1	bounds checks dashboard
❌	dsd_uds_512kb_3k_contexts_default_memory	memory utilization	+5.13	[+4.91, +5.35]	1	bounds checks dashboard
✅	dsd_uds_10mb_3k_contexts_default_memory	memory utilization	-15.40	[-15.58, -15.23]	1	bounds checks dashboard
✅	dsd_uds_10mb_3k_contexts_1hz_memory	memory utilization	-15.68	[-15.85, -15.51]	1	bounds checks dashboard
❌	dsd_uds_500mb_3k_contexts_1hz_throughput	ingress throughput	-18.17	[-18.32, -18.02]	1	bounds checks dashboard
❌	dsd_uds_500mb_3k_contexts_default_throughput	ingress throughput	-18.58	[-18.74, -18.42]	1	bounds checks dashboard
✅	dsd_uds_100mb_3k_contexts_1hz_cpu	% cpu utilization	-29.79	[-58.68, -0.91]	1	bounds checks dashboard
✅	dsd_uds_100mb_3k_contexts_default_cpu	% cpu utilization	-33.22	[-61.91, -4.52]	1	bounds checks dashboard
✅	dsd_uds_500mb_3k_contexts_1hz_cpu	% cpu utilization	-55.96	[-64.70, -47.22]	1	bounds checks dashboard
✅	dsd_uds_500mb_3k_contexts_default_cpu	% cpu utilization	-56.19	[-64.59, -47.79]	1	bounds checks dashboard
✅	dsd_uds_100mb_3k_contexts_1hz_memory	memory utilization	-56.91	[-57.05, -56.76]	1	bounds checks dashboard
✅	dsd_uds_100mb_3k_contexts_default_memory	memory utilization	-57.25	[-57.39, -57.11]	1	bounds checks dashboard
✅	dsd_uds_500mb_3k_contexts_1hz_memory	memory utilization	-76.06	[-76.16, -75.95]	1	bounds checks dashboard
✅	dsd_uds_500mb_3k_contexts_default_memory	memory utilization	-76.38	[-76.48, -76.28]	1	bounds checks dashboard

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	dsd_uds_1mb_3k_contexts_1hz_cpu	% cpu utilization	+94.47	[-22.48, +211.42]	1	bounds checks dashboard
➖	dsd_uds_512kb_3k_contexts_default_cpu	% cpu utilization	+29.48	[-63.87, +122.84]	1	bounds checks dashboard
➖	dsd_uds_512kb_3k_contexts_1hz_cpu	% cpu utilization	+26.46	[-63.37, +116.29]	1	bounds checks dashboard
❌	dsd_uds_512kb_3k_contexts_1hz_memory	memory utilization	+5.68	[+5.46, +5.90]	1	bounds checks dashboard
❌	dsd_uds_512kb_3k_contexts_default_memory	memory utilization	+5.13	[+4.91, +5.35]	1	bounds checks dashboard
➖	dsd_uds_1mb_3k_contexts_default_cpu	% cpu utilization	+4.18	[-81.18, +89.54]	1	bounds checks dashboard
➖	dsd_uds_1mb_3k_contexts_1hz_memory	memory utilization	+4.13	[+3.91, +4.35]	1	bounds checks dashboard
➖	dsd_uds_1mb_3k_contexts_default_memory	memory utilization	+4.10	[+3.88, +4.31]	1	bounds checks dashboard
➖	dsd_uds_10mb_3k_contexts_1hz_throughput	ingress throughput	+0.02	[-0.05, +0.08]	1	bounds checks dashboard
➖	dsd_uds_512kb_3k_contexts_1hz_throughput	ingress throughput	+0.00	[-0.06, +0.06]	1	bounds checks dashboard
➖	dsd_uds_512kb_3k_contexts_default_throughput	ingress throughput	+0.00	[-0.06, +0.06]	1	bounds checks dashboard
➖	dsd_uds_10mb_3k_contexts_default_throughput	ingress throughput	+0.00	[-0.06, +0.06]	1	bounds checks dashboard
➖	dsd_uds_1mb_3k_contexts_1hz_throughput	ingress throughput	-0.00	[-0.06, +0.06]	1	bounds checks dashboard
➖	dsd_uds_1mb_3k_contexts_default_throughput	ingress throughput	-0.00	[-0.06, +0.06]	1	bounds checks dashboard
➖	dsd_uds_100mb_3k_contexts_1hz_throughput	ingress throughput	-0.04	[-0.21, +0.13]	1	bounds checks dashboard
➖	dsd_uds_100mb_3k_contexts_default_throughput	ingress throughput	-0.05	[-0.23, +0.13]	1	bounds checks dashboard
✅	dsd_uds_10mb_3k_contexts_default_memory	memory utilization	-15.40	[-15.58, -15.23]	1	bounds checks dashboard
✅	dsd_uds_10mb_3k_contexts_1hz_memory	memory utilization	-15.68	[-15.85, -15.51]	1	bounds checks dashboard
❌	dsd_uds_500mb_3k_contexts_1hz_throughput	ingress throughput	-18.17	[-18.32, -18.02]	1	bounds checks dashboard
❌	dsd_uds_500mb_3k_contexts_default_throughput	ingress throughput	-18.58	[-18.74, -18.42]	1	bounds checks dashboard
➖	dsd_uds_10mb_3k_contexts_default_cpu	% cpu utilization	-25.96	[-90.17, +38.24]	1	bounds checks dashboard
✅	dsd_uds_100mb_3k_contexts_1hz_cpu	% cpu utilization	-29.79	[-58.68, -0.91]	1	bounds checks dashboard
✅	dsd_uds_100mb_3k_contexts_default_cpu	% cpu utilization	-33.22	[-61.91, -4.52]	1	bounds checks dashboard
➖	dsd_uds_10mb_3k_contexts_1hz_cpu	% cpu utilization	-33.30	[-91.57, +24.96]	1	bounds checks dashboard
✅	dsd_uds_500mb_3k_contexts_1hz_cpu	% cpu utilization	-55.96	[-64.70, -47.22]	1	bounds checks dashboard
✅	dsd_uds_500mb_3k_contexts_default_cpu	% cpu utilization	-56.19	[-64.59, -47.79]	1	bounds checks dashboard
✅	dsd_uds_100mb_3k_contexts_1hz_memory	memory utilization	-56.91	[-57.05, -56.76]	1	bounds checks dashboard
✅	dsd_uds_100mb_3k_contexts_default_memory	memory utilization	-57.25	[-57.39, -57.11]	1	bounds checks dashboard
✅	dsd_uds_500mb_3k_contexts_1hz_memory	memory utilization	-76.06	[-76.16, -75.95]	1	bounds checks dashboard
✅	dsd_uds_500mb_3k_contexts_default_memory	memory utilization	-76.38	[-76.48, -76.28]	1	bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

dd-octo-sts Bot added area/ci CI/CD, automated testing, etc. area/test All things testing: unit/integration, correctness, SMP regression, etc. labels May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(smp): add default-bucket dsd_uds variant alongside 1Hz cases#1594

test(smp): add default-bucket dsd_uds variant alongside 1Hz cases#1594
jszwedko wants to merge 1 commit into
jszwedko/agent-vs-adp-1hz-benchmarkfrom
jszwedko/agent-vs-adp-default-vs-1hz-benchmark

jszwedko commented May 6, 2026

Uh oh!

pr-commenter Bot commented May 6, 2026

Changes by Module

Detailed Symbol Changes

Uh oh!

pr-commenter Bot commented May 6, 2026

Fine details of change detection per experiment

Explanation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jszwedko commented May 6, 2026

Summary

Test plan

Uh oh!

pr-commenter Bot commented May 6, 2026

Binary Size Analysis (Agent Data Plane)

Changes by Module

Detailed Symbol Changes

Uh oh!

pr-commenter Bot commented May 6, 2026

Regression Detector (Agent vs Agent+ADP, 1Hz vs default bucket)

Regression Detector Results

Optimization Goals: ❌ Regression(s) detected

Fine details of change detection per experiment

Explanation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant