Skip to content

test(smp): add default-bucket dsd_uds variant alongside 1Hz cases#1594

Draft
jszwedko wants to merge 1 commit into
jszwedko/agent-vs-adp-1hz-benchmarkfrom
jszwedko/agent-vs-adp-default-vs-1hz-benchmark
Draft

test(smp): add default-bucket dsd_uds variant alongside 1Hz cases#1594
jszwedko wants to merge 1 commit into
jszwedko/agent-vs-adp-1hz-benchmarkfrom
jszwedko/agent-vs-adp-default-vs-1hz-benchmark

Conversation

@jszwedko
Copy link
Copy Markdown
Collaborator

@jszwedko jszwedko commented May 6, 2026

Summary

Doubles the dsd_uds_* case set so a single SMP run reports both variants side-by-side: Agent vs Agent+ADP at 1Hz aggregation (aggregator_bucket_size_seconds: 1) and at the Agent's default 10s bucket. The diff between matched *_1hz_* and *_default_* rows in the report isolates the telemetry impact of the bucket knob from the Agent-vs-Agent+ADP delta.

Stacked on #1592 (will auto-retarget to `main` when #1592 merges).

Test plan

  • CI green; SMP report posted under header "Regression Detector (Agent vs Agent+ADP, 1Hz vs default bucket)".
  • Report shows 30 cases (15 `1hz` + 15 `default`).
  • Compare matched rows (e.g., `dsd_uds_10mb_3k_contexts_1hz_throughput` vs `dsd_uds_10mb_3k_contexts_default_throughput`):
    • `aggregate_flushed_total` rate: 1hz row should show ~10–15× higher than default row.
    • Series `interval` field on the wire: 1 vs 10.
    • Comparison-side (Agent+ADP) CPU/RSS deltas should be measurably different between variants.

🤖 Generated with Claude Code

Doubles the dsd_uds_* case set so a single SMP run reports both
variants side-by-side: Agent vs Agent+ADP at 1Hz aggregation
(aggregator_bucket_size_seconds: 1) and at the Agent's default 10s
bucket. The diff between matched *_1hz_* and *_default_* rows in the
report isolates the telemetry impact of the bucket knob from the
Agent-vs-Agent+ADP delta.

- Renames shared/datadog.yaml -> shared/datadog-1hz.yaml.
- Adds shared/datadog-default-bucket.yaml.
- Splits dsd_base template into a shared parent (lading generator) plus
  two leaves that pick the datadog.yaml source. Each leaf is the
  parent of one variant's experiments.
- Bumps run-benchmarks-adp timeout 1h -> 2h for the doubled case set.
- Renames the SMP PR-comment header to reflect the dual-variant setup.

Stacked on jszwedko/agent-vs-adp-1hz-benchmark.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dd-octo-sts dd-octo-sts Bot added area/ci CI/CD, automated testing, etc. area/test All things testing: unit/integration, correctness, SMP regression, etc. labels May 6, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 6, 2026

Binary Size Analysis (Agent Data Plane)

Target: dc8f4b9 (baseline) vs b44c5fe (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 37.11 MiB
Comparison Size: 37.00 MiB
Size Change: -111.49 KiB (-0.29%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module File Size Symbols
figment -118.82 KiB 125
otlp_protos::otlp_include::opentelemetry -41.79 KiB 103
hyper +25.69 KiB 76
prost +24.60 KiB 66
hyper_util -14.05 KiB 13
hashbrown +11.34 KiB 72
h2 -8.86 KiB 92
[sections] -8.52 KiB 6
tonic -7.37 KiB 33
core -7.09 KiB 878
serde_core +6.51 KiB 85
serde +6.39 KiB 18
tower +5.15 KiB 11
async_compression +4.62 KiB 19
tokio_util +3.94 KiB 14
alloc +3.76 KiB 50
saluki_components::sources::otlp +3.75 KiB 17
tokio +3.67 KiB 124
saluki_core::data_model::event -3.40 KiB 8
futures_channel -3.35 KiB 7

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW] +18.5Ki  [NEW] +18.3Ki    saluki_components::transforms::apm_stats::span_concentrator::SpanConcentrator::flush::h4cec187aab531472
  [NEW] +16.5Ki  [NEW] +16.4Ki    saluki_components::transforms::apm_stats::span_concentrator::SpanConcentrator::add_span::hf95b8e429cc5bbe0
  [NEW] +16.1Ki  [NEW] +16.0Ki    saluki_components::transforms::apm_stats::span_concentrator::SpanConcentrator::new_stat_span_from_span::h831751497d326aef
  +283% +15.9Ki  +288% +15.9Ki    h2::proto::connection::DynConnection<B>::recv_frame::h9d7adeb5727e1522
  [NEW] +12.3Ki  [NEW] +12.1Ki    _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::hf9a832fc7767a946
  [NEW] +9.39Ki  [NEW] +9.23Ki    _<hyper::proto::h2::server::Server<T,S,B,E> as core::future::future::Future>::poll::h7fad89436d42473e
  +750% +6.74Ki  +828% +6.74Ki    prost::encoding::message::merge_repeated::hc52fda914c63fb75
  [NEW] +6.50Ki  [NEW] +6.28Ki    saluki_components::common::datadog::apm::_::_<impl serde_core::de::Deserialize for saluki_components::common::datadog::apm::ApmConfiguration>::deserialize::h2b55df90d15c8dc3
  +739% +6.39Ki  +819% +6.39Ki    prost::encoding::message::merge_repeated::h125609fe5afef278
  [DEL] -6.56Ki  [DEL] -6.41Ki    _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::h14607bccbe25f0f5
 -24.0% -8.01Ki -24.1% -7.98Ki    _<saluki_components::transforms::apm_stats::ApmStats as saluki_core::components::transforms::Transform>::run::_{{closure}}::h63fe22416badd464
  [DEL] -8.13Ki  [DEL] -8.00Ki    figment::value::de::_<impl figment::value::value::Value>::deserialize_from::hc178e2144edf2db7
 -67.1% -9.24Ki -67.7% -9.24Ki    saluki_components::transforms::trace_obfuscation::sql::obfuscate_sql_string::hbc6c7c370aac7ff9
  [DEL] -9.49Ki  [DEL] -9.34Ki    _<figment::value::magic::Tagged<T> as figment::value::magic::Magic>::deserialize_from::h44f59c3078bb5bee
  [DEL] -9.74Ki  [DEL] -9.59Ki    _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::h795d206f112d7dfd
 -81.7% -10.1Ki -82.6% -10.1Ki    _<core::pin::Pin<P> as core::future::future::Future>::poll::h901e76ef802f2f4c
  [DEL] -11.6Ki  [DEL] -11.5Ki    _<figment::value::de::ConfiguredValueDe<I> as serde_core::de::Deserializer>::deserialize_struct::h262edabf8ad9b351
  [DEL] -15.4Ki  [DEL] -15.3Ki    _<figment::value::magic::Tagged<T> as figment::value::magic::Magic>::deserialize_from::hf4491bd4db3bec82
  [DEL] -15.9Ki  [DEL] -15.7Ki    _<figment::value::magic::RelativePathBuf as figment::value::magic::Magic>::deserialize_from::hf41c2cc956726b9d
  [DEL] -32.1Ki  [DEL] -32.0Ki    saluki_components::transforms::apm_stats::ApmStats::process_trace::h7d2f794f20a992a4
  -1.4% -83.5Ki  -1.4% -69.0Ki    [4675 Others]
  -0.3%  -111Ki  -0.3% -96.7Ki    TOTAL

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 6, 2026

Regression Detector (Agent vs Agent+ADP, 1Hz vs default bucket)

Regression Detector Results

Run ID: d8df5951-67f8-429c-b9a8-2f631f21756f

Baseline: 15f1e04a
Comparison: b44c5fe
Diff

Optimization Goals: ❌ Regression(s) detected

perf experiment goal Δ mean % Δ mean % CI trials links
dsd_uds_512kb_3k_contexts_1hz_memory memory utilization +5.68 [+5.46, +5.90] 1 bounds checks dashboard
dsd_uds_512kb_3k_contexts_default_memory memory utilization +5.13 [+4.91, +5.35] 1 bounds checks dashboard
dsd_uds_10mb_3k_contexts_default_memory memory utilization -15.40 [-15.58, -15.23] 1 bounds checks dashboard
dsd_uds_10mb_3k_contexts_1hz_memory memory utilization -15.68 [-15.85, -15.51] 1 bounds checks dashboard
dsd_uds_500mb_3k_contexts_1hz_throughput ingress throughput -18.17 [-18.32, -18.02] 1 bounds checks dashboard
dsd_uds_500mb_3k_contexts_default_throughput ingress throughput -18.58 [-18.74, -18.42] 1 bounds checks dashboard
dsd_uds_100mb_3k_contexts_1hz_cpu % cpu utilization -29.79 [-58.68, -0.91] 1 bounds checks dashboard
dsd_uds_100mb_3k_contexts_default_cpu % cpu utilization -33.22 [-61.91, -4.52] 1 bounds checks dashboard
dsd_uds_500mb_3k_contexts_1hz_cpu % cpu utilization -55.96 [-64.70, -47.22] 1 bounds checks dashboard
dsd_uds_500mb_3k_contexts_default_cpu % cpu utilization -56.19 [-64.59, -47.79] 1 bounds checks dashboard
dsd_uds_100mb_3k_contexts_1hz_memory memory utilization -56.91 [-57.05, -56.76] 1 bounds checks dashboard
dsd_uds_100mb_3k_contexts_default_memory memory utilization -57.25 [-57.39, -57.11] 1 bounds checks dashboard
dsd_uds_500mb_3k_contexts_1hz_memory memory utilization -76.06 [-76.16, -75.95] 1 bounds checks dashboard
dsd_uds_500mb_3k_contexts_default_memory memory utilization -76.38 [-76.48, -76.28] 1 bounds checks dashboard

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
dsd_uds_1mb_3k_contexts_1hz_cpu % cpu utilization +94.47 [-22.48, +211.42] 1 bounds checks dashboard
dsd_uds_512kb_3k_contexts_default_cpu % cpu utilization +29.48 [-63.87, +122.84] 1 bounds checks dashboard
dsd_uds_512kb_3k_contexts_1hz_cpu % cpu utilization +26.46 [-63.37, +116.29] 1 bounds checks dashboard
dsd_uds_512kb_3k_contexts_1hz_memory memory utilization +5.68 [+5.46, +5.90] 1 bounds checks dashboard
dsd_uds_512kb_3k_contexts_default_memory memory utilization +5.13 [+4.91, +5.35] 1 bounds checks dashboard
dsd_uds_1mb_3k_contexts_default_cpu % cpu utilization +4.18 [-81.18, +89.54] 1 bounds checks dashboard
dsd_uds_1mb_3k_contexts_1hz_memory memory utilization +4.13 [+3.91, +4.35] 1 bounds checks dashboard
dsd_uds_1mb_3k_contexts_default_memory memory utilization +4.10 [+3.88, +4.31] 1 bounds checks dashboard
dsd_uds_10mb_3k_contexts_1hz_throughput ingress throughput +0.02 [-0.05, +0.08] 1 bounds checks dashboard
dsd_uds_512kb_3k_contexts_1hz_throughput ingress throughput +0.00 [-0.06, +0.06] 1 bounds checks dashboard
dsd_uds_512kb_3k_contexts_default_throughput ingress throughput +0.00 [-0.06, +0.06] 1 bounds checks dashboard
dsd_uds_10mb_3k_contexts_default_throughput ingress throughput +0.00 [-0.06, +0.06] 1 bounds checks dashboard
dsd_uds_1mb_3k_contexts_1hz_throughput ingress throughput -0.00 [-0.06, +0.06] 1 bounds checks dashboard
dsd_uds_1mb_3k_contexts_default_throughput ingress throughput -0.00 [-0.06, +0.06] 1 bounds checks dashboard
dsd_uds_100mb_3k_contexts_1hz_throughput ingress throughput -0.04 [-0.21, +0.13] 1 bounds checks dashboard
dsd_uds_100mb_3k_contexts_default_throughput ingress throughput -0.05 [-0.23, +0.13] 1 bounds checks dashboard
dsd_uds_10mb_3k_contexts_default_memory memory utilization -15.40 [-15.58, -15.23] 1 bounds checks dashboard
dsd_uds_10mb_3k_contexts_1hz_memory memory utilization -15.68 [-15.85, -15.51] 1 bounds checks dashboard
dsd_uds_500mb_3k_contexts_1hz_throughput ingress throughput -18.17 [-18.32, -18.02] 1 bounds checks dashboard
dsd_uds_500mb_3k_contexts_default_throughput ingress throughput -18.58 [-18.74, -18.42] 1 bounds checks dashboard
dsd_uds_10mb_3k_contexts_default_cpu % cpu utilization -25.96 [-90.17, +38.24] 1 bounds checks dashboard
dsd_uds_100mb_3k_contexts_1hz_cpu % cpu utilization -29.79 [-58.68, -0.91] 1 bounds checks dashboard
dsd_uds_100mb_3k_contexts_default_cpu % cpu utilization -33.22 [-61.91, -4.52] 1 bounds checks dashboard
dsd_uds_10mb_3k_contexts_1hz_cpu % cpu utilization -33.30 [-91.57, +24.96] 1 bounds checks dashboard
dsd_uds_500mb_3k_contexts_1hz_cpu % cpu utilization -55.96 [-64.70, -47.22] 1 bounds checks dashboard
dsd_uds_500mb_3k_contexts_default_cpu % cpu utilization -56.19 [-64.59, -47.79] 1 bounds checks dashboard
dsd_uds_100mb_3k_contexts_1hz_memory memory utilization -56.91 [-57.05, -56.76] 1 bounds checks dashboard
dsd_uds_100mb_3k_contexts_default_memory memory utilization -57.25 [-57.39, -57.11] 1 bounds checks dashboard
dsd_uds_500mb_3k_contexts_1hz_memory memory utilization -76.06 [-76.16, -75.95] 1 bounds checks dashboard
dsd_uds_500mb_3k_contexts_default_memory memory utilization -76.38 [-76.48, -76.28] 1 bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci CI/CD, automated testing, etc. area/test All things testing: unit/integration, correctness, SMP regression, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant