Fix flaky data_streams bucket aggregation test #5093

ericfirth · 2025-11-25T17:18:08Z

Fixes race condition in 'aggregates multiple checkpoints into DDSketch histograms' test.

Root Cause:
The test captured 'now = Time.now.to_f' at the start, but the checkpoints
created immediately after used their own Core::Utils::Time.now internally.
These timestamps could differ by milliseconds, causing checkpoints to be
bucketed into a different time window than the test expected.

When the test calculated the expected bucket:
bucket_time_ns = now_ns - (now_ns % processor.bucket_size_ns)

And looked it up:
bucket = processor.buckets[bucket_time_ns]

The bucket would be nil if the checkpoints used a slightly different
timestamp and landed in an adjacent bucket.

Solution:
Use Core::Utils::Time.now_provider to inject a fixed timestamp
(Time.utc(2000, 1, 1, 0, 0, 0)) for the entire test. This ensures
all checkpoints use the exact same deterministic timestamp,
eliminating the race condition completely.

The test logic remains unchanged - we're just controlling the time
source to make it deterministic rather than dependent on wall clock
timing.

Related to incident #46145
Seed that reproduced the flake: 61335

What does this PR do?

Motivation:

Change log entry

Additional Notes:

How to test the change?

github-actions · 2025-11-25T17:18:18Z

👋 Hey @DataDog/ruby-guild, please fill "Change log entry" section in the pull request description.

If changes need to be present in CHANGELOG.md you can state it this way

**Change log entry**

Yes. A brief summary to be placed into the CHANGELOG.md

(possible answers Yes/Yep/Yeah)

Or you can opt out like that

**Change log entry**

None.

(possible answers No/Nope/None)

^{Visited at: 2025-11-25 17:18:17 UTC}

pr-commenter · 2025-11-25T17:49:42Z

Benchmarks

Benchmark execution time: 2025-11-26 18:46:04

Comparing candidate commit 77d3041 in PR branch eric.firth/fix-ddsketch-flake with baseline commit f045f9d in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics.

p-datadog · 2025-11-26T00:15:49Z

👍 on the general approach but it looks like the tests in question are still intermittently failing with this PR.

Would be nice to have the time fixing code extracted into a helper but I can do that either in this PR or in a follow-up PR.

spec/datadog/data_streams/processor_spec.rb

datadog-datadog-prod-us1 · 2025-11-26T14:57:53Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
• Patch Coverage: 88.89%
• Total Coverage: 95.16% (-0.00%)

View detailed report

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 77d3041 | Docs | Datadog PR Page | Was this helpful? Give us feedback!}

ericfirth · 2025-11-26T15:08:31Z

👍 on the general approach but it looks like the tests in question are still intermittently failing with this PR.

Would be nice to have the time fixing code extracted into a helper but I can do that either in this PR or in a follow-up PR.

@p-datadog
Perhaps something like

with_frozen_time do
  # ... test stuff
end

# with a time set in case thats important
with_frozen_time(Time.utc(2000, 1, 1, 0, 0, 0)) do
   # ... test stuff
end

Fixes two race conditions in 'aggregates multiple checkpoints into DDSketch histograms' test. Root Cause 1 - Timestamp Race: The test captured 'now = Time.now.to_f' at the start, but the checkpoints created immediately after used their own Core::Utils::Time.now internally. These timestamps could differ by milliseconds, causing checkpoints to be bucketed into a different time window than the test expected, resulting in nil bucket lookups. Root Cause 2 - Background Worker Race: The processor's background worker thread periodically calls perform(), which drains the event buffer via process_events() and then clears buckets via flush_stats(). This could happen before the test checks buckets, resulting in empty buckets. Solution: 1. Use Core::Utils::Time.now_provider to inject a fixed timestamp (Time.utc(2000, 1, 1, 0, 0, 0)) ensuring all checkpoints use the exact same deterministic timestamp. 2. Stop the background worker with processor.stop(true) before creating checkpoints, preventing it from interfering with the test's manual event processing and bucket inspection. 3. Add diagnostic messages showing expected vs actual bucket keys when assertions fail to aid future debugging. Related to incident #46145

spec/datadog/data_streams/processor_spec.rb

Simplifies approach per PR discussion: - Remove with_frozen_time helper - Use standard RSpec stub for time - Keep aggregate_failures for better test output - Explicit frozen_time variable for clarity

y9v

thank you for fixing this!

p-datadog · 2025-11-27T17:03:20Z

spec/datadog/data_streams/processor_spec.rb

+        allow(Datadog::Core::Utils::Time).to receive(:now).and_return(frozen_time)
+        allow(Datadog::Tracing).to receive(:active_span).and_return(nil)
+
+        processor.stop(true)


I am confused by this line - is the processor leaked from another test? If it's made for this test and not used then is it possible to not make it at all?

ericfirth requested a review from a team as a code owner November 25, 2025 17:18

github-actions bot added the dev/testing Involves testing processes (e.g. RSpec) label Nov 25, 2025

Strech reviewed Nov 26, 2025

View reviewed changes

spec/datadog/data_streams/processor_spec.rb Outdated Show resolved Hide resolved

ericfirth force-pushed the eric.firth/fix-ddsketch-flake branch from 73a6044 to 1d45c21 Compare November 26, 2025 14:33

ericfirth force-pushed the eric.firth/fix-ddsketch-flake branch from 1d45c21 to 1502223 Compare November 26, 2025 15:23

ericfirth requested a review from p-datadog November 26, 2025 16:16

y9v reviewed Nov 26, 2025

View reviewed changes

spec/datadog/data_streams/processor_spec.rb Outdated Show resolved Hide resolved

spec/datadog/data_streams/processor_spec.rb Show resolved Hide resolved

Use stub instead of helper, keep aggregate_failures

77d3041

Simplifies approach per PR discussion: - Remove with_frozen_time helper - Use standard RSpec stub for time - Keep aggregate_failures for better test output - Explicit frozen_time variable for clarity

ericfirth requested review from Strech and y9v November 26, 2025 18:18

y9v approved these changes Nov 27, 2025

View reviewed changes

p-datadog reviewed Nov 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix flaky data_streams bucket aggregation test #5093

Fix flaky data_streams bucket aggregation test #5093

ericfirth commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

pr-commenter bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

p-datadog commented Nov 26, 2025

Uh oh!

Uh oh!

datadog-datadog-prod-us1 bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

ericfirth commented Nov 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

y9v left a comment

Uh oh!

p-datadog Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix flaky data_streams bucket aggregation test #5093

Are you sure you want to change the base?

Fix flaky data_streams bucket aggregation test #5093

Conversation

ericfirth commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

pr-commenter bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

p-datadog commented Nov 26, 2025

Uh oh!

Uh oh!

datadog-datadog-prod-us1 bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericfirth commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

y9v left a comment

Choose a reason for hiding this comment

Uh oh!

p-datadog Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pr-commenter bot commented Nov 25, 2025 •

edited

Loading

datadog-datadog-prod-us1 bot commented Nov 26, 2025 •

edited

Loading

ericfirth commented Nov 26, 2025 •

edited

Loading