Skip to content

Conversation

@p-datadog
Copy link
Member

@p-datadog p-datadog commented Nov 24, 2025

What does this PR do?
Implements chunking of dynamic instrumentation snapshot payloads to stay under the 10 MB intake limit.

Motivation:
Without chunking, if a probe (or multiple probes) produce many events, all of the events will be dropped by the intake.

Change log entry
Yes: dynamic instrumentation: fix sending large quantities of snapshots

Additional Notes:
There is also "snapshot pruning" which is supposed to be done to reduce the size of individual snapshot events below the 1 MB limit for a "log line". This is not in scope for this PR (or Ruby Live Debugger GA).

This PR ensures that if many events are generated by Live Debugger, they all get sent to the backend, but does not help with huge individual snapshots getting dropped.

This PR does however add diagnostics when snapshots are dropped due to excessive size.

How to test the change?
Unit tests added

@github-actions
Copy link

github-actions bot commented Nov 24, 2025

Thank you for updating Change log entry section 👏

Visited at: 2025-11-24 18:05:19 UTC

@pr-commenter
Copy link

pr-commenter bot commented Nov 24, 2025

Benchmarks

Benchmark execution time: 2025-11-28 06:12:52

Comparing candidate commit 228785a in PR branch di-chunk with baseline commit 18e70e4 in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics.

scenario:profiling - Allocations ()

  • 🟩 throughput [+173458.450op/s; +182379.316op/s] or [+5.537%; +5.822%]

@p-datadog p-datadog changed the title DI: chunk snapshot payloads DEBUG-3558 DI: chunk snapshot payloads Nov 24, 2025
@github-actions
Copy link

github-actions bot commented Nov 24, 2025

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 4 untyped methods, and clears 3 untyped methods. It decreases the percentage of typed methods from 54.94% to 54.93% (-0.01%).

Untyped methods (+4-3)Introduced:
sig/datadog/di/transport/input.rbs:37
└── def initialize: (untyped apis, untyped default_api, logger: untyped) -> void
sig/datadog/di/transport/input.rbs:39
└── def current_api: () -> untyped
sig/datadog/di/transport/input.rbs:41
└── def send_input: (untyped payload, untyped tags) -> untyped
sig/datadog/di/transport/input.rbs:43
└── def send_input_chunk: (untyped chunked_payload, untyped serialized_tags) -> untyped
Cleared:
sig/datadog/di/transport/input.rbs:34
└── def initialize: (untyped apis, untyped default_api, logger: untyped) -> void
sig/datadog/di/transport/input.rbs:36
└── def current_api: () -> untyped
sig/datadog/di/transport/input.rbs:38
└── def send_input: (untyped payload, untyped tags) -> untyped

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept to the end of the line to remove it from the stats.

@datadog-official
Copy link

datadog-official bot commented Nov 24, 2025

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
Patch Coverage: 93.67%
Total Coverage: 95.16% (+0.00%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 228785a | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@p-datadog p-datadog marked this pull request as ready for review November 24, 2025 18:31
@p-datadog p-datadog requested a review from a team as a code owner November 24, 2025 18:31
# The maximum chunk size that intake permits is 10 MB.
#
# Two bytes are for the [ and ] of JSON array syntax.
MAX_CHUNK_SIZE = 10 * 1024 * 1024 - 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add the size resolution here via MAX_CHUNK_SIZE_BYTES?

Comment on lines 65 to 68
if chunk.length == 1 && chunk.first.length > MAX_CHUNK_SIZE
# Drop the chunk.
# TODO report via telemetry metric?
logger.debug { "di: dropping too big snapshot" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but could it be more than 1 chunk going beyond max size? Or what is this scenario we are special handling?

Copy link
Member Author

@p-datadog p-datadog Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior changed a bit after I adjusted the code for the correct limits.

There are two limits that are relevant: size of any one snapshot - 1 MB and the size of the batch - 5 MB.

Given these limits, the batch (with chunking) will never exceed its max size since it is always legal to send a batch of one snapshot and the snapshot is limited to 1 MB.

For individual snapshots, yes, multiple snapshots could exceed their size, and each one will be logged. The logging however is at debug level which means the customers won't see any of it (normally). If this logging becomes an issue in the future I can add some sort of throttling but for now I think it's not necessary to worry about too much log output.

@p-datadog p-datadog requested review from a team as code owners November 26, 2025 20:57
@p-datadog p-datadog requested a review from mabdinur November 26, 2025 20:57
@github-actions github-actions bot added core Involves Datadog core libraries tracing labels Nov 26, 2025
@p-datadog
Copy link
Member Author

I got the limits wrong in the initial implementation of this PR, this has now been corrected.

p and others added 2 commits November 27, 2025 11:51
* master:
  Transports: DRY HTTP client code (#5095)
  DI: extract instance_double_agent_settings_with_stubs to DRY tests (#5087)
  [PROF-13115] Fix profiler ractor specs failing on Ruby 4
  [PROF-13115] Bootstrap installing dependencies on Ruby 4.0.0-preview2
  Clarify support for `rb_obj_info` and why it's OK to not have it
  Tweak pending to not apply to all Ruby preview versions
  Do not try to use `rb_obj_info` on Ruby 4.0
  Adjust stack collector spec to account for changed Ruby 4 behavior
  [PROF-13115] Disable heap profiling on Ruby 4 preview due to incompatibility
  Stub sampling in integration tests
  Rewrite security response tests
  Bump the gh-actions-packages group across 3 directories with 13 updates
Copy link
Member

@Strech Strech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a tiny suggestion over performance of the filter_map

Comment on lines +16 to +18
array.map(&block).reject do |item|
item.nil?
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done in a single iteration with each_with_object (or we can shadow the item and reduce variables)

Suggested change
array.map(&block).reject do |item|
item.nil?
end
array.each_with_object([]) do |item, memo|
new_item = block.call(item)
memo.push(new_item) unless new_item.nil?
end

Copy link
Member

@Strech Strech Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@p-datadog I think you have reverted suggestion, but if you want to cover lazy evaluation, you can add a simple check and keep the single iteration/single allocation branch

elsif array.is_a?(Enumerator::Lazy)
  # your example for lazy enumerator
else
  # my suggestion each_with_object(...)
end

Copy link
Member

@Strech Strech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left the suggestion about the filter_map path when it's not lazy enumerator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Involves Datadog core libraries tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants