Skip to content

test(profiling): unwind_greenlets RSS test on main without the fix [DO NOT MERGE]#18422

Draft
taegyunkim wants to merge 4 commits into
mainfrom
taegyunkim/prof-14423-test-regression-check
Draft

test(profiling): unwind_greenlets RSS test on main without the fix [DO NOT MERGE]#18422
taegyunkim wants to merge 4 commits into
mainfrom
taegyunkim/prof-14423-test-regression-check

Conversation

@taegyunkim
Copy link
Copy Markdown
Contributor

Purpose — DO NOT MERGE

This is an experiment to validate the regression test added in the buffer-reuse work (branch taegyunkim/prof-14423-unwind-greenlets-buffer-reuse).

It applies only the new test test_gevent_unwind_greenlets_rss_stable on top of main, without the C++ unwind_greenlets buffer-reuse fix.

Question being answered

A code review flagged that the test asserts on RSS growth (< 20 MB over 10s), but the regression it guards is per-sample allocation churn / heap-live-size. A recycling allocator can absorb that churn without net RSS growth, so the test might pass even on unfixed code — a false-negative.

  • Test FAILS here → it's a genuine regression guard.
  • Test PASSES here → it's a false-negative and needs strengthening (e.g. assert on allocation count / heap-live-size, or a baseline-vs-fixed comparison).

Scope

  • Single file changed: tests/profiling/collector/test_stack.py (+87 lines, the new test only).
  • No native/profiling source changes (verified identical to main).
  • The test is gated behind DD_PROFILE_TEST_GEVENT, so it runs only in the gevent profiling CI job.

🤖 Generated with Claude Code

… fix)

Experimental branch: contains ONLY the new gevent unwind_greenlets RSS
stability test from PR for taegyunkim/prof-14423-unwind-greenlets-buffer-reuse,
applied on top of main WITHOUT the C++ buffer-reuse fix.

Purpose: verify whether the test actually fails on unfixed code, i.e.
whether it is a genuine regression guard or a false-negative (a code
review flagged that an RSS-delta assertion may pass even when the
per-sample allocation churn regression is present).

Not for merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@taegyunkim taegyunkim requested a review from a team as a code owner June 2, 2026 20:33
@taegyunkim taegyunkim added the changelog/no-changelog A changelog entry is not required for this PR. label Jun 2, 2026
@taegyunkim taegyunkim requested a review from gyuheon0h June 2, 2026 20:33
@taegyunkim taegyunkim added the changelog/no-changelog A changelog entry is not required for this PR. label Jun 2, 2026
@taegyunkim taegyunkim marked this pull request as draft June 2, 2026 20:33
@taegyunkim taegyunkim removed the request for review from gyuheon0h June 2, 2026 20:33
@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented Jun 2, 2026

Codeowners resolved as

tests/profiling/collector/test_greenlet_buffer_reuse.py                 @DataDog/profiling-python

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: caf0b6218f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

p.stop()

peak_rss = max(samples)
growth_bytes = peak_rss - rss_after_warmup
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Measure allocation churn instead of post-warmup RSS

Because the baseline is taken only after the 3s warmup, this test can pass on the unfixed path whenever the allocator has already grown/cached enough memory during warmup to satisfy the repeated StackInfo/deque allocations during the 10s measurement window. In that scenario the per-sample churn still exists, but peak_rss - rss_after_warmup stays under 20 MB, so the regression guard produces the false negative the test is meant to rule out; assert on allocation/heap-live-size churn or compare fixed-vs-baseline behavior instead of post-warmup RSS growth.

Useful? React with 👍 / 👎.

@datadog-datadog-prod-us1
Copy link
Copy Markdown
Contributor

datadog-datadog-prod-us1 Bot commented Jun 2, 2026

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 14 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-py | profiling/profile 1/24   View in Datadog   GitLab

See error AttributeError: module 'ddtrace.internal.datadog.profiling.stack._stack' has no attribute '_greenlet_buffer_alloc_count' in test_greenlet_unwind_buffer_reuse.

🧪 1 Test failed

test_greenlet_unwind_buffer_reuse from test_greenlet_buffer_reuse.py   View in Datadog (Fix with Cursor)
Expected status 0, got 1.
=== Captured STDOUT ===
=== End of captured STDOUT ===
=== Captured STDERR ===
Traceback (most recent call last):
  File &#34;tests/profiling/collector/test_greenlet_buffer_reuse.py&#34;, line 76, in &lt;module&gt;
    c1 = stack._stack._greenlet_buffer_alloc_count()
AttributeError: module &#39;ddtrace.internal.datadog.profiling.stack._stack&#39; has no attribute &#39;_greenlet_buffer_alloc_count&#39;
=== End of captured STDERR ===

DataDog/apm-reliability/dd-trace-py | profiling/profile 23/24   View in Datadog   GitLab

See error AttributeError: module 'ddtrace.internal.datadog.profiling.stack._stack' has no attribute '_greenlet_buffer_alloc_count' in tests/profiling/collector/test_greenlet_buffer_reuse.py.

🧪 1 Test failed

test_greenlet_unwind_buffer_reuse from test_greenlet_buffer_reuse.py   View in Datadog (Fix with Cursor)
Expected status 0, got 1.
=== Captured STDOUT ===
=== End of captured STDOUT ===
=== Captured STDERR ===
Failed to register thread: 7220ccfa4fc0 (2784) MainThread
Traceback (most recent call last):
  File &#34;tests/profiling/collector/test_greenlet_buffer_reuse.py&#34;, line 76, in &lt;module&gt;
    c1 = stack._stack._greenlet_buffer_alloc_count()
AttributeError: module &#39;ddtrace.internal.datadog.profiling.stack._stack&#39; has no attribute &#39;_greenlet_buffer_alloc_count&#39;
=== End of captured STDERR ===

DataDog/apm-reliability/dd-trace-py | profiling/profile 3/24   View in Datadog   GitLab

See error 1 failed test: AttributeError: module 'ddtrace.internal.datadog.profiling.stack._stack' has no attribute '_greenlet_buffer_alloc_count' in tests/profiling/collector/test_greenlet_buffer_reuse.py.

🧪 1 Test failed

test_greenlet_unwind_buffer_reuse from test_greenlet_buffer_reuse.py   View in Datadog (Fix with Cursor)
Expected status 0, got 1.
=== Captured STDOUT ===
=== End of captured STDOUT ===
=== Captured STDERR ===
Traceback (most recent call last):
  File &#34;tests/profiling/collector/test_greenlet_buffer_reuse.py&#34;, line 76, in &lt;module&gt;
    c1 = stack._stack._greenlet_buffer_alloc_count()
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module &#39;ddtrace.internal.datadog.profiling.stack._stack&#39; has no attribute &#39;_greenlet_buffer_alloc_count&#39;
=== End of captured STDERR ===

View all 14 failed jobs.

🧪 1 Test failed in 1 job

DataDog/apm-reliability/dd-trace-py | profiling/profile 15/24

test_greenlet_unwind_buffer_reuse from test_greenlet_buffer_reuse.py   View in Datadog (Fix with Cursor)
Expected status 0, got 1.
=== Captured STDOUT ===
=== End of captured STDOUT ===
=== Captured STDERR ===
Traceback (most recent call last):
  File &#34;tests/profiling/collector/test_greenlet_buffer_reuse.py&#34;, line 76, in &lt;module&gt;
    c1 = stack._stack._greenlet_buffer_alloc_count()
AttributeError: module &#39;ddtrace.internal.datadog.profiling.stack._stack&#39; has no attribute &#39;_greenlet_buffer_alloc_count&#39;
=== End of captured STDERR ===

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

🔄 Datadog retried 1 test - 0 passed on retry View in Datadog

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: a73dcff | Docs | Datadog PR Page | Give us feedback!

LD_PRELOAD malloc-counting interposer; prints MALLOC_DELTA over a fixed
greenlet-sampling window. Experiment to find a signal that separates the
buffer-reuse fix from unfixed code (RSS does not). Always passes for now.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@taegyunkim taegyunkim force-pushed the taegyunkim/prof-14423-test-regression-check branch from 53910d7 to 41d2a4d Compare June 2, 2026 22:35
taegyunkim added a commit that referenced this pull request Jun 2, 2026
Same LD_PRELOAD malloc-counting experiment as #18422, but on top of the
buffer-reuse C++ fix. Compare MALLOC_DELTA against the unfixed run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
taegyunkim added a commit that referenced this pull request Jun 2, 2026
Same LD_PRELOAD malloc-counting experiment as #18422, but on top of the
buffer-reuse C++ fix. Compare MALLOC_DELTA against the unfixed run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
taegyunkim and others added 2 commits June 3, 2026 17:40
Asserts stack._greenlet_buffer_alloc_count() plateaus after warmup. On main
(no buffer-reuse fix) the counter symbol does not exist, so this fails -- it is
the regression guard that accompanies the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@taegyunkim taegyunkim force-pushed the taegyunkim/prof-14423-test-regression-check branch from 78b4e52 to a73dcff Compare June 3, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/no-changelog A changelog entry is not required for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant