test(profiling): unwind_greenlets RSS test on main without the fix [DO NOT MERGE]#18422
test(profiling): unwind_greenlets RSS test on main without the fix [DO NOT MERGE]#18422taegyunkim wants to merge 4 commits into
Conversation
… fix) Experimental branch: contains ONLY the new gevent unwind_greenlets RSS stability test from PR for taegyunkim/prof-14423-unwind-greenlets-buffer-reuse, applied on top of main WITHOUT the C++ buffer-reuse fix. Purpose: verify whether the test actually fails on unfixed code, i.e. whether it is a genuine regression guard or a false-negative (a code review flagged that an RSS-delta assertion may pass even when the per-sample allocation churn regression is present). Not for merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codeowners resolved as |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: caf0b6218f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| p.stop() | ||
|
|
||
| peak_rss = max(samples) | ||
| growth_bytes = peak_rss - rss_after_warmup |
There was a problem hiding this comment.
Measure allocation churn instead of post-warmup RSS
Because the baseline is taken only after the 3s warmup, this test can pass on the unfixed path whenever the allocator has already grown/cached enough memory during warmup to satisfy the repeated StackInfo/deque allocations during the 10s measurement window. In that scenario the per-sample churn still exists, but peak_rss - rss_after_warmup stays under 20 MB, so the regression guard produces the false negative the test is meant to rule out; assert on allocation/heap-live-size churn or compare fixed-vs-baseline behavior instead of post-warmup RSS growth.
Useful? React with 👍 / 👎.
|
LD_PRELOAD malloc-counting interposer; prints MALLOC_DELTA over a fixed greenlet-sampling window. Experiment to find a signal that separates the buffer-reuse fix from unfixed code (RSS does not). Always passes for now. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
53910d7 to
41d2a4d
Compare
Same LD_PRELOAD malloc-counting experiment as #18422, but on top of the buffer-reuse C++ fix. Compare MALLOC_DELTA against the unfixed run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same LD_PRELOAD malloc-counting experiment as #18422, but on top of the buffer-reuse C++ fix. Compare MALLOC_DELTA against the unfixed run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Asserts stack._greenlet_buffer_alloc_count() plateaus after warmup. On main (no buffer-reuse fix) the counter symbol does not exist, so this fails -- it is the regression guard that accompanies the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
78b4e52 to
a73dcff
Compare
Purpose — DO NOT MERGE
This is an experiment to validate the regression test added in the buffer-reuse work (branch
taegyunkim/prof-14423-unwind-greenlets-buffer-reuse).It applies only the new test
test_gevent_unwind_greenlets_rss_stableon top ofmain, without the C++unwind_greenletsbuffer-reuse fix.Question being answered
A code review flagged that the test asserts on RSS growth (< 20 MB over 10s), but the regression it guards is per-sample allocation churn / heap-live-size. A recycling allocator can absorb that churn without net RSS growth, so the test might pass even on unfixed code — a false-negative.
Scope
tests/profiling/collector/test_stack.py(+87 lines, the new test only).main).DD_PROFILE_TEST_GEVENT, so it runs only in the gevent profiling CI job.🤖 Generated with Claude Code