ci: warn (don't fail) on Bencher main regression#3168
Conversation
Single-run alerts on GitHub-hosted runners are dominated by environmental noise: the set of (benchmark, measure) pairs flagged varies almost entirely across consecutive pushes on main (Jaccard ~0 between adjacent runs), while every non-docs commit fails the gate. Tracking issue #3116 has accumulated auto-regression comments with no actionable signal. Retain the Bencher dashboard upload, the auto-opened tracking issue, and the job-summary report (steps 7a/7b). Downgrade step 7c from `exit 1` to a `::warning::` annotation so main stops being permanently red. Re-enable a hard gate later when thresholds/sample-size tolerate the runner noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WalkthroughThe benchmark GitHub Actions workflow now parses Bencher stderr to set and export Changes
Sequence Diagram(s)sequenceDiagram
participant Runner as GitHub Runner
participant Bencher as Bencher (tool)
participant Workflow as Workflow logic
participant Issue as Issue Creator
participant Summary as Job Summary
Runner->>Bencher: run benchmarks
Bencher-->>Runner: exit code + stderr
Runner->>Workflow: set BENCHER_EXIT_CODE, parse stderr -> BENCHER_HAS_ALERT
alt BENCHER_HAS_ALERT == '1'
Workflow->>Summary: emit warning (::warning::)
Workflow->>Issue: create regression issue
else BENCHER_EXIT_CODE != '0' and BENCHER_HAS_ALERT != '1'
Workflow->>Workflow: fail job (hard fail)
else BENCHER_EXIT_CODE == '0'
Workflow->>Summary: mark success
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR converts the Confidence Score: 5/5Safe to merge — workflow-only change with correct logic and no P0/P1 issues. All findings are P2 or lower. The BENCHER_HAS_ALERT heuristic is well-commented and correctly scoped. Operational failures (step 7d) still hard-fail. The 404 baseline-missing case is handled before the new heuristic runs, avoiding interference. CLI pinning is intentional and explained. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[bencher run exits] --> B{BENCHER_EXIT_CODE == 0?}
B -- Yes --> Z[✅ No action needed]
B -- No --> C{404 baseline missing\nAND no alert indicators?}
C -- Yes --> D[Set BENCHER_EXIT_CODE=0\nBENCHER_HAS_ALERT=0]
C -- No --> E{stderr contains\nAlert? / threshold / boundary?}
E -- Yes --> F[BENCHER_HAS_ALERT=1]
E -- No --> G[BENCHER_HAS_ALERT=0]
D --> Z
F --> H{on main push?}
G --> I{on main push?}
H -- Yes --> J[Step 7b: Create/update\nregression GitHub issue]
J --> K[Step 7c: emit ::warning::\nannotation]
H -- No --> Z
I -- Yes --> L[Step 7d: exit with\noriginal error code\n⛔ hard fail]
I -- No --> Z
Reviews (3): Last reviewed commit: "ci: pin Bencher CLI, broaden alert match..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9cac6cff12
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Review: ci: warn (don't fail) on Bencher main regressionOverall: Approve with one minor suggestion. The motivation is well-documented and the evidence (Jaccard similarity ~0–0.08 across consecutive alert sets) is compelling. The change is minimal, surgical, and correctly scoped to push-to-main only. What's good
One concernThe warning message hardcodes Minor notes
|
Distinguish regression alerts from operational failures so the warn-only policy on main only applies to genuine performance signals: - Set BENCHER_HAS_ALERT=1 when bencher stderr mentions alert/threshold/ boundary indicators; leave it 0 for auth/API/network/CLI failures. - Step 7b (auto-open issue) and step 7c (::warning::) now require BENCHER_HAS_ALERT == 1 so operational errors do not create bogus regression issues. - New step 7d fails the workflow on non-regression Bencher errors on main, preserving the original safety net for a broken benchmark pipeline (missing uploads, bad credentials, Bencher outage). - Step 7c warning no longer hardcodes a specific issue number (#3116 will churn as issues are closed/reopened) and now includes the workflow run URL for a direct clickable link. Addresses review feedback on #3168. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review feedback addressed (commit
|
|
To use Codex here, create an environment for this repo. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 34743
The overall logic is sound: noise-driven regression alerts produce a warning + auto-issue, operational failures still hard-fail, and the temp-file cleanup is handled both by the 🧠 Learnings usedIf you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e39e00ba6e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Narrows BENCHER_HAS_ALERT from case-insensitive "alert" to word-bounded case-sensitive "Alert" / "Alerts" so that lowercase "alert" appearing in bencher URL paths (e.g., "/v0/.../alerts") or CLI text cannot misclassify an operational failure as a regression and let main go green. "threshold violation" and "boundary violation" remain case-insensitive since those specific phrases are unlikely to appear outside real alert output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up review addressed (commit
|
|
To use Codex here, create an environment for this repo. |
| BENCHER_HAS_ALERT=0 | ||
| if [ $BENCHER_EXIT_CODE -ne 0 ] && \ | ||
| { grep -qE "\bAlerts?\b" "$BENCHER_STDERR" || \ | ||
| grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then | ||
| BENCHER_HAS_ALERT=1 |
There was a problem hiding this comment.
Pattern inconsistency with the 404 check creates a misclassification gap.
The 404 guard above (line 645) uses case-insensitive alert to detect alerts:
! grep -qiE "alert|threshold violation|boundary violation" "$BENCHER_STDERR"But this block uses case-sensitive \bAlerts?\b. If Bencher ever emits lowercase "alert detected" or "1 alert" (e.g. after a CLI version bump), the sequence would be:
- 404 guard: no
404 Not Found→ skip →BENCHER_EXIT_CODEstays non-zero ✓ - This block:
\bAlerts?\bdoes not match lowercase →BENCHER_HAS_ALERTstays0 - Step 7d fires and fails the workflow with "operational failure" — a misclassified regression.
The case-sensitivity was chosen to avoid matching /v0/.../alerts URL paths, but the same goal is achievable with a word boundary and case-insensitive flag:
| BENCHER_HAS_ALERT=0 | |
| if [ $BENCHER_EXIT_CODE -ne 0 ] && \ | |
| { grep -qE "\bAlerts?\b" "$BENCHER_STDERR" || \ | |
| grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then | |
| BENCHER_HAS_ALERT=1 | |
| BENCHER_HAS_ALERT=0 | |
| if [ $BENCHER_EXIT_CODE -ne 0 ] && \ | |
| { grep -qiE "\bAlerts?\b" "$BENCHER_STDERR" || \ | |
| grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then | |
| BENCHER_HAS_ALERT=1 |
Word boundaries (\b) already prevent matching alerts inside URL path segments like /v0/orgs/foo/alerts, so the case-insensitive flag adds robustness without reintroducing the false-positive risk.
Code ReviewOverviewThis PR converts the benchmark gate on The design — distinguish "regression alert" from "operational failure" by parsing Bencher's stderr, then route each to a different downstream step — is sound. The concerns below are about correctness of the parsing logic and long-term brittleness. Issues1. Pattern inconsistency between the 404 guard and The 404 guard uses case-insensitive 2. Fragile dependency on Bencher's undocumented output format
3. The removed step used 4. Double
Positive aspects
SummaryPragmatic fix for a genuinely broken gate. Main concern is the robustness of the stderr-parsing logic — specifically the case-sensitivity inconsistency (inline comment) and the undocumented dependency on Bencher's output format. Recommend addressing the case-sensitivity fix before merge; the output-format concern is lower priority but worth a comment or version pin. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/benchmark.yml (1)
645-665:⚠️ Potential issue | 🟡 MinorMinor: alert-pattern asymmetry between 404 override and new alert detection.
The 404 override at line 645 uses case-insensitive
alert|threshold violation|boundary violationto decide whether to swallow a non-zero exit, while the new detection at lines 660-665 uses case-sensitive\bAlerts?\b(plus case-insensitivethreshold violation|boundary violation). When stderr contains404 Not Foundtogether with a lowercasealerttoken that is not a real alert heading (e.g., a URL path like/v0/.../alertsin an error trace), the two branches behave asymmetrically:
- Line 645 sees "alert" (case-insensitive) → override is not applied →
BENCHER_EXIT_CODEstays non-zero.- Line 662 case-sensitive
\bAlerts?\bdoes not match lowercase →BENCHER_HAS_ALERTstays0.- Step 7d then hard-fails main for what is actually a 404.
This is fail-closed (safe), but the two predicates should ideally use the same alert signal. Consider aligning the 404 override to the same tightened pattern introduced for
BENCHER_HAS_ALERTso a spurious lowercasealerttoken doesn't block the docs-only-baseline escape hatch.Proposed alignment
- if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "404 Not Found" "$BENCHER_STDERR" && ! grep -qiE "alert|threshold violation|boundary violation" "$BENCHER_STDERR"; then + if [ $BENCHER_EXIT_CODE -ne 0 ] && grep -q "404 Not Found" "$BENCHER_STDERR" \ + && ! grep -qE "\bAlerts?\b" "$BENCHER_STDERR" \ + && ! grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/benchmark.yml around lines 645 - 665, The 404-override conditional should match alerts the same way as BENCHER_HAS_ALERT does; change the grep in the 404 override that currently uses case-insensitive "alert|threshold violation|boundary violation" to the same tightened check used later: use grep -qE "\bAlerts?\b" "$BENCHER_STDERR" || grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR" so that the 404 override and BENCHER_HAS_ALERT (and the symbols BENCHER_EXIT_CODE, BENCHER_STDERR, BENCHER_HAS_ALERT and the regex \bAlerts?\b) use consistent alert detection.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In @.github/workflows/benchmark.yml:
- Around line 645-665: The 404-override conditional should match alerts the same
way as BENCHER_HAS_ALERT does; change the grep in the 404 override that
currently uses case-insensitive "alert|threshold violation|boundary violation"
to the same tightened check used later: use grep -qE "\bAlerts?\b"
"$BENCHER_STDERR" || grep -qiE "threshold violation|boundary violation"
"$BENCHER_STDERR" so that the 404 override and BENCHER_HAS_ALERT (and the
symbols BENCHER_EXIT_CODE, BENCHER_STDERR, BENCHER_HAS_ALERT and the regex
\bAlerts?\b) use consistent alert detection.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b061cbb5-8038-4066-80c4-367b8f811fa3
📒 Files selected for processing (1)
.github/workflows/benchmark.yml
Addresses follow-up review feedback on the Bencher main-regression gate:
- Pin bencherdev/bencher to v0.6.2 and document the expected stderr
phrases ("Alert[s]", "threshold violation", "boundary violation") so a
silent CLI upgrade can't flip the alert classification.
- Switch the BENCHER_HAS_ALERT grep from case-sensitive `\bAlerts?\b` to
case-insensitive `\bAlerts?\b` to stay consistent with the 404 guard
and avoid misclassifying a lowercase `alert` variant as operational.
Word boundaries still prevent matches against URL paths like
`/v0/.../alerts`, so there's no new false-positive risk.
- Propagate the original Bencher exit code from step 7d (`exit
"${BENCHER_EXIT_CODE:-1}"` instead of bare `exit 1`) to preserve
diagnostic context in CI logs.
- Drop the redundant `rm -f "$BENCHER_STDERR"` now that the
`trap 'rm -f' EXIT` registered above handles cleanup.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6f2bb80154
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| { grep -qiE "\bAlerts?\b" "$BENCHER_STDERR" || \ | ||
| grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then |
There was a problem hiding this comment.
Avoid classifying '/alerts' paths as regressions
The new BENCHER_HAS_ALERT gate can still misclassify operational Bencher failures as regressions, because grep -qiE "\bAlerts?\b" matches lowercase URL segments like /v0/.../alerts (word boundaries exist at / and end-of-token). When that happens on main, step 7d is skipped and the workflow only emits a warning instead of failing for auth/API/network errors. Fresh evidence in this commit is the added case-insensitive \bAlerts?\b matcher here, which reintroduces the false-positive path-match behavior.
Useful? React with 👍 / 👎.
| # are not a documented API contract, so upgrading the CLI requires | ||
| # re-verifying that the expected phrases still appear in alert output. | ||
| # Expected stderr phrases at v0.6.2: "🚨 N Alerts", "Alert detected". | ||
| uses: bencherdev/bencher@v0.6.2 |
There was a problem hiding this comment.
The pin to a mutable tag (@v0.6.2) is a meaningful improvement over @main, but a tag can still be force-pushed by the upstream maintainer, silently swapping the binary that runs in your workflow.
For a stronger supply-chain guarantee, pin to the immutable commit SHA:
| uses: bencherdev/bencher@v0.6.2 | |
| uses: bencherdev/bencher@v0.6.2 # TODO: pin to SHA once verified, e.g. bencherdev/bencher@<sha> |
Run git ls-remote https://github.com/bencherdev/bencher refs/tags/v0.6.2 to get the SHA for the tag, then use bencherdev/bencher@<sha> (with # v0.6.2 as a comment for readability). This is a low-effort hardening for a step that runs with BENCHER_API_TOKEN.
| # Match Bencher's actual alert output (e.g., "🚨 2 Alerts", "Alert detected") via | ||
| # a word-bounded, case-insensitive "Alert[s]". The word boundary (\b) already | ||
| # prevents matching lowercase "alerts" inside URL paths (e.g., "/v0/.../alerts"), | ||
| # so the case-insensitive flag adds robustness against future CLI wording changes | ||
| # (e.g., lowercase "1 alert") without reintroducing the URL-path false positive. |
There was a problem hiding this comment.
The comment's claim about \b is incorrect. \b is a word-boundary assertion between \w and \W. Since / is \W and a is \w, there is a word boundary at the start of alerts in /v0/.../alerts — so \bAlerts?\b would match alerts inside a URL path.
In practice this may not matter (Bencher probably doesn't emit raw URL paths like /v0/alerts in stderr), but the stated justification is wrong and could mislead future editors into over-trusting the heuristic.
Suggested correction:
| # Match Bencher's actual alert output (e.g., "🚨 2 Alerts", "Alert detected") via | |
| # a word-bounded, case-insensitive "Alert[s]". The word boundary (\b) already | |
| # prevents matching lowercase "alerts" inside URL paths (e.g., "/v0/.../alerts"), | |
| # so the case-insensitive flag adds robustness against future CLI wording changes | |
| # (e.g., lowercase "1 alert") without reintroducing the URL-path false positive. | |
| # Match Bencher's actual alert output (e.g., "🚨 2 Alerts", "Alert detected") via | |
| # a word-bounded, case-insensitive "Alert[s]". Note: \b does NOT prevent matching | |
| # "alerts" inside URL paths like "/v0/alerts" (/ is \W so a word boundary exists | |
| # before "a"). False positives are unlikely in practice because Bencher does not | |
| # emit raw API endpoint paths in its stderr alert output. |
ReviewOverall: approve with minor notes. The motivation is solid — the Jaccard-similarity analysis in the PR description makes a compelling case that the hard gate was generating noise, not signal. The implementation is careful and well-commented. What's good
Issues
Minor
None of these block the PR — the core change (warn instead of hard-fail on regression alerts) is correct and the implementation is sound. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/benchmark.yml:
- Around line 659-671: The pattern for detecting Bencher alerts is currently
using case-insensitive grep (grep -qiE "\bAlerts?\b") which matches lowercase
"/alerts" in URLs; change that grep to be case-sensitive (remove the -i flag) so
only "Alert" or "Alerts" word-bound matches trigger setting BENCHER_HAS_ALERT=1,
leaving the other grep for "threshold violation|boundary violation" unchanged;
update the combined condition that references BENCHER_EXIT_CODE and
BENCHER_STDERR accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e1665e80-7cb9-42ad-8abd-36d16e6a1eb1
📒 Files selected for processing (1)
.github/workflows/benchmark.yml
| # Distinguish regression alerts from operational failures (auth/API/network/CLI) | ||
| # so that main can warn-only on the former while still failing hard on the latter. | ||
| # Match Bencher's actual alert output (e.g., "🚨 2 Alerts", "Alert detected") via | ||
| # a word-bounded, case-insensitive "Alert[s]". The word boundary (\b) already | ||
| # prevents matching lowercase "alerts" inside URL paths (e.g., "/v0/.../alerts"), | ||
| # so the case-insensitive flag adds robustness against future CLI wording changes | ||
| # (e.g., lowercase "1 alert") without reintroducing the URL-path false positive. | ||
| BENCHER_HAS_ALERT=0 | ||
| if [ $BENCHER_EXIT_CODE -ne 0 ] && \ | ||
| { grep -qiE "\bAlerts?\b" "$BENCHER_STDERR" || \ | ||
| grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then | ||
| BENCHER_HAS_ALERT=1 | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT
printf '%s\n' 'GET https://api.bencher.dev/v0/projects/react-on-rails-t8a9ncxo/alerts failed: 500' > "$tmp"
echo "Current regex:"
if grep -qiE "\bAlerts?\b" "$tmp"; then
echo "FAIL: case-insensitive regex matched lowercase /alerts URL path"
else
echo "PASS: no URL-path match"
fi
echo "Case-sensitive regex:"
if grep -qE "\bAlerts?\b" "$tmp"; then
echo "FAIL: case-sensitive regex still matched lowercase /alerts URL path"
else
echo "PASS: lowercase /alerts URL path is not treated as an alert"
fi
printf '%s\n' '🚨 2 Alerts' 'Alert detected' > "$tmp"
grep -qE "\bAlerts?\b" "$tmp"
echo "PASS: pinned Bencher v0.6.2 alert phrases still match"Repository: shakacode/react_on_rails
Length of output: 282
Make the Alerts? grep case-sensitive to prevent false positives from URL paths.
Line 668 uses -i flag, causing \bAlerts?\b to match lowercase /alerts in URL paths within error messages. This incorrectly classifies auth/API/network failures as performance regressions, skipping the hard-fail step 7d and downgrading to warn-only.
🐛 Proposed fix
- # Match Bencher's actual alert output (e.g., "🚨 2 Alerts", "Alert detected") via
- # a word-bounded, case-insensitive "Alert[s]". The word boundary (\b) already
- # prevents matching lowercase "alerts" inside URL paths (e.g., "/v0/.../alerts"),
- # so the case-insensitive flag adds robustness against future CLI wording changes
- # (e.g., lowercase "1 alert") without reintroducing the URL-path false positive.
+ # Match Bencher's pinned v0.6.2 alert output (e.g., "🚨 2 Alerts", "Alert detected")
+ # with a case-sensitive "Alert[s]" so lowercase URL paths such as "/v0/.../alerts"
+ # in operational errors are not classified as performance regressions.
BENCHER_HAS_ALERT=0
if [ $BENCHER_EXIT_CODE -ne 0 ] && \
- { grep -qiE "\bAlerts?\b" "$BENCHER_STDERR" || \
+ { grep -qE "\bAlerts?\b" "$BENCHER_STDERR" || \
grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then
BENCHER_HAS_ALERT=1
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Distinguish regression alerts from operational failures (auth/API/network/CLI) | |
| # so that main can warn-only on the former while still failing hard on the latter. | |
| # Match Bencher's actual alert output (e.g., "🚨 2 Alerts", "Alert detected") via | |
| # a word-bounded, case-insensitive "Alert[s]". The word boundary (\b) already | |
| # prevents matching lowercase "alerts" inside URL paths (e.g., "/v0/.../alerts"), | |
| # so the case-insensitive flag adds robustness against future CLI wording changes | |
| # (e.g., lowercase "1 alert") without reintroducing the URL-path false positive. | |
| BENCHER_HAS_ALERT=0 | |
| if [ $BENCHER_EXIT_CODE -ne 0 ] && \ | |
| { grep -qiE "\bAlerts?\b" "$BENCHER_STDERR" || \ | |
| grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then | |
| BENCHER_HAS_ALERT=1 | |
| fi | |
| # Match Bencher's pinned v0.6.2 alert output (e.g., "🚨 2 Alerts", "Alert detected") | |
| # with a case-sensitive "Alert[s]" so lowercase URL paths such as "/v0/.../alerts" | |
| # in operational errors are not classified as performance regressions. | |
| BENCHER_HAS_ALERT=0 | |
| if [ $BENCHER_EXIT_CODE -ne 0 ] && \ | |
| { grep -qE "\bAlerts?\b" "$BENCHER_STDERR" || \ | |
| grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then | |
| BENCHER_HAS_ALERT=1 | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/benchmark.yml around lines 659 - 671, The pattern for
detecting Bencher alerts is currently using case-insensitive grep (grep -qiE
"\bAlerts?\b") which matches lowercase "/alerts" in URLs; change that grep to be
case-sensitive (remove the -i flag) so only "Alert" or "Alerts" word-bound
matches trigger setting BENCHER_HAS_ALERT=1, leaving the other grep for
"threshold violation|boundary violation" unchanged; update the combined
condition that references BENCHER_EXIT_CODE and BENCHER_STDERR accordingly.
Follow-up review addressed (commit
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is ON, but it could not run because on-demand usage is turned off. To enable Bugbot Autofix, turn on on-demand usage and set a spend limit in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6f2bb80. Configure here.
| { grep -qiE "\bAlerts?\b" "$BENCHER_STDERR" || \ | ||
| grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then | ||
| BENCHER_HAS_ALERT=1 | ||
| fi |
There was a problem hiding this comment.
Word boundary doesn't prevent URL-path false positives
Medium Severity
The comment claims \b prevents matching "alerts" inside URL paths like /v0/.../alerts, but this is incorrect. \b matches at transitions between word and non-word characters — / is non-word and a is word, so \bAlerts?\b with -i does match "alerts" in URL paths. If Bencher's stderr during an operational failure includes a URL containing /alerts, BENCHER_HAS_ALERT is incorrectly set to 1, causing step 7d's hard-fail gate to be skipped and the error silently downgraded to a warning in step 7c.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 6f2bb80. Configure here.
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 35877
Everything looks solid — no blocking issues. 🧠 Learnings used |


Summary
::warning::annotation so main stops being permanently red.Why
The gate has been failing nearly every non-docs push to
mainfor weeks. Investigation of the Bencher alert history makes clear it's runner noise, not a real regression:4eb83648b) fired 43 alerts across Core and Pro, spanning routes unrelated to the committed change (which only editspackages/react-on-rails/src/pageLifecycle.ts).(benchmark, measure)pairs. A real regression would produce persistent, overlapping alerts.paths-ignore. Every run that actually exercises the suite trips the t-test on at least one metric.Single-run, 95%-CI t-test alerts on GitHub-hosted runners + a 70-minute suite will always produce tail-latency noise. The right long-term fix is threshold/sample-size tuning or self-hosted runners, tracked in #3169.
Change
.github/workflows/benchmark.ymlstep 7c: replaceexit ${BENCHER_EXIT_CODE:-1}with a GitHub Actions warning annotation. Steps 7a (Bencher report) and 7b (regression issue) are untouched.Follow-ups (not in this PR)
Test plan
mainshows a Bencher warning annotation instead of a red check when alerts fire.Note
Medium Risk
Changes
mainbenchmark gating behavior by no longer failing the workflow on performance regressions, which could let real regressions merge if reviewers rely on CI status. Adds new stderr-based classification logic that could mis-detect alerts and alter when the workflow fails vs warns.Overview
Benchmark CI on
mainno longer hard-fails on Bencher regression alerts. The workflow now emits a::warning::for regressions, while continuing to open/update the regression tracking issue.Adds
BENCHER_HAS_ALERTdetection by parsing Bencher stderr and gates downstream steps on it: regression-related actions run only when an actual alert is detected, and a new step explicitly failsmainonly for non-regression Bencher failures (auth/API/network/CLI).Reviewed by Cursor Bugbot for commit 12fa8f2. Bugbot is set up for automated code reviews on this repo. Configure here.
Summary by CodeRabbit
Release Notes