Extend existing failure notifications for tracking flaky CI issues#2974
Conversation
| types: [completed] | ||
|
|
||
| jobs: | ||
| track-failures: |
There was a problem hiding this comment.
The workflow triggers on any branch failure (workflow_run fires regardless of the branch the triggering run ran on). The PR description says "on the master branch" but the if: condition only checks conclusion == 'failure' — it does not filter on branch. This means flaky-test issues will be auto-created for failures on feature branches, release branches, forks, etc., generating noise.
Consider adding a branch filter:
| track-failures: | |
| if: github.event.workflow_run.conclusion == 'failure' && github.event.workflow_run.head_branch == 'master' |
| for job_id in $(echo "$failed_jobs_json" | jq -r '.[].id'); do | ||
| job_name=$(echo "$failed_jobs_json" | jq -r ".[] | select(.id == $job_id) | .name") | ||
| step_names=$(gh api "repos/$REPO/actions/jobs/$job_id" \ | ||
| --jq '.steps[] | select(.conclusion == "failure") | .name') |
There was a problem hiding this comment.
The log download silently succeeds even when it fails (|| true). If gh run view --log-failed fails (e.g. rate-limit, token permission issue), /tmp/failed_logs.txt may be empty or contain an error message, causing the grep patterns to produce zero results. The workflow would then fall through to the "unparseable failures" path and create a generic issue for every CI failure — this is the highest-noise fallback and should be avoided.
Consider logging a warning when the log download fails so at least the issue body contains diagnostic context.
| echo "$FAILING_TESTS" | while IFS= read -r test_name; do | ||
| [ -z "$test_name" ] && continue | ||
|
|
||
| echo "Processing test: $test_name" |
There was a problem hiding this comment.
Injection risk via ${{ steps.parse.outputs.tests }}.
FAILING_TESTS is populated from parsed log output (test names) and then passed through ${{ steps.parse.outputs.tests }}. GitHub Actions interpolates ${{ ... }} expressions directly into the YAML before the runner executes the step. If a crafted test name contains shell metacharacters or YAML-breaking sequences, this could alter step behaviour.
The correct mitigation is to reference the output only via the environment variable ($FAILING_TESTS), which is already being done inside the run: script — the environment variable assignment itself is the risk. Prefer writing step outputs to a file (e.g. /tmp/failing_tests.txt) and reading that file in the subsequent step, rather than passing through ${{ steps.parse.outputs.tests }} as an env value.
The same pattern applies to FAILED_STEPS at line 197.
| body="### Another failure observed" | ||
| body="$body"$'\n\n'"- **Run:** $RUN_URL" | ||
| body="$body"$'\n'"- **Commit:** \`${COMMIT_SHA:0:10}\`" | ||
| body="$body"$'\n'"- **Date:** $run_date" |
There was a problem hiding this comment.
The GitHub Search API (used by gh issue list --search) can return stale results with a lag of a few minutes. If two CI runs fail in quick succession for the same test, the deduplication search may find no existing issue for both runs, causing two issues to be created with the same title.
A safer deduplication approach would be to search by exact title match using gh issue list --search "is:issue is:open label:flaky-test \"$issue_title\"" and then verify the title server-side with --jq, which is what the code already does — but the underlying search index latency is unavoidable. This is worth noting in the workflow comments so maintainers understand the duplicates-on-simultaneous-failures edge case.
| RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.event.workflow_run.id }} | ||
| REPO: ${{ github.repository }} | ||
| COMMIT_SHA: ${{ github.event.workflow_run.head_sha }} | ||
| run: | |
There was a problem hiding this comment.
The "unparseable failures" step creates a new issue per run with a unique title (CI failure (unparseable): run #$RUN_ID). If the log download frequently fails or the grep patterns never match (e.g. due to a log format change), this will create an unbounded number of issues — one for every CI failure.
Unlike the flaky-test and flaky-step steps, there is no deduplication here. Consider either deduplicating by searching for a generic title (e.g. CI failure (unparseable)) or enforcing a rate-limit / daily cap, or simply skipping issue creation and only writing to $GITHUB_STEP_SUMMARY for this fallback case.
| body="$body"$'\n\n'"---" | ||
| body="$body"$'\n'"*This issue was automatically created by the flaky test tracker workflow.*" | ||
| body="$body"$'\n'"*Add further failure occurrences as comments below.*" | ||
| gh issue create --repo "$REPO" \ |
There was a problem hiding this comment.
The issue body hard-codes "on the master branch" but the workflow is not restricted to master (see the branch-filter comment on line 9). This text will be misleading when issues are filed from non-master runs.
| gh issue create --repo "$REPO" \ | |
| body="$body"$'\n\n'"This test has been detected as failing in CI." |
| # Get failed jobs with their names and IDs | ||
| failed_jobs_json=$(gh api "repos/$REPO/actions/runs/$RUN_ID/jobs?filter=latest&per_page=100" \ | ||
| --jq '[.jobs[] | select(.conclusion == "failure") | {id, name}]') | ||
|
|
There was a problem hiding this comment.
The API call fetches up to 100 jobs (per_page=100) but does not handle pagination. If a workflow run has more than 100 jobs (unlikely but possible as the matrix grows), failures in jobs beyond position 100 will be silently missed. Consider adding --paginate if the gh api CLI supports it here, or at least document this limitation.
ArcticDB Code Review SummaryChanges in this PR: Extracts inline bash logic from failure_notification.yaml into standalone Python scripts (parse_ci_failures.py, track_ci_issues.py, send_slack_notification.py) with unit tests. Adds a Previously Flagged — Status After Latest Push
Open Issues
Minor / Informational
Checklist Security:
Correctness:
Testing:
|
| jobs: | ||
| on-failure: | ||
| track-failures: | ||
| runs-on: ubuntu-latest |
There was a problem hiding this comment.
The track-failures job has no branch restriction — it fires on failures from any branch (feature branches, forks' PRs, etc.). This will create GitHub issues for every flaky test run on any contributor's branch, generating significant noise.
The original workflow explicitly filtered to master or '' (scheduled runs). That filter should be preserved here, or at minimum added as a comment explaining the intentional choice to track all branches.
| runs-on: ubuntu-latest | |
| if: >- | |
| (github.event.workflow_run.conclusion == 'failure' || | |
| github.event.workflow_run.conclusion == 'timed_out') && | |
| (github.event.workflow_run.head_branch == 'master' || | |
| github.event.workflow_run.head_branch == '') |
There was a problem hiding this comment.
I think we should make the track-failures run on PR branches only after a human presses a button. Otherwise we will fill up github with false positives (e.g. I pushed a small change to my PR but broke all my tests because of dumb bug)
|
|
||
| echo "Fetching failed jobs for run $RUN_ID..." | ||
|
|
||
| failed_jobs_json=$(gh api "repos/$REPO/actions/runs/$RUN_ID/jobs?filter=latest&per_page=100" \ |
There was a problem hiding this comment.
The jobs API is called with per_page=100 but without pagination. A large matrix workflow can exceed 100 jobs (ArcticDB's build matrix is already sizeable). Any failures in jobs beyond position 100 will be silently missed, causing the issue tracker to never see those test failures.
Consider using gh api --paginate or at least adding a comment noting the limit. Note that --paginate with --jq requires using jq -s to merge pages, e.g.:
failed_jobs_json=$(gh api --paginate "repos/$REPO/actions/runs/$RUN_ID/jobs?filter=latest&per_page=100" \
--jq '[.jobs[] | select(.conclusion == "failure") | {id, name}]' | jq -s 'add // []')There was a problem hiding this comment.
I agree with claude's comment
| echo "$failed_steps" | ||
|
|
||
| # --- Download logs and parse test failures --- | ||
| gh run view "$RUN_ID" --repo "$REPO" --log-failed > /tmp/failed_logs.txt 2>&1 || true |
There was a problem hiding this comment.
The || true here silently swallows failures from gh run view --log-failed. If this command fails — due to a rate limit, insufficient token permissions (actions: read is granted, but log access can be restricted on forks), or a transient API error — the file /tmp/failed_logs.txt will either be empty or contain an error message like HTTP 403. The subsequent grep calls will match nothing, both FAILING_TESTS and FAILED_STEPS will be empty, and the workflow will fall through to the unparseable fallback, creating a new ci-failure issue for every such run.
At minimum, log the exit code or file content so failures are visible in the step logs:
gh run view "$RUN_ID" --repo "$REPO" --log-failed > /tmp/failed_logs.txt 2>&1 || {
echo "WARNING: gh run view --log-failed failed (exit $?); test name parsing will be skipped"
}| RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.event.workflow_run.id }} | ||
| REPO: ${{ github.repository }} | ||
| COMMIT_SHA: ${{ github.event.workflow_run.head_sha }} | ||
| FAILING_TESTS: ${{ steps.parse.outputs.tests }} |
There was a problem hiding this comment.
Security: step output injection via ${{ ... }} expression interpolation.
FAILING_TESTS and FAILED_STEPS are populated from parsed CI log content — test names and step names that ultimately come from source code, commit messages, and workflow definitions. GitHub Actions evaluates ${{ steps.parse.outputs.tests }} by substituting the value directly into the YAML before the runner executes the step.
If a test name or step name contains characters that break the YAML/shell context (e.g. a test name with \n, backticks, or $(...) sequences), this can alter step behaviour. While the workflow_run trigger somewhat limits exposure (only code on this repo can influence test names), it is still best practice to avoid interpolating step outputs directly into env: blocks.
The safe pattern is to write the parsed outputs to a file in the parse step and read that file in the issues step:
# In parse step:
echo "$failing_tests" > /tmp/failing_tests.txt
echo "$infra_steps" > /tmp/failed_steps.txt
# In issues step (no ${{ }} interpolation of untrusted content):
FAILING_TESTS=$(cat /tmp/failing_tests.txt)
FAILED_STEPS=$(cat /tmp/failed_steps.txt)This avoids the ${{ }} interpolation path entirely for untrusted content.
There was a problem hiding this comment.
This is probably fine because of the somewhat strict regex pattern matching, but still worth considering this claude's suggestion. I guess if we extract the above to bash script we can pass --output-dir to it where it will put all the text files needed for next steps.
|
|
||
| # --- Fallback for unparseable failures --- | ||
| if [ -z "$FAILING_TESTS" ] && [ -z "$FAILED_STEPS" ]; then | ||
| issue_title="CI failure (unparseable): run #$RUN_ID" |
There was a problem hiding this comment.
The unparseable-failure fallback creates a new issue for every CI run using a unique title (CI failure (unparseable): run #$RUN_ID). There is no deduplication here.
The primary scenario that triggers this path is when the log download fails (see the || true concern on line 68). If that happens consistently — e.g. due to a log permission change or gh CLI upgrade breaking the --log-failed flag — every CI failure on master will produce a new issue indefinitely.
Consider one of:
- Writing to
$GITHUB_STEP_SUMMARYonly, instead of opening an issue, so humans can investigate without issue spam. - Searching for a generic open issue titled
CI failures (unparseable)and adding a comment to it rather than creating a new issue per run. - Capping at a daily or weekly issue (use a date-stamped title like
CI failure (unparseable): YYYY-MM-DD).
| (github.event.workflow_run.head_branch == 'master' || | ||
| github.event.workflow_run.head_branch == '') && | ||
| always() && | ||
| needs.track-failures.result != 'skipped' |
There was a problem hiding this comment.
The notify-slack condition has a subtle logic gap. The combination of always() and needs.track-failures.result != 'skipped' means this job will also fire when track-failures itself fails (e.g. due to a permissions error, API rate limit, or script bug). In that case needs.track-failures.outputs.slack_summary will be empty, and the Slack message will be sent but with no failure details — potentially confusing on-call engineers.
Consider adding needs.track-failures.result != 'failure' to the condition, or at minimum handle the case where SUMMARY is empty by adding a fallback message indicating that issue tracking itself failed:
if: >-
(github.event.workflow_run.head_branch == 'master' ||
github.event.workflow_run.head_branch == '') &&
always() &&
needs.track-failures.result != 'skipped' &&
needs.track-failures.result != 'cancelled'There was a problem hiding this comment.
I think we should indeed notify in case of track-failures failure (otherwise we'll be losing coverage of failures silently), but we should probably have a friendly error message in that case.
| "type": "section", | ||
| "text": { | ||
| "type": "mrkdwn", | ||
| "text": "$icon *$WORKFLOW_NAME* $CONCLUSION on \`$BRANCH\`" |
There was a problem hiding this comment.
Shell variables injected unescaped into a JSON string literal inside a heredoc.
$WORKFLOW_NAME, $CONCLUSION, and $BRANCH are substituted directly into the JSON payload string. If any of these contain double quotes, backslashes, or newlines (e.g. a workflow name like Build "with" quotes), the resulting JSON will be malformed and curl will send an invalid payload, silently failing with -f.
The escaped_summary is already being JSON-escaped via python3. Apply the same treatment to the other variables before inserting them:
esc_workflow=$(printf '%s' "$WORKFLOW_NAME" | python3 -c "import sys,json; print(json.dumps(sys.stdin.read())[1:-1])")
esc_conclusion=$(printf '%s' "$CONCLUSION" | python3 -c "import sys,json; print(json.dumps(sys.stdin.read())[1:-1])")
esc_branch=$(printf '%s' "$BRANCH" | python3 -c "import sys,json; print(json.dumps(sys.stdin.read())[1:-1])")Then use $esc_workflow, $esc_conclusion, $esc_branch in the heredoc.
| jobs: | ||
| on-failure: | ||
| track-failures: | ||
| runs-on: ubuntu-latest |
There was a problem hiding this comment.
I think we should make the track-failures run on PR branches only after a human presses a button. Otherwise we will fill up github with false positives (e.g. I pushed a small change to my PR but broke all my tests because of dumb bug)
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| REPO: ${{ github.repository }} | ||
| run: | | ||
| gh label create "flaky-test" --repo "$REPO" --color "FBCA04" --description "Flaky test auto-tracked by CI" 2>/dev/null || true |
There was a problem hiding this comment.
Should we use an autogenerated label for all of these? (So that we can filter and remove all in case of noise or bugs)
Or prefix all these labels with autogenerated e.g. autogenerated-flaky-test
There was a problem hiding this comment.
Nit: Maybe rename the workflow
|
|
||
| echo "Fetching failed jobs for run $RUN_ID..." | ||
|
|
||
| failed_jobs_json=$(gh api "repos/$REPO/actions/runs/$RUN_ID/jobs?filter=latest&per_page=100" \ |
There was a problem hiding this comment.
I agree with claude's comment
| for job_id in $(echo "$failed_jobs_json" | jq -r '.[].id'); do | ||
| job_name=$(echo "$failed_jobs_json" | jq -r ".[] | select(.id == $job_id) | .name") | ||
| step_names=$(gh api "repos/$REPO/actions/jobs/$job_id" \ | ||
| --jq '.steps[] | select(.conclusion == "failure") | .name') |
There was a problem hiding this comment.
I don't think large bash scripts should live standalone in github workflows.
What do you think about extracting that to a separate file and calling into it with interface like:
track_failures --run-id something
and here we can just parse the stdout.
This will also make manual testing easier. E.g. you can run the track_failures script locally against a few example failures and see the output is correct. I think including such evidence in PR description would also be helpful.
| RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.event.workflow_run.id }} | ||
| REPO: ${{ github.repository }} | ||
| COMMIT_SHA: ${{ github.event.workflow_run.head_sha }} | ||
| FAILING_TESTS: ${{ steps.parse.outputs.tests }} |
There was a problem hiding this comment.
This is probably fine because of the somewhat strict regex pattern matching, but still worth considering this claude's suggestion. I guess if we extract the above to bash script we can pass --output-dir to it where it will put all the text files needed for next steps.
| body="$body"$'\n'"*This issue was automatically created by the flaky test tracker workflow.*" | ||
| body="$body"$'\n'"*Add further failure occurrences as comments below.*" | ||
|
|
||
| track_item "$test_name" "Flaky test" "flaky-test" "$body" |
There was a problem hiding this comment.
I think opening a separate issue for each test can be problematic.
Most probably if >10 tests fail together their failures are correlated. This can happen in cases:
- Someone accidentally breaks 1000 tests on their PR and fat fingers the button to track failure
- Some fixture or shared setup breaks and results in dozens of failed tests
I guess we can try to produce a single item if >10 failures.
| body="$body"$'\n'"*This issue was automatically created by the flaky test tracker workflow.*" | ||
| body="$body"$'\n'"*Add further failure occurrences as comments below.*" | ||
|
|
||
| track_item "$step_entry" "Flaky step" "flaky-step" "$body" |
There was a problem hiding this comment.
Some general gh action outage can cause a lot of steps to fail together. Grouping them if many makes sense.
| (github.event.workflow_run.head_branch == 'master' || | ||
| github.event.workflow_run.head_branch == '') && | ||
| always() && | ||
| needs.track-failures.result != 'skipped' |
There was a problem hiding this comment.
I think we should indeed notify in case of track-failures failure (otherwise we'll be losing coverage of failures silently), but we should probably have a friendly error message in that case.
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: > | ||
| python3 .github/scripts/parse_ci_failures.py | ||
| --run-id "${{ steps.meta.outputs.run_id }}" |
There was a problem hiding this comment.
The run_id from a workflow_dispatch is user-supplied and still reaches the shell via ${{ steps.meta.outputs.run_id }} expression interpolation (GitHub Actions evaluates ${{ }} before the runner executes the step). If a user inputs a value containing ", `, or $(), it may break the quoted argument or execute shell code.
The meta step already follows the safe pattern of routing context values through env: variables. Apply the same here:
| --run-id "${{ steps.meta.outputs.run_id }}" | |
| - name: Parse CI failures | |
| env: | |
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | |
| META_RUN_ID: ${{ steps.meta.outputs.run_id }} | |
| META_REPO: ${{ github.repository }} | |
| run: | | |
| python3 .github/scripts/parse_ci_failures.py \ | |
| --run-id "$META_RUN_ID" \ | |
| --repo "$META_REPO" \ | |
| --output-dir /tmp/ci_failures |
Same fix applies to the identical pattern in the "Create or update issues" step (line 85).
| "--json", "number,title", | ||
| ) | ||
| if not raw: | ||
| return None |
There was a problem hiding this comment.
GitHub Search treats [, ], :, /, and . as special syntax. Pytest parametrized test names like tests/test_foo.py::test_bar[param1-param2] contain all of these. The search query '"tests/test_foo.py::test_bar[param1-param2]" in:title' may return zero results even when an exact-title issue exists, causing find_existing_issue to return None and create a duplicate issue on every subsequent run.
The exact-title match on line 63 (if issue["title"] == title) is correct dedup logic, but it only runs if the search returns any results. Consider broadening the search (e.g. drop brackets/colons from the query term) and relying solely on the exact-match filter, or escaping search-special characters before passing the title to --search.
|
|
||
| if item_type == "test": | ||
| label = "flaky-test" | ||
| heading = f"Grouped CI test failures ({count} tests)" |
There was a problem hiding this comment.
The grouped-issue title embeds run #{self.run_id}, making it unique per run. This means grouped failures have no deduplication: if 15+ tests fail consistently across many CI runs, a new issue is created for every single run — the same unbounded issue growth the handle_unparseable fix resolved.
Consider using a stable title (e.g. f"Grouped CI {item_type} failures") and applying the same find-or-create pattern as track_item / handle_unparseable: search for an open issue with that title, add a comment if found, create only if not found.
| raw = run_gh( | ||
| "api", "--paginate", | ||
| f"repos/{repo}/actions/runs" | ||
| f"?branch={branch}&status=failure&per_page=5", |
There was a problem hiding this comment.
Two issues on this line:
1. status=failure misses timed_out runs
The GitHub Actions Runs API treats failure and timed_out as separate conclusion (and status) values. Querying status=failure will not return runs that concluded as timed_out. The code then checks run["conclusion"] in ("failure", "timed_out") on the results, but timed_out runs are never returned by this query, so they will always be missed.
Fix: query with status=completed and filter locally, or make a second query with status=timed_out:
for status in ("failure", "timed_out"):
raw = run_gh(
"api", "--paginate",
f"repos/{repo}/actions/runs?branch={branch_encoded}&status={status}&per_page=10",
...
)2. Branch name is not URL-encoded
branch comes from .head.ref via get_pr_head(). A branch named feat/foo&status=success would inject extra query parameters into the URL, returning wrong results (or all runs regardless of status). Apply URL encoding:
from urllib.parse import quote
branch_encoded = quote(branch, safe='')| if: >- | ||
| github.event_name == 'issue_comment' && | ||
| github.event.issue.pull_request && | ||
| contains(github.event.comment.body, '/analyse-failures') |
There was a problem hiding this comment.
The /analyse-failures command has no collaborator check. Any GitHub user can comment /analyse-failures on any PR, causing the workflow to consume actions: read quota, call gh issue list/create/comment (using issues: write), and post a PR comment (using pull-requests: write). On a public repo this is open to the world.
Consider gating on author_association:
| contains(github.event.comment.body, '/analyse-failures') | |
| contains(github.event.comment.body, '/analyse-failures') && | |
| (github.event.comment.author_association == 'MEMBER' || | |
| github.event.comment.author_association == 'OWNER' || | |
| github.event.comment.author_association == 'COLLABORATOR') |
This still allows all repo members/maintainers while blocking external actors.
|
|
||
| jobs: | ||
| on-failure: | ||
| resolve-pr-run: |
There was a problem hiding this comment.
I'm ok with triggering via PR comments but I think it has a couple downsides:
- Pollutes the PR description page with things not related to the PR (e.g. flaky master tests)
- Can't easily trigger for a failure before the last
I guess I was thinking of adding a button to the job summary of a failed "Build and Test" or "Build with conda" or other like:
failure-analysis-link:
needs: [cpp-test-linux, cpp-test-windows, cpp-test-macos, build-python-wheels-linux,
build-python-wheels-windows, build-python-wheels-macos,
persistent_storage_verify_linux, persistent_storage_verify_windows]
if: always() && failure()
runs-on: ubuntu-latest
steps:
- name: Write failure analysis link to summary
run: |
url="${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/actions/workflows/failure_notification.yaml"
cat >> "$GITHUB_STEP_SUMMARY" <<'EOF'
## Failure Analysis
EOF
echo "[**Analyse failures →**]($url)" >> "$GITHUB_STEP_SUMMARY"
echo "" >> "$GITHUB_STEP_SUMMARY"
echo "Run ID: \`${GITHUB_RUN_ID}\`" >> "$GITHUB_STEP_SUMMARY"
I don't feel strongly about this, but using a link like above would allow us to delete reselov_pr_run.py and simplify logic here
I guess using both activation mechanisms is also an option
There was a problem hiding this comment.
I think that I prefer the implemented approach a bit more because:
- It doesn't require changes to "Build and Test", "Build with conda", etc
- Makes flaky tests a bit more clear in the PR itself, thus easier to see for reviewers
| notify-slack: | ||
| runs-on: ubuntu-latest | ||
| needs: [track-failures] | ||
| # Only send Slack notifications for automatic master triggers, not manual dispatch |
There was a problem hiding this comment.
Could be argued that manual dispatch calls on PRs are still flakyness present on master.
Still probably better to keep the notifications just for master to avoid excessive noise.
| def download_failed_logs(repo: str, run_id: str, output_path: str) -> bool: | ||
| """Download logs for the failed run. Returns True on success.""" | ||
| result = subprocess.run( | ||
| ["gh", "run", "view", run_id, "--repo", repo, "--log-failed"], |
| ) | ||
| with open(output_path, "w") as f: | ||
| f.write(result.stdout) | ||
| if result.returncode != 0: |
There was a problem hiding this comment.
Maybe we can add some error handling in the general run_gh instead of just this one gh run view command
There was a problem hiding this comment.
Are these tests ever ran on the CI?
I think it makes sense to enforce they pass before merging a PR.
There was a problem hiding this comment.
added a flow to test them
| req = urllib.request.Request( | ||
| webhook_url, | ||
| data=data, | ||
| headers={"Content-Type": "application/json"}, | ||
| method="POST", | ||
| ) |
There was a problem hiding this comment.
How does authentication against the slack webhook work? I see we in gh workflow we have the url itself as a secret.
Any chance urllib exception will leak the url in logs or something similar? It seems like in exception handling we at least don't log the url which is good.
There was a problem hiding this comment.
Added extra handling for the URLError.
But in any case, since SLACK_WEBHOOK_URL is a repository secret, GitHub Actions automatically masks its value in all log output. Any occurrence of the secret's value gets replaced with ***. So even if it did leak through a traceback, it would be masked.
| - name: Checkout repository | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Send Slack notification |
There was a problem hiding this comment.
We were previously using an existing action ravsamhq/notify-slack-action@be814b201e233b2dc673608aa46e5447c8ab13f2.
Any reason to write our own slack notification code instead of building the message and using the existing action?
There was a problem hiding this comment.
The ravsamhq/notify-slack-action doesn't support what we need. It uses Slack's legacy attachments format with a fixed schema — no Block Kit support and no way to pass an arbitrary payload. We'd lose the structured summary (per-test
lines, grouped failure messages, timeout indicators) and the tracker-failure warning.
| GROUPING_THRESHOLD = 10 | ||
|
|
||
|
|
||
| def run_gh(*args: str, check: bool = True) -> str: |
There was a problem hiding this comment.
This method is defined in more than one place, maybe we can add a common method in utils.py?
| self.run_id = run_id | ||
| self.run_url = run_url | ||
| self.short_sha = commit_sha[:10] | ||
| self.run_date = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC") |
There was a problem hiding this comment.
The run might be e.g. a week old, so the time of report might be misleading. Would be useful to use the time of the failure so we can cross reference against e.g. github outages.
| """Download logs for the failed run. Returns True on success.""" | ||
| try: | ||
| output = run_gh("run", "view", run_id, "--repo", repo, "--log-failed") | ||
| except Exception: |
There was a problem hiding this comment.
Two issues here:
1. Overly broad exception type. run_gh raises subprocess.CalledProcessError on failure — catching Exception also swallows unexpected errors such as MemoryError or a programming bug inside run_gh itself. Prefer except subprocess.CalledProcessError: to be explicit about what you're tolerating.
2. Double error messages. When gh run view --log-failed fails, run_gh already prints "gh command failed (exit N): ..." to stderr before raising. Then this except block prints a second "WARNING: gh run view --log-failed failed; ..." message. Both appear in the same log for the same failure event.
To avoid the duplicate you could call run_gh with check=False and check whether the returned string is empty, but that conflates "command failed" with "no failed logs". The simplest targeted fix is:
| except Exception: | |
| except subprocess.CalledProcessError: |
and add import subprocess at the top of the file (or rely on it being transitively available via utils). This keeps the warning message but narrows the catch to the expected failure mode.
| body=$(printf '%s\n' "$SUMMARY" \ | ||
| | sed 's/:rotating_light:/🚨/g' \ | ||
| | sed 's/:warning:/⚠️/g' \ | ||
| | sed 's/:question:/❓/g' \ | ||
| | sed 's/:hourglass:/⏳/g' \ | ||
| | sed 's/\*\([^*]*\)\*/**\1**/g') |
There was a problem hiding this comment.
The sed pipeline converts Slack emoji codes and *bold* to GitHub Markdown equivalents, but leaves Slack's <url|text> link syntax unconverted. The track_ci_issues.py output produces lines like:
🚨 **New** — `TestSuite.TestName` (<https://github.com/.../issues/1|issue>)
The (<url|text>) part won't render as a hyperlink in GitHub Markdown — it will appear as literal angle-bracket text. Add a sed substitution to convert Slack links to Markdown:
| body=$(printf '%s\n' "$SUMMARY" \ | |
| | sed 's/:rotating_light:/🚨/g' \ | |
| | sed 's/:warning:/⚠️/g' \ | |
| | sed 's/:question:/❓/g' \ | |
| | sed 's/:hourglass:/⏳/g' \ | |
| | sed 's/\*\([^*]*\)\*/**\1**/g') | |
| body=$(printf '%s\n' "$SUMMARY" \ | |
| | sed 's/:rotating_light:/🚨/g' \ | |
| | sed 's/:warning:/⚠️/g' \ | |
| | sed 's/:question:/❓/g' \ | |
| | sed 's/:hourglass:/⏳/g' \ | |
| | sed 's/\*\([^*]*\)\*/**\1**/g' \ | |
| | sed 's/<\([^|>]*\)|\([^>]*\)>/[\2](\1)/g') |
| run_gh("api", "repos/owner/repo") | ||
| captured = capsys.readouterr() | ||
| assert "gh command failed" in captured.err | ||
| assert "Bad credentials" in captured.err |
There was a problem hiding this comment.
I feel like these tests are a bit too much for a simple utility to run a github command.
| run_gh("api", "repos/owner/repo") | ||
| captured = capsys.readouterr() | ||
| assert "gh command failed" in captured.err | ||
| assert "Bad credentials" in captured.err |
Replaces the inline bash in
failure_notification.yamlwith standalone Pythonscripts that parse CI failures, track them as GitHub issues, and send Slack
notifications. The new implementation adds three capabilities that the bash
version lacked:
names are recognised as timeouts and tracked under a dedicated
CI timeoutissue instead of the generic "unparseable" bucket.
/analyse-failureson a PRto analyse its most recent failed CI run and get results posted back as a
comment.
json.dumpsinstead of bash string interpolation, eliminating broken messages from
special characters in test names.
How failures are classified
--conclusion timed_outfrom GitHub forces thetimeoutpath regardless.Real-world examples
Example 1: Test failure — run 24664309055
Context:
Building master on push, conclusionfailure.One failed job:
3.9 Linux / integration-DefaultCache, failed step:Run test.Logs contain:
Pipeline output:
failure_kind.txt→test_failurefailing_tests.txt→tests/integration/toolbox/test_library_tool.py::test_overwrite_append_datafailed_steps.txt→ empty (Run testfiltered as test-runner step)Flaky test: tests/integration/toolbox/test_library_tool.py::test_overwrite_append_datawithflaky-testlabel:rotating_light: *New* — \tests/integration/toolbox/test_library_tool.py::test_overwrite_append_data``Example 2: Infrastructure failure — run 24807645574
Context:
Building master on schedule, conclusionfailure.One failed job:
3.11 Windows / compile (...), failed step:Install Required MSVC.No test logs (compile job, not a test job).
Pipeline output:
failure_kind.txt→infra_failurefailing_tests.txt→ emptyfailed_steps.txt→Install Required MSVCFlaky step: Install Required MSVCwithflaky-steplabel:rotating_light: *New* — \Install Required MSVC``Example 3: Timeout — run 24841138872
Context:
Building master on workflow_dispatch, conclusionfailure.Two failed jobs:
3.11 Windows / unit-DefaultCache— stepRun testtimed out after 120 min3.10 Windows / stress-DefaultCache— stepRun testtimed outLogs show tests running (at 95% progress) but no
FAILED/ERRORlines — therunner was killed before completion.
Pipeline output:
failure_kind.txt→timeout(only test-runner steps, no parsed test names)failing_tests.txt→ emptyfailed_steps.txt→ empty (Run testfiltered)CI timeoutwithci-failurelabel:hourglass: *Timeout*