Skip to content

fix(ci): resolve failures from telemetry regression, unhandled rejections, and flaky tests#8680

Draft
aseemxs wants to merge 10 commits intomasterfrom
fix/ci-check
Draft

fix(ci): resolve failures from telemetry regression, unhandled rejections, and flaky tests#8680
aseemxs wants to merge 10 commits intomasterfrom
fix/ci-check

Conversation

@aseemxs
Copy link
Contributor

@aseemxs aseemxs commented Mar 20, 2026

Problem

Multiple CI checks failing across macOS, Windows, and Linux (CodeBuild). All 5849+ unit tests pass — failures are from test assertion regressions, unhandled promise rejections detected by the run_and_report script, and pre-existing flaky tests.

Root Causes & Fixes

1. Lambda URI Handler telemetry — 3 test failures (all platforms)

Introduced by: #8598 (fix(lambda): add confirmation prompt before initiating console login)

What broke: #8598 changed handleLambdaUriError from throwing a cancelled ToolkitError to calling telemetry.record() + return. This broke telemetry because run() overwrites the span result with Succeeded when the function completes normally — the record() call gets clobbered.

Fix: Revert the cancellation path to throw ToolkitError.chain(e, 'User cancelled operation', { cancelled: true }). This lets run() correctly record Cancelled as the result. Tests updated to assert the throw instead of checking telemetry directly.

Files: uriHandlers.ts (functional), uriHandlers.test.ts (test)

2. AppBuilder Walkthrough test — 1 test failure (macOS minimum)

Introduced by: Interaction between the walkthrough test (#8236) and background security scan errors from startSecurityScan.ts. The test's onDidShowMessage handler asserted on every message, but an unrelated security scan error message fired during the test.

Fix: Filter the onDidShowMessage handler to only assert on the expected overwrite prompt message, ignoring unrelated ones. Added explicit assertion that the prompt was shown.

Files: walkthrough.test.ts (test only)

3. Unhandled promise rejections — Linux CodeBuild failures (all 3 variants)

Introduced by:

What broke: The run_and_report script in CodeBuild fails the build if it detects "rejected promise not handled" in stdout. All 5849 tests pass, but these 5 unhandled rejections triggered the check.

Fix:

  • SageMaker: console.error + return instead of throw for missing env var — avoids unhandled rejection while keeping extension host alive
  • CloudFormation CodeLens: isRunning() guard + try/catch before sendRequest, errors logged at debug level
  • CloudFormation deactivate: .catch() on client.stop()

Files: server.ts, stackActionCodeLensProvider.ts, extension.ts (all functional — adds resilience)

4. CloudFormation LSP client lifecycle rejections — Linux CodeBuild minimum

Introduced by: #8275 (feat(cloudformation): Merge CloudFormation LSP integration)

What broke: The vscode-languageclient LanguageClient creates unhandled promise rejections internally when the LSP server process fails to start during CI. These rejections (connection got disposed, Client is not running) happen inside the library's handleConnectionClosed/doInitialize and cannot be caught from our code.

Fix: Skip the CloudFormation LSP client activation when AWS_TOOLKIT_AUTOMATION === 'unit'. No unit tests depend on the LSP client — the CloudFormation unit tests cover template parsing, not the language server. E2E tests are unaffected (they use a different automation value).

Files: extension.ts (functional — conditional activation)

5. Pre-existing flaky test fixes

These are not caused by any recent PR but fail intermittently across CI:

SharedCredentialsProvider (handleInvalidConsoleCredentials — does not prompt reload for non-session errors):

  • Previous test's messages leak through getTestWindow().shownMessages — the array isn't cleared between tests
  • Fix: snapshot the count before the test and assert no new messages were added

ToolkitLogger (logs to a file — logs warn):

  • Windows file I/O race: log file exists but write hasn't flushed yet, single read misses content
  • Fix: retry loop that keeps reading until expected content appears (within existing 10s timeout)

editorContext + recommendationHandler (30s timeout on macOS insiders):

  • Slow VS Code startup + async telemetry/mock setup exceeds default 30s Mocha timeout
  • Fix: bump to 60s for these two tests

Files: sharedCredentialsProvider.test.ts, toolkitLogger.test.ts, editorContext.test.ts, recommendationHandler.test.ts (all test only)

Functional vs Test-Only Changes

File Type Change
uriHandlers.ts Functional Reverts cancellation to throw for correct telemetry
server.ts Functional console.error + return instead of throw for missing env var
stackActionCodeLensProvider.ts Functional isRunning() guard + debug logging on catch
extension.ts Functional .catch() on deactivate, skip LSP in unit tests
uriHandlers.test.ts Test only Updated assertions
walkthrough.test.ts Test only Filtered onDidShowMessage handler
sharedCredentialsProvider.test.ts Test only Snapshot message count instead of absolute check
toolkitLogger.test.ts Test only Retry loop for file content read
editorContext.test.ts Test only Timeout bump to 60s
recommendationHandler.test.ts Test only Timeout bump to 60s

Testing

  • All GitHub Actions checks pass (macOS/Windows/Linux × stable/minimum/insiders × toolkit/amazonq)
  • All 3 Linux CodeBuild variants pass (stable/minimum/insiders)
  • CloudFormation LSP E2E tests pass (all 3 platforms)

@amazon-inspector-ohio
Copy link

⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done

@amazon-inspector-ohio
Copy link

✅ I finished the code review, and didn't find any security or code quality issues.

aseemxs added 5 commits March 19, 2026 19:11
The vscode-languageclient LanguageClient creates unhandled promise
rejections internally when the LSP server process fails to start
during CI. These rejections (connection disposed, client not running)
occur inside the library's handleConnectionClosed/doInitialize and
cannot be caught from our code.

Skip the LSP client entirely during unit tests since no tests depend
on it. Reverts the process.on('unhandledRejection') approach which
was too broad and wouldn't suppress VS Code's own rejection tracking.
@aseemxs aseemxs closed this Mar 20, 2026
@aseemxs aseemxs reopened this Mar 20, 2026
@amazon-inspector-ohio
Copy link

⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done

@amazon-inspector-ohio
Copy link

✅ I finished the code review, and didn't find any security or code quality issues.

@aseemxs aseemxs changed the title ci: dummy PR to check CI status fix: resolve CI failures from Lambda telemetry regression, unhandled promise rejections, and flaky test assertions Mar 20, 2026
… timeout tests

- SharedCredentialsProvider: check no *new* messages instead of
  absolute count (previous test's messages leak via getTestWindow)
- ToolkitLogger: retry reading log file content instead of single
  read (Windows file I/O flush race condition)
- editorContext/recommendationHandler: bump timeout to 60s for slow
  macOS insiders CI environment
@github-actions
Copy link

  • This pull request implements a feat or fix, so it must include a changelog entry (unless the fix is for an unreleased feature). Review the changelog guidelines.
    • Note: beta or "experiment" features that have active users should announce fixes in the changelog.
    • If this is not a feature or fix, use an appropriate type from the title guidelines. For example, telemetry-only changes should use the telemetry type.

… var

- CodeLens: log caught errors at debug level instead of silent swallow
- SageMaker server: process.exit(1) instead of return when env var
  missing — server can't function without it, shouldn't linger
@aseemxs aseemxs closed this Mar 20, 2026
@aseemxs aseemxs reopened this Mar 20, 2026
@amazon-inspector-ohio
Copy link

⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done

@amazon-inspector-ohio
Copy link

✅ I finished the code review, and didn't find any security or code quality issues.

@aseemxs aseemxs closed this Mar 20, 2026
@aseemxs aseemxs reopened this Mar 20, 2026
@amazon-inspector-ohio
Copy link

⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done

@amazon-inspector-ohio
Copy link

✅ I finished the code review, and didn't find any security or code quality issues.

@aseemxs aseemxs changed the title fix: resolve CI failures from Lambda telemetry regression, unhandled promise rejections, and flaky test assertions fix: resolve CI failures from telemetry regression, unhandled rejections, and flaky tests Mar 20, 2026
@aseemxs aseemxs closed this Mar 20, 2026
@aseemxs aseemxs reopened this Mar 20, 2026
@amazon-inspector-ohio
Copy link

⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done

@amazon-inspector-ohio
Copy link

✅ I finished the code review, and didn't find any security or code quality issues.

@aseemxs aseemxs changed the title fix: resolve CI failures from telemetry regression, unhandled rejections, and flaky tests fix(ci): resolve failures from telemetry regression, unhandled rejections, and flaky tests Mar 20, 2026
@aseemxs aseemxs closed this Mar 20, 2026
@aseemxs aseemxs reopened this Mar 20, 2026
@amazon-inspector-ohio
Copy link

⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done

@amazon-inspector-ohio
Copy link

✅ I finished the code review, and didn't find any security or code quality issues.

The SageMaker detached server runs inside the VS Code extension host
process. process.exit(1) terminates the entire test runner. Use
console.error + return instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant