CI Workflows and Smoke Tests Analysis

This document catalogs all CI workflows and smoke/build-test agentic workflows in gh-aw-firewall, describing what each tests, when it runs, what real-world scenario it validates, coverage gaps, and how it relates to the Node.js integration test suite.

CI Workflow Overview
Core CI Workflows
Smoke Test Workflows (Agentic)
Build-Test Workflows (Agentic)
Security & Compliance Workflows
Infrastructure Workflows
Relationship Map: CI vs Integration Tests
Coverage Gap Analysis

CI Workflow Overview

The repo has three tiers of testing:

Tier	Type	Count	Purpose
Unit	Jest (src/*.test.ts)	19 files	Fast feedback on individual modules
Integration	Jest (tests/integration/*.test.ts)	26 files	End-to-end AWF container execution
Smoke/Build-Test	gh-aw compiled workflows (.lock.yml)	28 workflows	Real AI agent execution inside AWF sandbox
CI	Hand-written GitHub Actions (.yml)	15 workflows	Build, lint, type-check, security, coverage

Core CI Workflows

1. `test-integration.yml` — TypeScript Type Check

File: .github/workflows/test-integration.yml

Attribute	Value
What it tests	TypeScript type-checking via `npm run type-check`. Despite the filename, this does NOT run integration tests.
Triggers	Push to main, PR to main, manual dispatch
Timeout	5 minutes
Real-world mapping	Validates that TypeScript code compiles without type errors before merge
Gaps	Only checks types, not runtime behavior. Name is misleading (suggests integration tests).
Integration test relationship	Complementary — type checking catches compile-time errors; integration tests catch runtime issues.

2. `test-chroot.yml` — Chroot Integration Tests

File: .github/workflows/test-chroot.yml

Attribute	Value
What it tests	Runs the chroot integration test suite across 4 parallel jobs: Language Support, Package Managers, /proc Filesystem, Edge Cases
Triggers	Push to main, PR to main, manual dispatch
Timeout	30-45 minutes per job
Real-world mapping	Validates that the chroot-based filesystem isolation works correctly with multiple languages (Node, Python, Go, Java, .NET, Ruby, Rust) and package managers
Gaps	Sequential dependency: package-managers waits for languages job. No macOS testing.
Integration test relationship	Direct 1:1 mapping — this workflow runs `tests/integration/chroot-languages.test.ts`, `chroot-package-managers.test.ts`, `chroot-procfs.test.ts`, and `chroot-edge-cases.test.ts`

Jobs breakdown:

test-chroot-languages — Sets up Node.js, Python 3.12, Go 1.22, Java 21, .NET 8.0. Builds containers locally. Runs chroot-languages integration tests.
test-chroot-package-managers (needs: languages) — Adds Ruby 3.2, Rust stable. Runs chroot-package-managers integration tests. 45-minute timeout.
test-chroot-procfs (parallel) — Tests /proc filesystem access within chroot. Python, Java only.
test-chroot-edge-cases (parallel) — Tests edge cases. Node.js only.

Key detail: Containers are built locally (docker build), so source changes to entrypoint.sh and docker-manager.ts ARE reflected in tests.

3. `test-coverage.yml` — Test Coverage

File: .github/workflows/test-coverage.yml

Attribute	Value
What it tests	Runs unit tests with coverage collection. On PRs: compares coverage against base branch and fails on regression. On push: generates coverage summary.
Triggers	Push to main, PR to main (ignoring .md files)
Timeout	15 minutes
Real-world mapping	Ensures PRs don't reduce test coverage — guards against "add feature, skip tests" PRs
Gaps	Only covers unit tests (src/*.test.ts), not integration tests. Node.js 20 only (build.yml tests 20+22).
Integration test relationship	Only measures coverage of unit tests. Integration test coverage is not tracked.

Notable features:

Checks out base branch to compute coverage diff
Posts coverage comparison as PR comment
Uploads coverage artifacts (30-day retention)
Fails PR if coverage regresses

4. `test-action.yml` — Setup Action Tests

File: .github/workflows/test-action.yml

Attribute	Value
What it tests	Tests the `action.yml` composite action that installs AWF from GitHub releases
Triggers	Push to main, PR to main (ignoring .md files), manual dispatch
Timeout	5-10 minutes per job
Real-world mapping	Validates that users can install AWF via `uses: github/gh-aw-firewall@v1` in their workflows
Gaps	Only tests installation, not actual firewall functionality. Tests version v0.7.0 specifically (may go stale).
Integration test relationship	No overlap — tests the GitHub Action packaging, not the firewall itself

Jobs:

test-action-latest — Install latest version, verify awf --version and awf --help work
test-action-specific-version — Install v0.7.0, verify exact version/image-tag outputs match
test-action-with-images — Install v0.7.0 with pull-images: true, verify Docker images are pulled
test-action-invalid-version — Install invalid-version, verify action fails gracefully

5. `test-examples.yml` — Examples Test

File: .github/workflows/test-examples.yml

Attribute	Value
What it tests	Runs example shell scripts from `examples/` directory as smoke tests
Triggers	Push to main, PR to main (ignoring .md files), manual dispatch
Timeout	15 minutes
Real-world mapping	Validates that documentation examples actually work — prevents stale README instructions
Gaps	Skips `github-copilot.sh` (requires GITHUB_TOKEN). Only 4 of 5 examples tested.
Integration test relationship	Complementary — examples test real AWF invocations from shell scripts, while integration tests use the Jest/TypeScript test runner

Examples tested:

basic-curl.sh — Basic domain allow/block with curl
using-domains-file.sh — Domain list from file
debugging.sh — Debug mode with --keep-containers
blocked-domains.sh — Verify blocked domains return errors

6. `build.yml` — Build Verification

File: .github/workflows/build.yml

Attribute	Value
What it tests	Builds TypeScript project and runs linter across Node.js 20 and 22 matrix
Triggers	Push to main, PR to main, manual dispatch
Timeout	10 minutes
Real-world mapping	Ensures the project builds successfully on supported Node.js versions
Gaps	No Node.js 18 testing (though `pkg` in release uses node18 targets). No test execution.
Integration test relationship	Prerequisite — if build fails, nothing else runs. No direct test overlap.

7. `lint.yml` — ESLint

File: .github/workflows/lint.yml

Attribute	Value
What it tests	Runs ESLint on TypeScript source
Triggers	Push to main, PR to main (ignoring .md files)
Timeout	5 minutes
Real-world mapping	Code quality enforcement
Gaps	Duplicated with `build.yml` which also runs `npm run lint`.
Integration test relationship	None — code quality only

Smoke Test Workflows (Agentic)

These are gh-aw agentic workflows compiled from .md source files into .lock.yml GitHub Actions workflows. They run actual AI agents (Claude, Copilot, Codex, Gemini) inside the AWF sandbox.

Post-processing: All .lock.yml files are post-processed by scripts/ci/postprocess-smoke-workflows.ts which replaces GHCR image references with local builds (--build-local), removes sparse-checkout, and installs AWF from source.

8. `smoke-claude.lock.yml` — Smoke Claude

Source: smoke-claude.md

Attribute	Value
What it tests	Claude Code engine running inside AWF sandbox with GitHub API, Playwright, file I/O, and bash tools
Engine	`claude` (max 8 turns)
Triggers	Every 12h (schedule), PR (opened/synchronize/reopened), manual dispatch
Timeout	10 minutes
Network allowed	defaults, github, playwright
Tools	github (repos, pull_requests), playwright, bash
Safe outputs	add-comment (hide older), add-labels (smoke-claude)
Real-world mapping	Validates that Claude Code can operate within AWF's network sandbox: GitHub API access via MCP, browser automation via Playwright, local file operations — the core use case for agentic workflows
Gaps	Non-deterministic (AI agent may behave differently). No HTTPS blocking verification.
Integration test relationship	High-level end-to-end complement. Integration tests verify AWF mechanics (iptables, proxy); this verifies an actual AI agent works through the firewall.

Test requirements:

GitHub MCP: Review last 2 merged PRs
Playwright: Navigate to github.com, verify page title
File writing: Create test file, verify with cat
Bash: Execute commands to verify file creation
Post-step: Validate safe outputs were invoked (add_comment for PR triggers)

9. `smoke-copilot.lock.yml` — Smoke Copilot

Source: smoke-copilot.md

Attribute	Value
What it tests	Copilot engine running inside AWF sandbox with MCP, Playwright, web-fetch, and agentic-workflows tools
Engine	`copilot`
Triggers	Every 12h, PR, manual dispatch
Timeout	5 minutes
Network allowed	defaults, node, github, playwright
Tools	agentic-workflows, cache-memory, edit, bash, github, playwright, web-fetch
Real-world mapping	Validates Copilot CLI agent works through AWF with broader network access (node registries) and additional tools
Gaps	Shorter timeout (5min) may cause flaky failures. No blocked-domain verification.
Integration test relationship	Similar to smoke-claude but for Copilot engine. Tests a different engine implementation path.

10. `smoke-codex.lock.yml` — Smoke Codex

Source: smoke-codex.md

Attribute	Value
What it tests	Codex engine with extended tool suite: GH CLI safe inputs, Tavily web search, discussion interactions, and AWF project build
Engine	`codex`
Triggers	Every 12h, PR, manual dispatch
Timeout	15 minutes
Network allowed	defaults, github, playwright
Tools	cache-memory, github, playwright, edit, bash
Safe outputs	add-comment, create-issue, add-labels, hide-comment
Imports	shared/gh.md, shared/mcp/tavily.md, shared/reporting.md, shared/github-queries-safe-input.md
Real-world mapping	Most comprehensive smoke test — validates safe-inputs (gh CLI), Tavily MCP, discussion API, and build capability
Gaps	Complex prompt may cause non-deterministic failures. Build step (`npm ci && npm run build`) adds latency.
Integration test relationship	Tests discussion interactions and create-issue safe outputs that integration tests don't cover

Additional test requirements beyond Claude/Copilot:

Safe Inputs GH CLI: Query PRs via safeinputs-gh
Tavily web search: Search for "GitHub Agentic Workflows Firewall"
Discussion interaction: Comment on latest discussion
Build AWF: Run npm ci && npm run build inside sandbox

11. `smoke-gemini.lock.yml` — Smoke Gemini

Source: smoke-gemini.md

Attribute	Value
What it tests	Gemini engine with same extended tool suite as Codex smoke test
Engine	`gemini`
Triggers	Every 12h, PR, manual dispatch
Timeout	15 minutes
Real-world mapping	Validates Gemini (Google) engine works through AWF — important for multi-engine support
Gaps	Same as Codex. Identical test requirements — could share test definition via imports.
Integration test relationship	Same as Codex — tests a different engine path through the same infrastructure

12. `smoke-chroot.lock.yml` — Smoke Chroot

Source: smoke-chroot.md

Attribute	Value
What it tests	Chroot filesystem isolation by comparing host vs chroot runtime versions (Python, Node.js, Go)
Engine	`copilot`
Triggers	PR (with path filter: src/, containers/, package.json, smoke-chroot.md), manual dispatch
Timeout	20 minutes
Network allowed	defaults, github
Tools	github (repos, pull_requests), bash
Real-world mapping	Validates the core chroot feature: host binaries must be accessible inside the container with matching versions
Gaps	Only tests 3 runtimes (Python, Node, Go). No Java/.NET/Ruby/Rust version comparison. Path-filtered — won't run on non-code PRs.
Integration test relationship	Overlaps with `chroot-languages.test.ts` but approaches differently: smoke test runs `awf` → agent compares versions; integration test runs `awf` → Jest assertions compare versions

Unique architecture:

Pre-steps capture host versions, run awf --skip-pull for each runtime, compare
Agent only reads result files and creates PR comment with comparison table
Uses --skip-pull (locally-built containers from pre-steps)

Build-Test Workflows (Agentic)

These are agentic workflows that clone external test repositories and run real build/test commands through the AWF sandbox. They validate that AWF's network filtering allows language-specific package managers to function correctly.

All build tests are combined into a single build-test.lock.yml workflow:

Engine: copilot
Triggers: PR (opened/synchronize/reopened), manual dispatch
Tools: bash, github (with GH_AW_GITHUB_MCP_SERVER_TOKEN)
MCP: ghcr.io/github/gh-aw-mcpg container
Safe outputs: add-comment (single combined table), add-labels (build-test)
Error handling: Per-ecosystem failure tracking, table-based reporting
Test repos: Mossaka/gh-aw-firewall-test-{language}

13. `build-test.lock.yml` — Build Test Suite

Attribute	Value
What it tests	All 8 ecosystems in a single workflow: Bun (elysia, hono), C++ (fmt, json), Deno (oak, std), .NET (hello-world, json-parse), Go (color, env, uuid), Java (gson, caffeine), Node.js (clsx, execa, p-limit), Rust (fd, zoxide)
Runtimes	node 20, go 1.22, rust stable, java 21, dotnet 8.0
Network	defaults, github, node, go, rust, crates.io, java, dotnet, bun.sh, deno.land, jsr.io, dl.deno.land
Timeout	45 minutes
Real-world mapping	Validates AWF allows language-specific package managers, runtime installations, and build/test execution across all supported ecosystems
Gaps	No Python/Ruby build-tests. Maven proxy requires manual `settings.xml` workaround. No Gradle, yarn/pnpm, vcpkg/conan, or nightly Rust testing.
Integration test relationship	Complements `chroot-package-managers.test.ts` with real-world projects

Security & Compliance Workflows

21. `dependency-audit.yml` — Dependency Vulnerability Audit

Attribute	Value
What it tests	`npm audit` on main package and docs-site package. Uploads SARIF to GitHub Security tab.
Triggers	Push/PR to main (ignoring .md), weekly Monday schedule, manual dispatch
Timeout	5 minutes per job
Real-world mapping	Catches vulnerable npm dependencies before they ship
Gaps	Only npm, not container base image packages.
Integration test relationship	None

23. `codeql.yml` — CodeQL Analysis

Attribute	Value
What it tests	CodeQL static analysis for JavaScript/TypeScript and GitHub Actions code
Triggers	Push/PR to main, weekly Monday schedule, manual dispatch
Timeout	360 minutes (6 hours)
Real-world mapping	Catches security vulnerabilities (XSS, injection) and code quality issues
Gaps	No Shell/Bash analysis (container scripts, iptables rules not analyzed)
Integration test relationship	None — static analysis

24. `pr-title.yml` — PR Title Check

Attribute	Value
What it tests	Semantic PR title format (e.g., `feat:`, `fix:`, `docs:`) using `amannn/action-semantic-pull-request`
Triggers	PR to main (opened, edited, synchronize, reopened)
Real-world mapping	Enforces conventional commit format for automated changelog generation
Gaps	N/A
Integration test relationship	None

Infrastructure Workflows

25. `release.yml` — Release Pipeline

Attribute	Value
What it tests	End-to-end release: version bump, build 4 container images (squid, agent, api-proxy, agent-act), create binaries (linux-x64/arm64, darwin-x64/arm64), generate changelog, create GitHub release
Triggers	Manual dispatch only (patch/minor/major choice)
Real-world mapping	Production release pipeline
Gaps	Binary smoke test only for linux-x64 (arm64 and macOS verified as valid ELF/Mach-O but not executed)
Integration test relationship	None — infrastructure

Container images built:

squid:VERSION (linux/amd64 + arm64) — with cosign signing and SBOM
agent:VERSION (linux/amd64 + arm64) — with cosign signing and SBOM, no-cache
api-proxy:VERSION (linux/amd64 + arm64) — with cosign signing and SBOM
agent-act:VERSION (linux/amd64 only) — with cosign signing and SBOM (retry logic)

26. `deploy-docs.yml` — Deploy Documentation

Attribute	Value
What it tests	Builds and deploys docs-site to GitHub Pages
Triggers	Push to main (docs-site/** paths), manual dispatch
Real-world mapping	Documentation deployment
Integration test relationship	None

27. `copilot-setup-steps.yml` — Copilot Setup Steps

Attribute	Value
What it tests	Installs gh-aw extension for GitHub Copilot Agent
Triggers	Manual dispatch, push to workflow file
Real-world mapping	Configures Copilot Agent environment with gh-aw
Integration test relationship	None

Relationship Map: CI vs Integration Tests

CI Workflow	Related Integration Tests	Overlap Level
`test-chroot.yml`	`chroot-languages.test.ts`, `chroot-package-managers.test.ts`, `chroot-procfs.test.ts`, `chroot-edge-cases.test.ts`	Direct — CI runs these exact test files
`test-examples.yml`	`blocked-domains.test.ts`, `wildcard-patterns.test.ts`	Indirect — examples test similar scenarios (domain allow/block)
`test-coverage.yml`	All `src/*.test.ts` unit tests	Direct — runs unit test suite with coverage
`smoke-chroot.lock.yml`	`chroot-languages.test.ts`	Overlapping — both test runtime version matching, different approaches
`build-test.lock.yml`	`chroot-package-managers.test.ts`	Complementary — build-test uses real projects; integration uses minimal packages
`smoke-{claude,copilot,codex,gemini}.lock.yml`	None	Unique — only place that tests actual AI agents through AWF
`test-action.yml`	None	Unique — only place that tests the setup action
`build.yml`	None	Prerequisite — validates build on Node 20+22

Coverage Gap Analysis

What's Well Covered

Chroot functionality — Tested at 3 levels: unit tests, integration tests, CI workflow, and smoke test
Domain filtering — Unit tests (domain-patterns), integration tests (blocked-domains, wildcard-patterns), examples
Multi-engine support — Smoke tests cover Claude, Copilot, Codex, Gemini
Multi-language support — Build-tests cover 8 languages (Bun, C++, Deno, .NET, Go, Java, Node, Rust)
Container security — cosign signing, SBOM attestation

Gaps Identified

No integration tests run in CI — The test-integration.yml is actually just a type-check. The non-chroot integration tests (blocked-domains, dns-servers, environment-variables, exit-code-propagation, etc.) have no dedicated CI workflow.
No macOS CI testing — All CI runs on ubuntu-latest. AWF produces darwin binaries but never tests them in CI.
No arm64 CI testing — Containers are built for arm64 in release but never tested on arm64 runners.
Duplicate lint execution — Both build.yml and lint.yml run npm run lint on PRs.
Missing Python build-test — Python pip/conda package installation through AWF proxy has no build-test workflow (despite Python being tested in chroot-languages).
Missing Ruby build-test — Ruby gem installation through AWF proxy has no build-test workflow.
Maven proxy workaround not tested in integration — The ~/.m2/settings.xml workaround is only documented in build-test.md, not validated by integration tests.
No load/performance testing — No tests for concurrent connections, large file transfers, or many-domain allowlists.
Smoke test non-determinism — AI agent behavior varies between runs. A passing smoke test doesn't guarantee the next run passes.
No negative security testing in CI — Integration tests cover network-security.test.ts (iptables bypass attempts), but this isn't run by any CI workflow.
Stale version in test-action.yml — Tests hardcode v0.7.0 which may diverge from current release.
No integration test coverage tracking — test-coverage.yml only tracks unit test coverage.
api-proxy container not scanned — container-scan.yml only scans agent and squid images, not the api-proxy image added later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI Workflows and Smoke Tests Analysis

Table of Contents

CI Workflow Overview

Core CI Workflows

1. `test-integration.yml` — TypeScript Type Check

2. `test-chroot.yml` — Chroot Integration Tests

3. `test-coverage.yml` — Test Coverage

4. `test-action.yml` — Setup Action Tests

5. `test-examples.yml` — Examples Test

6. `build.yml` — Build Verification

7. `lint.yml` — ESLint

Smoke Test Workflows (Agentic)

8. `smoke-claude.lock.yml` — Smoke Claude

9. `smoke-copilot.lock.yml` — Smoke Copilot

10. `smoke-codex.lock.yml` — Smoke Codex

11. `smoke-gemini.lock.yml` — Smoke Gemini

12. `smoke-chroot.lock.yml` — Smoke Chroot

Build-Test Workflows (Agentic)

13. `build-test.lock.yml` — Build Test Suite

Security & Compliance Workflows

21. `dependency-audit.yml` — Dependency Vulnerability Audit

23. `codeql.yml` — CodeQL Analysis

24. `pr-title.yml` — PR Title Check

Infrastructure Workflows

25. `release.yml` — Release Pipeline

26. `deploy-docs.yml` — Deploy Documentation

27. `copilot-setup-steps.yml` — Copilot Setup Steps

Relationship Map: CI vs Integration Tests

Coverage Gap Analysis

What's Well Covered

Gaps Identified

FilesExpand file tree

ci-smoke.md

Latest commit

History

ci-smoke.md

File metadata and controls

CI Workflows and Smoke Tests Analysis

Table of Contents

CI Workflow Overview

Core CI Workflows

1. test-integration.yml — TypeScript Type Check

2. test-chroot.yml — Chroot Integration Tests

3. test-coverage.yml — Test Coverage

4. test-action.yml — Setup Action Tests

5. test-examples.yml — Examples Test

6. build.yml — Build Verification

7. lint.yml — ESLint

Smoke Test Workflows (Agentic)

8. smoke-claude.lock.yml — Smoke Claude

9. smoke-copilot.lock.yml — Smoke Copilot

10. smoke-codex.lock.yml — Smoke Codex

11. smoke-gemini.lock.yml — Smoke Gemini

12. smoke-chroot.lock.yml — Smoke Chroot

Build-Test Workflows (Agentic)

13. build-test.lock.yml — Build Test Suite

Security & Compliance Workflows

21. dependency-audit.yml — Dependency Vulnerability Audit

23. codeql.yml — CodeQL Analysis

24. pr-title.yml — PR Title Check

Infrastructure Workflows

25. release.yml — Release Pipeline

26. deploy-docs.yml — Deploy Documentation

27. copilot-setup-steps.yml — Copilot Setup Steps

Relationship Map: CI vs Integration Tests

Coverage Gap Analysis

What's Well Covered

Gaps Identified

1. `test-integration.yml` — TypeScript Type Check

2. `test-chroot.yml` — Chroot Integration Tests

3. `test-coverage.yml` — Test Coverage

4. `test-action.yml` — Setup Action Tests

5. `test-examples.yml` — Examples Test

6. `build.yml` — Build Verification

7. `lint.yml` — ESLint

8. `smoke-claude.lock.yml` — Smoke Claude

9. `smoke-copilot.lock.yml` — Smoke Copilot

10. `smoke-codex.lock.yml` — Smoke Codex

11. `smoke-gemini.lock.yml` — Smoke Gemini

12. `smoke-chroot.lock.yml` — Smoke Chroot

13. `build-test.lock.yml` — Build Test Suite

21. `dependency-audit.yml` — Dependency Vulnerability Audit

23. `codeql.yml` — CodeQL Analysis

24. `pr-title.yml` — PR Title Check

25. `release.yml` — Release Pipeline

26. `deploy-docs.yml` — Deploy Documentation

27. `copilot-setup-steps.yml` — Copilot Setup Steps