This document catalogs all CI workflows and smoke/build-test agentic workflows in gh-aw-firewall, describing what each tests, when it runs, what real-world scenario it validates, coverage gaps, and how it relates to the Node.js integration test suite.
- CI Workflow Overview
- Core CI Workflows
- Smoke Test Workflows (Agentic)
- Build-Test Workflows (Agentic)
- Security & Compliance Workflows
- Infrastructure Workflows
- Relationship Map: CI vs Integration Tests
- Coverage Gap Analysis
The repo has three tiers of testing:
| Tier | Type | Count | Purpose |
|---|---|---|---|
| Unit | Jest (src/*.test.ts) | 19 files | Fast feedback on individual modules |
| Integration | Jest (tests/integration/*.test.ts) | 26 files | End-to-end AWF container execution |
| Smoke/Build-Test | gh-aw compiled workflows (.lock.yml) | 28 workflows | Real AI agent execution inside AWF sandbox |
| CI | Hand-written GitHub Actions (.yml) | 15 workflows | Build, lint, type-check, security, coverage |
File: .github/workflows/test-integration.yml
| Attribute | Value |
|---|---|
| What it tests | TypeScript type-checking via npm run type-check. Despite the filename, this does NOT run integration tests. |
| Triggers | Push to main, PR to main, manual dispatch |
| Timeout | 5 minutes |
| Real-world mapping | Validates that TypeScript code compiles without type errors before merge |
| Gaps | Only checks types, not runtime behavior. Name is misleading (suggests integration tests). |
| Integration test relationship | Complementary — type checking catches compile-time errors; integration tests catch runtime issues. |
File: .github/workflows/test-chroot.yml
| Attribute | Value |
|---|---|
| What it tests | Runs the chroot integration test suite across 4 parallel jobs: Language Support, Package Managers, /proc Filesystem, Edge Cases |
| Triggers | Push to main, PR to main, manual dispatch |
| Timeout | 30-45 minutes per job |
| Real-world mapping | Validates that the chroot-based filesystem isolation works correctly with multiple languages (Node, Python, Go, Java, .NET, Ruby, Rust) and package managers |
| Gaps | Sequential dependency: package-managers waits for languages job. No macOS testing. |
| Integration test relationship | Direct 1:1 mapping — this workflow runs tests/integration/chroot-languages.test.ts, chroot-package-managers.test.ts, chroot-procfs.test.ts, and chroot-edge-cases.test.ts |
Jobs breakdown:
- test-chroot-languages — Sets up Node.js, Python 3.12, Go 1.22, Java 21, .NET 8.0. Builds containers locally. Runs
chroot-languagesintegration tests. - test-chroot-package-managers (needs: languages) — Adds Ruby 3.2, Rust stable. Runs
chroot-package-managersintegration tests. 45-minute timeout. - test-chroot-procfs (parallel) — Tests /proc filesystem access within chroot. Python, Java only.
- test-chroot-edge-cases (parallel) — Tests edge cases. Node.js only.
Key detail: Containers are built locally (docker build), so source changes to entrypoint.sh and docker-manager.ts ARE reflected in tests.
File: .github/workflows/test-coverage.yml
| Attribute | Value |
|---|---|
| What it tests | Runs unit tests with coverage collection. On PRs: compares coverage against base branch and fails on regression. On push: generates coverage summary. |
| Triggers | Push to main, PR to main (ignoring .md files) |
| Timeout | 15 minutes |
| Real-world mapping | Ensures PRs don't reduce test coverage — guards against "add feature, skip tests" PRs |
| Gaps | Only covers unit tests (src/*.test.ts), not integration tests. Node.js 20 only (build.yml tests 20+22). |
| Integration test relationship | Only measures coverage of unit tests. Integration test coverage is not tracked. |
Notable features:
- Checks out base branch to compute coverage diff
- Posts coverage comparison as PR comment
- Uploads coverage artifacts (30-day retention)
- Fails PR if coverage regresses
File: .github/workflows/test-action.yml
| Attribute | Value |
|---|---|
| What it tests | Tests the action.yml composite action that installs AWF from GitHub releases |
| Triggers | Push to main, PR to main (ignoring .md files), manual dispatch |
| Timeout | 5-10 minutes per job |
| Real-world mapping | Validates that users can install AWF via uses: github/gh-aw-firewall@v1 in their workflows |
| Gaps | Only tests installation, not actual firewall functionality. Tests version v0.7.0 specifically (may go stale). |
| Integration test relationship | No overlap — tests the GitHub Action packaging, not the firewall itself |
Jobs:
- test-action-latest — Install latest version, verify
awf --versionandawf --helpwork - test-action-specific-version — Install v0.7.0, verify exact version/image-tag outputs match
- test-action-with-images — Install v0.7.0 with
pull-images: true, verify Docker images are pulled - test-action-invalid-version — Install
invalid-version, verify action fails gracefully
File: .github/workflows/test-examples.yml
| Attribute | Value |
|---|---|
| What it tests | Runs example shell scripts from examples/ directory as smoke tests |
| Triggers | Push to main, PR to main (ignoring .md files), manual dispatch |
| Timeout | 15 minutes |
| Real-world mapping | Validates that documentation examples actually work — prevents stale README instructions |
| Gaps | Skips github-copilot.sh (requires GITHUB_TOKEN). Only 4 of 5 examples tested. |
| Integration test relationship | Complementary — examples test real AWF invocations from shell scripts, while integration tests use the Jest/TypeScript test runner |
Examples tested:
basic-curl.sh— Basic domain allow/block with curlusing-domains-file.sh— Domain list from filedebugging.sh— Debug mode with--keep-containersblocked-domains.sh— Verify blocked domains return errors
File: .github/workflows/build.yml
| Attribute | Value |
|---|---|
| What it tests | Builds TypeScript project and runs linter across Node.js 20 and 22 matrix |
| Triggers | Push to main, PR to main, manual dispatch |
| Timeout | 10 minutes |
| Real-world mapping | Ensures the project builds successfully on supported Node.js versions |
| Gaps | No Node.js 18 testing (though pkg in release uses node18 targets). No test execution. |
| Integration test relationship | Prerequisite — if build fails, nothing else runs. No direct test overlap. |
File: .github/workflows/lint.yml
| Attribute | Value |
|---|---|
| What it tests | Runs ESLint on TypeScript source |
| Triggers | Push to main, PR to main (ignoring .md files) |
| Timeout | 5 minutes |
| Real-world mapping | Code quality enforcement |
| Gaps | Duplicated with build.yml which also runs npm run lint. |
| Integration test relationship | None — code quality only |
These are gh-aw agentic workflows compiled from .md source files into .lock.yml GitHub Actions workflows. They run actual AI agents (Claude, Copilot, Codex, Gemini) inside the AWF sandbox.
Post-processing: All .lock.yml files are post-processed by scripts/ci/postprocess-smoke-workflows.ts which replaces GHCR image references with local builds (--build-local), removes sparse-checkout, and installs AWF from source.
Source: smoke-claude.md
| Attribute | Value |
|---|---|
| What it tests | Claude Code engine running inside AWF sandbox with GitHub API, Playwright, file I/O, and bash tools |
| Engine | claude (max 8 turns) |
| Triggers | Every 12h (schedule), PR (opened/synchronize/reopened), manual dispatch |
| Timeout | 10 minutes |
| Network allowed | defaults, github, playwright |
| Tools | github (repos, pull_requests), playwright, bash |
| Safe outputs | add-comment (hide older), add-labels (smoke-claude) |
| Real-world mapping | Validates that Claude Code can operate within AWF's network sandbox: GitHub API access via MCP, browser automation via Playwright, local file operations — the core use case for agentic workflows |
| Gaps | Non-deterministic (AI agent may behave differently). No HTTPS blocking verification. |
| Integration test relationship | High-level end-to-end complement. Integration tests verify AWF mechanics (iptables, proxy); this verifies an actual AI agent works through the firewall. |
Test requirements:
- GitHub MCP: Review last 2 merged PRs
- Playwright: Navigate to github.com, verify page title
- File writing: Create test file, verify with cat
- Bash: Execute commands to verify file creation
- Post-step: Validate safe outputs were invoked (add_comment for PR triggers)
Source: smoke-copilot.md
| Attribute | Value |
|---|---|
| What it tests | Copilot engine running inside AWF sandbox with MCP, Playwright, web-fetch, and agentic-workflows tools |
| Engine | copilot |
| Triggers | Every 12h, PR, manual dispatch |
| Timeout | 5 minutes |
| Network allowed | defaults, node, github, playwright |
| Tools | agentic-workflows, cache-memory, edit, bash, github, playwright, web-fetch |
| Real-world mapping | Validates Copilot CLI agent works through AWF with broader network access (node registries) and additional tools |
| Gaps | Shorter timeout (5min) may cause flaky failures. No blocked-domain verification. |
| Integration test relationship | Similar to smoke-claude but for Copilot engine. Tests a different engine implementation path. |
Source: smoke-codex.md
| Attribute | Value |
|---|---|
| What it tests | Codex engine with extended tool suite: GH CLI safe inputs, Tavily web search, discussion interactions, and AWF project build |
| Engine | codex |
| Triggers | Every 12h, PR, manual dispatch |
| Timeout | 15 minutes |
| Network allowed | defaults, github, playwright |
| Tools | cache-memory, github, playwright, edit, bash |
| Safe outputs | add-comment, create-issue, add-labels, hide-comment |
| Imports | shared/gh.md, shared/mcp/tavily.md, shared/reporting.md, shared/github-queries-safe-input.md |
| Real-world mapping | Most comprehensive smoke test — validates safe-inputs (gh CLI), Tavily MCP, discussion API, and build capability |
| Gaps | Complex prompt may cause non-deterministic failures. Build step (npm ci && npm run build) adds latency. |
| Integration test relationship | Tests discussion interactions and create-issue safe outputs that integration tests don't cover |
Additional test requirements beyond Claude/Copilot:
- Safe Inputs GH CLI: Query PRs via
safeinputs-gh - Tavily web search: Search for "GitHub Agentic Workflows Firewall"
- Discussion interaction: Comment on latest discussion
- Build AWF: Run
npm ci && npm run buildinside sandbox
Source: smoke-gemini.md
| Attribute | Value |
|---|---|
| What it tests | Gemini engine with same extended tool suite as Codex smoke test |
| Engine | gemini |
| Triggers | Every 12h, PR, manual dispatch |
| Timeout | 15 minutes |
| Real-world mapping | Validates Gemini (Google) engine works through AWF — important for multi-engine support |
| Gaps | Same as Codex. Identical test requirements — could share test definition via imports. |
| Integration test relationship | Same as Codex — tests a different engine path through the same infrastructure |
Source: smoke-chroot.md
| Attribute | Value |
|---|---|
| What it tests | Chroot filesystem isolation by comparing host vs chroot runtime versions (Python, Node.js, Go) |
| Engine | copilot |
| Triggers | PR (with path filter: src/, containers/, package.json, smoke-chroot.md), manual dispatch |
| Timeout | 20 minutes |
| Network allowed | defaults, github |
| Tools | github (repos, pull_requests), bash |
| Real-world mapping | Validates the core chroot feature: host binaries must be accessible inside the container with matching versions |
| Gaps | Only tests 3 runtimes (Python, Node, Go). No Java/.NET/Ruby/Rust version comparison. Path-filtered — won't run on non-code PRs. |
| Integration test relationship | Overlaps with chroot-languages.test.ts but approaches differently: smoke test runs awf → agent compares versions; integration test runs awf → Jest assertions compare versions |
Unique architecture:
- Pre-steps capture host versions, run
awf --skip-pullfor each runtime, compare - Agent only reads result files and creates PR comment with comparison table
- Uses
--skip-pull(locally-built containers from pre-steps)
These are agentic workflows that clone external test repositories and run real build/test commands through the AWF sandbox. They validate that AWF's network filtering allows language-specific package managers to function correctly.
All build tests are combined into a single build-test.lock.yml workflow:
- Engine:
copilot - Triggers: PR (opened/synchronize/reopened), manual dispatch
- Tools: bash, github (with GH_AW_GITHUB_MCP_SERVER_TOKEN)
- MCP: ghcr.io/github/gh-aw-mcpg container
- Safe outputs: add-comment (single combined table), add-labels (
build-test) - Error handling: Per-ecosystem failure tracking, table-based reporting
- Test repos:
Mossaka/gh-aw-firewall-test-{language}
| Attribute | Value |
|---|---|
| What it tests | All 8 ecosystems in a single workflow: Bun (elysia, hono), C++ (fmt, json), Deno (oak, std), .NET (hello-world, json-parse), Go (color, env, uuid), Java (gson, caffeine), Node.js (clsx, execa, p-limit), Rust (fd, zoxide) |
| Runtimes | node 20, go 1.22, rust stable, java 21, dotnet 8.0 |
| Network | defaults, github, node, go, rust, crates.io, java, dotnet, bun.sh, deno.land, jsr.io, dl.deno.land |
| Timeout | 45 minutes |
| Real-world mapping | Validates AWF allows language-specific package managers, runtime installations, and build/test execution across all supported ecosystems |
| Gaps | No Python/Ruby build-tests. Maven proxy requires manual settings.xml workaround. No Gradle, yarn/pnpm, vcpkg/conan, or nightly Rust testing. |
| Integration test relationship | Complements chroot-package-managers.test.ts with real-world projects |
| Attribute | Value |
|---|---|
| What it tests | npm audit on main package and docs-site package. Uploads SARIF to GitHub Security tab. |
| Triggers | Push/PR to main (ignoring .md), weekly Monday schedule, manual dispatch |
| Timeout | 5 minutes per job |
| Real-world mapping | Catches vulnerable npm dependencies before they ship |
| Gaps | Only npm, not container base image packages. |
| Integration test relationship | None |
| Attribute | Value |
|---|---|
| What it tests | CodeQL static analysis for JavaScript/TypeScript and GitHub Actions code |
| Triggers | Push/PR to main, weekly Monday schedule, manual dispatch |
| Timeout | 360 minutes (6 hours) |
| Real-world mapping | Catches security vulnerabilities (XSS, injection) and code quality issues |
| Gaps | No Shell/Bash analysis (container scripts, iptables rules not analyzed) |
| Integration test relationship | None — static analysis |
| Attribute | Value |
|---|---|
| What it tests | Semantic PR title format (e.g., feat:, fix:, docs:) using amannn/action-semantic-pull-request |
| Triggers | PR to main (opened, edited, synchronize, reopened) |
| Real-world mapping | Enforces conventional commit format for automated changelog generation |
| Gaps | N/A |
| Integration test relationship | None |
| Attribute | Value |
|---|---|
| What it tests | End-to-end release: version bump, build 4 container images (squid, agent, api-proxy, agent-act), create binaries (linux-x64/arm64, darwin-x64/arm64), generate changelog, create GitHub release |
| Triggers | Manual dispatch only (patch/minor/major choice) |
| Real-world mapping | Production release pipeline |
| Gaps | Binary smoke test only for linux-x64 (arm64 and macOS verified as valid ELF/Mach-O but not executed) |
| Integration test relationship | None — infrastructure |
Container images built:
squid:VERSION(linux/amd64 + arm64) — with cosign signing and SBOMagent:VERSION(linux/amd64 + arm64) — with cosign signing and SBOM, no-cacheapi-proxy:VERSION(linux/amd64 + arm64) — with cosign signing and SBOMagent-act:VERSION(linux/amd64 only) — with cosign signing and SBOM (retry logic)
| Attribute | Value |
|---|---|
| What it tests | Builds and deploys docs-site to GitHub Pages |
| Triggers | Push to main (docs-site/** paths), manual dispatch |
| Real-world mapping | Documentation deployment |
| Integration test relationship | None |
| Attribute | Value |
|---|---|
| What it tests | Installs gh-aw extension for GitHub Copilot Agent |
| Triggers | Manual dispatch, push to workflow file |
| Real-world mapping | Configures Copilot Agent environment with gh-aw |
| Integration test relationship | None |
| CI Workflow | Related Integration Tests | Overlap Level |
|---|---|---|
test-chroot.yml |
chroot-languages.test.ts, chroot-package-managers.test.ts, chroot-procfs.test.ts, chroot-edge-cases.test.ts |
Direct — CI runs these exact test files |
test-examples.yml |
blocked-domains.test.ts, wildcard-patterns.test.ts |
Indirect — examples test similar scenarios (domain allow/block) |
test-coverage.yml |
All src/*.test.ts unit tests |
Direct — runs unit test suite with coverage |
smoke-chroot.lock.yml |
chroot-languages.test.ts |
Overlapping — both test runtime version matching, different approaches |
build-test.lock.yml |
chroot-package-managers.test.ts |
Complementary — build-test uses real projects; integration uses minimal packages |
smoke-{claude,copilot,codex,gemini}.lock.yml |
None | Unique — only place that tests actual AI agents through AWF |
test-action.yml |
None | Unique — only place that tests the setup action |
build.yml |
None | Prerequisite — validates build on Node 20+22 |
- Chroot functionality — Tested at 3 levels: unit tests, integration tests, CI workflow, and smoke test
- Domain filtering — Unit tests (domain-patterns), integration tests (blocked-domains, wildcard-patterns), examples
- Multi-engine support — Smoke tests cover Claude, Copilot, Codex, Gemini
- Multi-language support — Build-tests cover 8 languages (Bun, C++, Deno, .NET, Go, Java, Node, Rust)
- Container security — cosign signing, SBOM attestation
-
No integration tests run in CI — The
test-integration.ymlis actually just a type-check. The non-chroot integration tests (blocked-domains, dns-servers, environment-variables, exit-code-propagation, etc.) have no dedicated CI workflow. -
No macOS CI testing — All CI runs on
ubuntu-latest. AWF produces darwin binaries but never tests them in CI. -
No arm64 CI testing — Containers are built for arm64 in release but never tested on arm64 runners.
-
Duplicate lint execution — Both
build.ymlandlint.ymlrunnpm run linton PRs. -
Missing Python build-test — Python pip/conda package installation through AWF proxy has no build-test workflow (despite Python being tested in chroot-languages).
-
Missing Ruby build-test — Ruby gem installation through AWF proxy has no build-test workflow.
-
Maven proxy workaround not tested in integration — The
~/.m2/settings.xmlworkaround is only documented in build-test.md, not validated by integration tests. -
No load/performance testing — No tests for concurrent connections, large file transfers, or many-domain allowlists.
-
Smoke test non-determinism — AI agent behavior varies between runs. A passing smoke test doesn't guarantee the next run passes.
-
No negative security testing in CI — Integration tests cover
network-security.test.ts(iptables bypass attempts), but this isn't run by any CI workflow. -
Stale version in test-action.yml — Tests hardcode
v0.7.0which may diverge from current release. -
No integration test coverage tracking —
test-coverage.ymlonly tracks unit test coverage. -
api-proxy container not scanned —
container-scan.ymlonly scans agent and squid images, not the api-proxy image added later.