Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .claude/learnings.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,13 @@ The /cost + /evals collectors were first built against assumed endpoints that DO
- Multi-`task_ids` queries work and accumulate correctly; batch the ids ~50/request to keep the GET URL well under server limits (`ArthurClient.listTraces`/`countTraces` chunk + paginate + merge).
- **Evals: there is no success-rate field.** Compute pass-rate from `countTraces(..., { continuous_eval_run_status })` (enum: `pending|passed|running|failed|skipped|error`): `score = passed / (passed+failed)`. On this instance continuous evals are NOT configured (`GET /api/v2/tasks/{id}/metrics` → 404, trace `metrics` carry only token/cost, `annotations: null`, spans have no `metric_results`), so passed=failed=0 and /evals correctly degrades to `available:false`. The logic lights up once evals are enabled.
- Verified live (MTD): /cost = $26.33 over 94 traces / 23M tokens, real per-workflow + daily breakdown. The dashboard reads the **deployed** worker — these fixes need a worker redeploy to show.

## Pino error logging convention (2026-06-10)
- Pino's error serializer only applies to the `err` key. `logger.warn({ error: err }, ...)` JSON-stringifies the Error to `{}` (message/stack are non-enumerable).
- Codebase convention: `logger.warn({ err: (err as Error).message }, "msg")` — see src/lib/reconcile.ts:145, src/routes/webhooks/github.post.ts:146.
- Known pre-existing violation (out of scope, left as-is): `poll_dispatch_failed` in src/routes/cron/poll.get.ts (~line 94) uses `{ error: err }`.

## 2026-06-10 — Redis→Neon migration implementation learnings
- **Stale `apps/worker/dist/` breaks `pnpm build` after removing a dependency.** The nitro/Workflow-DevKit build scans compiled JS under the gitignored `dist/` tree; a pre-migration `dist/src/adapters/run-registry/upstash.js` still importing `@upstash/redis` failed esbuild resolution after `pnpm remove @upstash/redis`. Fresh clones (Vercel CI) are unaffected. Fix: `rm -rf apps/worker/dist` locally. Also: piping build output through `tail` masks the exit code — check `PIPESTATUS`/run unpiped.
- **`pnpm exec tsx --env-file=X` exits 9 under standalone-pnpm installs** (pnpm's embedded Node makes tsx re-spawn `process.execPath` = the pnpm wrapper, which mishandles the forwarded flag). Root-cause fix used in `scripts/db-migrate.ts` + `scripts/clear-run-registry.ts`: `config({ path: [".env.local", ".env"], quiet: true })` from dotenv 17.x — first file wins, never overrides pre-set process env, so `vercel env pull .env.local && pnpm db:migrate` just works. Note `init-env/SKILL.md:193` still carries the fragile `pnpm tsx --env-file` pattern (pre-existing).
- **`pnpm dedupe` does NOT purge stale optional-peer suffixes** (e.g. `@upstash/redis` baked into `drizzle-orm(...)` snapshot keys when both coexisted) — it churned ~950 lockfile lines without removing the package. Reverted; the stale entry is peer-only, nothing imports it, harmless.
2 changes: 1 addition & 1 deletion .claude/skills/init-agent/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: Configure or rotate the agent runtime (Claude or Codex) for the Bla

Branch-on-choice skill. Asks **Claude or Codex**, then emits a single paste-template for the chosen runtime. Cross-field rule in `env.ts` (`AGENT_KIND=claude` requires `ANTHROPIC_API_KEY`; `AGENT_KIND=codex` requires `CODEX_API_KEY` or `CODEX_CHATGPT_OAUTH_TOKEN`) is enforced by construction.

> If you want full project setup (Jira + VCS + Agent + Slack + Upstash + deploy), invoke `init-env` instead. This skill only handles the agent runtime.
> If you want full project setup (Jira + VCS + Agent + Slack + Neon + deploy), invoke `init-env` instead. This skill only handles the agent runtime.

## Precondition

Expand Down
16 changes: 8 additions & 8 deletions .claude/skills/init-env/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: init-env
description: First-time setup orchestrator for the Blazebot ai-workflow repo. Mirrors SETUP.md as an agent-driven flow — project linking, env vars across Jira / VCS / Agent / Slack / Upstash, production deploy, post-deploy registrations (Jira webhook + Slack /ai-workflow slash command), and smoke checks. Use when starting fresh on this repo for the first time — "init project", "first-time setup", "bootstrap this repo", "onboard me", "set up env from scratch".
description: First-time setup orchestrator for the Blazebot ai-workflow repo. Mirrors SETUP.md as an agent-driven flow — project linking, env vars across Jira / VCS / Agent / Slack / Neon, production deploy, post-deploy registrations (Jira webhook + Slack /ai-workflow slash command), and smoke checks. Use when starting fresh on this repo for the first time — "init project", "first-time setup", "bootstrap this repo", "onboard me", "set up env from scratch".
---

# Initialize Project Environment (Cold Start)
Expand Down Expand Up @@ -36,7 +36,7 @@ If the user replies with anything other than a clear go-signal, do not advance
3. init-vcs → branch on github | gitlab
4. init-agent → branch on claude | codex
5. init-slack → bot token, channel, signing secret
6. init-upstash → Marketplace install runbook
6. init-neon → Marketplace install runbook
7. Inline: CRON_SECRET → auto-generate, paste-template
8. vercel env pull + validate → .env.local + pnpm tsx env.ts
9. vercel --prod → single production deploy
Expand Down Expand Up @@ -153,15 +153,15 @@ Invoke `init-agent`. It asks **claude or codex** and emits a single paste-templa

Invoke `init-slack`. It walks the user through creating the Slack app (or finding an existing bot token), the bot's `chat:write` scope, and the channel ID format.

→ **Stop. Ask:** *"Slack configured. Ready for Step 6: Upstash Redis?"*
→ **Stop. Ask:** *"Slack configured. Ready for Step 6: Neon Postgres?"*

---

## Step 6 — Invoke `init-upstash`
## Step 6 — Invoke `init-neon`

Invoke `init-upstash`. It walks the user through the Vercel Marketplace install of Upstash for Redis, with the env-var prefix set to `AI_WORKFLOW_KV` so Vercel auto-injects the two keys `env.ts` expects.
Invoke `init-neon`. It walks the user through the Vercel Marketplace install of Neon Postgres with branch-per-environment enabled so Vercel auto-injects `DATABASE_URL` for each environment that `env.ts` expects.

→ **Stop. Ask:** *"Upstash installed. Ready for Step 7: cron secret?"*
→ **Stop. Ask:** *"Neon installed. Ready for Step 7: cron secret?"*

---

Expand Down Expand Up @@ -351,7 +351,7 @@ Configured:
VCS <github|gitlab> <owner>/<repo>
Agent <claude|codex> model <model>
Slack channel <id> bot @<bot_name>, slash <registered | deferred>
Upstash AI_WORKFLOW_KV prefix via Marketplace
Neon DATABASE_URL per env via Marketplace
Cron CRON_SECRET set schedule * * * * *

Skipped (see SETUP.md for the full how-to):
Expand All @@ -377,7 +377,7 @@ Smoke checks:

Maintenance:
Rotate one integration later by invoking that subskill standalone:
init-jira | init-vcs | init-agent | init-slack | init-upstash
init-jira | init-vcs | init-agent | init-slack | init-neon

Inspect the deployment:
vercel logs --prod
Expand Down
2 changes: 1 addition & 1 deletion .claude/skills/init-jira/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ State-aware skill for the Jira side of Blazebot. Two phases triggered by detecte
- **Phase 1 — Credentials, columns, secret pre-gen.** Runs when `JIRA_BASE_URL` is not yet in Vercel env.
- **Phase 2 — Webhook registration.** Runs when phase 1 is done and a production deploy exists.

> If you want full project setup (Jira + VCS + Agent + Slack + Upstash + deploy), invoke `init-env` instead. This skill only handles Jira.
> If you want full project setup (Jira + VCS + Agent + Slack + Neon + deploy), invoke `init-env` instead. This skill only handles Jira.

## Precondition

Expand Down
80 changes: 80 additions & 0 deletions .claude/skills/init-neon/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
name: init-neon
description: Configure the Neon Postgres database for Blazebot (run registry + post-PR gate store) via the Vercel Marketplace. Verifies DATABASE_URL is injected per environment, that environments do NOT share a branch, and that migrations apply. Use for "set up neon", "set up postgres", "configure database", "fix run registry", "env_marker error".
---

# Initialize Neon Postgres

Walks the user through installing **Neon Postgres** from the Vercel Marketplace with branch-per-environment enabled so Vercel auto-injects a separate `DATABASE_URL` per environment that `env.ts` expects.

Blazebot uses Postgres as its run registry and post-PR-gate store — tracking active workflow runs per ticket, deduplicating dispatch, and locking concurrent cron cycles. Tables are created automatically; migrations run during every deploy's build step (`apps/worker/scripts/db-migrate.ts`).

> If you want full project setup (Jira + VCS + Agent + Slack + Neon + deploy), invoke `init-env` instead. This skill only handles Neon.

## Precondition

`.vercel/project.json` must exist. If missing:

```
ERROR: no Vercel project linked. Run `vercel link` first, or invoke `init-env`
for the full first-time setup.
```

Halt.

## State detection

1. `vercel env ls | grep DATABASE_URL` — if present for all three environments, skip install and go to verification.
2. If missing: walk the user through the Marketplace install below.

## Step 1 — Marketplace install

Walk the user through these steps (Vercel dashboard install is faster than CLI):

1. Open https://vercel.com/marketplace/neon and click **Install**.
2. Select the team and connect it to the ai-workflow Vercel project.
3. **Critical:** enable **branch per environment** (development / preview / production) when configuring the integration. Each environment's `DATABASE_URL` must point at its own Neon branch. The build fails with an `env_marker` error if two environments share one branch — that guard protects the production run registry from preview deployments.
4. Confirm the install. Vercel auto-injects `DATABASE_URL` for all three environments.

CLI alternative: `vercel integration add neon`

## Step 2 — Confirm the key landed

Tell the user to confirm in Vercel → Project Settings → Environment Variables that they see `DATABASE_URL` scoped to all three environments (Production, Preview, Development).

CLI alternative (faster from a terminal):

```bash
vercel env ls | grep DATABASE_URL
```

Success: `DATABASE_URL` appears for each of the three environments, with different values (distinct `ep-…` endpoint hosts confirm branch isolation; ignore any `-pooler` suffix when comparing hosts — pooled vs direct URLs of the same branch differ textually).

If `DATABASE_URL` is missing or the same value appears across environments, the branch-per-environment option wasn't enabled during install. Recovery paths:

- **Easier:** disconnect the Neon integration (Project → Storage → Neon → Disconnect), reinstall with branch-per-environment enabled.
- **Manual fix:** in the Neon console, create separate branches per environment and update each environment's `DATABASE_URL` in Vercel manually. Works but the integration won't keep them in sync automatically.

## Verification (all must pass)

1. `vercel env ls` shows `DATABASE_URL` for development, preview, and production.
2. Branch isolation: pull each environment's value and confirm the hosts differ (`vercel env pull --environment=production .env.prod` etc., compare the `ep-…` endpoint hosts; ignore any `-pooler` suffix when comparing hosts — pooled vs direct URLs of the same branch differ textually). Identical hosts across environments = the build's `env_marker` guard will fail — fix the integration's branch settings.
3. Migrations: `cd apps/worker && vercel env pull .env.local && pnpm db:migrate` against the development branch — expect "[db-migrate] OK — branch claimed by 'development'." (The script loads `.env.local` then `.env` via dotenv; vars already set in the shell env are never overridden.)

## Step 3 — Done

No paste-template needed — `DATABASE_URL` is auto-injected by Vercel. The end-of-flow validator (in `init-env`) confirms it made it.

If invoked from `init-env`, return control. If standalone, end.

## Troubleshooting

- Build fails with `[db-migrate] FATAL: this Neon branch is already claimed by VERCEL_ENV='production', but this build is VERCEL_ENV='…'`: two environments share one Neon branch (the `env_marker` guard). Reconfigure the integration for branch-per-environment, redeploy.
- `DATABASE_URL undefined` at build: integration not connected to this project, or env var scoped to the wrong environments.
- Stale run registry (e.g. after a bad deploy or smoke test): run `pnpm exec tsx scripts/clear-run-registry.ts <ticket>` from `apps/worker` (after `vercel env pull .env.local`) to dump and clear `active_runs` / `failed_tickets` / `thread_parents`.

## Don'ts

- **Don't manually create a Neon database outside the Marketplace.** You'd lose the auto-injection benefit and have to manage `DATABASE_URL` by hand. The Marketplace integration is the preferred path.
- **Don't share one Neon branch across environments.** The `env_marker` build guard will fail — it's there to protect the production run registry from preview deployments polluting it.
- **Don't skip branch isolation.** A preview deploy writing to the production Neon branch corrupts the run registry and can orphan live sandboxes.
2 changes: 1 addition & 1 deletion .claude/skills/init-slack/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: Configure or rotate the Slack bot integration for Blazebot notifica

Configures the Slack bot Blazebot uses to post status updates (run started, PR opened, run failed, etc.) to a single channel.

> If you want full project setup (Jira + VCS + Agent + Slack + Upstash + deploy), invoke `init-env` instead. This skill only handles Slack.
> If you want full project setup (Jira + VCS + Agent + Slack + Neon + deploy), invoke `init-env` instead. This skill only handles Slack.

## Precondition

Expand Down
2 changes: 1 addition & 1 deletion .claude/skills/init-slack/references/slash-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Expect:
1. An ephemeral "Working on `/ai-workflow list`…" message (within ~1s).
2. A second message visible in the channel with either the list of active runs or "No active workflows."

If you instead see Slack's "operation_timeout" error, the function probably can't reach Upstash — check Vercel runtime logs for the `slack_command_dispatching` log line.
If you instead see Slack's "operation_timeout" error, the function probably can't reach the Postgres run registry — check Vercel runtime logs for the `slack_command_dispatching` log line.

## Troubleshooting

Expand Down
70 changes: 0 additions & 70 deletions .claude/skills/init-upstash/SKILL.md

This file was deleted.

2 changes: 1 addition & 1 deletion .claude/skills/init-vcs/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: Configure or rotate the VCS provider (GitHub or GitLab) for the Bla

Branch-on-choice skill. Asks **GitHub or GitLab**, then emits a single paste-template for the chosen provider. The cross-field rule in `env.ts` (`VCS_KIND=github` requires `GITHUB_TOKEN` + `GITHUB_OWNER` + `GITHUB_REPO`; `VCS_KIND=gitlab` requires `GITLAB_TOKEN` + `GITLAB_PROJECT_ID`) is enforced by construction — only the chosen branch's keys are emitted.

> If you want full project setup (Jira + VCS + Agent + Slack + Upstash + deploy), invoke `init-env` instead. This skill only handles VCS.
> If you want full project setup (Jira + VCS + Agent + Slack + Neon + deploy), invoke `init-env` instead. This skill only handles VCS.

## Precondition

Expand Down
Loading
Loading