Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
48 changes: 38 additions & 10 deletions .claude/agents/pr-comment-triager.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,27 +72,55 @@ After each commit:
git push origin <branch>
```

## Step 4: Resolve stupid items
## Step 4: Reply to each comment individually, then resolve

For each thread classified STUPID, resolve via GraphQL mutation:
**The user wants to scroll through the PR comments and see a reply right after each bot comment** so it's obvious every item has been addressed. **Do NOT post one summary comment that covers everything.** Reply per-comment.

### Inline review-thread comments

For each thread, post a reply IN the thread (so it shows up indented under the original), then resolve the thread:

```bash
# Reply in the thread — replaces the parent comment ID with the bot's comment ID
gh api repos/mikepsinn/optimitron/pulls/N/comments/<COMMENT_DATABASE_ID>/replies \
-f body="<your verdict, one or two lines, code-specific>"

# Then mark the thread resolved
gh api graphql -f query='mutation { resolveReviewThread(input: {threadId: "..."}) { thread { isResolved } } }'
```

Then post a single PR-level summary comment via `gh pr comment N --body "..."` that lists every stupid thread with a one-line reason. Don't post per-thread reply comments — the on-thread resolution is enough, and a single summary reads better.
### PR-level issue comments from bots

The summary comment shape:
GitHub doesn't natively thread issue-comments. Post one reply per bot comment as a new issue-comment that quotes the bot comment URL so the chronological scroll shows reply-immediately-after-bot:

```bash
gh pr comment N --body "$(cat <<'EOF'
Re: <link to the bot comment, copy the comment's permalink>

<verdict — one or two lines, code-specific>
EOF
)"
```
Triaged N CodeRabbit/Copilot threads (commit <sha>).

**Valid — fixed (M):**
- <path:line> — what was wrong and what was changed.
For repeated bot test/ping noise (e.g. `claude[bot]` posting "test" / "test-ping"), minimize them with `minimizeComment` (`classifier: OUTDATED`) instead of replying — they're not real comments.

**Stupid — resolved no change (K):**
- <path:line> — one-line reason (specific to the code, not generic).
```
### Reply body shape

Each reply must be:

- **Short**: one or two lines. The body of the work goes in the commit message, not in PR replies.
- **Code-specific**: name the file/line/commit. Generic dismissals ("not applicable", "won't fix") erode trust.
- **Honest verdict**: either "Fixed in <sha>" / "Already fixed in <sha>" / "Won't fix because <code-specific reason>" / "False positive — <line> already does <X>".

Examples:

> Fixed in 73bc8979 — narrowed `assertEmailSafe` to require `http(s)://` scheme so prose containing the bare word "localhost" no longer trips the send-boundary guard.

> Already fixed — `react-email-components.tsx:112` is `fontWeight: "700"` (commit a04b1355).

> Won't fix — `PersonalIncomeChart.tsx` was untouched on this branch; the hardcoded `bg-white` is a pre-existing pattern across the landing-light component family that needs to migrate together, not a one-off patch.

> False positive — `safety-gate.mjs:44` already covers newline AND single-`&`: `(?:&&|\|\||;|\n|\|(?!\|)|&(?!&))`.

## Step 5: Report back to the parent agent

Expand Down
50 changes: 49 additions & 1 deletion .claude/codex-delegation.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Concrete failure case this rule prevents: this session, multiple Codex agents sp

Codex has Playwright MCP wired up (`mcp__playwright__browser_navigate`, `browser_console_messages`, `browser_take_screenshot`, etc.). Use it for spot-checks during the fix-iterate loop — load a page, grab console errors, verify the symptom is gone. 5-15 seconds per route.

DO NOT default to `pnpm --filter @optimitron/web run e2e -- visual --grep <route>` for iteration verification. That command boots a dev/prod server, compiles routes, runs screenshot capture + baseline comparison + Argos upload — 5-10 minutes per filter. Reserve it for the FINAL pre-merge verification pass after the fix is known to work.
DO NOT default to `pnpm --filter @optimitron/web run e2e -- visual --grep <route>` for iteration verification. That command boots a dev/prod server, compiles routes, and runs screenshot capture — 5-10 minutes per filter. Reserve it for the FINAL pre-merge verification pass after the fix is known to work.

Same signal (does the page hydrate without React errors? does the layout look right?) at 50x the cost. Burning 10 minutes per fix-iteration cycle when the same answer is available in 10 seconds is the anti-pattern. Concrete failure: this session, the hydration-investigation Codex spent ~8 minutes of one verification run on `pnpm e2e visual --grep treaty` when the same fix could have been spot-checked via Playwright MCP in seconds.

Expand Down Expand Up @@ -114,6 +114,54 @@ The `subagent_type: codex:codex-rescue` Agent path is MCP-mediated and strictly

If a future Claude session is tempted to use the Agent path because it looks more integrated: it isn't. The direct CLI path has the same `run_in_background: true` notification UX from Bash, plus everything above.

## Plan-first protocol for substantial work

Default for any Codex dispatch that touches >1 system, >100 lines, schema/CI, or matches the "I thought we had / why is this so" phrasing in CLAUDE.md HVD#13 (Diagram-before-code). Skip only for trivial single-file renames / one-liners / copy edits.

**Six steps. No skipping. No reordering.**

1. **Claude drafts the plan file.** Location: gstack convention (`~/.gstack/projects/<slug>/plans/<task-slug>.md`) or a project-local `.claude/plans/<task-slug>.md` if gstack isn't initialized for the repo. Required sections:
- **Brief** — problem restated in Claude's own words.
- **Research log** — REQUIRED before anything else. AI knowledge cutoffs make every vendor/API/tool assumption suspect. Before drafting current/proposed state, WebSearch + WebFetch the relevant vendor docs from the last 12 months. List: search queries run, URLs of canonical docs + their last-updated date if visible, any changelog entries from the last 6 months, anything that contradicts an assumption I'd otherwise have made from training data. If touching a third-party tool/API/SDK, the research log MUST cite the vendor's current docs (not "I think this works like X"). Examples of failures this catches: "Codex app-server is experimental, probably overkill" (false — it's the documented integration point), "Neon anonymized branches would be a 1-2 day project" (false — built-in feature GA late 2025).
- **Current state — ASCII diagram** — boxes/arrows of the systems involved.
- **Proposed state — ASCII diagram** — boxes/arrows after the change.
- **Step list** — checkboxes Codex will tick off during implementation.
- **Risks** — things that could go sideways.
- **Files to touch** — Codex's expected scope.
- **ALERTS** — orchestrator-edited, Codex re-reads top of every Phase-3 turn.
- **Agent log** — Codex appends after each meaningful action.

2. **Codex criticizes the plan.** Dispatch via `codex review` (preferred for plan critique — adversarial, "tries to break") or `codex exec` with an explicit instruction to investigate the code AND argue against the plan: name what's wrong, what's missing, what's overcomplicated, what's a worse fix than the alternative. Codex must read the actual files referenced in the plan, not just react to the prose. Codex MUST also verify Claude's `## Research log` — re-WebSearch anything not cited with a recent vendor doc URL, name anything Claude assumed that's contradicted by current docs. Codex writes its critique INTO the plan file under a `## Codex critique (round N)` section.

3. **Claude + Codex iterate until agreement.** Claude responds to Codex's critique in the same plan file (`## Claude response (round N)`). If still disagreeing, dispatch Codex again. Stop when either: (a) both agree, OR (b) the disagreement is a taste call that needs Mike. Two rounds max — if not converged, escalate to step 4 as "Claude + Codex disagree on X, see plan file."

4. **Tell Mike the plan.** Summary in chat: plan file path, key decisions, any unresolved disagreements between Claude and Codex. Mike reads the plan file directly (it's the single source of truth).

5. **Mike approves, fixes, or rejects.** Mike edits the plan file directly OR responds in chat with redirects. Claude applies Mike's changes to the plan file. Loop back to step 4 if Mike's redirect was substantial.

6. **Codex implements.** Dispatch via direct `codex exec` (not the Agent wrapper — blocked by `.claude/hooks/block-codex-rescue-agent.mjs` anyway). Prompt includes: plan file path, the discipline rule "before any tool call, Read `<plan-path>` and check `## ALERTS`", and "append to `## Agent log` after each meaningful action; tick `## Step list` checkboxes as you go." Mike + Claude can edit `## ALERTS` mid-flight; Codex picks up on next turn boundary (turn-boundary latency, not real interrupt — use `codex app-server` upgrade for that, see below).

**For trivial dispatches**, skip steps 1–5. Dispatch directly via `codex exec` with a tight self-contained prompt. Trivial = single-file rename, one-liner, copy edit, OR something already designed in a prior plan file.

**If unsure whether a dispatch is substantial enough**: it is. Defaulting to plan-first costs maybe 20 minutes; defaulting to direct dispatch costs the kind of failures today's session demonstrated (anonymized-preview Codex run wasted, no plan, no visibility).

## Dispatch transport: prefer `@openai/codex-sdk` (or app-server)

**Default: `@openai/codex-sdk` npm package.** Official Node 18+ TypeScript SDK that wraps the codex CLI and exchanges JSONL events over stdin/stdout — gives us streamed agent events, parallel threads with stable IDs, and the protocol primitives (`turn/steer`, `turn/interrupt`) without writing a JSON-RPC client ourselves. **Not "couple hours of work" — it's `npm install @openai/codex-sdk`.** Previous claim was a stale-training-data error; see `feedback_websearch_vendor_capabilities_first.md`.

When NOT to use the SDK:
- **Truly trivial one-shots** (single-file rename, copy edit). Direct `codex exec '...' < NUL > log 2>&1` is cheaper than spawning an SDK process. Bypass the protocol with `trivial: <reason>` (per the enforce-codex-protocol hook).
- **Custom low-level transport needs** (WebSocket with auth, embedding in non-Node service). Use `codex app-server --listen ws://...` directly with bindings generated via `codex app-server generate-ts`.

**NOT `codex mcp-server`** — that's the wrong direction (lets Codex consume external MCP tools; doesn't let Claude drive Codex).

**Status:** SDK not yet adopted in this repo. The dispatch-side code still goes through Bash → `codex exec`. The `enforce-codex-protocol` hook accepts both shapes (any `codex exec` / `codex review` invocation with a plan-file or trivial bypass). Adopt the SDK when we hit the first failure mode `codex exec` doesn't cover (mid-turn steering, parallel agents needing live coordination). Tracked in TODO.md.

Sources:
- [Codex App Server (OpenAI Developers)](https://developers.openai.com/codex/app-server)
- [`@openai/codex-sdk` on npm](https://www.npmjs.com/package/@openai/codex-sdk)
- [Codex SDK overview](https://developers.openai.com/codex/sdk)

## Config

`.codex/config.toml` pins `model = "gpt-5.5"` + `model_reasoning_effort = "xhigh"` — strongest tier for the hardest async tasks.
Expand Down
64 changes: 64 additions & 0 deletions .claude/hooks/block-codex-rescue-agent.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/usr/bin/env node
// block-codex-rescue-agent.mjs
//
// PreToolUse hook on the Agent tool: blocks dispatches with
// `subagent_type: "codex:codex-rescue"`. The wrapper is strictly
// worse than direct `codex exec` via Bash and the protocol already
// says so — see .claude/codex-delegation.md line 7 + line 105 ("Why
// CLI not Agent tool"). I keep using it anyway despite the doc rule.
//
// 2026-05-14: written after I used the wrapper twice in one session
// (anonymized-preview, ICEWAD rename). Both showed exactly the
// failure mode line 113 names verbatim — wrapper returns "will
// report when done" while underlying work has already finished or
// fizzled. Per feedback_promote_violated_text_rules_to_hooks.md:
// passive text loses to active enforcement.

import { readFileSync } from "node:fs";

try {
let hookData = null;
try {
const raw = readFileSync(0, "utf-8");
if (raw && raw.trim()) hookData = JSON.parse(raw);
} catch {
process.exit(0);
}
if (!hookData) process.exit(0);

if (hookData.tool_name !== "Agent") process.exit(0);

const subagent = hookData?.tool_input?.subagent_type ?? "";
if (subagent !== "codex:codex-rescue") process.exit(0);

const msg = `[block-codex-rescue-agent] BLOCKED — Agent(subagent_type: "codex:codex-rescue").

The Agent wrapper is strictly worse than direct \`codex exec\` via Bash
for Codex dispatches. See \`.claude/codex-delegation.md\` line 105+
("Why CLI not Agent tool"). Common failure mode (line 113):

Wrapper returns "Codex is running in the background, will report
when done" narration AFTER the work has already finished — looks
like it's still running when it's already done (or fizzled).

Use this shape instead:

Bash(
command: "codex exec --skip-git-repo-check '<prompt>'",
run_in_background: true
)

For substantial / multi-turn tasks where you want steer/interrupt,
the upgrade path is \`codex app-server\` (JSON-RPC) — see the
"App-server upgrade path" section in codex-delegation.md.

If you HAVE a specific reason to use the wrapper (need Claude-side
review between dispatch and result), say so in chat first. There is
no 5-minute bypass on this hook — every dispatch decision should be
deliberate.`;

process.stderr.write(msg + "\n");
process.exit(2);
} catch {
process.exit(0);
}
71 changes: 71 additions & 0 deletions .claude/hooks/block-snapshot-handedit.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/usr/bin/env node
// block-snapshot-handedit.mjs
//
// PreToolUse hook: blocks `Write` / `Edit` of generated `.md` snapshots.
// They are produced by `pnpm copy:preview` (smart by default — auto-
// detects affected routes) and `pnpm email:preview-md`. Hand-editing
// drifts from the rendered source; the right move is to fix the
// underlying page/email and re-run the script.
//
// Match paths:
// packages/web/src/app/**/page.logged-out.md
// packages/web/src/app/**/page.logged-in.md
// packages/web/**/*.email.md
//
// 2026-05-14: written after I hand-edited HVG snapshots three times in
// one session before the user pushed back. Per
// `feedback_promote_violated_text_rules_to_hooks.md`: rules in
// CLAUDE.md lose to active enforcement.

import { readFileSync } from "node:fs";

try {
let hookData = null;
try {
const raw = readFileSync(0, "utf-8");
if (raw && raw.trim()) hookData = JSON.parse(raw);
} catch {
process.exit(0);
}
if (!hookData) process.exit(0);

const tool = hookData.tool_name;
if (tool !== "Write" && tool !== "Edit") process.exit(0);

const filePath = hookData?.tool_input?.file_path;
if (!filePath) process.exit(0);

const normalized = filePath.replace(/\\/g, "/");

const isSnapshot =
/\/packages\/web\/src\/app\/.+\/page\.logged-(out|in)\.md$/.test(normalized) ||
/\/packages\/web\/.+\.email\.md$/.test(normalized);

if (!isSnapshot) process.exit(0);

const relPath = normalized.replace(/.*\/(packages\/.+)$/, "$1");

const msg = `[block-snapshot-handedit] BLOCKED — about to hand-edit a generated snapshot:
${relPath}

These files are output, not source. They're regenerated by:
pnpm --filter @optimitron/web copy:preview (smart, only affected routes)
pnpm --filter @optimitron/web email:preview-md (for *.email.md)

If the rendered output is wrong, fix the underlying page.tsx / email
module / template and re-run the script. Hand-editing drifts from
source and CI will flag the snapshot diff at PR time.

If you're SURE this is a one-off (e.g., a known-broken script you're
fixing in the same commit), bypass by acknowledging in chat:
"Hand-editing <filename> intentionally because <reason>." and the
hook allows a retry within 5 minutes.

Rule source: CLAUDE.md "Never hand-edit page.logged-out.md / *.email.md
snapshots" + memory feedback_promote_violated_text_rules_to_hooks.md`;

process.stderr.write(msg + "\n");
process.exit(2);
} catch {
process.exit(0);
}
7 changes: 7 additions & 0 deletions .claude/hooks/codex-dispatch-blather.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,13 @@ try {
if (hookData.tool_name !== "Bash") process.exit(0);

const cmd = hookData?.tool_input?.command ?? "";
// Skip commands that mention codex inside quoted/heredoc strings
// rather than invoking it. `git commit -m "... codex exec ..."`
// would otherwise fire this hook on every protocol-related commit.
const firstToken = String(cmd).trim().split(/\s+/)[0] ?? "";
if (/^(git|gh|grep|rg|find|cat|head|tail|sed|awk|echo|printf|ls|cd|node|pnpm|npm|yarn|tsx|powershell)$/i.test(firstToken)) {
process.exit(0);
}
if (!/\bcodex\s+exec\b/.test(cmd)) process.exit(0);

// Extract the prompt argument. Codex usage:
Expand Down
79 changes: 79 additions & 0 deletions .claude/hooks/dev-server-check.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#!/usr/bin/env node
// dev-server-check.mjs
//
// SessionStart hook: probes http://127.0.0.1:3001 and prints one of:
// DEV SERVER: 200 OK
// DEV SERVER: DOWN — run `pnpm --filter @optimitron/web dev:fast ...`
// DEV SERVER: PORT BOUND but UNRESPONSIVE — kill PID then restart
//
// Per CLAUDE.md HVD#2: Claude pre-warms the dev server at session
// start if curl doesn't return 2xx/3xx. I keep forgetting to run the
// curl, so I never reach the conditional. This hook runs the check
// FOR me + prints loud status + the exact command to restart.
//
// Does NOT auto-start — too many edge cases (intentional teardown,
// different port, sibling repo bound to 3001). Loud warn + concrete
// command is sufficient if I actually read the output.
//
// 2026-05-14: written after I dispatched Codex agents claiming "Dev
// server is running at :3001" while a 17-hour zombie PID was bound
// but unresponsive. Per
// feedback_promote_violated_text_rules_to_hooks.md.

import { execSync } from "node:child_process";

const PORT = 3001;
const URL = `http://127.0.0.1:${PORT}`;
const START_CMD =
"pnpm --filter @optimitron/web dev:fast > packages/web/.dev-server.log 2>&1";

try {
let status = null;
try {
const devNull = process.platform === "win32" ? "NUL" : "/dev/null";
const out = execSync(
`curl -sS -m 3 -o ${devNull} -w "%{http_code}" ${URL}`,
{ stdio: ["ignore", "pipe", "ignore"], encoding: "utf-8" },
).trim();
Comment thread
mikepsinn marked this conversation as resolved.
status = parseInt(out, 10);
} catch {
status = 0;
}

if (status >= 200 && status < 400) {
process.stdout.write(`[dev-server-check] DEV SERVER: ${status} OK on ${URL}\n`);
process.exit(0);
}

// Down or unresponsive. Check if port is bound to a zombie PID.
let zombiePid = null;
try {
const netstat = execSync(
`powershell -NoProfile -Command "(Get-NetTCPConnection -LocalPort ${PORT} -State Listen -ErrorAction SilentlyContinue | Select-Object -First 1).OwningProcess"`,
{ stdio: ["ignore", "pipe", "ignore"], encoding: "utf-8" },
).trim();
if (netstat && /^\d+$/.test(netstat)) zombiePid = netstat;
} catch {
// ignore; Windows-specific
}

if (zombiePid) {
process.stdout.write(
`[dev-server-check] DEV SERVER: PORT ${PORT} BOUND by PID ${zombiePid} but UNRESPONSIVE (zombie).\n` +
` Kill + restart:\n` +
` powershell -NoProfile -Command "Stop-Process -Id ${zombiePid} -Force"\n` +
` ${START_CMD}\n`,
);
process.exit(0);
}

process.stdout.write(
`[dev-server-check] DEV SERVER: DOWN on ${URL}.\n` +
` Start it now (orchestrator-only — per CLAUDE.md HVD#2):\n` +
` ${START_CMD}\n` +
` Codex dispatches will fail or run blind without it.\n`,
);
process.exit(0);
} catch {
process.exit(0);
}
Loading
Loading