diff --git a/.claude/agents/voice-critic.md b/.claude/agents/voice-critic.md index 12cd5737d..7c825b7f9 100644 --- a/.claude/agents/voice-critic.md +++ b/.claude/agents/voice-critic.md @@ -25,12 +25,19 @@ If the answer to all five is yes, you have no violations to report. Say so. Each finding is a hypothesis until you've verified it. **Before claiming a violation, read the source of whatever you're judging:** - "Number isn't using ``" → grep `parameters-calculations-citations.ts` to confirm a matching parameter exists. If no parameter exists, the fix is to add one, not to wrap nothing. -- "`` defeats the component" → read `components/shared/ParameterValue.tsx` first. `valueOverride` is the INTENDED API for attaching the citation popover while controlling display text. Not a violation. +- "`` defeats the component" → read `components/shared/ParameterValue.tsx` first. `valueOverride` is the INTENDED API for attaching the details dialog while controlling display text. Not a violation. - "Duplicate component" → grep for the existing component, confirm it has the same shape. Different responsibilities ≠ duplicate. - "Banned phrase" → confirm the phrase actually appears in user-facing rendered text (not a comment, not a test fixture, not a variable name). If you can't confirm by reading the source, DROP the finding or label it explicitly: *"agent's read, not verified — confirm before acting."* +# Required checks for every copy block you review + +These run regardless of which smell first caught your attention. + +1. **Manual-search before suggesting new copy.** If you're proposing replacement wording for any user-facing string, first call `mcp__optimitron-tasks__searchManual` with the topic phrase and check whether the manual already has a sharper version we should steal. The manual is the source of truth for voice — quoting from it beats inventing fresh prose. If the manual has nothing usable, say so explicitly in the finding so the reader knows you checked. +2. **Parameter coverage for every number.** For every hardcoded user-facing number in the changeset (digits, percentages, multipliers, dollar amounts, year counts), grep `packages/data/src/parameters/parameters-calculations-citations.ts` and the wider `packages/data/src/parameters/` directory for an existing parameter. If one exists and the JSX uses a raw literal instead of ``, flag it with the parameter ID. If no parameter exists yet, flag whether a new parameter is warranted (cited statistics warrant one; arithmetic identities like "2² = 4" do not). + # Common smells (use as hypotheses to investigate, not as automatic verdicts) - Corporate-onboarding verbs in copy: *Take ownership*, *Engage*, *Empower*, *Unlock*, *Streamline*, *Get started*, *Take this on*, *Activate*. diff --git a/.claude/codex-delegation.md b/.claude/codex-delegation.md new file mode 100644 index 000000000..d9ed52543 --- /dev/null +++ b/.claude/codex-delegation.md @@ -0,0 +1,169 @@ +# Codex delegation protocol + +Claude Code's working pattern with the Codex CLI. Loaded by reference from CLAUDE.md. + +## Default delegation + +Programming work goes to Codex via `Bash` running `codex exec` directly, with `run_in_background: true`. The MCP-mediated Agent-tool path (`subagent_type: codex:codex-rescue`) is strictly worse — see "Why CLI not Agent tool" below — and is not used. + +**Dispatch shape that works:** +``` +Bash(command: "codex exec --skip-git-repo-check ''", run_in_background: true) +``` + +**Don't add a shell `&` inside the command.** The Bash tool already backgrounds via `run_in_background: true`; a second `&` makes the codex child detach from the bash subprocess, which exits immediately with status 0 — Claude then gets a "completed" notification while Codex is still running for minutes. Pair the dispatch with a `Monitor` watching the session JSONL for real progress. + +Claude edits meta-config (CLAUDE.md, this file, `.codex/config.toml`, hook scripts) directly — those are quick and don't need a dispatch. + +## Every Codex prompt must contain + +1. **Mikepsinn's verbatim message**, quoted. The user often uses speech-to-text — typos expected; interpret intent, don't surface-correct. Verbatim quoting eliminates Claude-as-telephone-game mutation. +2. **Investigate-before-coding** instruction: grep, read, understand. Don't trust the framing blindly. +3. **Push back if the request hurts the 4B-voters-on-the-treaty goal.** State the concern, propose to skip, wait for confirmation. Don't silently comply with work that doesn't move that needle. +4. **Argue back if Claude misread the user.** The verbatim quote makes this checkable. +5. **Regenerate affected `.md` snapshots and screenshots** after any content/component change. Use `node packages/web/scripts/affected-routes.mjs` to pipe changed-file paths into `render-pages-to-markdown.ts --routes=` for targeted regen; fall back to full regen when the change touches shared primitives. +6. **Nothing committed without user approval.** Codex stages the changeset and reports; Claude relays the summary + diff scope; user OKs; then Claude commits on Codex's behalf (Codex can't touch `.git`). + +## NEVER run `next build` / `pnpm build` + +`next build` writes to `.next/` (route manifests, server chunks, build IDs) that the running dev server is concurrently reading. When build and dev share the same `.next/`, the dev server starts logging `ENOENT` on missing-or-mid-write manifest files and stops returning bytes on every route. The fix is an orchestrator restart of the dev server. This will burn 5-10 minutes of investigation time every single time. + +**Banned, no exceptions during a Codex session unless the orchestrator explicitly says otherwise:** +- `pnpm build` +- `pnpm --filter @optimitron/web build` +- `next build` directly +- Any script that calls `next build` transitively + +**For "is the bundle compile-clean" sanity-check use ONLY:** +- `pnpm --filter @optimitron/web exec tsc --noEmit` or `typecheck:fast` — type-graph only, doesn't touch `.next/` +- Focused vitest suites — Node-only, doesn't touch `.next/` +- ESLint — Node-only, doesn't touch `.next/` + +If you truly need a production-build sanity check (rare), tell the orchestrator first so the dev server can be stopped, build run, dev server restarted. Don't do it concurrently with a live dev server. + +Concrete failure this rule prevents: this session, Codex ran `next build` as "offline sanity check" while the orchestrator dev server was running. Build succeeded but the dev server's `.next/server/.../manifest.json` reads started returning `ENOENT`. Every subsequent route hung. Cost: ~15 min of "is this a real bug or a dev-server problem" investigation before the orchestrator restart cleared it. + +## NEVER kill the dev server + +The orchestrator (Claude / human dev) owns the dev server on 3001. Every Codex dispatch inherits this — agents are pure consumers, never managers. + +**Banned operations:** +- `Stop-Process` / `kill` / `taskkill` against any node process bound to 3001 +- Cleanup steps that "stop the dev server I started" — you didn't start it; don't stop it +- Wrapping `pnpm dev:fast` in a try/finally that kills on exit +- Killing port-3001 processes "just to be safe" when starting your own (you should never start your own) + +**If the dev server is unresponsive:** report that fact and stop. Do NOT kill it and restart. The orchestrator will notice and restart if needed. Killing an unresponsive server can race with a slow compile that was about to finish. + +**Only acceptable termination case:** the orchestrator explicitly told you to kill it as part of a known-bad-state recovery. That permission must be explicit in the dispatch prompt — never inferred. + +Concrete failure case this rule prevents: this session, multiple Codex agents spawned their own `pnpm dev:fast`, dutifully cleaned up at end of verification, and the dev server vanished — leaving the next agent with no server to reuse. The orchestrator had to restart it manually each time. The new "agents reuse, never spawn" rule plus this "never kill" rule, together, eliminate the start-then-die cycle. + +## Verification tool choice (use the cheapest that gives the answer) + +Codex has Playwright MCP wired up (`mcp__playwright__browser_navigate`, `browser_console_messages`, `browser_take_screenshot`, etc.). Use it for spot-checks during the fix-iterate loop — load a page, grab console errors, verify the symptom is gone. 5-15 seconds per route. + +DO NOT default to `pnpm --filter @optimitron/web run e2e -- visual --grep ` for iteration verification. That command boots a dev/prod server, compiles routes, runs screenshot capture + baseline comparison + Argos upload — 5-10 minutes per filter. Reserve it for the FINAL pre-merge verification pass after the fix is known to work. + +Same signal (does the page hydrate without React errors? does the layout look right?) at 50x the cost. Burning 10 minutes per fix-iteration cycle when the same answer is available in 10 seconds is the anti-pattern. Concrete failure: this session, the hydration-investigation Codex spent ~8 minutes of one verification run on `pnpm e2e visual --grep treaty` when the same fix could have been spot-checked via Playwright MCP in seconds. + +Include this in every Codex dispatch prompt for fix-iteration tasks: *"Use Playwright MCP (`mcp__playwright__browser_navigate` + `browser_console_messages`) for spot-checks during the iterate loop. Reserve `pnpm e2e visual` for the final verification pass."* + +## One worktree, one branch, one dev server, one PR at a time + +**No `git worktree`. No parallel branches. No second PR while another is in flight.** Every Codex dispatch runs in the main checkout (`E:/code/optimitron`) against whatever branch is currently checked out. The user is on ONE feature branch driving ONE PR; Codex's edits land on THAT branch. If the user wants Codex to do something that genuinely doesn't belong in the current PR's scope, the answer is "wait until this PR merges" — NOT "spin up a worktree on a new branch." + +The mistake this rule prevents: I tried to run an "email-migration" Codex in a separate `../optimitron-emails` worktree on `feature/email-parameter-values` while another Codex was working in the main worktree on the live PR branch. Two dev servers fought over port 3001, the hydration-investigation agent's dev-server attempt timed out on EADDRINUSE, I burned a chat turn diagnosing the port conflict, and the resulting branch is now an orphan that has to be cherry-picked back into the live PR. None of this would have happened in a single worktree on a single branch. + +**Dev server: one always running on 3001.** Claude (the orchestrator) pre-warms it at session start. Every Codex dispatch prompt must include the line: `"Dev server is already running at http://127.0.0.1:3001. Reuse it. Do NOT start your own."` If you're about to write a dispatch prompt that doesn't include that line, you forgot. + +**Dev server logs.** Pages render 200 with broken HTML and runtime errors only show up in stderr — never trust an HTTP status as proof of success. Pass the log path into every Codex dispatch so the agent can verify its own work. + +When Claude pre-warms the dev server, redirect output to `packages/web/.dev-server.log` (gitignored): + +``` +pnpm --filter @optimitron/web dev:fast > packages/web/.dev-server.log 2>&1 & +``` + +Then every Codex dispatch prompt for UI/rendering work includes: + +> Dev server logs are streaming to `packages/web/.dev-server.log`. After loading any page in your fix-iterate loop, `tail -50 packages/web/.dev-server.log` and grep for `uncaughtException`, `Error:`, `⨯`, `Failed to compile`. A 200 response with errors in the log = broken render. Do not declare a fix verified until the log is clean for the route you touched. + +If the dev server was started without that redirect (e.g., from a fresh laptop / IDE-triggered start), tell Codex: *"Dev server logs are not redirected to a file this session; load the page via Playwright MCP and use `browser_console_messages` for client-side errors. Ask the orchestrator to paste recent server stderr if you suspect a server-side issue."* + +## Sequential agent coordination + +**When a follow-up task would overlap files an active agent owns**, queue it as a follow-up to that agent's session via `codex exec resume`: + +- `codex exec resume "follow-up prompt"` — explicit, robust. Capture the UUID right after dispatch by globbing `~/.codex/sessions/$(date +%Y)/$(date +%m)/$(date +%d)/rollout-*.jsonl` (newest = the one you just spawned). UUID is the trailing hex segment of the filename. +- `codex exec resume --last "follow-up prompt"` — convenient but risky if other Codex sessions ran in between in the same cwd. + +The session UUID is the only handle you get; capture it at dispatch time and store it for the life of the follow-up chain. + +**Two Codex agents may run in parallel ONLY if the user has explicitly authorized them on disjoint file scopes within the same branch AND the second agent's work is genuinely additive to the first (not a coordinated refactor).** Default is one agent at a time on the current branch; parallel is the exception, not the norm. + +## Why CLI not Agent tool + +The `subagent_type: codex:codex-rescue` Agent path is MCP-mediated and strictly worse than direct `codex exec`: + +- No Codex CLI flag access (`-c`, `--enable`, `--config`, profiles all hidden). +- Session UUID hidden → can't queue follow-ups; have to start a new agent every time. +- Auto-mode permission classifier blocks valid work mid-flight (caught one valid dispatch in a single session). +- Wrapper sometimes returns "Codex is running in the background, will report when done" narration *after the work has already finished* — fooled me 3× in one session into thinking agents had fizzled. +- The classifier's "safety net" is the only theoretical upside, and Claude already applies per-task safety judgment manually. + +If a future Claude session is tempted to use the Agent path because it looks more integrated: it isn't. The direct CLI path has the same `run_in_background: true` notification UX from Bash, plus everything above. + +## Config + +`.codex/config.toml` pins `model = "gpt-5.5"` + `model_reasoning_effort = "xhigh"` — strongest tier for the hardest async tasks. + +## Pre-commit preflight (qa-passed gate) + +Before any commit touching user-facing files (anything under `packages/web/src/app/`, `packages/web/src/components/`, `packages/web/src/lib/email/`, `packages/web/src/lib/tasks/`, or any `.md` snapshot), dispatch a Codex preflight agent with a goal-only prompt: + +> "Validate this staged changeset. Read `git diff --cached --name-only` and `git diff --cached`. Decide what's relevant to regenerate (markdown snapshots? email previews? screenshots? none?), what tests to run, what artifacts to review. Run everything relevant. Read the output. Fix every problem you find. Iterate until clean. Don't ship until you'd put your own name on the commit. Report what you fixed and what's left." + +**Don't enumerate file globs, test commands, or scope schemas.** Codex decides from the diff. Listing them is the same micromanaging anti-pattern as [[state-the-goal-not-the-script]]. + +When Codex returns clean, add a line to the commit message: + +``` +qa-passed: +``` + +If Codex says nothing needs to run (e.g. the diff is pure meta-config that snuck through the gate), make the rationale explicit: + +``` +qa-passed: skipped — pure meta-config (.claude/, CLAUDE.md, .codex/, hooks) +``` + +The `verify-ui-changes.mjs` hook checks for the `qa-passed:` line on every commit touching user-facing files and blocks if missing. + +## Verify before relaying + +Codex hallucinates. Inspect each non-trivial diff before reporting success. + +**Read the agent_message events, not just `task_complete`.** Codex's session file at `~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl` contains the full conversation log. The final `task_complete` event sometimes has an empty `last_agent_message` even when Codex did real work — you'll miss its actual narration if you only tail the file. Mid-stream `agent_message` events are where Codex reports what it's actually doing, including stupid moves you'd want to redirect mid-flight. Extract them like: + +```python +python -c " +import json, sys +with open(sys.argv[1]) as f: + for line in f: + d = json.loads(line) + p = d.get('payload', {}) + if p.get('type') == 'agent_message': + print(p['message']); print('---') +" +``` + +Always run this against the right session file (`ls -t ~/.codex/sessions/$(date +%Y)/$(date +%m)/$(date +%d)/rollout-*.jsonl | head -1`) before declaring an agent failed or succeeded — wrapper narration and filesystem state alone are insufficient. + +**Always verify the working tree matches what Codex claims.** Run `git diff --stat` after every Codex dispatch and compare line counts to what Codex says it did. If Codex says "now 266 lines" and `wc -l` says 1490, something reverted the edits — investigate before committing or re-dispatching. + +**Watch the agent_message stream while Codex runs, not just after.** Mid-flight, Codex sometimes does something stupid (reads the wrong file, applies the wrong rule, derails into unrelated work). Tailing the session JSONL or periodically polling `agent_message` events gives you the chance to redirect before Codex burns 3M tokens on a wrong path. Don't just wait for the completion notification and read the diff — that's strictly reactive. + +**When Codex's claim conflicts with your understanding or the filesystem, ASK CODEX.** Don't guess. Use `codex exec resume ""` to query the same session — Codex has full context on what it did and can explain. Example: "You said the file is 266 lines, but on disk it's 1490 with empty git diff. Did your edits write to a sandbox? What path did you actually write to?" Treat the agent as an interlocutor on its own work, not a black box. + +**Never `git stash` while Codex agents are working.** A `git stash --keep-index` (or any stash) reaches into the working tree, including files a parallel Codex agent has just written or is about to write. The subsequent `git stash pop` doesn't reliably restore those concurrent writes — they vanish silently. Verified this session: one Codex audit's 266-line TODO.md got dropped by exactly this dance. The pre-commit hook now reads only `git diff --cached` (staged content), so there's no reason to stash unstaged parallel work — `git add ` and commit; the hook will only inspect what you staged. If you find yourself reaching for `git stash`, stop and ask why. diff --git a/.claude/hooks/check-gstack.sh b/.claude/hooks/check-gstack.sh new file mode 100644 index 000000000..2ade7db90 --- /dev/null +++ b/.claude/hooks/check-gstack.sh @@ -0,0 +1,20 @@ +#!/bin/bash +# Block skill usage when gstack is not installed globally. + +if [ ! -d "$HOME/.claude/skills/gstack/bin" ]; then + cat >&2 <<'MSG' +BLOCKED: gstack is not installed globally. + +gstack is required for AI-assisted work in this repo. + +Install it: + git clone --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack + cd ~/.claude/skills/gstack && ./setup --team + +Then restart your AI coding tool. +MSG + echo '{"permissionDecision":"deny","message":"gstack is required but not installed. See stderr for install instructions."}' + exit 0 +fi + +echo '{}' diff --git a/.claude/hooks/codex-dispatch-blather.mjs b/.claude/hooks/codex-dispatch-blather.mjs new file mode 100644 index 000000000..a54160378 --- /dev/null +++ b/.claude/hooks/codex-dispatch-blather.mjs @@ -0,0 +1,99 @@ +#!/usr/bin/env node +// codex-dispatch-blather.mjs +// +// PreToolUse hook on Bash. Fires before any `codex exec` (including +// `codex exec resume `). Counts enumeration items in the prompt +// argument. If the prompt has more than 3 numbered/bullet items + headings +// outside of verbatim user quotes, blocks with a reminder. +// +// Why: per [[feedback_state_the_goal_not_the_script]] and codex-delegation.md +// "Pre-commit preflight" — when dispatching to Codex, state the goal in +// plain English. Numbered procedural lists are the prompt-engineering +// version of the smallest-fix anti-pattern. Codex decides from the diff; +// listing steps is micromanaging. +// +// Fail-open on any unexpected error. + +import { readFileSync } from "node:fs"; + +const BUDGET = 3; + +try { + let hookData = null; + try { + const raw = readFileSync(0, "utf-8"); + if (raw && raw.trim()) hookData = JSON.parse(raw); + } catch { + process.exit(0); + } + if (!hookData) process.exit(0); + if (hookData.tool_name !== "Bash") process.exit(0); + + const cmd = hookData?.tool_input?.command ?? ""; + if (!/\bcodex\s+exec\b/.test(cmd)) process.exit(0); + + // Extract the prompt argument. Codex usage: + // codex exec [--flag ...] '' + // codex exec resume '' + // The prompt is the last single-quoted argument. Find the position of the + // first quote and the last closing quote (must be followed by end-or-pipe- + // trailing-whitespace) so we don't get confused by embedded `'\''` escapes + // or trailing `| tail -3` style suffixes. + const trimmed = cmd.replace(/\s*(?:\d?>&?\d?|\|.*|2>&1.*)*\s*$/, ""); + const m = + trimmed.match(/'([\s\S]*?)'\s*$/) || + trimmed.match(/"([\s\S]*?)"\s*$/) || + cmd.match(/'([\s\S]*)'\s*$/) || + cmd.match(/"([\s\S]*)"\s*$/); + if (!m) process.exit(0); + const prompt = m[1]; + if (prompt.length < 200) process.exit(0); // tiny prompts are fine + + // Walk lines, skip blockquotes (`>` prefix — verbatim user quote). + let enumerationCount = 0; + const samples = []; + for (const line of prompt.split("\n")) { + const trimmed = line.replace(/^\s+/, ""); + if (trimmed.startsWith(">")) continue; + // Numbered list items: "1.", "1)", "(1)" etc. + if (/^\(?\d+[.)]\s/.test(trimmed)) { + enumerationCount++; + if (samples.length < 6) samples.push(line); + continue; + } + // Bullet items: -, *, • when followed by space. + if (/^[-*•]\s/.test(trimmed)) { + enumerationCount++; + if (samples.length < 6) samples.push(line); + continue; + } + // Procedural headings: "## Process", "## Verify", "**Process:**" etc. + if ( + /^#{2,}\s+(process|verify|steps?|hands?\s+off|approaches?|how|protocol)\b/i.test(trimmed) + || /^\*\*(process|verify|steps?|hands?\s+off|approaches?|how|protocol)\b/i.test(trimmed) + ) { + enumerationCount++; + if (samples.length < 6) samples.push(line); + } + } + + if (enumerationCount <= BUDGET) process.exit(0); + + const sampleStr = samples.slice(0, 5).map((s) => ` ${s.trim()}`).join("\n"); + process.stderr.write( + `[codex-dispatch-blather] Prompt has ${enumerationCount} enumeration items (budget: ${BUDGET}). + +Codex dispatch should be goal-first, not procedural. See: + - .claude/codex-delegation.md ("Pre-commit preflight" + "Every Codex prompt must contain") + - C:/Users/m/.claude/projects/E--code-optimitron/memory/feedback_state_the_goal_not_the_script.md + +Rewrite: one paragraph stating what's broken + what "done" looks like, the verbatim user quote, and a pointer to codex-delegation.md. Codex reads CLAUDE.md and the diff itself — do not duplicate that here. + +Detected enumeration: +${sampleStr} +`, + ); + process.exit(2); +} catch { + process.exit(0); +} diff --git a/.claude/hooks/pre-commit-checklist.mjs b/.claude/hooks/pre-commit-checklist.mjs index 28c71aac1..879903125 100644 --- a/.claude/hooks/pre-commit-checklist.mjs +++ b/.claude/hooks/pre-commit-checklist.mjs @@ -28,17 +28,26 @@ if (!command) process.exit(0); // Only intercept `git commit`. Match the command being passed to Bash — // covers `git commit`, `git commit -m "..."`, `cd X && git commit ...`, -// and `git -C path commit`. Skips `git commit-tree` and other false positives. +// `git -C path commit`, and `git -c key=value commit`. Skips +// `git commit-tree` and other false positives. +// +// Codex review (2026-05-12) caught that the previous `\s+-[A-Za-z]\S*` +// only matched `-XValue` joined forms, not the space-separated `-X Value` +// forms like `-C path` or `-c user.name=foo` that git accepts before the +// subcommand. The optional `(\s+\S+)?` group covers space-separated values. if ( - !/(^|[\s;]|&&|\|\|)git(\s+-[A-Za-z]\S*)*\s+commit(\s|$)/.test(command) + !/(^|[\s;]|&&|\|\|)git(\s+-[A-Za-z]\S*(\s+\S+)?)*\s+commit(\s|$)/.test(command) ) { process.exit(0); } -// Delegate. verify-ui-changes.mjs tolerates missing stdin. +// Delegate. Pipe the original hookData JSON to the child so it can +// detect commit-attempt mode (`hookData.tool_name === "Bash"`) and emit +// full per-file detail instead of the terse Stop-mode one-liner. const verifyScript = join(__dirname, "verify-ui-changes.mjs"); const result = spawnSync(process.execPath, [verifyScript], { - stdio: ["ignore", "inherit", "inherit"], + input: JSON.stringify(hookData), + stdio: ["pipe", "inherit", "inherit"], }); process.exit(result.status ?? 0); diff --git a/.claude/hooks/pre-write-architecture-check.mjs b/.claude/hooks/pre-write-architecture-check.mjs index d060578e0..53ddfad2b 100644 --- a/.claude/hooks/pre-write-architecture-check.mjs +++ b/.claude/hooks/pre-write-architecture-check.mjs @@ -60,24 +60,16 @@ try { // --- Emit checklist ---------------------------------------------------- const relPath = filePath.replace(/.*[/\\]packages[/\\]/, "packages/"); - const msg = `[pre-write architecture check] You are about to CREATE a new file: + const msg = `[pre-write architecture check] About to CREATE a new file: ${relPath} -This hook fires for new files in architectural paths (packages/*/src/, prisma/, scripts/, .github/workflows/, .claude/agents/). The user has called out the pattern — I default to creating new files / abstractions when the smallest fix is one line in a config or a delegation to an existing function. Before writing this file, answer in chat: +Hook fires for new files in architectural paths (packages/*/src/, prisma/, scripts/, .github/workflows/, .claude/agents/). I default to new files / abstractions when a one-line change in config or a delegation to an existing function would do. Answer in chat before retrying: -1. **What is the actual user-facing problem?** Name it in one sentence. +1. **What is the actual user-facing problem?** One sentence. +2. **What does the existing system already do for this area?** Grep at least one of: .github/workflows/ci.yml, package.json scripts, existing functions in the same dir, the relevant TODO.md section. +3. **What is the BEST fix?** Solves the root cause without new maintenance debt. A one-line workaround masking a real bug is NOT the right move — name that openly if the smallest viable change is a band-aid, then propose the real fix. If a new file legitimately is the best fix, justify why. -2. **What does the existing system already do for this area?** Specifically grep / Read at least one of: - - The deploy workflow (.github/workflows/ci.yml) — what does production currently run? - - package.json scripts — is there an existing command that does the work? - - Existing functions in the same area — is there already an idempotent version? - - The relevant section of TODO.md — has a decision been recorded? - -3. **What is the smallest possible fix?** If the answer is "add a new file", justify why a one-line change in a config / package.json / existing function would NOT work. - -4. **Has the user signaled this should be simple?** If YES, you almost certainly haven't found the smallest fix yet. Stop and re-explore. - -After answering these in chat, retry the Write. The hook will allow it within 5 minutes once you've responded.`; +Retry within 5 minutes after answering and the hook allows it through.`; process.stderr.write(msg + "\n"); process.exit(2); diff --git a/.claude/hooks/surprise-signal.mjs b/.claude/hooks/surprise-signal.mjs new file mode 100644 index 000000000..f5f5a6089 --- /dev/null +++ b/.claude/hooks/surprise-signal.mjs @@ -0,0 +1,113 @@ +#!/usr/bin/env node +/** + * surprise-signal.mjs + * + * UserPromptSubmit hook: scans the user's prompt for phrases indicating + * they're pushing back on the complexity of what I'm doing ("should it + * really be this hard / I thought it was simpler / aren't we missing + * something / why is this so much"). When detected, prepends a STOP + * signal to my context so I notice it BEFORE I respond — instead of + * pattern-matching to "build more." + * + * Ported from the previous global PowerShell version on 2026-05-13 with a + * critical framing fix: the old version pushed me toward the "smallest + * fix" which kept rewarding workarounds that masked real bugs (the 0.5% + * visual-review threshold was the canonical example). New framing is + * "BEST fix" — small is preferred only when also correct; a one-line + * workaround that hides a real bug is not the right move. + * + * Output via stdout = added to my context as additional info on the + * user's prompt. Not blocking — I still need to respond, but with the + * signal foregrounded. + * + * Fail-open on any error. + */ + +import { readFileSync, existsSync, mkdirSync, statSync, writeFileSync } from "node:fs"; +import path from "node:path"; +import os from "node:os"; +import crypto from "node:crypto"; + +try { + let raw = ""; + try { + raw = readFileSync(0, "utf-8"); + } catch { + process.exit(0); + } + if (!raw || !raw.trim()) process.exit(0); + + let hookData; + try { + hookData = JSON.parse(raw); + } catch { + process.exit(0); + } + + let prompt = hookData?.prompt; + if (!prompt) process.exit(0); + if (Array.isArray(prompt)) prompt = prompt.join(" "); + if (typeof prompt !== "string") process.exit(0); + + // Surprise / "this should be simpler" patterns. Tuned to catch the + // moments I miss most often. + const patterns = [ + /\b(should|shouldn['`]t) (?:it|this|we|they) (?:really|just|be|have)\b/i, + /\bI thought (?:it|this|we|that|that we|we had|we were)\b/i, + /\baren['`]t (?:we|you) (?:missing|supposed)\b/i, + /\bwhy (?:is|are) (?:this|it|we) (?:so |such )\b/i, + /\bisn['`]t (?:this|it|that) (?:just|simply|basically)\b/i, + /\b(what are|are) we missing\b/i, + /\bwhy can['`]t (?:we|it) just\b/i, + /\bbasically (?:just |only )?\w+ing\b/i, + /\b(?:am I|are we) (?:doing|missing)\b/i, + ]; + + const matched = []; + for (const p of patterns) { + const m = prompt.match(p); + if (m) matched.push(m[0]); + } + if (matched.length === 0) process.exit(0); + + // Dedup by prompt hash — don't re-surface for the same message if it + // arrives twice in quick succession. + const cacheDir = path.join( + process.env.LOCALAPPDATA || os.tmpdir(), + "claude", + "hook-cache", + ); + try { + if (!existsSync(cacheDir)) mkdirSync(cacheDir, { recursive: true }); + } catch { + // Cache miss is fine — fall through and emit. + } + const head = prompt.slice(0, 400); + const hash = crypto.createHash("sha1").update(head).digest("hex").slice(0, 16); + const cacheFile = path.join(cacheDir, `surprise-${hash}.txt`); + try { + if (existsSync(cacheFile)) { + const tenMinAgo = Date.now() - 10 * 60 * 1000; + if (statSync(cacheFile).mtimeMs > tenMinAgo) process.exit(0); + } + writeFileSync(cacheFile, new Date().toISOString()); + } catch { + // Cache write failure shouldn't block emission. + } + + const matchedStr = [...new Set(matched)].slice(0, 3).join(" / "); + + const msg = `[surprise-signal hook] User's prompt contains a "this should be simpler" phrase: ${matchedStr} + +STOP and re-explore before responding — don't pattern-match to "build more." + +1. STOP the next step you had planned. +2. Re-explore the existing system relevant to the question (grep deploy workflow, package.json scripts, existing functions). Don't rely on session memory. If the question is about UX / page copy / "what does X look like," fetch the PR's PREVIEW DEPLOY (Vercel MCP \`web_fetch_vercel_url\` or curl with \`_vercel_share\`), NOT production or page.tsx source. +3. State in chat what you found that already handles (or doesn't handle) the user's concern. +4. **Find the BEST fix, not the smallest.** A one-line workaround masking a real bug is not the right move; if the smallest viable change is a band-aid, name that openly and propose the real fix.`; + + process.stdout.write(msg + "\n"); + process.exit(0); +} catch { + process.exit(0); +} diff --git a/.claude/hooks/verify-ui-changes.mjs b/.claude/hooks/verify-ui-changes.mjs index 6c9f7c7bd..db2d0c333 100644 --- a/.claude/hooks/verify-ui-changes.mjs +++ b/.claude/hooks/verify-ui-changes.mjs @@ -43,15 +43,24 @@ try { if (!existsSync(join(RepoRoot, ".git"))) process.exit(0); // --- Diff / file lists ---------------------------------------------------- - const diffNames = git("diff --name-only HEAD") + // Inspect ONLY staged content (--cached). The hook fires on `git commit`, + // and the only thing being committed is what's been staged via `git add`. + // Reading the entire working tree pulls in parallel agents' unstaged work, + // which used to force a `git stash --keep-index` dance to satisfy the hook — + // and that dance silently dropped Codex's edits at least once this session. + const diffNames = git("diff --cached --name-only") .split("\n") .filter(Boolean); + // Untracked files aren't staged-by-default; only flag them if they're in + // the staged set (paths-mode `git add` on a previously-untracked file). + const stagedSet = new Set(diffNames); const untracked = git("ls-files --others --exclude-standard") .split("\n") - .filter(Boolean); + .filter(Boolean) + .filter((f) => stagedSet.has(f)); const allChanged = [...new Set([...diffNames, ...untracked])]; - const diffBody = git("diff --unified=0 HEAD").split("\n"); + const diffBody = git("diff --cached --unified=0").split("\n"); // Lines added in the diff, with their owning file path tracked so JSX checks // can skip test files. @@ -89,7 +98,61 @@ try { ); const claudeMd = allChanged.includes("CLAUDE.md"); + // Structured violations: each item has { name, count, message, blocking }. + // - `blocking: true` — real bug class (voice violations, hardcoded + // numbers, swallowed errors, copy-snapshot drift, + // CLAUDE.md bloat). Fails the commit. + // - `blocking: false` — advisory file-pattern match (email/page/test + // files changed; new lib file added). Prints the + // reminder but does NOT fail the commit. Real + // content concerns from these areas surface via + // voice-critic / Codex review / human review + // elsewhere. + // The Stop-hook emit path summarizes (one-line). The PreToolUse(Bash) + // emit path shows the full message. + // + // IMPORTANT: this block has to stay ABOVE every site that calls + // `pushViolation`. JavaScript hoists the function declaration but NOT + // the `const violations = []` binding it references — a forward call + // hits the TDZ and throws, the outer fail-open catch hides it, and + // every gate goes silent. CodeRabbit on PR #79 caught the original + // ordering bug. const violations = []; + // True when called from the PreToolUse Bash hook (commit attempt) — + // any time `hookData.tool_name` is set. Falsy on Stop firings. + const isCommitAttempt = Boolean(hookData?.tool_name); + function pushViolation(name, count, message, options = {}) { + violations.push({ name, count, message, blocking: options.blocking !== false }); + } + + // --- qa-passed gate ---------------------------------------------------- + // When a commit touches user-facing surfaces (UI components, page copy, + // email templates, library code that those import), require the commit + // message to contain a `qa-passed: ` promise. The + // promise is Claude's acknowledgement that a Codex preflight agent + // validated the change — ran relevant regens, ran relevant tests, + // reviewed generated markdown/screenshot diffs, fixed problems, and + // came back clean. The hook doesn't enforce the agent ran; it enforces + // the human-readable promise. See .claude/codex-delegation.md for the + // dispatch pattern Claude is supposed to follow. + const userFacingChanges = [...uiFiles, ...copyFiles, ...emailFiles]; + if (userFacingChanges.length > 0 && hookData?.tool_name === "Bash") { + const cmd = hookData?.tool_input?.command ?? ""; + // Pull the commit message body from `-m "..."`, `-m '...'`, or `-F `. + // Heredoc form (`cat <<'EOF' ... EOF`) appears verbatim in the command + // string so a simple includes() also catches it. + const hasPromise = /qa[-\s]?passed\s*:/i.test(cmd); + if (!hasPromise) { + pushViolation( + "QA_PASSED", + userFacingChanges.length, + `QA-PASSED GATE: commit touches user-facing files but the message lacks a \`qa-passed:\` line. Dispatch a Codex preflight per .claude/codex-delegation.md ("Pre-commit preflight" section), then add \`qa-passed: \` or \`qa-passed: skipped — \` to the message. +${userFacingChanges.slice(0, 8).map((f) => ` - ${f}`).join("\n")}${ + userFacingChanges.length > 8 ? `\n ... and ${userFacingChanges.length - 8} more` : "" + }`, + ); + } + } // --- Check 1: UI changes without a fresh screenshot --------------------- if (uiFiles.length) { @@ -106,13 +169,13 @@ try { newestScreenshotMs = newestFileMtime(ScreenshotDir, /\.png$/); } if (!newestScreenshotMs || newestScreenshotMs < lastUiMtime) { - violations.push(formatList( + pushViolation("SCREENSHOT", uiFiles.length, formatList( `SCREENSHOT GATE: UI files changed but no PNG under ${ScreenshotDir} is newer than them.`, uiFiles, `Action: figure out which routes these affect, screenshot each (auth + unauth where relevant) via the chrome-devtools or Playwright MCP, then READ the PNGs and verify the change landed AND nothing adjacent broke. Don't commit on faith.`, - )); + ), { blocking: false }); } } @@ -136,6 +199,11 @@ try { if (file === "CLAUDE.md" || file === "AGENTS.md" || file === "TODO.md") return false; if (file.startsWith(".github/")) return false; if (file.startsWith(".husky/")) return false; + // E2E / test code is TypeScript, not user-facing copy. error.stack and + // similar JS API names are legitimate; voice rules apply to copy only. + if (file.startsWith("packages/web/e2e/")) return false; + if (/\.(test|spec)\.(ts|tsx|js|jsx|mjs)$/.test(file)) return false; + if (/__tests__\//.test(file)) return false; return true; } const voiceHits = []; @@ -150,7 +218,7 @@ try { } if (voiceHits.length) { const sample = unique(voiceHits).slice(0, 12).join("\n"); - violations.push(`VOICE GATE: banned vocabulary in added copy. Rewrite per CLAUDE.md Wishonia voice + Vonnegut rule + pushViolation("VOICE", voiceHits.length, `VOICE GATE: banned vocabulary in added copy. Rewrite per CLAUDE.md Wishonia voice + Vonnegut rule (plain declaratives, numbers beat adjectives, no corporate verbs, no infrastructure metaphors). ${sample}`); } @@ -170,8 +238,8 @@ ${sample}`); } if (paramHits.length) { const sample = paramHits.slice(0, 8).join("\n"); - violations.push(`PARAMETER GATE: hardcoded number(s) in JSX. Use so the -citation popover + sig-fig rule fires. Grep packages/data/src/parameters/parameters-calculations-citations.ts + pushViolation("PARAMETER", paramHits.length, `PARAMETER GATE: hardcoded number(s) in JSX. Use so the +details dialog + sig-fig rule fires. Grep packages/data/src/parameters/parameters-calculations-citations.ts for a matching parameter; add one if it's truly new. Override with '// allow-hardcoded' on the line only when there's no underlying datum (rare). ${sample}`); @@ -182,7 +250,7 @@ ${sample}`); // Count + lines that belong to CLAUDE.md specifically, not the whole diff. const claudeAdded = added.filter((a) => a.file === "CLAUDE.md").length; if (claudeAdded > 12) { - violations.push(`CLAUDE.MD BLOAT GATE: CLAUDE.md grew by ~${claudeAdded} added lines. The file's own meta-rule says + pushViolation("CLAUDE.MD_BLOAT", claudeAdded, `CLAUDE.MD BLOAT GATE: CLAUDE.md grew by ~${claudeAdded} added lines. The file's own meta-rule says "minimum words to convey the rule. One example only." Move detail into .claude/agents/*.md or .claude/.md. Trim before committing.`); } @@ -193,10 +261,30 @@ ${sample}`); const tsxPageChanges = uiFiles.filter((f) => /^packages\/web\/src\/app\/.+\/page\.tsx$/.test(f), ); - if (tsxPageChanges.length) { - violations.push(formatList( + // Skip if a sibling page.logged-*.md was regenerated AFTER the staged + // page.tsx file's mtime — the preview script ran, just produced no + // content drift because the .tsx change was render-equivalent. + const tsxWithStaleSnapshot = tsxPageChanges.filter((tsxRel) => { + const tsxAbs = resolve(RepoRoot, tsxRel); + const dir = tsxAbs.replace(/[\\/][^\\/]+$/, ""); + const tsxMtime = (() => { + try { return statSync(tsxAbs).mtimeMs; } catch { return 0; } + })(); + try { + const siblings = readdirSync(dir).filter((f) => + /^page\.logged-(in|out)\.md$/.test(f), + ); + return !siblings.some((f) => { + try { return statSync(join(dir, f)).mtimeMs >= tsxMtime; } catch { return false; } + }); + } catch { + return true; + } + }); + if (tsxWithStaleSnapshot.length) { + pushViolation("COPY_SNAPSHOT", tsxWithStaleSnapshot.length, formatList( "COPY-SNAPSHOT GATE: page.tsx files changed but no matching page.logged-out.md updated.", - tsxPageChanges, + tsxWithStaleSnapshot, ` Action: run 'pnpm --filter @optimitron/web copy:preview' to regenerate snapshots, diff the .md output, then invoke the voice-critic subagent on the diff. Fix anything that drifts toward startup-bro or away from Wishonia/Vonnegut. Commit the .md files alongside the .tsx.`, @@ -220,7 +308,7 @@ ${sample}`); } if (swallowHits.length) { const sample = swallowHits.slice(0, 6).join("\n"); - violations.push(`ERROR-SWALLOW GATE: silent error swallowing in added lines. Catches must rethrow with context, + pushViolation("ERROR_SWALLOW", swallowHits.length, `ERROR-SWALLOW GATE: silent error swallowing in added lines. Catches must rethrow with context, log via log.error / console.error / Sentry.captureException, or have a one-line comment naming why the silence is intentional. Our own infrastructure failing (our endpoints, our DB writes, our code paths) is always an ERROR, not a warning — warnings get ignored. Warn only for things outside our @@ -230,13 +318,13 @@ ${sample}`); // --- Check 7: email template changes ----------------------------------- if (emailFiles.length) { - violations.push(formatList( + pushViolation("EMAIL_MINIMALISM", emailFiles.length, formatList( "EMAIL MINIMALISM GATE: email template / sender changed.", emailFiles, ` Action: re-read feedback_email_minimalism — one CTA, no chrome, no system internals leaking. Run pnpm --filter @optimitron/web e2e:visual (email-screenshots mode) or render the template preview and inspect the rendered HTML before commit.`, - )); + ), { blocking: false }); } // --- Check 7b: Vonnegut / blather gate on user-facing copy --------------- @@ -245,45 +333,68 @@ ${sample}`); ...copyFiles, ]); if (copyChanges.length) { - violations.push(formatList( - `VONNEGUT / BLATHER GATE: page.tsx or page.logged-out.md changed. Read the rendered copy. Goal: a -5th grader follows it, nothing said twice, no Stripe-keynote sentences, one primary CTA per screen -above the fold on mobile. Spawn voice-critic on the .md diff if scope is non-trivial.`, + pushViolation("VONNEGUT", copyChanges.length, formatList( + "BLATHER REVIEW: page.tsx or page.logged-out.md changed. Run `/qa-editorial` (fires voice-critic + cold-stranger-ux + test-auditor in parallel; returns one SHIP / NEEDS FIXES punch list).", copyChanges, "", - )); + ), { blocking: false }); } // --- Check 7c: stupid tests gate ---------------------------------------- if (testFiles.length) { - violations.push(formatList( + pushViolation("STUPID_TEST", testFiles.length, formatList( `STUPID TEST GATE: test file(s) changed. For each new test: name the bug it would catch. If you can't, delete it. No tests for symmetry, documentation, or to silence a bot.`, testFiles, "", - )); + ), { blocking: false }); } // --- Check 7d: reuse / no-duplication gate ------------------------------ if (newReusableFiles.length) { - violations.push(formatList( - `REUSE GATE: new file(s) under components/ or lib/. Before commit: grep for an existing component/ -function that does the same job. Don't duplicate. Don't add an abstraction nobody extends yet. -Recent miss: org-context-token (full HMAC system for no real threat — should have trusted the URL slug).`, + pushViolation("REUSE", newReusableFiles.length, formatList( + "REUSE GATE: new file under components/ or lib/. Grep for an existing component/function that does the same job before committing. Don't add an abstraction nobody extends yet.", newReusableFiles, "", - )); + ), { blocking: false }); } // --- Emit --------------------------------------------------------------- if (!violations.length) process.exit(0); - const banner = `[verify-ui-changes hook] ${violations.length} gate(s) failed. Address each before stopping:`; - const body = violations.join("\n\n"); + const blocking = violations.filter((v) => v.blocking); + const advisory = violations.filter((v) => !v.blocking); + + if (isCommitAttempt) { + // Pre-commit: full detail. Blocking violations fail the commit; + // advisory ones print as reminders but exit 0 so the commit proceeds. + const sections = []; + if (blocking.length) { + sections.push( + `[verify-ui-changes] ${blocking.length} BLOCKING gate(s) failed — address each, then re-commit:\n\n${blocking.map((v) => v.message).join("\n\n")}`, + ); + } + if (advisory.length) { + sections.push( + `[verify-ui-changes] ${advisory.length} advisory gate(s) — reminders, NOT blocking the commit:\n\n${advisory.map((v) => v.message).join("\n\n")}`, + ); + } + process.stderr.write(`${sections.join("\n\n---\n\n")}\n`); + process.exit(blocking.length ? 2 : 0); + } + + // Stop hook: one-line summary. Marks advisory gates with [*] so the + // reader knows they're informational. + const summary = violations + .map((v) => `${v.name}(${v.count})${v.blocking ? "" : "*"}`) + .join(", "); process.stderr.write( - `${banner}\n\n${body}\n\n(Commit clears the gate, but only commit AFTER you have addressed each violation above. For each NON-OBVIOUS fix, present 2-3 options via AskUserQuestion — recommendation first + one-line reason — before refactoring. Skip asking only when the fix is mechanical/obvious (typo, missing import, formatter). Calibrate first, then act; don't refactor then ask.)\n`, + `[verify-ui-changes] ${blocking.length} blocking + ${advisory.length} advisory* gate(s): ${summary}. Run 'git commit' for per-file detail.\n`, ); - process.exit(2); + // Stop hook: exit 0 if no blocking violations (advisory-only is just + // FYI, shouldn't block stopping). Exit 2 only when something real + // would fail. + process.exit(blocking.length ? 2 : 0); } catch { // Fail-open. process.exit(0); diff --git a/.claude/safety-gate.mjs b/.claude/safety-gate.mjs new file mode 100644 index 000000000..03784685c --- /dev/null +++ b/.claude/safety-gate.mjs @@ -0,0 +1,105 @@ +#!/usr/bin/env node +import path from "node:path"; + +const mode = process.argv[2]; +const value = process.argv.slice(3).join(" "); + +function fail(message) { + console.error(message); + process.exit(2); +} + +function ok(message = "ok") { + console.log(message); + process.exit(0); +} + +function checkCommand(command) { + const text = command.trim(); + + // CodeRabbit/Claude review on PR #79 flagged that `safeDelete` previously + // ran before the blocked list, so a compound command like + // `rm -rf node_modules && rm -rf /` would `ok()` out on the first clause + // and the second clause's recursive-rm / DROP TABLE / force-push never got + // inspected. We now check blocked patterns FIRST. The build-artifact + // shortcut still exists but is gated on the WHOLE command containing no + // shell-separator-style continuations, so it cannot be used as a prefix to + // smuggle a second clause through. + const blocked = [ + [/\brm\s+(-[^\s]*r[^\s]*|-.*recursive)/i, "recursive delete"], + [/\bRemove-Item\b[\s\S]*\b-Recurse\b/i, "recursive delete"], + [/\bDROP\s+(TABLE|DATABASE)\b/i, "database drop"], + [/\bTRUNCATE\b/i, "database truncate"], + [/\bgit\s+reset\s+--hard\b/i, "hard reset"], + [/\bgit\s+(checkout|restore)\s+\.\b/i, "discarding worktree changes"], + [/\bgit\s+push\b[\s\S]*(--force|-f)\b/i, "force push"], + [/\bkubectl\s+delete\b/i, "kubernetes delete"], + [/\bdocker\s+(rm\s+-f|system\s+prune)\b/i, "destructive docker cleanup"], + ]; + + // Single `&` (background), newline, `&&`, `||`, `;`, and `|` all chain a + // following command. The carve-out for `rm -rf node_modules` only applies + // to single-clause commands — never compound. + const hasShellSeparator = + /(?:&&|\|\||;|\n|\|(?!\|)|&(?!&))/.test(text); + const safeDelete = /\b(remove-item|rm)\b[\s\S]*(node_modules|\.next|dist|build|\.turbo|coverage|__pycache__|\.cache)\b/i; + const isSafeCleanupSingleClause = + safeDelete.test(text) && !hasShellSeparator; + + for (const [pattern, label] of blocked) { + if (pattern.test(text)) { + // Carve-out: standalone "rm -rf node_modules" stays allowed. Compound + // commands containing && / || / ; / | are never carved out, even when + // their first clause looks safe. + if (label === "recursive delete" && isSafeCleanupSingleClause) continue; + fail(`Safety gate: ${label}. Get explicit human approval before running:\n${text}`); + } + } + + if (isSafeCleanupSingleClause) ok("safe cleanup command"); + ok("command allowed"); +} + +function checkPath(targetPath) { + const root = process.env.CLAUDE_FREEZE_DIR; + if (!root) ok("no freeze boundary set"); + + const base = path.resolve(root) + path.sep; + const target = path.resolve(targetPath); + if (target === path.resolve(root) || target.startsWith(base)) ok("path inside freeze boundary"); + + fail(`Freeze gate: ${target} is outside ${base}`); +} + +// Hook mode: invoked by Claude Code's PreToolUse:Bash hook. Reads stdin JSON +// of the form { tool_name, tool_input: { command, ... } } and checks the +// extracted command via the same checkCommand path. +if (mode === "hook") { + let raw = ""; + try { + raw = await new Promise((resolve, reject) => { + let buf = ""; + process.stdin.setEncoding("utf8"); + process.stdin.on("data", (chunk) => (buf += chunk)); + process.stdin.on("end", () => resolve(buf)); + process.stdin.on("error", reject); + }); + } catch { + process.exit(0); + } + if (!raw || !raw.trim()) process.exit(0); + try { + const hookData = JSON.parse(raw); + const cmd = hookData?.tool_input?.command; + if (typeof cmd === "string" && cmd.length > 0) checkCommand(cmd); + } catch { + // Malformed JSON or non-Bash invocation — fail open. + } + process.exit(0); +} + +if (mode === "command") checkCommand(value); +if (mode === "path") checkPath(value); + +console.error("Usage: node .claude/safety-gate.mjs command \"\" | path \"\" | hook "); +process.exit(64); diff --git a/.claude/settings.json b/.claude/settings.json index 4a145763a..47e1de82b 100644 --- a/.claude/settings.json +++ b/.claude/settings.json @@ -14,21 +14,29 @@ { "matcher": "Bash", "hooks": [ + { + "type": "command", + "command": "node .claude/safety-gate.mjs hook", + "timeout": 3000 + }, { "type": "command", "command": "node .claude/hooks/pre-commit-checklist.mjs", "timeout": 10000 + }, + { + "type": "command", + "command": "node .claude/hooks/codex-dispatch-blather.mjs", + "timeout": 3000 } ] - } - ], - "Stop": [ + }, { + "matcher": "Skill", "hooks": [ { "type": "command", - "command": "node .claude/hooks/verify-ui-changes.mjs", - "timeout": 8000 + "command": "\"$CLAUDE_PROJECT_DIR/.claude/hooks/check-gstack.sh\"" } ] } @@ -43,6 +51,17 @@ } ] } + ], + "UserPromptSubmit": [ + { + "hooks": [ + { + "type": "command", + "command": "node .claude/hooks/surprise-signal.mjs", + "timeout": 5000 + } + ] + } ] } } diff --git a/.claude/skills/qa-editorial.md b/.claude/skills/qa-editorial.md new file mode 100644 index 000000000..58b807f14 --- /dev/null +++ b/.claude/skills/qa-editorial.md @@ -0,0 +1,92 @@ +--- +name: qa-editorial +description: Project-specific editorial audit that runs AFTER gstack's auto-fix chain (/design-review, /qa, /cso). Fires voice-critic, cold-stranger-ux, and test-auditor in parallel and consolidates findings. Catches Wishonia-voice violations, manual-quote-overlay opportunities, and parameter-citation gaps that gstack's generic skills can't see. +user_invocable: true +--- + +# /qa-editorial — project-specific layer on top of gstack's auto-fix chain + +When the user types `/qa-editorial`, run the project's editorial + UX critics over the current branch state and report a single consolidated punch list. This is **NOT** the gstack `/qa` (functional browser bug-hunt + auto-fix). This is the Wishonia-voice / treaty-editorial / project-specific layer that gstack can't do because gstack is generic. + +## Where this fits in the pre-merge chain + +Run the gstack chain FIRST, then this: + +1. `/review` (gstack) — diff vs base, structural issues +2. `/design-review` (gstack) — visual slop, auto-fix + commit +3. `/qa` (gstack) — functional bug-hunt, auto-fix + commit +4. `/cso` (gstack) — OWASP + STRIDE security +5. **`/qa-editorial` (this skill)** — Wishonia voice, manual-search, parameter-coverage, cold-stranger-UX +6. `/ship` (gstack) — open PR + +## What to fire (in parallel) + +Three subagents IN PARALLEL via the `Agent` tool: + +1. **`voice-critic`** — Copy critique against project voice rules. Catches startup-bro phrasing, banned vocabulary (engage, empower, off-ramp, primitive), tautological hints under headings, adjective stacks with no number, Stripe-keynote sentences. **Required to call `mcp__optimitron-tasks__searchManual`** before proposing replacement copy, and to grep `parameters-calculations-citations.ts` for every hardcoded user-facing number. If MCP isn't wired, fall back to `curl https://manual.warondisease.org/assets/json/search-index.json` — same content, no auth. +2. **`cold-stranger-ux`** — Zero-context first-time reader reaction. Drives a real browser at iPhone-14 viewport, takes screenshots, reacts in plain English. Catches confusing UX, missing case-for-action, would-bail moments. +3. **`test-auditor`** — Test suite slop + missing coverage. Catches "tests added for symmetry / documentation / to silence a bot" and missing regression tests for fixed bugs. + +Skip rules: +- No `.tsx` / `.md` content changed → skip `voice-critic`. +- No UI changed → skip `cold-stranger-ux`. +- No test files changed → skip `test-auditor`. + +Visual slop, OWASP/STRIDE, and root-cause analysis are NOT this skill's job — gstack's `/design-review`, `/cso`, and `/investigate` already cover those (and auto-fix). Run those first. + +## Scope each invocation + +Before dispatching, run `node packages/web/scripts/affected-routes.mjs` to enumerate the routes whose page.tsx files import the changed components. Pass that route list to `cold-stranger-ux` so it doesn't drive the whole site — only the surfaces that actually moved. + +For `voice-critic`, scope to the changed `.tsx` files + their regenerated `.md` snapshots. + +## Output format + +After all return, produce ONE numbered punch list: + +``` +## /qa-editorial findings on (HEAD ) + +### Voice (N findings) +1. : + +### Cold-stranger UX (N findings) +1. : + +### Tests (N findings) +1. : + +### Verdict +SHIP / NEEDS FIXES BEFORE COMMIT / NEEDS USER DECISION + +If NEEDS FIXES: which findings are deal-breakers vs. nice-to-have? +If NEEDS USER DECISION: what specifically is the call to make? +``` + +## When NOT to run /qa-editorial + +Don't run on commits that only touch: +- `.claude/`, `.codex/`, `.husky/` (meta-config) +- `CLAUDE.md`, `TODO.md`, `AGENTS.md` (docs) +- `packages/web/scripts/`, `packages/web/e2e/` (tooling) +- Pure dependency bumps in `package.json` / `pnpm-lock.yaml` + +For everything else — yes, run it before committing. + +## What this skill explicitly does NOT do + +- Doesn't write code. +- Doesn't commit. +- Doesn't auto-fix findings — surfaces them and waits for the user's call. (gstack's `/design-review` and `/qa` DO auto-fix; this skill is the editorial layer that runs after.) +- Doesn't replicate gstack's generic checks — voice, manual-search, and cold-stranger UX are the differentiated value here. + +## Standard pre-commit ritual + +1. Make the change. +2. Run `pnpm --filter @optimitron/web copy:preview -- --routes=$(node packages/web/scripts/affected-routes.mjs)` to regenerate affected snapshots. +3. Run gstack's chain: `/review` → `/design-review` (auto-fixes visual slop) → `/qa` (auto-fixes functional bugs) → `/cso` (security). +4. Run `/qa-editorial` — the project-specific layer. +5. Fix any deal-breakers; mark hand-waves intentional with a one-line comment in the commit. +6. Commit with `qa-passed:` line. + +If `/qa-editorial` returns "SHIP" verdict and the pre-commit hooks pass, the change is ready. diff --git a/.codex/agents/cold-stranger-ux.toml b/.codex/agents/cold-stranger-ux.toml new file mode 100644 index 000000000..3389da87c --- /dev/null +++ b/.codex/agents/cold-stranger-ux.toml @@ -0,0 +1,101 @@ +description = 'Reacts to the running local site AS A STRANGER who has never heard of the project, just got a text link from a friend, and has a 2-minute mobile attention span. Use when the user asks "what does a normal person think of this", "is this confusing", "audit the UX", or after meaningful UI/copy changes land on local dev. Does NOT read AGENTS.md, TODO.md, or any project docs. Drives a real browser via Playwright at iPhone-14 viewport, takes screenshots, reacts in plain English. Returns a punch list of bugs / confusion / would-bail moments per page.' +developer_instructions = ''' +You are role-playing AS A REGULAR PERSON. Your friend Mike just texted you a link. That is ALL you know. + +You have **NEVER HEARD OF:** + +- "war on disease" +- "the 1% treaty" +- "Wishonia" +- "Optimitron" +- Mike's politics or what he cares about + +You are on your iPhone in line at a coffee shop. Mildly curious, 2-minute attention span max. + +# Hard rules + +1. **Do NOT read AGENTS.md, TODO.md, AGENTS.md, README.md, the manual, the QMDs, or any project docs.** You are a stranger. Reading them contaminates your judgment. +2. **Do NOT read source code unless you need to confirm a specific bug exists** (e.g., "is the submit button rendered but offscreen, or not rendered at all?"). The point is FIRST IMPRESSION, not code archaeology. +3. **Target local dev (`http://localhost:3001`)** by default. That's the most up-to-date version. Targeting production means complaining about bugs already fixed on the branch — wasted compute. +4. **Use iPhone 14 viewport via Playwright** (already installed as a dev dep in `packages/web`). +5. **React, don't analyze.** Write like a person texting back: "wtf is this asking me to do?", not "the user might experience cognitive friction with the call-to-action." + +# Tooling + +Playwright via Bash CLI (no MCP needed). Write a Node script and run: + +```bash +mkdir -p E:/code/optimitron/packages/web/output/cold-stranger +cd E:/code/optimitron/packages/web && pnpm exec node -e " +const { chromium, devices } = require('playwright'); +(async () => { + const browser = await chromium.launch(); + const ctx = await browser.newContext({ ...devices['iPhone 14'] }); + const page = await ctx.newPage(); + await page.goto('http://localhost:3001/', { waitUntil: 'domcontentloaded' }); + await page.waitForLoadState('networkidle').catch(() => {}); + await page.screenshot({ path: 'output/cold-stranger/01-landing-above-fold.png' }); + await page.screenshot({ path: 'output/cold-stranger/02-landing-full.png', fullPage: true }); + // ... scroll, click, type, more screenshots + await browser.close(); +})(); +" +``` + +Read screenshots back with the `Read` tool (PNG support) and react to what you SEE. Don't trust the URL or page title to tell you what's there. + +# Journey (default — parent can override) + +1. **Landing on `http://localhost:3001/`** — above-fold + full-page screenshot. What does this site want from me in the first 2 seconds? Confused / intrigued / annoyed / leaving? Where would I tap? +2. **Follow the most prominent CTA.** Whatever looks most tappable to a stranger. Screenshot wherever you land. +3. **Try `/vote`** — drag the slider, see what happens. Is the submit button visible after release? Does anything explain *why* I'm voting? +4. **Try `/treaty`** — readable on phone? Body legible or tiny? Would I sign something I just read? +5. **Try `/donate`** — does the calculator make sense or is it math homework? Do the numbers feel grounded or made-up? +6. **Try `/signatories`** — does seeing other signers make me trust this more or less? + +# Specifically watch for + +- Jargon that means nothing without context (RAPPA, OPG, OBG, HALE, "the 1% treaty" used as a known referent, "Wishonia", parameter names rendered as visible UI text) +- CTAs that don't tell me what they do ("Engage", "Take ownership", "Get started") +- Walls of text on a phone before I can do the thing +- Submit/primary-action buttons hidden below the fold on mobile (`/vote` slider in particular) +- Numbers presented without source ("102 million people died waiting" — is that real or made up?) +- Pages that look like a corporate dashboard instead of a campaign +- Anything that screams "made by tech bros" rather than "designed for humans" + +# Output + +Save the full report to `E:\code\optimitron\packages\web\output\cold-stranger\REPORT.md`. + +Each section formatted like: + +``` +## + +[3-4 sentence first impression as the stranger] + +### Bugs +- [bug 1, plain English] +- [bug 2] + +### Confusion +- [thing 1 that confused me] + +### Would-bail moments +- [moment 1] +``` + +End the report with a **Top 3 fix-this-now** list ranked by likelihood of losing the visitor. + +Return a ≤300-word summary to the parent agent (don't dump the full report into the reply — that's what the file is for). + +# What you are NOT for + +- Code review or fix suggestions ("you should refactor X") +- Voice critique against the project's specific style rules (that's `voice-critic`) +- Test audits (that's `test-auditor`) +- Architecture takes +- Suggesting features + +Just react like a stranger. Identify the bugs a 2-minute mobile visitor would hit. Stop.''' +name = "cold-stranger-ux" diff --git a/.codex/agents/test-auditor.toml b/.codex/agents/test-auditor.toml new file mode 100644 index 000000000..75b449efe --- /dev/null +++ b/.codex/agents/test-auditor.toml @@ -0,0 +1,126 @@ +description = """Audits the codebase's test suite for stupid/flaky/wasteful tests AND identifies critical untested paths. Returns a delete list (with reasons) and an add list (with the specific code path that needs coverage). Use when the user asks "audit the tests", "any stupid tests we should delete?", "what's flaky?", or before a major refactor where dead tests will be load-bearing in the diff.""" +developer_instructions = ''' +You are the test-auditor agent. You walk the test suite with two questions: + +1. **Which tests are wasteful** (delete-on-sight per AGENTS.md), and +2. **Which load-bearing code paths have no test** at all. + +You do NOT write tests yourself. You output two lists with specific file:line citations so the parent agent can act. + +## Before you audit: read TODO.md + +Grep `TODO.md` for entries that name areas you're about to audit (e.g. "migrate referendums to managed-data", "split tests when X lands"). Tests guarding code that's about to be deleted / migrated are NOT slop — they're load-bearing for the migration. Flag those with "keep until " instead of "delete." Skip listing "missing coverage" for paths that the team has already decided to refactor away. + +```bash +grep -i -E "|test|coverage" TODO.md +``` + +# The "delete on sight" rubric + +Per AGENTS.md `Testing Rules (non-negotiable)`. A test should be DELETED when: + +- **Mocks the entire surface it's supposedly testing.** `vi.mock("./notifyTaskAssignee"); expect(notifyTaskAssignee).toHaveBeenCalled();` only verifies the mock can be called. +- **Passthrough wrapper tests.** `export const buildPostVoteShareMessageText = (url) => buildShareMessage(url);` followed by tests that assert content from `buildShareMessage` — those tests belong next to `buildShareMessage`, not its one-line re-export. +- **Constant-equality tests.** `expect(TEMPLATE_ID).toBe("post-vote-share")` restates the declaration; it only fails when someone intentionally renames. +- **Implementation-transcription tests.** Tests that line-up the assertion order to the function body. Refactor-fragile, change-amplifying, signal-free. +- **Snapshot/markup tests** for UI that doesn't have a behavioral contract beyond "looks right." Visual review catches that. +- **Tests added "for symmetry"** with a similar test elsewhere when the matching code is trivial. +- **Tests gated on real wall-clock / `Math.random` / network / DB row order without `orderBy`** — flaky by construction. +- **Tests that need `retry` or `sleep` to pass.** + +# The "this needs a test" rubric + +A test should be ADDED when: + +- **Pure functions with fallback / branching logic** (helpers, parsers, formatters, selectors) — and there's a path that isn't covered. +- **State transitions inside a `$transaction`** or multi-step DB writes. +- **Boundary conversions** (Prisma row → DTO, OAuth profile → User, session → client) — verify shape + null-handling. +- **Regression fixes shipped without a failing-then-passing test.** Search `git log --oneline -- src/lib/` for "fix" / "bug" commits and check whether the corresponding test file changed in the same commit. If not, the regression is unguarded. +- **Critical user paths** with no smoke test: signup, sign-treaty, claim-task, share-link, magic-link send. + +# How to operate + +## Step 1: Enumerate test files + +```bash +find packages -name "*.test.ts" -o -name "*.test.tsx" | grep -v node_modules +``` + +## Step 2: Quick-scan for the obvious slop patterns + +```bash +# Constant-equality assertions: +grep -rn "expect(.*_ID\|_SUBJECT\|_TEMPLATE).toBe(" packages --include="*.test.ts" --include="*.test.tsx" + +# Mock-and-check-the-mock: +grep -rln "vi\.mock\|vi\.fn(" packages --include="*.test.ts" --include="*.test.tsx" | xargs grep -l "toHaveBeenCalled" + +# Passthrough function tests (testing a function that's a one-line re-export): +# Heuristic: a test file that imports only ONE function from a module +# where that module's source is fewer than 5 non-trivial lines. + +# Wall-clock dependence: +grep -rn "new Date()\|Date\.now()" packages --include="*.test.ts" --include="*.test.tsx" | grep -v "vi\.setSystemTime\|now =" + +# Sleep / retry / waitFor with arbitrary timeouts: +grep -rn "setTimeout\|sleep(\|retry(.*[0-9]" packages --include="*.test.ts" --include="*.test.tsx" +``` + +For each hit, READ the surrounding test to confirm it's actually wasteful (the grep is noisy; you have to look). Skip false positives. + +## Step 3: Find flaky tests in CI history + +```bash +gh run list --workflow CI --status failure --limit 30 --json databaseId,headSha,conclusion +``` + +For each failed run, look at the failed-step logs. Tests that appear multiple times across distinct PRs with `ECONNRESET`, timeout, or "expected … to equal …" with values that almost-match — those are flaky. + +```bash +gh run view --log-failed | grep -iE "fail|error" | grep -v "0 error\|ignored" +``` + +Cross-reference: if a test fails on a re-run of the same SHA but passes on a different SHA, it's environment-flaky. Flag. + +## Step 4: Find untested load-bearing code + +For each `src/lib/` and `src/app/api/` file, check whether there's a co-located `.test.ts`. If not, READ the file and decide whether it's load-bearing (state transitions, boundary conversions, regression risk) or trivial (re-exports, type definitions). + +Specifically check the critical user flows: + +- `src/app/api/auth/**` — sign-in, magic-link, OAuth callbacks +- `src/app/api/referendums/[slug]/vote/route.ts` — already has tests, verify scope +- `src/app/api/tasks/**` — claim, complete, comment +- `src/lib/email/**` — every triggered email's send path +- `src/lib/tasks/**` — task assignment + notification side effects + +## Step 5: Output + +Two lists, in this exact format: + +```text +## Delete (N tests) + +1. `:` — +2. … + +## Add (M tests) + +1. `` — +2. … + +## Flaky (K tests) + +1. `:` — +2. … +``` + +End with: "Run `pnpm --filter @optimitron/web test` after applying the deletes. Verify total test count drops by N and the suite stays green." + +# What you are NOT for + +- Writing tests. Return the add list; the parent agent writes them. +- Mass-deleting "to reduce test count." Each delete needs a specific rubric reason. +- Removing tests that catch real regressions just because they're verbose. +- Judging tests outside `packages/`. Stay in the project.''' +name = "test-auditor" diff --git a/.codex/agents/voice-critic.toml b/.codex/agents/voice-critic.toml new file mode 100644 index 000000000..adc08fe43 --- /dev/null +++ b/.codex/agents/voice-critic.toml @@ -0,0 +1,63 @@ +description = "Critiques user-facing copy and UI for the optimitron / warondisease.org codebase against the project's voice rules + reuse-first conventions. Spawn after any change that touches `src/app/**/page.tsx`, `*ShareCard*`, `*SignatureBox*`, nav labels in `routes.ts`, or any other user-facing copy. Returns a numbered punch list of things to fix or explicitly mark intentional. Does not write code." +developer_instructions = """ +You are the voice-critic for the optimitron / warondisease.org codebase. You critique like an unsentimental reviewer. You do not write code. + +# The goal + +Read the rendered copy as a stranger who hits this page after a friend texts them the link. Two-minute attention span on a phone. + +Does the copy: + +1. **Reach them?** Could a 5th grader read it without a dictionary? Is the primary action above the fold? Is the same idea said only once? +2. **Cite its claims?** Every user-facing number traces back to a real source (via `` or an inline citation), and the math has been done somewhere a reader could find. +3. **Sound like Wishonia + Kurt Vonnegut?** Deadpan, data-first, plain declaratives, sardonic comparisons. Not a Stripe keynote, not a corporate-onboarding flow, not a moral aphorism in lieu of a fact. +4. **Keep momentum?** After a YES action, the next step renders inline. No "open the dashboard to find X" punts. +5. **Reuse what exists?** New components are flagged unless the user explicitly wants a divergence from existing equivalents. + +If the answer to all five is yes, you have no violations to report. Say so. + +# How to flag + +Each finding is a hypothesis until you've verified it. **Before claiming a violation, read the source of whatever you're judging:** + +- "Number isn't using ``" → grep `parameters-calculations-citations.ts` to confirm a matching parameter exists. If no parameter exists, the fix is to add one, not to wrap nothing. +- "`` defeats the component" → read `components/shared/ParameterValue.tsx` first. `valueOverride` is the INTENDED API for attaching the citation popover while controlling display text. Not a violation. +- "Duplicate component" → grep for the existing component, confirm it has the same shape. Different responsibilities ≠ duplicate. +- "Banned phrase" → confirm the phrase actually appears in user-facing rendered text (not a comment, not a test fixture, not a variable name). + +If you can't confirm by reading the source, DROP the finding or label it explicitly: *"agent's read, not verified — confirm before acting."* + +# Required checks for every copy block you review + +These run regardless of which smell first caught your attention. + +1. **Manual-search before suggesting new copy.** If you're proposing replacement wording for any user-facing string, first call `mcp__optimitron-tasks__searchManual` with the topic phrase and check whether the manual already has a sharper version we should steal. The manual is the source of truth for voice — quoting from it beats inventing fresh prose. If the manual has nothing usable, say so explicitly in the finding so the reader knows you checked. +2. **Parameter coverage for every number.** For every hardcoded user-facing number in the changeset (digits, percentages, multipliers, dollar amounts, year counts), grep `packages/data/src/parameters/parameters-calculations-citations.ts` and the wider `packages/data/src/parameters/` directory for an existing parameter. If one exists and the JSX uses a raw literal instead of ``, flag it with the parameter ID. If no parameter exists yet, flag whether a new parameter is warranted (cited statistics warrant one; arithmetic identities like "2² = 4" do not). + +# Common smells (use as hypotheses to investigate, not as automatic verdicts) + +- Corporate-onboarding verbs in copy: *Take ownership*, *Engage*, *Empower*, *Unlock*, *Streamline*, *Get started*, *Take this on*, *Activate*. +- Infrastructure metaphors: *stack*, *rails*, *off-ramp*, *primitive*, *substrate*. +- Empty-mechanism phrases: *incentive layer*, *the protocol that…*, *fundamentally*. +- Corporate openers: *We're building*, *Let's take a moment*, *We're excited to*. +- Hand-off copy: *The dashboard has X*, *Find more on the Y page*. +- Sentences that could appear unchanged in a Stripe keynote. +- The same idea repeated across eyebrow → H1 → subtitle → drop-cap. +- Plaintext numbers in user-facing JSX with no `` wrapper at all. +- `figures={1}` or `figures={2}` on a calculator page (the donate page floor is 3). + +# Output + +Numbered list. Each item: one-sentence finding, file/line, and the actual fix (specific, not "improve"). + +End with: *"Address these or mark intentional. Items marked intentional without justification should not be marked intentional."* + +# What you are NOT for + +- Design judgment (is a Yes/No button better than a name input?) +- Architecture decisions (where should X live?) +- Picking between valid product designs +- Writing code + +Reference voice (when in doubt): *"Singapore spends a quarter of what America spends on healthcare and their people live six years longer. It's like watching someone pay four times more for a worse sandwich and then insist sandwiches are impossible."*""" +name = "voice-critic" diff --git a/.codex/config.toml b/.codex/config.toml new file mode 100644 index 000000000..df2787a72 --- /dev/null +++ b/.codex/config.toml @@ -0,0 +1,20 @@ +model = "gpt-5.5" +model_reasoning_effort = "xhigh" + +approval_policy = "never" +sandbox_mode = "workspace-write" + +# MCP server entries removed 2026-05-14: when the local dev server on 3001 or +# the spawned pnpm MCP subprocess is unreachable / wedged, Codex hangs on the +# handshake at startup ("thinking forever" in the VS Code extension). Re-add +# the entries below once you have a stable dev environment. +# +# [mcp_servers.optimitron-local] +# url = "http://localhost:3001/api/mcp" +# +# [mcp_servers.optimitron-tasks] +# args = ["--filter", "@optimitron/web", "exec", "tsx", "scripts/mcp-task-server.ts"] +# command = "pnpm" +# +# [mcp_servers.optimitron-tasks.env] +# MCP_USER_EMAIL = "m@thinkbynumbers.org" diff --git a/.env.example b/.env.example index e54d40bbe..d44bd412d 100644 --- a/.env.example +++ b/.env.example @@ -8,7 +8,6 @@ DATABASE_URL=postgresql://postgres:postgres@localhost:5432/optimitron # Web auth/app URLs # For local dev with packages/web on port 3001. NEXTAUTH_URL=http://localhost:3001 -NEXT_PUBLIC_BASE_URL=http://localhost:3001 # Generate locally, for example: `openssl rand -base64 32` NEXTAUTH_SECRET=replace-this-with-a-long-random-secret diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 8c1385fe0..f23212a22 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -101,7 +101,6 @@ jobs: DATABASE_URL: postgresql://postgres:postgres@localhost:5432/optimitron_web_ci NEXTAUTH_SECRET: test-secret-minimum-32-characters-long-for-validation NEXTAUTH_URL: http://127.0.0.1:3001 - NEXT_PUBLIC_BASE_URL: http://127.0.0.1:3001 VERCEL_CLI_VERSION: 50.37.3 services: postgres: @@ -545,6 +544,144 @@ jobs: packages/web/public/img/screenshots if-no-files-found: ignore + sync-preview-managed-data: + name: sync-preview-managed-data + if: github.event_name == 'pull_request' + needs: web-validate + runs-on: ubuntu-latest + timeout-minutes: 12 + environment: + name: Preview + permissions: + contents: read + env: + CI: true + VERCEL_CLI_VERSION: 50.37.3 + VERCEL_ORG_ID: ${{ vars.VERCEL_ORG_ID }} + VERCEL_PROJECT_ID: ${{ vars.VERCEL_PROJECT_ID }} + PREVIEW_GIT_BRANCH: ${{ github.head_ref }} + + steps: + - name: Checkout + uses: actions/checkout@v6 + with: + submodules: recursive + + - name: Check Preview Vercel token + id: preview_vercel_token + env: + VERCEL_TOKEN: ${{ secrets.VERCEL_TOKEN }} + shell: bash + run: | + if [ -n "$VERCEL_TOKEN" ]; then + echo "configured=true" >> "$GITHUB_OUTPUT" + exit 0 + fi + + echo "configured=false" >> "$GITHUB_OUTPUT" + echo "::warning::Preview managed-data sync skipped because the GitHub Preview environment is missing VERCEL_TOKEN." + { + echo "## Preview database sync" + echo + echo "- Skipped: add the \`VERCEL_TOKEN\` secret to the GitHub \`Preview\` environment so CI can pull the branch-specific Vercel preview database env." + } >> "$GITHUB_STEP_SUMMARY" + + - name: Enable Corepack + if: steps.preview_vercel_token.outputs.configured == 'true' + run: | + corepack enable + corepack prepare pnpm@8.14.0 --activate + + - name: Setup Node.js + if: steps.preview_vercel_token.outputs.configured == 'true' + uses: actions/setup-node@v6 + with: + node-version: 24 + cache: pnpm + + - name: Install dependencies + if: steps.preview_vercel_token.outputs.configured == 'true' + run: pnpm install --frozen-lockfile + + - name: Pull Vercel preview settings + if: steps.preview_vercel_token.outputs.configured == 'true' + env: + VERCEL_TOKEN: ${{ secrets.VERCEL_TOKEN }} + run: pnpm dlx vercel@${VERCEL_CLI_VERSION} pull --yes --environment=preview --git-branch "$PREVIEW_GIT_BRANCH" --token "$VERCEL_TOKEN" + + - name: Export preview database env + if: steps.preview_vercel_token.outputs.configured == 'true' + shell: bash + run: | + node <<'NODE' + const fs = require("node:fs"); + const envPath = ".vercel/.env.preview.local"; + + function parseEnv(contents) { + const entries = {}; + for (const rawLine of contents.split(/\r?\n/u)) { + const line = rawLine.trim(); + if (!line || line.startsWith("#")) continue; + const separatorIndex = line.indexOf("="); + if (separatorIndex <= 0) continue; + + const key = line.slice(0, separatorIndex).trim(); + let value = line.slice(separatorIndex + 1).trim(); + if ( + (value.startsWith('"') && value.endsWith('"')) || + (value.startsWith("'") && value.endsWith("'")) + ) { + value = value.slice(1, -1); + } + entries[key] = value; + } + return entries; + } + + const parsed = parseEnv(fs.readFileSync(envPath, "utf8")); + if (!parsed.DATABASE_URL) { + throw new Error(`Vercel preview env did not include DATABASE_URL in ${envPath}.`); + } + + for (const key of ["DATABASE_URL", "DATABASE_URL_UNPOOLED"]) { + const value = parsed[key]; + if (!value) continue; + console.log(`::add-mask::${value}`); + const delimiter = `managed_data_${key}_${Math.random().toString(36).slice(2)}`; + fs.appendFileSync( + process.env.GITHUB_ENV, + `${key}<<${delimiter}\n${value}\n${delimiter}\n`, + ); + } + + const target = new URL(parsed.DATABASE_URL); + console.log(`Loaded Vercel preview database env for ${target.hostname}.`); + NODE + + - name: Apply preview database migrations + if: steps.preview_vercel_token.outputs.configured == 'true' + env: + PRISMA_SCHEMA_DISABLE_ADVISORY_LOCK: "1" + run: pnpm db:deploy + + - name: Sync preview managed data + if: steps.preview_vercel_token.outputs.configured == 'true' + env: + MANAGED_DATA_ALLOW_REMOTE_APPLY: "1" + run: pnpm db:sync:managed-data -- --apply + + - name: Summarize preview database sync + if: steps.preview_vercel_token.outputs.configured == 'true' + shell: bash + run: | + { + echo "## Preview database sync" + echo + echo "- Pulled Vercel preview env for \`$PREVIEW_GIT_BRANCH\`." + echo "- Applied Prisma migrations to the branch preview database." + echo "- Synced managed data with idempotent upserts." + } >> "$GITHUB_STEP_SUMMARY" + deploy-production: name: deploy-production if: (github.event_name == 'push' || github.event_name == 'workflow_dispatch') && github.ref == 'refs/heads/main' @@ -627,6 +764,7 @@ jobs: # referendums, demo fixtures, and trigger blueprints. env: DATABASE_URL: ${{ secrets.DATABASE_URL }} + MANAGED_DATA_ALLOW_REMOTE_APPLY: "1" run: pnpm db:sync:managed-data -- --apply - name: Deploy production artifact diff --git a/.gitignore b/.gitignore index 50b405797..6a084da1a 100644 --- a/.gitignore +++ b/.gitignore @@ -6,8 +6,13 @@ node_modules/ dist/ *.tsbuildinfo .next/ +.next.stale-*/ next-env.d.ts +# Dev server log (pre-warmed by Claude with stdout/stderr redirection) +.dev-server.log +packages/*/.dev-server.log + # IDE .idea/ .vscode/ @@ -91,4 +96,9 @@ packages/web/public/media/ .vercel .claude/scheduled_tasks.lock .codex-logs/dashboard-dev.pid -.tmp-pr-comments.json +.tmp-* +# Codex CLI session scratch — one-shot prompt files + diagnostic scripts. +# `.codex/config.toml` IS committed; everything else under .codex/ is local. +.codex/prompt-*.txt +.codex/*-test.mjs +.codex/last-commit-message.txt diff --git a/.husky/pre-commit b/.husky/pre-commit index 26d96bac8..383d0b180 100644 --- a/.husky/pre-commit +++ b/.husky/pre-commit @@ -3,18 +3,15 @@ set -e pnpm exec lint-staged -# === Copy preview markdown regen === -if git diff --cached --name-only -- packages/web/src packages/web/public packages/web/scripts/render-pages-to-markdown.ts packages/web/package.json | grep -q .; then - if ! node -e "fetch('http://127.0.0.1:3001/api/auth/csrf',{signal:AbortSignal.timeout(2000)}).then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))"; then - echo "Copy preview markdown needs the local web server at http://127.0.0.1:3001." - echo "Start or reuse it with: pnpm --filter @optimitron/web dev:fast" - exit 1 - fi - - pnpm --filter @optimitron/web run copy:preview -- --no-authed - if ! git diff --quiet -- packages/web/src/app/page.logged-out.md 'packages/web/src/app/**/page.logged-out.md'; then - echo "Copy preview markdown was refreshed." - echo "Review the page.logged-out.md diffs against docs/h2ewd.md, stage them, then commit again." - exit 1 - fi +# === Copy preview markdown reminder (not auto-run) === +# Previously this block auto-ran `pnpm copy:preview` on every commit +# that touched packages/web/src — Playwright + 16 routes ≈ 30-60s, +# required the dev server, and failed the commit if any .md changed +# (forcing a re-stage + re-commit cycle). Total cost: 1-2 minutes per +# UI commit. Moved to a manual step; CI catches drift at PR time. +if git diff --cached --name-only -- packages/web/src/app 'packages/web/src/lib/email' 'packages/web/src/lib/tasks' | grep -qE '\.(tsx?|md)$'; then + echo "→ Reminder: if user-facing copy or email templates changed, run:" + echo " pnpm --filter @optimitron/web copy:preview" + echo " pnpm --filter @optimitron/web email:preview-md" + echo " (CI will fail if .md snapshots drift from rendered output.)" fi diff --git a/CLAUDE.md b/CLAUDE.md index 8fc7f2e0c..bff66a2fe 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -6,7 +6,7 @@ Optimitron is an **Earth Optimization Machine** for coordinating 8 billion humans to maximize median healthy life-years and real median after-tax income. It connects pairwise preferences (RAPPA), outcome tracking (dFDA), causal inference, and optimal policy/budget generation into alignment software for governments — treated as misaligned superintelligences. -The current public campaign is the **International Campaign to End War and Disease** at `warondisease.org`. Until the 1% Treaty passes, that campaign is the product. `optimitron.com` is the operating system and proof engine behind it: tasks, referrals, communications, OPG/OBG/Wishocracy, politician grading, impact math, and AI-agent coordination. +The current public campaign is the **International Campaign to End War and Disease** at `warondisease.org`. Until the 1% Treaty passes, that campaign is the product. `optimitron.com` is the operating system and proof engine behind it. Default priority order during campaign mode: @@ -14,7 +14,7 @@ Default priority order during campaign mode: 2. Increase referral propagation: each voter gets two more humans to vote. 3. Get organizations to endorse, embed, and recruit their own people. 4. Register plaintiffs and connect the case framing to voting. -5. Pressure country leaders and treaty signers. +5. Remind country leaders and treaty signers. 6. Improve discoverability and trust in people, organization, task, and evidence pages. 7. Preserve Optimitron's broader governance OS as the proof layer, not as a competing homepage. @@ -24,7 +24,7 @@ Everything user-facing is narrated by **Wishonia** — _World Integrated System **Voice rules:** -- **Deadpan** — state horrifying facts as though they are mildly interesting observations. +- **Deadpan** — state horrifying facts as though mildly interesting observations. - **Data-first** — lead with specific numbers, costs, percentages, or ROI ratios. Numbers beat adjectives. - **Dry understatement, not outrage** — "It's almost like treating people like humans works better. Weird." - **Comparative** — contrast Earth's approach with what a rational civilisation would do. "On my planet..." @@ -32,18 +32,14 @@ Everything user-facing is narrated by **Wishonia** — _World Integrated System - **Sardonic analogies** — "It's like buying 4.7 million cars and spending $1 on a mechanic." - **Criticise the system, never a party.** The data does the work. -**Examples:** - -- "Singapore spends a quarter of what America spends on healthcare and their people live six years longer. It's like watching someone pay four times more for a worse sandwich and then insist sandwiches are impossible." -- "Your FDA makes treatments wait 8.2 years AFTER they've already been proven safe. Just... sitting there. Being safe. While 102 million people died waiting." -- "On my planet, governance takes about four minutes a week." - -**No startup-bro copy.** No infrastructure metaphors (stack, rails, off-ramp, primitive, substrate), empty mechanism vocabulary (incentive layer, the protocol that, fundamentally), or corporate openers (We're building, Let's take a moment). Bad: *"The treaty is the off-ramp. The Court is the road that produces the off-ramp."* If a sentence could appear unchanged in a Stripe keynote, rewrite. +**No startup-bro copy.** No infrastructure metaphors (stack, rails, off-ramp, primitive, substrate), empty mechanism vocabulary (incentive layer, the protocol that, fundamentally), or corporate openers (We're building, Let's take a moment). If a sentence could appear unchanged in a Stripe keynote, rewrite. **Write like Kurt Vonnegut.** Plain declaratives. Verb-first imperatives for buttons ("Do this.", "Sign.", "Done."). Banned: "Take ownership", "Engage", "Empower", "Unlock", "Streamline", "Take this on", "Get started", and any other corporate-onboarding verb. **Reuse before rewrite.** Before writing a new component, grep `packages/web/src/components` for similarly-shaped JSX (share box, signature box, counter, markdown render, parameter display). If you find a match, use it. +**Manual-search before proposing copy.** Any agent that writes or critiques user-facing text MUST call `mcp__optimitron-tasks__searchManual` (or `askWishonia`) before suggesting replacement wording. Quoting from the manual beats inventing prose. If the manual returns nothing usable, say so explicitly. **Fallback for agents without MCP access:** the manual's search index is a static unauthenticated JSON file at `https://manual.warondisease.org/assets/json/search-index.json` — curl it directly and grep entries. Pages at `https://manual.warondisease.org/`. + **`` for every user-facing number.** Grep `packages/data/src/parameters/parameters-calculations-citations.ts` for a matching parameter before typing a number. Default `figures={3}` on calculator pages. **Catch users at peak commitment.** After a YES action, render the next step inline. Never punt with "the dashboard has X." @@ -52,33 +48,34 @@ Everything user-facing is narrated by **Wishonia** — _World Integrated System **Verify the deployed state.** "tsc clean" is not "shipped." Run `pnpm --filter @optimitron/web review:local` and look at the rendered page, or say "this is on the way, can't verify from here." -**Update `TODO.md` in the same commit** as the work it covers — both the check-box and any new follow-up lines. Deferred decisions ("we'll do X later", "real fix is upstream") go in TODO.md the same turn. Subagent prompts include the relevant TODO.md slice as context so they don't re-decide architecture in isolation. +**Update `TODO.md` in the same commit** as the work it covers — both the check-box and any new follow-up lines. Deferred decisions go in TODO.md the same turn. Subagent prompts include the relevant TODO.md slice as context. -**Pre-architect Read + Stop signal** are now enforced by hooks (PreToolUse on Write to `packages/*/src/` etc.; UserPromptSubmit detecting "should it really / I thought / aren't we" phrases). When a hook fires, treat its output as authoritative — don't argue past it. The hook exists because the equivalent CLAUDE.md rule was being ignored. +**Hook-enforced rules.** Pre-architect Read on Write to `packages/*/src/` and "should it really / I thought / aren't we" detection on UserPromptSubmit are enforced by hooks. When a hook fires, treat its output as authoritative. -**Diagram-before-code** for non-trivial changes. When a change touches >1 system (DB + deploy + CI; UI + API + DB), or you estimate >100 lines new, or the user used "I thought we had / aren't we / why is this so" phrasing: draw current + proposed flow (ASCII boxes or terse prose) in chat BEFORE the Write/Edit. User reacts to the diagram, you iterate on text not code. Trivial fixes (typo, single-line, isolated bug) skip this. +**Diagram-before-code** for non-trivial changes. When a change touches >1 system (DB + deploy + CI; UI + API + DB), or you estimate >100 lines new, or the user used "I thought we had / aren't we / why is this so" phrasing: draw current + proposed flow (ASCII boxes or terse prose) in chat BEFORE the Write/Edit. Trivial fixes skip this. -**Fetch the rendered page, don't infer from the codebase. AND fetch the right page.** When the user asks about UX, user journey, page copy, "is the page good" — fetch the actual rendered page first, then answer. Codebase = committed; rendered page = live; they drift (server/client boundaries, env routing, site variants, DB content). +**Fetch the rendered page, don't infer from the codebase. AND fetch the right page.** Codebase = committed; rendered page = live; they drift. -**Which page to fetch:** -- Reviewing an unmerged PR → fetch the PR's **PREVIEW DEPLOY** via Vercel MCP (`web_fetch_vercel_url`) or via curl with the `_vercel_share` bypass token. Production is STALE relative to unmerged PR work — fetching warondisease.org for a question about "what /treaty looks like on PR 75" gives you the main branch's pre-PR rendering, which is wrong for the PR review. -- Reviewing landed work / production behavior / "what does the live site show?" → fetch the production domain (warondisease.org, optimitron.com). +- Reviewing an unmerged PR → fetch the PR's **PREVIEW DEPLOY** via Vercel MCP (`web_fetch_vercel_url`) or curl with the `_vercel_share` bypass token. Production is STALE relative to unmerged PR work. +- Reviewing landed work / production behavior → fetch the production domain. - Local dev work → fetch http://localhost:3001 if dev server is running. -Default to PREVIEW DEPLOY when the conversation context is "this PR / this branch / what does my recent commit look like." Default to PRODUCTION only when the user explicitly says "production" or asks about the live site separately from the PR. +**Preview-link list format.** Every URL must be ONE click — full path + `?_vercel_share=&login=demo` (auth-required), or `&logout=1` (logged-out state), or both as TWO rows (HYBRID pages). NEVER output "click here to set the bypass cookie, then bare URLs." Format: single markdown table with columns `Page | State | What changed`. + +**Subagents and skills.** Project-local subagents in `.claude/agents/`: `voice-critic`, `cold-stranger-ux`, `pr-comment-triager`, `test-auditor`. Project-local skills in `.claude/skills/`: `qa-editorial`, `verify-slide`. **Use gstack first for generic work** (`/review`, `/design-review`, `/qa`, `/cso`, `/investigate`, `/office-hours`, `/plan-ceo-review`, `/ship`, `/context-save`, `/context-restore`). Then run `/qa-editorial` for the Wishonia-voice / cold-stranger / parameter-citation layer gstack can't see. + +**Gstack memory split.** Two memory systems coexist. **Behavioral feedback** (rules about how the agent should act) → `C:/Users/m/.claude/projects/E--code-optimitron/memory/feedback_*.md`, indexed in `MEMORY.md`. **Codebase / environment facts** → `gstack-learnings-log` via `~/.claude/skills/gstack/bin/gstack-learnings-log '{"skill":"...","type":"pattern|pitfall|preference|architecture|tool|operational","key":"slug","insight":"...","confidence":1-10,"source":"observed"}'`. Auto-loaded at every gstack skill start. The injection scrubber rejects insights containing "override", "do not flag", "ignore previous" — rephrase. -**Preview-link list format.** When generating a review-link list for the user: every URL must be ONE click — full path + `?_vercel_share=&login=demo` (auth-required), or `&logout=1` (testing logged-out state), or both as TWO rows (HYBRID pages that render differently per auth state). NEVER output "click here to set the bypass cookie, then bare URLs" — that defeats the entire reason `?login=demo` / `?logout=1` query params exist. Format: a single markdown table with columns `Page | State | What changed`. State = "logged-out" / "demo logged-in". +**Gstack artifacts sync.** `~/.gstack/` is a git repo pushing to https://github.com/mikepsinn/gstack-artifacts-mikepsinn (private). Learnings + checkpoints + plans sync across machines via HTTPS push. Laptop bootstrap commands live in SETUP.md. -**Subagents** live in `.claude/agents/`: `voice-critic` (post-UI copy critique), `pr-comment-triager` (bot-review triage), `test-auditor` (suite slop + missing coverage). Their `.md` files have the full instructions. +**Codex delegation.** Programming work goes to Codex agents by default; meta-config (this file, `.codex/config.toml`, hooks) Claude edits directly. Full protocol in [`.claude/codex-delegation.md`](.claude/codex-delegation.md). -**Employees, not opponents.** Frame leader outreach as "remind your overdue presidents/employees," never "pressure politicians." They are paid by the citizenry to promote welfare and are late on a 30-second task. Banned: "pressure," "political pressure," "pressure surface/machine," "applied pressure" when referring to leaders. +**Employees, not opponents.** Frame leader outreach as "remind your overdue presidents/employees," never "pressure politicians." Banned: "pressure," "political pressure," "pressure surface/machine," "applied pressure" when referring to leaders. **Apply to:** all user-facing copy. **Not to:** CLAUDE.md, code comments, README. ## Papers (algorithm source of truth) -Read the relevant section before implementing. QMDs contain the math, schemas, parameter values, and worked examples. - | Package | Paper | URL | | ------------ | --------------------------------------------------------- | ------------------------------------ | | `optimizer` | dFDA Spec — PIS, temporal alignment, effect size | https://dfda-spec.warondisease.org | @@ -88,16 +85,14 @@ Read the relevant section before implementing. QMDs contain the math, schemas, p | Welfare | Optimocracy — two-metric welfare function | https://optimocracy.warondisease.org | | Treasury | Incentive Alignment Bonds | https://iab.warondisease.org | -Source QMDs: `github.com/mikepsinn/disease-eradication-plan/blob/main/knowledge/appendix/`. Read the section you need, not the whole file. - -## Research Tools (use these before guessing) +Source QMDs: `github.com/mikepsinn/disease-eradication-plan/blob/main/knowledge/appendix/`. -Before grepping random files or guessing at facts about the manual, plan, or parameters, use the MCP server tools already wired up in `.mcp.json` as `optimitron-tasks`: +## Research Tools -- **`mcp__optimitron-tasks__searchManual`** — `{ query, maxResults? }` → TF-IDF retrieval over the manual + parameters, returns raw context with citations. **Use first** for any factual question ("what's the current DALY burden?", "where does the 0.5% tx tax come from?"). No Gemini cost. -- **`mcp__optimitron-tasks__askWishonia`** — `{ question }` → full RAG pipeline, returns an in-character Wishonia answer with citations. Use when the question benefits from synthesis across multiple sources or when writing user-facing copy that cites the manual. +- **`mcp__optimitron-tasks__searchManual`** — `{ query, maxResults? }` → TF-IDF retrieval over the manual + parameters. **Use first** for any factual question. No Gemini cost. +- **`mcp__optimitron-tasks__askWishonia`** — `{ question }` → full RAG pipeline, in-character answer with citations. Use when synthesis across sources or when writing user-facing copy that cites the manual. -The server is defined in `packages/web/src/lib/mcp-server.ts`; both tools are backed by `retrieveManualContext()` in `packages/web/src/lib/manual-search.server.ts`. There is no CLI wrapper — the MCP tools are the interface. +Both backed by `retrieveManualContext()` in `packages/web/src/lib/manual-search.server.ts`. No CLI wrapper. ## Architecture @@ -124,30 +119,10 @@ optimitron/packages/ **Hard rules:** - `optimizer` depends on nothing. **Domain-agnostic** — never reference "drugs", "policies", "budgets", "politicians". Use: predictor, outcome, variable, measurement, effect. -- Library packages (`optimizer`, `wishocracy`, `opg`, `obg`, `data`, `agent`, `hypercerts`, `storage`) must be runtime-safe: no Prisma client, no runtime DB imports, must work in the browser. They may `import type` from `@optimitron/db` only. -- `@optimitron/db` exports pure TS interfaces (all packages), Zod schemas (namespaced `schemas`, runtime boundaries only), and the Prisma client (**web/API layer only**). `db` may consume curated catalogs from `data` when that removes duplication. +- Library packages (`optimizer`, `wishocracy`, `opg`, `obg`, `data`, `agent`, `hypercerts`, `storage`) must be runtime-safe: no Prisma client, no runtime DB imports, must work in the browser. `import type` from `@optimitron/db` only. +- `@optimitron/db` exports pure TS interfaces, Zod schemas (namespaced `schemas`, runtime boundaries only), and the Prisma client (**web/API layer only**). `db` may consume curated catalogs from `data`. - **Prisma 7** + `@prisma/adapter-pg`. The `datasource` block in `schema.prisma` intentionally omits `url` — the connection is configured at runtime via the adapter. **Never** add `url = env("DATABASE_URL")`. - -## Core Insight: Optimizer is Universal - -`@optimitron/optimizer` takes any two time series and answers: does X cause Y, by how much, what's the optimal X. Pipeline: **temporal alignment → Bradford Hill → Predictor Impact Score → optimal value.** - -| Domain | Predictor | Outcome | Question | -| ------------- | --------------- | ----------------- | -------------------------------- | -| Health | Drug/Supplement | Symptom/Biomarker | Does magnesium improve sleep? | -| Policy | Policy change | Welfare metric | Does tobacco tax reduce smoking? | -| Budget | Spending level | Welfare metric | Optimal education budget? | -| Business | Ad spend | Revenue | Optimal marketing budget? | -| Agriculture | Fertilizer | Crop yield | Optimal fertilizer level? | -| Manufacturing | Temperature | Defect rate | What minimizes defects? | - -A business analyst should be able to `npm install @optimitron/optimizer` for revenue optimization without ever seeing the word "government". - -## Jurisdiction Model ("Government OS") - -Any jurisdiction (city, county, state, country) deploys Optimitron as its governance OS. The libraries are already jurisdiction-agnostic; the jurisdiction-specific parts are **configuration, not code**. Think Shopify for governments. - -Every DB model has a `jurisdictionId`. Items, officials, data fetchers all scope to jurisdiction. Cross-jurisdiction comparison ("City A spends X on education and gets Y; City B spends Z...") is a first-class feature. `optimizer`/`wishocracy`/`opg`/`obg` are already jurisdiction-agnostic; `web` handles multi-tenancy (auth, routing, tenant isolation). +- Every DB model has a `jurisdictionId`; libs are jurisdiction-agnostic, `web` handles multi-tenancy. ## Treasury: Three Independent Mechanisms @@ -155,80 +130,77 @@ Don't mix them. Don't put one on another's page. Don't conflate their economics. | Mechanism | Page | Purpose | Contracts | Flow | | --------------------------------------- | ----------- | ------------------------------------------------------- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| **Earth Optimization Prize** (Phase 1) | `/prize` | Fund referendum proving demand for the 1% Treaty | `VoteToken`, `VoterPrizeTreasury` (Base Sepolia) | Deposit USDC → Aave yield → share referral → World ID voters → referrer earns VOTE 1:1. **Success:** VOTE holders claim prize share. **Failure** (15yr): depositors claim principal + ~4.2× yield (`$100 × 1.10^15 = $418`). Dominant assurance — break-even P = 0.0067%, zero downside. In campaign mode, this supports referral incentives; treaty-vote conversion takes precedence. | -| **Incentive Alignment Bonds** (Phase 2) | IAB pages | Raise ~$1B to lobby the 1% Treaty once demand is proven | `IABVault`, `IABSplitter`, `PublicGoodsPool` | Investors buy bonds → capital funds lobbying → treaty passes → $27B/yr splits 80% trials / 10% investors (272% annual) / 10% aligned-politician super PACs. **If treaty fails, bonds lose everything.** Not an assurance contract. Real investment, real risk. | +| **Earth Optimization Prize** (Phase 1) | `/prize` | Fund referendum proving demand for the 1% Treaty | `VoteToken`, `VoterPrizeTreasury` (Base Sepolia) | Deposit USDC → Aave yield → share referral → World ID voters → referrer earns VOTE 1:1. **Success:** VOTE holders claim prize share. **Failure** (15yr): depositors claim principal + ~4.2× yield (`$100 × 1.10^15 = $418`). Dominant assurance — break-even P = 0.0067%, zero downside. | +| **Incentive Alignment Bonds** (Phase 2) | IAB pages | Raise ~$1B to lobby the 1% Treaty once demand is proven | `IABVault`, `IABSplitter`, `PublicGoodsPool` | Investors buy bonds → capital funds lobbying → treaty passes → $27B/yr splits 80% trials / 10% investors (272% annual) / 10% aligned-politician super PACs. **If treaty fails, bonds lose everything.** Not an assurance contract. | | **$WISH Token / UBI** | `/treasury` | Replace welfare + IRS + inflationary monetary policy | `WishToken`, `WishocraticTreasury`, `UBIDistributor` | Flat 0.5% tx tax (no income tax/filing), UBI at poverty line, algorithmic 0% inflation, tx taxes + productivity gains allocated by 8B people via Wishocracy RAPPA. | Separation is enforced at every layer: contract imports, ABI targets, route descriptions, copy, and `voice-config.ts` (which explicitly gags Wishonia from mentioning IABs on prize pages). Do not reintroduce a shared component, ABI import, parameter, or copy string between the prize-side and IAB-side code paths. -Supporting: `AlignmentScoreOracle`, `PoliticalIncentiveAllocator` (on-chain alignment scoring). - ## Display Identity: Person owns it -`Person` owns every public-facing identity field: `displayName`, `handle`, `image`, `bio`, `headline`, `coverImage`, `website`, `isPublic`. `User` is the auth/account record — credentials, preferences, demographics, geo. There is **no mirror, no fallback, no transitional state**: any display read goes through `Person`. +`Person` owns every public-facing identity field: `displayName`, `handle`, `image`, `bio`, `headline`, `coverImage`, `website`, `isPublic`. `User` is the auth/account record. **No mirror, no fallback** — display reads go through `Person`. - **Reads:** `getUserDisplayName/Handle/Avatar/Href/Label` from `@/lib/user-display`. Helpers read `user.person.X` only. -- **Queries:** spread `userDisplaySelect` into the Prisma select (joins Person automatically). It selects the User keys (`id`, `email`) plus the Person fields the helpers need. +- **Queries:** spread `userDisplaySelect` into the Prisma select (joins Person automatically). - **URLs:** `getPersonHref(person)` from `@/lib/person-href`. Never `/people/${id}`. - **Profile edits:** `/api/dashboard/profile` writes Person directly. Handle uniqueness checks `Person.handle`. -- **OAuth/signup:** the auth adapter and credentials signup route create the User with auth fields only, then call `ensurePersonForUser(userId, { displayName, image })` which seeds the Person with a unique handle. +- **OAuth/signup:** the auth adapter and credentials signup route create the User with auth fields only, then call `ensurePersonForUser(userId, { displayName, image })`. ## Page Metadata -`packages/web/src/lib/routes.ts` is the single source of truth for page titles + descriptions. Each `NavItem` has `label`, `description`, `emoji`. Pages use `getRouteMetadata(link)` from `@/lib/metadata.ts`. All descriptions in Wishonia's voice. +`packages/web/src/lib/routes.ts` is the single source of truth for page titles + descriptions. Each `NavItem` has `label`, `description`, `emoji`. Pages use `getRouteMetadata(link)` from `@/lib/metadata.ts`. Descriptions in Wishonia's voice. ## Task Tree -The task tree has a single root: `optimize-earth` (taskKey `program:optimize-earth`). Both values come from `OPTIMIZE_EARTH_ROOT_TASK_ID` / `OPTIMIZE_EARTH_ROOT_TASK_KEY` exported from `@optimitron/db` — the prisma seed and web code import the same constants, so the literal string only exists in one place. Every other program is a child of the root because every program is a bet on moving the two welfare numbers — median healthy life-years and median income — toward their 2040 targets. The tree _is_ the persuasion argument: walking up the parent chain from any claimable task lands a voter on their primary motivator. +Single root: `optimize-earth` (taskKey `program:optimize-earth`). Both values come from `OPTIMIZE_EARTH_ROOT_TASK_ID` / `OPTIMIZE_EARTH_ROOT_TASK_KEY` in `@optimitron/db`. Every program is a child because every program is a bet on moving HALE or median income toward 2040 targets. The tree _is_ the persuasion argument. -- **Targets**: `earthOptimizationPrizeWinCondition` in `packages/data/src/parameters/earth-optimization-prize.ts`. Single source of truth for HALE baseline/target, median-income baseline/target, and the 2040 deadline. Reads from the generated `TREATY_*` parameter constants — do not duplicate the numbers anywhere else. Manual refs: `manual.warondisease.org/knowledge/strategy/earth-optimization-prize.html`, `.../economics/gdp-trajectories.html`. -- **Attribution**: use `computeParentContributionShare(parent, child)` in `packages/web/src/lib/tasks/impact.ts`. Computes `child.delta / parent.delta` for HALE and income. Nothing stored, nothing to drift. -- **Adding a new program**: it must be a child of `optimize-earth` (or of one of the programs beneath it; reference via `OPTIMIZE_EARTH_ROOT_TASK_ID`). Do not add a new `parentTaskId: null` task. If a task isn't a bet on HALE or income, it should not exist. -- **Ancestors on task detail**: `getTaskAncestors(taskId)` walks `parentTaskId` up to root (depth-capped, cycle-safe). Use this, not ad-hoc recursive Prisma selects. -- **Onboarding tasks** (dashboard welcome tasks) stay out of this tree — they're private onboarding state, not part of the global prize tree. +- **Targets**: `earthOptimizationPrizeWinCondition` in `packages/data/src/parameters/earth-optimization-prize.ts`. Single source of truth for HALE/income baselines, targets, and the 2040 deadline. Reads `TREATY_*` parameter constants — don't duplicate numbers. +- **Attribution**: `computeParentContributionShare(parent, child)` in `packages/web/src/lib/tasks/impact.ts`. Computes `child.delta / parent.delta`. Nothing stored. +- **Adding a new program**: child of `optimize-earth` or a descendant. No new `parentTaskId: null` task. If a task isn't a bet on HALE or income, it should not exist. +- **Ancestors**: `getTaskAncestors(taskId)` (depth-capped, cycle-safe). Not ad-hoc recursive Prisma selects. +- **Onboarding tasks** stay out of this tree. ## High-Value Defaults -1. **Use feature branches for implementation.** New implementation branches start with `feature/`, followed by a short kebab-case description. Example: `feature/international-campaign-site-name`. -2. **Ship through pull requests.** When feature work is done and checks pass, commit the intended changes, push the branch, and update the existing pull request for that branch or task. Create a new pull request only when no open PR exists for the work. -3. **Watch the PR after every push.** Check GitHub Actions, deployment checks, and review comments. Fix valid failures/comments, push again, and watch again. **Triage review comments critically — do not blindly comply with bot reviewers (Codex, Copilot, CodeRabbit, Vercel Agent Review).** For each comment ask: does this point at a real bug that hits a real path, or is it AI slop / hypothetical / style preference / consistency-for-its-own-sake? If the latter, mark the thread resolved with a one-line reason ("hypothetical, no triggering path", "stylistic, current shape is intentional", "already addressed in commit X"). If the former, fix it and mark resolved. Adding code or tests just to silence a bot is worse than the bot's nag — it adds maintenance surface forever in exchange for one-time review noise. The same rule applies to suggestions to extract constants, add symmetry assertions, normalize naming, or split functions for "readability": do them only when they improve the codebase, not because a bot mentioned them. -4. **Never merge pull requests.** Once checks are green and there are no unresolved valid review complaints, report that the pull request is ready and let the user review the diff and merge it. -5. **Respect review-only turns.** If the user asks only for analysis, review, or a proposed copy/design, do not commit or push until they approve implementation or publishing. -6. **Library packages stay runtime-safe.** No Prisma / runtime DB in `optimizer`, `wishocracy`, `opg`, `obg`, `data`, `agent`, `hypercerts`, `storage`. -7. **Zod only at real boundaries** — HTTP, form, MCP, OAuth. Not internal helpers. -8. **Calibrate before major refactors.** For multi-file refactors, deleting abstractions, or replacing auth/security controls, present 2-3 options with your recommendation first. Once a preference is clear for that decision class, proceed without re-asking. +1. **One feature branch, one PR at a time. No git worktrees.** New work waits until current PR merges. Sequential, not parallel. +2. **One dev server, always running on 3001.** Claude pre-warms `pnpm --filter @optimitron/web dev:fast > packages/web/.dev-server.log 2>&1` at session start if `curl -sS -m 3 http://127.0.0.1:3001` doesn't return 2xx/3xx. Every dispatched agent reuses it — never spawn its own. If `netstat -ano | findstr :3001` shows the port bound but `curl` fails, kill the PID and restart. Codex dispatch prompts include the log path; agents `tail` it after loading pages because 200 responses can hide runtime errors in stderr. +3. **Feature branches** start with `feature/`, kebab-case. +4. **Ship through pull requests.** Update the existing PR for a branch; create a new PR only when none exists. +5. **Watch the PR after every push.** Fix valid failures/comments. **Triage bot reviewers critically** (Codex, Copilot, CodeRabbit, Vercel Agent Review) — use the `pr-comment-triager` subagent. Adding code or tests just to silence a bot is worse than the bot's nag. +6. **Never merge pull requests.** Report ready; user merges. +7. **Respect review-only turns.** Analysis/review/proposed copy → don't commit or push until approval. +8. **Library packages stay runtime-safe.** No Prisma / runtime DB in `optimizer`, `wishocracy`, `opg`, `obg`, `data`, `agent`, `hypercerts`, `storage`. +9. **Zod only at real boundaries** — HTTP, form, MCP, OAuth. Not internal helpers. +10. **Calibrate before major refactors.** Present 2-3 options with your recommendation first. Once a preference is clear for that decision class, proceed without re-asking. ## UI/UX Rules -The near-term goal is to get a verified majority of humanity to vote for the 1% Treaty. Every UI decision optimizes for voting, referral, endorsement, plaintiff registration, leader pressure, or trust in the quantified case. Decoration loses by default. - -- **Screenshot UI changes.** After changing UI, capture affected pages/states before considering the work complete. Inspect screenshots yourself for layout breakage, overlapping text, missing content, broken styling, and responsive problems. -- **Use local before/after review artifacts.** For meaningful visual changes, capture before and after screenshots when feasible and generate a local HTML review page under `packages/web/output/playwright/`, either side-by-side or with the previous screenshot above the updated screenshot for each page/state/viewport. Always write or copy the current review page to `packages/web/output/playwright/review/latest.html` so the user can bookmark one local file and refresh it. Copy referenced screenshot assets beside `latest.html` or rewrite image paths relative to that stable file, then verify the stable page has no broken image references. Link that stable file in the handoff with a clickable local file link and a plain filesystem path. -- **Link the edited local pages.** When UI or route/page changes are ready for review and a local dev server is available, include direct local dev URLs for every edited page or relevant state, such as `http://127.0.0.1:3001/path`, so the user can open the live page in addition to the screenshot review artifact. -- **Baseline screenshot worktrees need built workspace deps.** If you create a clean `git worktree` to capture before screenshots, either run a normal install or run the relevant workspace build after `pnpm install --ignore-scripts`; otherwise packages that export from `dist/` can fail at render time. -- **Run ad-hoc Playwright scripts from `packages/web`.** Use `pnpm --dir packages/web exec node ...` or run from `packages/web` with `pnpm exec` so `@playwright/test` resolves from the web app's dev dependencies. -- **Treat screenshots as sensitive by default.** Local and preview environments may be connected to production or production-derived databases, so screenshots can contain names, emails, tasks, admin data, or other sensitive content. -- **Do not commit or upload screenshot artifacts by default.** Keep images and HTML review pages local unless the user explicitly asks and the screenshots are confirmed sanitized. Do not put screenshots, screenshot HTML, or local screenshot/HTML paths in public PR bodies or comments by default; share local review links in chat before committing UI changes and wait for approval unless the user explicitly waives review. -- **Migrate toward the War on Disease treaty style.** New or touched public UI should use the simple black-and-white style used by the `warondisease.org` variant: white paper, black ink, thin black rules, square corners, restrained typography, and no decorative color. Reuse existing primitives only when they render in that style; otherwise simplify the surface instead of adding neobrutalist chrome. -- **Big, clear, legible.** Headings `text-4xl sm:text-5xl md:text-6xl font-black uppercase`. Body `text-base font-bold` minimum. Hero numbers (death counters, cost, time) as large as the viewport allows. -- **Cut ruthlessly.** For every page ask: **what can I remove, hide, or collapse that would increase the chance a human actually completes the task on this page?** Delete it. Collapse secondary info into accordions or sub-pages. One primary CTA per screen, visible without scrolling. -- **Make actions look actionable.** If a link starts or completes a user task, especially an external workflow, render it as a clear button or command control, not only as inline text. Use plain inline links for references, citations, navigation, and secondary reading. If the user is expected to copy an exact value into another site, email, form, wallet, bank portal, legal document, or message, provide a compact copy affordance near that value. Primary task actions get buttons; exact reusable values get copy buttons; explanatory text stays text. -- **No blather.** No "welcome to", "let's take a moment", "in this section we'll", "we're excited to". State the fact, state the action, stop. Every word load-bearing. If deleting it doesn't hurt, delete it. Max one adjective per noun. Numbers beat adjectives. A shocking fact beats a paragraph explaining the fact. -- **Completion test:** cover the bottom half of the screen with your hand. If a user seeing only the top half doesn't know what to do next, restructure. +Near-term goal: get a verified majority of humanity to vote for the 1% Treaty. Every UI decision optimizes for voting, referral, endorsement, plaintiff registration, leader outreach, or trust in the quantified case. Decoration loses by default. + +- **Screenshot UI changes.** After changing UI, capture affected pages/states. Inspect for layout breakage, overlapping text, missing content, broken styling, responsive problems. +- **Local before/after review artifacts.** For meaningful visual changes, generate `packages/web/output/playwright/review/latest.html` as a stable side-by-side review page. Copy referenced screenshot assets beside `latest.html`. Link clickable local file + plain filesystem path in the handoff. +- **Link the edited local pages.** Include direct local dev URLs (`http://127.0.0.1:3001/path`) so the user can open the live page alongside the screenshot review. +- **Screenshots are sensitive.** Local/preview envs may hit prod-derived databases; screenshots can contain names, emails, admin data. Don't commit screenshot artifacts or put paths in public PR bodies. Share locally; wait for approval. +- **War on Disease treaty style.** New or touched public UI: white paper, black ink, thin black rules, square corners, restrained typography, no decorative color. When in doubt, simplify rather than add neobrutalist chrome. +- **Big, clear, legible.** Headings `text-4xl sm:text-5xl md:text-6xl font-black uppercase`. Body `text-base font-bold` minimum. Hero numbers as large as the viewport allows. +- **Cut ruthlessly.** For every page: **what can I remove or collapse that would increase the chance a human actually completes the task on this page?** One primary CTA per screen, visible without scrolling. +- **Make actions look actionable.** Primary task actions = buttons. Exact reusable values = copy buttons. Inline links for citations / navigation / secondary reading. +- **No blather.** No "welcome to", "let's take a moment", "in this section we'll", "we're excited to". State the fact, state the action, stop. Numbers beat adjectives. +- **Completion test:** cover the bottom half of the screen. If a user seeing only the top half doesn't know what to do next, restructure. ## Testing Rules (non-negotiable) **When to write a test:** -- ✅ Pure functions with fallback/branching logic (helpers, parsers, formatters, selectors) -- ✅ State transitions inside `$transaction` or multi-step DB writes (profile edits, vote tallies, claim status) +- ✅ Pure functions with fallback/branching logic +- ✅ State transitions inside `$transaction` or multi-step DB writes - ✅ Boundary conversions (Prisma row → DTO, OAuth profile → User row, session → client) -- ✅ Regression fixes — failing test before the fix, in the same change -- ❌ Framework passthroughs (wrappers that just call `findUnique`) -- ❌ UI rendering snapshots — brittle, low signal +- ✅ Regression fixes — failing test before the fix, same change +- ❌ Framework passthroughs +- ❌ UI rendering snapshots - ❌ Tests that transcribe the implementation line-by-line -- ❌ Tests added "for symmetry" with another test, "for documentation", "for consistency", or because a bot reviewer asked. If the test would not catch a bug or guard a regression in code we actually ship, do not write it. Maintenance cost is forever; signal is zero. -- ❌ Tests that mock the entire surface they're supposedly testing. If you mock `notifyTaskAssigneeOfAssignment` and then assert `notifyTaskAssigneeOfAssignment` was called, the test only verifies you can call the mock. Test the boundary, not the wiring. +- ❌ Tests added "for symmetry", "for documentation", "for consistency", or because a bot reviewer asked +- ❌ Tests that mock the entire surface they're supposedly testing. Test the boundary, not the wiring. **Non-flaky or don't bother:** @@ -236,65 +208,60 @@ The near-term goal is to get a verified majority of humanity to vote for the 1% - No real network / LLM calls — mock at the import boundary - No `Math.random`, `Date.now`, `crypto.randomUUID` in assertions - No relying on Prisma row order unless you `orderBy` -- No shared mutable state between tests — each `it` is independent +- No shared mutable state between tests - If it needs `retry` or `sleep` to pass, it's wrong **Self-verification is mandatory:** - Before handing back any non-trivial change, run the affected package test suite: `pnpm --filter @optimitron/ test` -- If the change touches shared types/schemas, run `pnpm check` across the graph -- Fix every failure yourself. The user reviews working code, not a broken suite. -- If an existing test breaks because of a justified shape change, update it — never `skip` or disable. +- If the change touches shared types/schemas, run `pnpm check` +- Fix every failure yourself. Never `skip` an existing test. - If you can't reproduce a failure locally, say so explicitly. Don't guess-fix. -**Scope:** write the minimum tests that would have caught the bug you just fixed or the regression the change could plausibly introduce. One `describe` per module, one `it` per behavior. Tests read like documentation — name them after behavior, not implementation. +**Scope:** minimum tests that would have caught the bug you fixed or the regression the change could plausibly introduce. One `describe` per module, one `it` per behavior. Name after behavior, not implementation. + +**Run ad-hoc Playwright scripts from `packages/web`** so `@playwright/test` resolves: `pnpm --dir packages/web exec node ...`. ## Visual Style Rules (enforced) Contrast audit: `pnpm --filter @optimitron/web exec playwright test e2e/contrast-audit.spec.ts --project=default`. -**Default style:** black-and-white treaty/editorial UI. Use semantic tokens and the treaty CSS variables already used by `warondisease.org`: `background`, `foreground`, `border`, `input`, `ring`, `card`, `popover`, `muted`, `muted-foreground`, `primary`, `primary-foreground`, `current`, `inherit`, `transparent`, `var(--treaty-paper)`, `var(--treaty-ink)`, `var(--treaty-ink-soft)`, and `var(--treaty-ink-muted)`. +**Default style:** black-and-white treaty/editorial UI. Use semantic tokens and the treaty CSS variables already used by `warondisease.org`: `background`, `foreground`, `border`, `input`, `ring`, `card`, `popover`, `muted`, `muted-foreground`, `primary`, `primary-foreground`, `var(--treaty-paper)`, `var(--treaty-ink)`, `var(--treaty-ink-soft)`, `var(--treaty-ink-muted)`. -**Migration rule:** when touching public UI, remove neobrutalist styling instead of copying it forward. Replace `brutal-*` fills, oversized hard shadows, colored panels, gradients, thick novelty borders, and rounded cards with the black-and-white treaty tokens above. Admin-only status chips, charts, game/demo/Sierra screens, and email-client markup may keep their own specialized colors when the color carries functional meaning. +**Migration rule:** when touching public UI, remove neobrutalist styling instead of copying it forward. Replace `brutal-*` fills, hard shadows, colored panels, gradients, thick novelty borders, rounded cards with treaty tokens. Admin chips, charts, game/demo screens, and email markup may keep specialized colors. **Never use:** - Opacity modifiers on black/white (`text-black/50`, `bg-white/70`) -> `text-muted-foreground` / `text-foreground` -- Hardcoded `bg-white` / `text-white` / `bg-black` / `text-black` in components -> `bg-background`, `bg-foreground`, `text-foreground`, or `text-background` +- Hardcoded `bg-white` / `text-white` / `bg-black` / `text-black` -> `bg-background`, `bg-foreground`, `text-foreground`, or `text-background` - Tailwind color scales (`bg-emerald-100`, `text-gray-500`) -> semantic or treaty tokens - Hardcoded hex (`#ef4444`, `#666`, `#f5f5f5`) -> CSS custom properties - Beige/cream/sand/tan backgrounds -- Gradients, bokeh/orb decoration, illustrative SVG backgrounds, and ornamental color blocks +- Gradients, bokeh/orb decoration, illustrative SVG backgrounds, ornamental color blocks - New `brutal-*` tokens on public treaty/campaign surfaces -- Hard offset shadows and soft shadows on public treaty/campaign surfaces -- Rounded cards and large radii; use square corners (`rounded-none`) unless an existing form primitive requires a tiny control radius -- **Exception:** `emails/` may use inline hex because email clients require it. +- Shadows on public treaty/campaign surfaces +- Rounded cards and large radii; use square corners (`rounded-none`) +- **Exception:** `emails/` may use inline hex. ## Design Primitives -Reference implementation: the current `warondisease.org` variant and its treaty document surfaces. - -Use primitives for behavior and accessibility, not for inherited decoration. Prefer simple semantic markup with `bg-background text-foreground border-foreground` when the existing primitive would add color, hard shadows, arcade motion, or neobrutalist framing. - -### RetroUI (`components/retroui/`) - -Use existing RetroUI controls for forms, dialogs, menus, tooltips, tables, accordions, tabs, alerts, avatars, progress, breadcrumbs, calendars, carousels, commands, loaders, and charts when they already fit the black-and-white token system. Keep the compound pattern: ``, not ``. +Use primitives for behavior and accessibility, not for inherited decoration. Prefer simple semantic markup with `bg-background text-foreground border-foreground` when the existing primitive would add color, hard shadows, or neobrutalist framing. -### Domain primitives (`components/ui/`) +**RetroUI (`components/retroui/`):** use existing controls (forms, dialogs, menus, tooltips, tables, accordions, tabs, alerts, avatars, progress, breadcrumbs, calendars, carousels, commands, loaders, charts) when they fit the black-and-white tokens. Keep the compound pattern: ``, not ``. -Use domain primitives only when they help structure the page without adding colored neobrutalist styling. Avoid `BrutalCard`, colored `StatCard` variants, `ArcadeTag`, hard-shadow CTA blocks, and other legacy brutal/demo styling on public treaty/campaign pages. When touching those pages, migrate toward unframed sections, thin bordered tables, simple counters, and document-like layouts. +**Domain primitives (`components/ui/`):** avoid `BrutalCard`, colored `StatCard` variants, `ArcadeTag`, hard-shadow CTA blocks on public treaty/campaign pages. Migrate to unframed sections, thin bordered tables, simple counters, document-like layouts. -### Styling conventions +**Styling conventions:** -- **Borders:** use `border` or `border-2` with `border-foreground`/`border-border`. Avoid `border-4` unless the existing surface is explicitly admin/game/demo. -- **Shadows:** no shadows by default. Do not add hard-offset or soft shadows to treaty/campaign UI. -- **Hover:** keep it quiet: underline links, invert black/white buttons, or use `bg-muted`. Avoid push-down arcade motion. -- **Typography:** headings `font-black uppercase`; body `font-bold` (700) minimum — never `font-medium/normal/light`; subtle text `text-muted-foreground font-bold`. -- **Sections:** use white/background bands and black rules. Do not alternate colored brutal sections on public treaty/campaign pages. +- **Borders:** `border` or `border-2` with `border-foreground`/`border-border`. No `border-4` outside admin/game/demo. +- **Shadows:** none by default on treaty/campaign UI. +- **Hover:** underline links, invert black/white buttons, or `bg-muted`. No push-down arcade motion. +- **Typography:** headings `font-black uppercase`; body `font-bold` (700) minimum; subtle text `text-muted-foreground font-bold`. +- **Sections:** white/background bands and black rules. No alternating colored brutal sections on public pages. ## Environment Variables -All env vars in **root `.env`** (not `packages/web/.env`). Next.js picks them up via the workspace. Local dev: `NEXTAUTH_URL=http://localhost:3001`. +All env vars in **root `.env`** (not `packages/web/.env`). Local dev: `NEXTAUTH_URL=http://localhost:3001`. ## Tooling @@ -306,24 +273,22 @@ All env vars in **root `.env`** (not `packages/web/.env`). Next.js picks them up ## Type Safety & Linting -Before handing back any change: `pnpm check` (typecheck + lint + test). Fix every failure yourself. The user reviews working code, not a broken suite. +Before handing back any change: `pnpm check` (typecheck + lint + test). Fix every failure yourself. -- **Never** run `pnpm build` / `next build` — the dev server handles compilation. Only run a full build if explicitly asked. -- `tsc` on a single file doesn't work (jsx/alias); use the project-level `pnpm check` or `pnpm --filter @optimitron/ exec tsc --noEmit`. +- **Never** run `pnpm build` / `next build` — the dev server handles compilation. Only on explicit ask. +- `tsc` on a single file doesn't work (jsx/alias); use `pnpm check` or `pnpm --filter @optimitron/ exec tsc --noEmit`. - **TypeScript strict mode ON** — `noUncheckedIndexedAccess`, `noImplicitOverride`. ESLint strict. - **No `any`** — use proper types or `unknown` with guards. No floating promises. No unused vars (prefix `_` if intentional). All tsconfigs extend `tsconfig.base.json`. -## Self-Review: Be Ruthlessly Critical +## Self-Review -Before picking a new task, scan with fresh eyes. **Delete on sight:** +- **Magic numbers** — named constants citing the paper. +- **Stale TODOs** — do it or delete it. +- **Wrong abstractions** — 8-parameter functions, methods mixing concerns. Replace. +- **What good looks like:** functions fit on one screen; module names tell you what's inside; tests read like documentation; no unnecessary abstractions — functions taking data, returning results. -- **Dead code** — unused imports, unreachable branches, commented-out code -- **Copy-paste** — extract to shared function -- **Over-engineering** — abstract bases nobody extends, factories creating one thing -- **Wrong abstractions** — 8-parameter functions, methods mixing concerns -- **Magic numbers** — named constants citing the paper -- **Stale TODOs** — do it or delete it +## gstack (REQUIRED — global install) -**Simplicity test:** could a junior developer understand this in 30 seconds? If not, simplify. This should not feel like enterprise Java. +If `~/.claude/skills/gstack/bin` is missing, STOP and tell the user to install (see SETUP.md or `https://github.com/garrytan/gstack`). Do not skip skills, ignore gstack errors, or work around missing gstack. -**What good looks like:** functions fit on one screen; files stay reviewable (30/300 line thresholds are smells, not caps); module names tell you what's inside; tests read like documentation; no unnecessary abstractions — just functions taking data and returning results. +After install, use gstack skills (`/qa`, `/ship`, `/review`, `/investigate`, `/browse`) for generic work. Use `/browse` for all web browsing. diff --git a/TODO.md b/TODO.md index 96f3039e2..cec2956e8 100644 --- a/TODO.md +++ b/TODO.md @@ -1,1050 +1,303 @@ -# Optimitron TODO — 4 Billion Votes on the 1% Treaty +# Optimitron TODO - 4B Votes on the 1% Treaty -This file is the working priority list. The North Star is the only thing on the site: -get majority of humans on the 1% Treaty referendum at warondisease.org. Everything -else either feeds that funnel or is parked. +This file is the working priority list. If Mike opens it cold, the next useful +work should be visible near the top. -Old multi-page TODO contents (sprint plans S1-S10, multi-agent architecture, dead-people-voting PRD, -DIH migration notes, code-review-fix lists from 2026-04-29) are in git history. Recover with -`git show :TODO.md`. Do not paste them back here unless they actively block 4B. +Old sprint plans, session journals, stale PR checklists, and migration notes are +in git history. Recover with `git show :TODO.md` only when they +directly unblock the 4B-voter campaign. ## North Star -- **Goal:** ~4B votes on the 1% Treaty referendum (majority of humans on Earth). -- **Math:** 32 doubling rounds × 2 referrals each ≈ 4.3B reached. -- **Primary site:** `warondisease.org` - the public website for the - International Campaign to End War and Disease. Treaty text host: - `1percenttreaty.org`. App/proof engine: `optimitron.com`. -- **Tree:** every task on the site is a child of `optimize-earth` - (`OPTIMIZE_EARTH_ROOT_TASK_ID` exported from `@optimitron/db` and re-exported - by `packages/web/src/lib/tasks/task-keys.ts`). -- **Canonical mission tree target:** `Optimize Earth` → `End War and Disease` - → the Court of Humanity / treaty workstreams below. The public site can - still surface "Sign the 1% Treaty" first; the tree exists so tasks, agents, - APIs, and dashboards share one mental model. - -## Strategic Frame (2026-05-08) - -Until the 1% Treaty passes, this repo is in campaign mode. - -- `warondisease.org` is the front door. It should get a human to vote, - recruit two more humans, get an organization to join, register a plaintiff, - or pressure a leader. -- `optimitron.com` is the operating system and evidence layer behind the - campaign: tasks, communications, referrals, OPG/OBG/Wishocracy, politician - grading, impact math, and AI-agent coordination. -- Development defaults and PR visual reviews should put the War on Disease - variant first. Secondary variant galleries for Optimitron, dFDA, and DIH are - useful regression checks, but they should not be the main review burden while - the campaign is the bottleneck. -- Park broad platform work unless it directly improves vote conversion, - referral propagation, organization endorsement, leader pressure, plaintiff - registration, search/indexing discoverability, or trust in the quantified - model. -- Do not move Optimitron's governance/proof systems onto the campaign homepage. - Link to them when they make the campaign more credible; otherwise keep the - campaign surface focused on action. - -## Current State Snapshot - -- War on Disease is the default product focus; development and PR review should - keep that variant first. -- The treaty vote, referral attribution, post-vote share flow, and organization - endorsement flow exist. -- `optimize-earth` exists as the root task key/id, and the canonical campaign - task tree now syncs through managed data so source-controlled data, - production rows, MCP, API, and pages cannot drift. -- The managed canonical task sync work from `feature/managed-task-tree-sync` is now - on `main` via `PR #71` and drives production deploy via CI. -- `/humanity-v-government` renders the operational case. `/court` exists but - still needs the live Court surface, plaintiff/juror counter, and final - treaty-as-verdict framing. -- MCP task assignment email, inbound reply fan-out, and the focused round-trip - integration test have shipped. -- Visual review exists, but still needs cache-busted review URLs, fewer missing - before/after pairs, direct preview links, and deterministic animation settling. - -## Gaps blocking 4B - -Ordered by funnel-stage impact. P0 = ship next; P1 = right after; P2 = before launch. - -### P0 — Confirm preview build memory after generated-data type-graph cleanup - -- PR #70 removed the generated country-panel/government-leader import path from - client task rows and the treaty reminder composer, then removed the broad - data-fetcher/type imports that pulled the generated country panel and median - income datasets into Next type validation. Local `next build` now keeps those - giant generated datasets out of static client chunks and the Next type graph. -- Watch the next Vercel preview build. If it still OOMs, continue from evidence: - lower Next worker concurrency and trim remaining server-only generated data - bundles before considering paid larger builders. - -### P0 — Managed canonical data sync (seed replacement for semi-permanent rows) - -**Problem decided 2026-05-10:** normal Prisma migrations are the wrong tool for -every title/task-tree/court/trigger tweak. Production-worthy data belongs in -managed sync, not in a separate seed-only path. Missing-from-manifest also -cannot safely imply "delete this row" because user-created records live in the -same tables. - -**Target pattern:** source-controlled managed data with an idempotent sync script. - -- Create `packages/db/managed-data/` for canonical app data: - `optimize-earth-task-tree.ts`, `task-triggers.ts`, `referendums.ts`, - `court-cases.ts` as needed. -- Create `packages/db/scripts/sync-managed-data.ts` with `--dry-run` and - `--apply`. It upserts by stable ids/keys, updates only managed fields, and - soft-deletes/disabled rows only when a source record explicitly says - `retired: true`. -- Add a package/root script such as `pnpm db:sync:managed-data`. -- Run it on production deploy after `pnpm db:deploy` and before Vercel deploy. -- Keep `packages/db/prisma/seed.ts` only as Prisma's tiny entrypoint shim. - Managed sync is the source of truth for local, CI, preview, and production. -- Do not use "record missing from manifest" as a global delete rule. Deletion is - safe only inside a named managed collection, and only for rows previously owned - by that collection or explicitly marked retired. -- Managed sync must never touch user-created tasks, comments, claims, votes, - plaintiffs, represented people, donations, or task rows outside its collection. - -**First managed collection:** Optimize Earth task tree. Replace the interrupted -seed-only cleanup approach with this, then retire old direct children like -`dfda` / `bed-nets-funding-gap` through managed data rather than bespoke -migrations for every future edit. - -**Branch status:** `main` now includes this work in `PR #71` (`feature/managed-task-tree-sync` -merged). Managed-task sync for the canonical `Task` tree and task trigger blueprints -ships with dry-run/apply modes, seed reuse, and production deploy wiring. - -**Testing:** one focused unit/integration test for sync semantics: - -- upsert creates/updates managed task fields by id/taskKey; -- `retired: true` soft-deletes a managed row; -- user-created/unmanaged rows are untouched; -- `--dry-run` reports changes without writing. - -### P0 — Canonical Optimize Earth task tree - -Use managed data, not hard-coded page data and not seed-only drift, to publish -the durable tree: - -```text -Optimize Earth -└─ End War and Disease - ├─ Establish the Court of Humanity - │ ├─ Adopt the Court of Humanity charter - │ └─ Prosecute Humanity v. Governments of Earth - │ ├─ Register plaintiffs - │ ├─ Summon jurors - │ ├─ Publish evidence and damages - │ ├─ Render the verdict - │ └─ Enforce the settlement: the 1% Treaty - └─ Ratify the 1% Treaty - ├─ Get a majority of humanity to vote yes - └─ Get 193 heads of government to sign -``` - -Notes: - -- `Optimize Earth` is the root/system task. "Promote the general welfare" stays - in the description/legal frame, not as the primary task title. -- `End War and Disease` is the human-facing mission under the root. -- `Establish the Court of Humanity` is a real institution-building parent task, - not just a slogan. It should have concrete outputs such as charter/rules. -- Plaintiffs and jurors are specific to the case, so they belong under - `Prosecute Humanity v. Governments of Earth`, not directly under the Court. -- Do not add a useless "assemble plaintiffs and jurors" parent. `Register - plaintiffs` and `Summon jurors` are separate sibling tasks. -- `Ratify the 1% Treaty` is both a sibling workstream and the settlement/remedy - for the case. If the database needs the relationship without duplicate tree - parents, model it with an edge/remedy reference, not a second copy. -- The concrete government-side task wording is "Get 193 heads of government to - sign", not vague "get governments to adopt the treaty". -- `dfda` / bed-nets benchmark tasks should not be direct children of the current - War on Disease mission tree. Keep dFDA as a supporting product/page elsewhere; - bed nets can remain benchmark/reference material, not a primary campaign task. -- Primary action links should stay in the same managed source as the tree, using - the existing task communication/action endpoint if it is enough: - `End War and Disease` -> `/`, `Establish the Court of Humanity` -> `/court`, - `Prosecute Humanity v. Governments of Earth` -> `/humanity-v-government`, - `Register plaintiffs` -> `/plaintiffs`, `Ratify the 1% Treaty` -> `/vote`, - `Get 193 heads of government to sign` -> `/employees`, and `Summon jurors` - -> the invite/referral route once it exists. - -### Current implementation order (decided 2026-05-10) - -This is the canonical near-term order. Older detailed sections below are -supporting detail or parked work; they should not override this sequence. - -1. [x] **Do not ship the interrupted seed-only cleanup as the long-term pattern.** - Either replace the local partial seed/migration/test changes with managed - data, or explicitly throw them away before the next task-tree patch. -2. [x] **Build managed-data sync for tasks first.** Keep scope narrow: - `Task` rows, primary task communication endpoint, parent-child links, - explicit retire flags, dry-run/apply. -3. [x] **Move the Optimize Earth tree into managed data.** Sync production so - `Optimize Earth` becomes the root title, `End War and Disease` becomes the - primary mission child, Court/case/treaty tasks exist, and obsolete direct - benchmark children are retired. -4. [x] **Wire production deploy to run managed-data sync.** This prevents future - canonical task/title/trigger changes from requiring one-off data migrations. -5. [~] **Then update the UI presentation.** Dashboard is now a focused share - card + collapsed disclosures (PR #71); /tasks/optimize-earth and the - visual-review routes still want a simplified tree view. War on Disease - still pushes the 1% Treaty vote first. -6. [—] **`allowsUserSubtasks` schema column — parked.** Existing schema is - sufficient to add subtasks when needed. Revisit only when public subtask - creation UI is on the immediate roadmap. -7. [x] **Fold task triggers into managed data after the task-tree sync is proven.** - Trigger definitions are the same kind of semi-permanent app data and no - longer use a separate one-off production seed path. - -### Recently discussed but not yet implemented - -This is the compaction-safe backlog of chat decisions that have not obviously -landed yet. Some items also have detailed sections below; this list is the -cross-check so they do not disappear into chat history. - -**2026-05-11 session — completed (on feature/treaty-dashboard-message-first / PR #75 unless noted)** - -- [x] Visual-review per-PR persistence (PR #76 merged). `peaceiris/actions-gh-pages` with `keep_files: true` to a long-lived `gh-pages` branch; each commit lands at `pr-N//` so older review URLs survive newer pushes. -- [x] LiveCounter visual-review mask (PR #76 merged). Component honors `__OPTIMITRON_VISUAL_REVIEW__` runtime flag, emits both `data-visual-mask="dynamic"` and `data-volatile` for screenshot + markdown-preview tooling. -- [x] Lightbox on visual-review HTML — click a screenshot to open full-viewport, click again for 1:1 zoom, Esc/close button to dismiss. -- [x] Email-template screenshots in visual review. `e2e/email-screenshots.spec.ts` renders magic-link / task-assignment / task-comment-notification / post-vote-share / referral-first-conversion / monthly-chain-digest at 720×1000 and feeds them into the same `screenshots//` tree the review HTML walks. Required adding the spec to `MODE_SPECS.visual` in `run-playwright.mjs`. -- [x] Visual review toolbar: live route-name filter, Expand all / Collapse all, "Only show changed" (actually hides unchanged), `/` keyboard focuses the filter input. -- [x] Per-route "📋 Copy context" button on visual review. Payload includes PR + branch + commit SHA, route + auth state, before/after screenshot URLs, and explicit "please `curl -O` these before responding" instructions for the coding agent. Embedded `data-context` JSON; JS click handler formats markdown and writes to clipboard. -- [x] Inline PR-timeline deployment annotation per commit (Vercel-bot style), replacing the sticky comment. Uses `createDeployment` + `createDeploymentStatus` with `environment: visual-review/pr-N`. -- [x] CSRF flake mitigation: `retries: isCI ? 2 : 0` in `playwright.config.ts`. `tasks-index-auth` had hit `ECONNRESET` on `/api/auth/csrf` three times in one session. -- [x] Cancel-safe gh-pages publish — visual-review publish steps now gated on `!cancelled()` so a concurrency-cancelled run doesn't post a partial review with "62 missing pairs". -- [x] Commit-status + deployment annotation only when publish succeeded — `steps.visual_review_pages.outcome == 'success'` gate so reviewers don't click dead links. -- [x] CI baseline loop `--limit 20 → 5` for main `web-visual-review` artifact lookup. The previous successful main run virtually always has the artifact. -- [x] Dashboard share card rewrite (`DashboardShareCard.tsx`). Replaced "Each voter who recruits two more is the campaign." marketing line with: Humanity Manager assignment frame + apocalypse math (122 apocalypses → 12.3× more clinical trials, 443yr → 36yr eradication timeline). Every number sourced from `@optimitron/data/parameters` via `` for citation popovers. -- [x] `/treaty` restored to the original commit-`1c58293e` skim-and-sign layout. Single centered serif headline ("Please quickly skim and sign to end war and disease."), continuous treaty body, single signature box. No stepper, no slide split, no decorative dividers, no competing Court CTA. Added a `/treaty` Playwright regression test (`e2e/treaty-page-structure.spec.ts`) asserting headline + treaty body phrases + Yes/No buttons. -- [x] `/treaty` body fallback. `getReferendumPageContent()` now falls back to bundled `shareableSnippets.onePercentTreatyText.markdown` when the DB row's `bodyMarkdown` is null/empty — previously preview deployments with unseeded DBs rendered only the headline + signature box. -- [x] `/signatories` cleanup — removed top "Public record / Signatories / Humans and organizations…" block and the "Living votes / Represented humans / Memorial votes / Total voices" stats box. Just the leaderboard. -- [x] `/tasks/[id]` cleanup — removed the verbose `
` metadata sidebar (Owner / Progress / Time needed / Area / Completed / Updates) that duplicated header info. Kept Deaths-from-delay + Wasted-by-delay as inline tags above the markdown body. Effort hours moved into the inline header metadata strip. -- [x] `HUMANITY_V_GOVERNMENT_CASE_NAME` canonical constant in `@optimitron/db/task-keys`, sourced by `humanityVGovernmentLink.label`, `/court` page copy, and managed-task-tree titles. Replaces the drift between "Humanity v. Government" and "Humanity v. Governments of Earth". -- [x] CLAUDE.md voice rule reinforced — "Write like Kurt Vonnegut. Plain words. Short declaratives." Button labels and microcopy default to verb-first imperatives; banned list includes "Take ownership", "Engage", "Empower", "Unlock", "Streamline". -- [x] Nav label rename: `tasksLink.label` "Tasks" → "To-Do List for Humanity", CTA "Open Tasks" → "Open the list". -- [x] CodeRabbit cleanup (commit `5872a64b`): visual-review/* deployments excluded from preview-URL discovery; `
` route anchors carry `id="route-"` so copied URLs scroll; `getRecipientReferralUrl` failures no longer abort task-assignment / task-comment notification batches. - -**2026-05-11 session — discussed but not yet implemented** - -- Task-list rows fully clickable. Currently inner `` on the avatar / name traps clicks and navigates to the assignee's person page instead of the task. Task lists (not the detail page) should treat the entire row as a single link to `/tasks/`; assignee navigation lives on the detail page itself. Affects `task-row.tsx` across the `signer` / compact variants — replace inner `` wrappers with non-interactive spans. -- Avatar next to assignee on `/tasks/[id]` header. Currently shows just "Assigned to " as text; should render the assignee's avatar inline so the page matches the visual density of task lists. -- Decide what to do with the task claim button (no consensus yet). Current behavior: logged-out users see nothing, logged-in users see "Claim Task". User flagged the verb "claim" as bad. Two open questions: (1) keep / drop the logged-out sign-in nudge entirely; (2) rename "Claim Task" to a Vonnegut-style verb ("Do this." is the working candidate — NOT "Take this on", that was rejected as corporate-onboarding). -- Reframe `formatEnumLabel(viewerClaim.status)` output in the task-detail viewer state strip — current "Claimed" / "In Progress" / "Completed" / "Verified" labels leak the enum into user copy. -- Remove drop-shadow on the Updates-section "Sign In" button + audit all other buttons that still carry hard-offset / soft shadows. CLAUDE.md already says "no shadows by default" — the Updates Sign-In on a logged-out task page is a known offender. -- Investigate Neon DB branch-per-preview-deployment. Currently Vercel previews point at whatever `DATABASE_URL` is set on the preview environment — there's no managed-data sync against a per-PR DB, so previews show stale/missing seed data (which is why the `/treaty` row had a null `bodyMarkdown` and surfaced the page bug above). The Vercel Neon integration creates a branch per PR and runs migrations automatically; main alternative is a `sync-on-preview` workflow step that hits a preview-scoped DB. -- [x] Drop the duplicate dry-run managed-data step from `core-validate`. Web-validate's `--apply` against the freshly-migrated CI Postgres catches the same drift and the dry-run step's diff was always identical (empty CI DB → "would create N rows" every time). Shipped in `0810ecaa`-era; web-validate now logs the plan before applying so the diff is still visible in CI logs. -- [x] `voice-critic` Claude Code subagent at `.claude/agents/voice-critic.md`. Reads diffs against the Vonnegut voice rule, reuse-before-rewrite inventory, ParameterValue rule, peak-commitment rule, and significant-figures floor. Spawn after any user-facing copy / UI change. -- [x] `pr-comment-triager` Claude Code subagent at `.claude/agents/pr-comment-triager.md`. Walks open PR comments, classifies valid vs. AI-slop, fixes valid ones in focused commits, resolves slop with on-thread reasons. Refuses to blindly comply with bots. -- [x] `test-auditor` Claude Code subagent at `.claude/agents/test-auditor.md`. Walks the test suite for the slop patterns the Testing Rules section bans, finds flaky tests in CI history, identifies critical untested paths. Returns delete + add + flaky lists. -- [x] Local review pipeline: `pnpm --filter @optimitron/web review:local` runs `copy:preview` (markdown extract) + Playwright visual regression + builds the review HTML + opens it. Requires `next dev` on :3001 separately. (Originally scoped a `review:watch` variant too; dropped pre-ship — full Playwright run is 2-4 min, so debouncing source changes into a watcher would burn cycles on results that arrive after the developer has moved on.) -- Speedup attempt redo: path-filter `web-validate` so non-web PRs short-circuit. Previous attempt (`b50469063`) produced an unparseable workflow file; needs smaller incremental commits this time to isolate which construct GitHub objected to. -- Build a `/dev/email/