Skip to content

feat: migrate redis to postgres#92

Merged
kasin-it merged 3 commits into
devfrom
AIW-74-postgres-migration
Jun 15, 2026
Merged

feat: migrate redis to postgres#92
kasin-it merged 3 commits into
devfrom
AIW-74-postgres-migration

Conversation

@kasin-it

@kasin-it kasin-it commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • Documentation

    • Updated setup guides to use Neon PostgreSQL instead of Upstash Redis as the data store.
    • Added new Neon initialization guide with marketplace setup instructions.
    • Updated learnings documentation with Redis→PostgreSQL migration insights.
  • New Features

    • PostgreSQL-based run registry and post-PR gate store for improved data durability.
    • Enhanced workflow run telemetry with structured database persistence.
  • Chores

    • Updated environment configuration templates and examples for PostgreSQL.
    • Added database schema, migrations, and test infrastructure.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 79e3a8f6-df50-45f4-958b-f966cac64b49

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces Upstash Redis with Neon Postgres across the entire worker app. Introduces a Drizzle ORM schema with eight tables, SQL migrations, a Postgres-backed RunRegistry adapter and rewritten GateStore, a build-time migration runner with environment-isolation guard, workflow run telemetry persistence, ported E2E helpers, and updated documentation/operator skills.

Changes

Upstash Redis → Neon Postgres migration

Layer / File(s) Summary
Drizzle schema, migrations, DB client, and test harness
apps/worker/src/db/schema.ts, apps/worker/drizzle/0000_elite_paibok.sql, apps/worker/drizzle/0001_abnormal_dreaming_celestial.sql, apps/worker/drizzle/meta/*, apps/worker/drizzle.config.ts, apps/worker/src/db/client.ts, apps/worker/src/db/test-db.ts, apps/worker/package.json
Defines eight Drizzle ORM tables (active_runs, failed_tickets, thread_parents, gate_locks, gate_dedupe, gate_current, env_marker, workflow_runs), generates two SQL migrations with snapshots and journal, adds a lazy getDb() singleton wired to DATABASE_URL, creates a PGlite in-memory test harness applying all migration SQL files, and registers @neondatabase/serverless, @electric-sql/pglite, and drizzle-kit as dependencies with new db:generate/db:migrate scripts.
PostgresRunRegistry adapter (replaces UpstashRunRegistry)
apps/worker/src/adapters/run-registry/postgres.ts, apps/worker/src/adapters/run-registry/postgres.test.ts, apps/worker/src/adapters/run-registry/upstash.ts, apps/worker/src/adapters/run-registry/upstash.test.ts, apps/worker/src/adapters/run-registry/types.ts
Adds PostgresRunRegistry implementing RunRegistryAdapter and ThreadStore using INSERT … ON CONFLICT semantics for atomic claiming, idempotent registration with createdAt refresh, sandbox tracking, failed-ticket upserts, and thread-parent CRUD; removes UpstashRunRegistry and its 226-line test suite; adds 192-line Vitest coverage for all methods.
GateStore rewritten from Redis+Lua to Postgres
apps/worker/src/post-pr-gate/gate-store.ts, apps/worker/src/post-pr-gate/gate-store.test.ts
Replaces Redis Lua-script-based locking/dedupe with SQL upserts guarded by expiresAt < now() set-where predicates on gate_locks, gate_dedupe, and gate_current; adds token-guarded releaseLock, TTL-aware reads, bigint array concatenation for checkRunIds, conditional updateRunIdIfHeadSha, and purgeExpired physical cleanup; constructor changes from { url, token, envPrefix } to (db: Db); adds 200-line Vitest suite.
Build-time migration runner and env_marker isolation guard
apps/worker/scripts/db-migrate.ts, apps/worker/package.json
Adds db-migrate.ts that loads .env.local/.env, skips gracefully without DATABASE_URL, runs drizzle-kit migrate, upserts an env_marker row encoding VERCEL_ENV and normalized endpoint host, then fails the build (exitCode = 1) if a different VERCEL_ENV already owns the same Neon branch; the build script now calls pnpm db:migrate first.
Runtime adapter wiring and env schema
apps/worker/src/lib/adapters.ts, apps/worker/src/lib/step-adapters.ts, apps/worker/src/routes/webhooks/github.post.ts, apps/worker/src/workflows/post-pr-gate.ts, apps/worker/env.ts, apps/worker/env.test.ts, apps/worker/src/lib/dispatch.ts, apps/worker/src/lib/reconcile.ts, apps/worker/src/sandbox/stop-ticket-sandboxes.ts
Swaps UpstashRunRegistry for PostgresRunRegistry(getDb()) in createAdapters() and createStepAdapters(); replaces AI_WORKFLOW_KV_REST_API_URL/TOKEN with DATABASE_URL in the env schema; changes GateStore construction in the webhook handler and runGate workflow to new GateStore(getDb()); updates comments referencing Redis to Postgres.
Workflow run telemetry persistence
apps/worker/src/sandbox/usage.ts, apps/worker/src/lib/telemetry/run-telemetry.ts, apps/worker/src/lib/telemetry/run-telemetry.test.ts, apps/worker/src/lib/telemetry/collect-snapshots.ts, apps/worker/src/lib/telemetry/collect-snapshots.test.ts, apps/worker/src/lib/overview/collect-runs.ts, apps/worker/src/workflows/agent.ts, apps/worker/src/routes/cron/poll.get.ts
Adds phaseCostUsd/computeUsageTotals in usage.ts; introduces upsertRunSnapshots (bulk COALESCE upsert) and recordRunUsage (single-run usage+PR upsert) in run-telemetry.ts; adds collectSnapshots joining active_runs and gate_current to build RunSnapshot rows; hooks recordRunTelemetryStep into agentWorkflow's finally; adds best-effort purgeExpired and upsertRunSnapshots to the cron handler; exports STATUS_MAP and mapWorkflow for reuse.
E2E registry helper and tier-2 test rewiring
apps/worker/e2e/helpers/registry.ts, apps/worker/e2e/helpers/redis.ts, apps/worker/e2e/env.ts, apps/worker/e2e/tier2/us01-*.tsus15-*.ts, apps/worker/scripts/clear-run-registry.ts, apps/worker/.env.example, apps/worker/.env.e2e.example
Adds registry.ts e2e helper (Neon SQL client using e2eEnv.DATABASE_URL) with the same interface as the deleted redis.ts; adds DATABASE_URL to the e2e Zod schema; updates import lines in all 15 tier-2 tests to use registry.js; rewrites clear-run-registry.ts to delete from active_runs, failed_tickets, thread_parents; updates both .env.example files to replace Upstash placeholders with DATABASE_URL.
Documentation and operator skills
README.md, SETUP.md, .claude/skills/init-neon/SKILL.md, .claude/skills/init-env/SKILL.md, .claude/skills/init-upstash/SKILL.md, .claude/skills/init-*/SKILL.md, .claude/learnings.md, docs/superpowers/plans/2026-06-10-redis-to-neon-postgres.md
Rewrites README.md and SETUP.md to describe Neon Postgres as the run-registry backend; removes the init-upstash skill; adds a new init-neon skill covering Marketplace setup, branch-per-environment isolation, and migration verification; updates init-env/init-agent/init-jira/init-slack/init-vcs skills to reference Neon; appends Pino and migration learnings; commits the 1771-line implementation plan.

Sequence Diagram(s)

sequenceDiagram
  participant Build as Vercel Build
  participant dbMigrate as db-migrate.ts
  participant NeonDB as Neon Postgres
  participant agentWorkflow as agentWorkflow
  participant PostgresRunRegistry
  participant GateStore
  participant cronPoll as Cron Poll

  Build->>dbMigrate: pnpm db:migrate
  dbMigrate->>NeonDB: drizzle-kit migrate (CREATE TABLE …)
  dbMigrate->>NeonDB: INSERT env_marker ON CONFLICT DO NOTHING
  NeonDB-->>dbMigrate: stored env + endpoint_host
  dbMigrate-->>Build: success or exitCode=1 (branch conflict)

  agentWorkflow->>PostgresRunRegistry: claim(ticketKey, runId)
  PostgresRunRegistry->>NeonDB: INSERT active_runs ON CONFLICT DO NOTHING
  NeonDB-->>PostgresRunRegistry: claimed / already held
  agentWorkflow->>agentWorkflow: finally — computeUsageTotals
  agentWorkflow->>NeonDB: recordRunUsage → workflow_runs UPSERT

  Note over GateStore,NeonDB: GitHub webhook path
  GateStore->>NeonDB: INSERT gate_locks ON CONFLICT setWhere(expired) DO UPDATE
  GateStore->>NeonDB: INSERT gate_dedupe ON CONFLICT setWhere(expired) DO UPDATE
  GateStore->>NeonDB: UPDATE gate_current SET check_run_ids = check_run_ids || $ids

  cronPoll->>NeonDB: GateStore.purgeExpired()
  cronPoll->>NeonDB: upsertRunSnapshots → workflow_runs bulk UPSERT
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • Blazity/ai-workflow#61: This PR replaces the UpstashRunRegistry that backed ChatSDKAdapter's ThreadStore with PostgresRunRegistry using the thread_parents table, directly changing the storage mechanism introduced in PR #61.
  • Blazity/ai-workflow#86: Both PRs modify apps/worker/src/post-pr-gate/gate-store.ts and apps/worker/src/routes/webhooks/github.post.ts — this PR migrates GateStore from Redis to Postgres while PR #86 introduced the original GateStore and webhook behavior.
  • Blazity/ai-workflow#37: The failed-ticket safeguard APIs (markFailed, isTicketFailed, clearFailedMark) implemented here in PostgresRunRegistry and the failed_tickets table directly replace the Redis-backed storage introduced in PR #37 for dispatch/reconcile infinite-retry prevention.

Poem

🐇 Hopped from Redis hashes, swift and lean,
To Postgres tables, durable and clean!
ON CONFLICT DO NOTHING claims the run,
While Lua scripts vanish, their work is done.
With Drizzle migrations and env_marker true,
This bunny's registry is born anew! 🥕

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch AIW-74-postgres-migration

@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
ai-workflow-app-dashboard Ready Ready Preview, Comment Jun 15, 2026 11:31am

Request Review

@kasin-it

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
docs/superpowers/plans/2026-06-10-redis-to-neon-postgres.md (1)

39-39: 💤 Low value

Fix minor markdown linting issues.

Two low-priority formatting issues:

  • Line 39: Heading levels increment incorrectly (h2 to h3 without h2.1). Consider restructuring or using h2.
  • Line 1669: Fenced code block is missing a language identifier (e.g., ```bash).

Also applies to: 1669-1669

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/plans/2026-06-10-redis-to-neon-postgres.md` at line 39, Fix
two markdown linting issues: First, the heading "Task 1: Dependencies, Drizzle
schema, generated migration" uses an h3 level (###) which breaks heading
hierarchy when appearing after an h2. Change this heading to h2 level (##) to
maintain proper heading structure. Second, locate the fenced code block that is
missing a language identifier and add an appropriate language tag (such as bash,
sql, etc.) after the opening triple backticks to properly format the code block.
apps/worker/e2e/tier2/us03-review-fix-cycle.test.ts (1)

20-20: ⚡ Quick win

Rename stale Redis cleanup aliases to registry terminology.

These imports now target ../helpers/registry.js, but keeping the redisCleanup alias preserves outdated storage naming and makes E2E failure output harder to interpret during Postgres migration debugging.

  • apps/worker/e2e/tier2/us03-review-fix-cycle.test.ts#L20-L20: rename cleanup as redisCleanup to cleanup as registryCleanup, and update local callsites / related wait descriptions accordingly.
  • apps/worker/e2e/tier2/us04-merge-conflict-rebase.test.ts#L19-L19: rename cleanup as redisCleanup to cleanup as registryCleanup, and update local callsites / related wait descriptions accordingly.
  • apps/worker/e2e/tier2/us05-unclear-ticket-clarification.test.ts#L10-L10: rename cleanup as redisCleanup to cleanup as registryCleanup, and update local callsites / related wait descriptions accordingly.
  • apps/worker/e2e/tier2/us06-clarification-answered.test.ts#L17-L17: rename cleanup as redisCleanup to cleanup as registryCleanup, and update local callsites / related wait descriptions accordingly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/worker/e2e/tier2/us03-review-fix-cycle.test.ts` at line 20, Rename the
stale Redis cleanup alias to registry terminology across all four test files. In
apps/worker/e2e/tier2/us03-review-fix-cycle.test.ts (line 20), rename the import
alias from cleanup as redisCleanup to cleanup as registryCleanup, then update
all local callsites where redisCleanup is invoked to use registryCleanup
instead, and update any wait descriptions mentioning Redis cleanup. Apply the
identical changes to apps/worker/e2e/tier2/us04-merge-conflict-rebase.test.ts
(line 19), apps/worker/e2e/tier2/us05-unclear-ticket-clarification.test.ts (line
10), and apps/worker/e2e/tier2/us06-clarification-answered.test.ts (line
17)—each file requires the import alias rename and all local invocations of
redisCleanup updated to registryCleanup to align with the registry helper
terminology.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/worker/scripts/clear-run-registry.ts`:
- Around line 30-31: Replace all SQL query interpolations with table names using
a label-to-static-query mapping pattern. At lines 30-31, 40-42, and 49-50, the
queries currently interpolate the table variable directly into SQL strings
(e.g., `SELECT * FROM ${table}`). Instead, create a switch statement or object
mapping keyed by the label variable that returns pre-written static SQL queries
for each table, eliminating the dynamic string interpolation and ensuring all
executed SQL strings are fixed literals that SAST tools will accept.

In `@apps/worker/scripts/db-migrate.ts`:
- Around line 47-48: Add a defensive check after the SELECT query in the
database migration script to verify that the marker row was successfully
retrieved. After the sql query executes and assigns to rows, check if rows is
empty or undefined. If the rows array is empty, throw a descriptive error
message indicating that the env_marker row with id 1 is missing, rather than
allowing the code to proceed to access rows[0] which would result in an unclear
TypeError. This guard should prevent the code from attempting to access
properties like endpoint_host on an undefined or missing row.

In `@apps/worker/src/adapters/run-registry/postgres.ts`:
- Around line 63-70: The registerSandbox method in the postgres.ts file silently
succeeds even when the UPDATE affects zero rows, which can leave orphan
sandboxes running. After executing the update operation on activeRuns with the
ticketKey filter, check the number of affected rows from the result. If the
update affected zero rows, throw an error to fail fast and alert that no active
run exists for the given ticketKey, rather than silently losing the sandbox
linkage that cleanup paths depend on.

In `@apps/worker/src/db/schema.ts`:
- Around line 150-153: The costUsd field in the schema currently uses the real
type for storing monetary values, which is floating-point and will accumulate
rounding errors during cost rollups and aggregations. Replace the real type with
numeric(p,s) (such as numeric(19,4) for currency values) to provide
fixed-precision decimal storage appropriate for monetary amounts. After updating
the schema definition for costUsd, regenerate the database migration to ensure
the schema changes are properly captured.

In `@apps/worker/src/workflows/agent.ts`:
- Around line 972-980: Before calling computeUsageTotals on the phaseUsages
object, you need to mark any phases that were launched but don't have parsed
usage data as unknown. Identify which phases were launched during the workflow
execution, then check if each launched phase has an entry in phaseUsages. For
any launched phase that is missing from phaseUsages (indicating a timeout or
error exit before usage could be parsed), add it to phaseUsages with a null
value to mark it as unknown. This ensures that computeUsageTotals will properly
recognize these incomplete phases and set costKnown to false rather than
incorrectly reporting costKnown as true with costUsd of 0.

In `@docs/superpowers/plans/2026-06-10-redis-to-neon-postgres.md`:
- Line 1234: The Pino logging pattern in the logger.warn catch block uses the
incorrect key `error` instead of `err`. Update the logging object from `{ error:
err }` to `{ err: (err as Error).message }` to match the correct Pino error
serialization pattern, which only applies to the `err` key. This ensures the
documentation guides the implementer toward the correct logging pattern
consistent with the learnings and actual implementation in poll.get.ts.

---

Nitpick comments:
In `@apps/worker/e2e/tier2/us03-review-fix-cycle.test.ts`:
- Line 20: Rename the stale Redis cleanup alias to registry terminology across
all four test files. In apps/worker/e2e/tier2/us03-review-fix-cycle.test.ts
(line 20), rename the import alias from cleanup as redisCleanup to cleanup as
registryCleanup, then update all local callsites where redisCleanup is invoked
to use registryCleanup instead, and update any wait descriptions mentioning
Redis cleanup. Apply the identical changes to
apps/worker/e2e/tier2/us04-merge-conflict-rebase.test.ts (line 19),
apps/worker/e2e/tier2/us05-unclear-ticket-clarification.test.ts (line 10), and
apps/worker/e2e/tier2/us06-clarification-answered.test.ts (line 17)—each file
requires the import alias rename and all local invocations of redisCleanup
updated to registryCleanup to align with the registry helper terminology.

In `@docs/superpowers/plans/2026-06-10-redis-to-neon-postgres.md`:
- Line 39: Fix two markdown linting issues: First, the heading "Task 1:
Dependencies, Drizzle schema, generated migration" uses an h3 level (###) which
breaks heading hierarchy when appearing after an h2. Change this heading to h2
level (##) to maintain proper heading structure. Second, locate the fenced code
block that is missing a language identifier and add an appropriate language tag
(such as bash, sql, etc.) after the opening triple backticks to properly format
the code block.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ad886923-96f4-4dd5-b8d2-5bffab51448b

📥 Commits

Reviewing files that changed from the base of the PR and between cb4234f and bf9b368.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (67)
  • .claude/learnings.md
  • .claude/skills/init-agent/SKILL.md
  • .claude/skills/init-env/SKILL.md
  • .claude/skills/init-jira/SKILL.md
  • .claude/skills/init-neon/SKILL.md
  • .claude/skills/init-slack/SKILL.md
  • .claude/skills/init-slack/references/slash-commands.md
  • .claude/skills/init-upstash/SKILL.md
  • .claude/skills/init-vcs/SKILL.md
  • README.md
  • SETUP.md
  • apps/worker/.env.e2e.example
  • apps/worker/.env.example
  • apps/worker/drizzle.config.ts
  • apps/worker/drizzle/0000_elite_paibok.sql
  • apps/worker/drizzle/0001_abnormal_dreaming_celestial.sql
  • apps/worker/drizzle/meta/0000_snapshot.json
  • apps/worker/drizzle/meta/0001_snapshot.json
  • apps/worker/drizzle/meta/_journal.json
  • apps/worker/e2e/env.ts
  • apps/worker/e2e/helpers/redis.ts
  • apps/worker/e2e/helpers/registry.ts
  • apps/worker/e2e/tier2/us01-clear-ticket-pr.test.ts
  • apps/worker/e2e/tier2/us03-review-fix-cycle.test.ts
  • apps/worker/e2e/tier2/us04-merge-conflict-rebase.test.ts
  • apps/worker/e2e/tier2/us05-unclear-ticket-clarification.test.ts
  • apps/worker/e2e/tier2/us06-clarification-answered.test.ts
  • apps/worker/e2e/tier2/us07-agent-failure-backlog.test.ts
  • apps/worker/e2e/tier2/us08-previously-failed-skip.test.ts
  • apps/worker/e2e/tier2/us09-failed-marker-cleared.test.ts
  • apps/worker/e2e/tier2/us10-duplicate-dispatch-prevented.test.ts
  • apps/worker/e2e/tier2/us11-capacity-limit-respected.test.ts
  • apps/worker/e2e/tier2/us12-ticket-moved-out-during-dispatch.test.ts
  • apps/worker/e2e/tier2/us13-webhook-immediate-dispatch.test.ts
  • apps/worker/e2e/tier2/us14-stale-claim-cleanup.test.ts
  • apps/worker/e2e/tier2/us15-orphaned-run-cancelled.test.ts
  • apps/worker/env.test.ts
  • apps/worker/env.ts
  • apps/worker/package.json
  • apps/worker/scripts/clear-run-registry.ts
  • apps/worker/scripts/db-migrate.ts
  • apps/worker/src/adapters/run-registry/postgres.test.ts
  • apps/worker/src/adapters/run-registry/postgres.ts
  • apps/worker/src/adapters/run-registry/types.ts
  • apps/worker/src/adapters/run-registry/upstash.test.ts
  • apps/worker/src/adapters/run-registry/upstash.ts
  • apps/worker/src/db/client.ts
  • apps/worker/src/db/schema.ts
  • apps/worker/src/db/test-db.ts
  • apps/worker/src/lib/adapters.ts
  • apps/worker/src/lib/dispatch.ts
  • apps/worker/src/lib/overview/collect-runs.ts
  • apps/worker/src/lib/reconcile.ts
  • apps/worker/src/lib/step-adapters.ts
  • apps/worker/src/lib/telemetry/collect-snapshots.test.ts
  • apps/worker/src/lib/telemetry/collect-snapshots.ts
  • apps/worker/src/lib/telemetry/run-telemetry.test.ts
  • apps/worker/src/lib/telemetry/run-telemetry.ts
  • apps/worker/src/post-pr-gate/gate-store.test.ts
  • apps/worker/src/post-pr-gate/gate-store.ts
  • apps/worker/src/routes/cron/poll.get.ts
  • apps/worker/src/routes/webhooks/github.post.ts
  • apps/worker/src/sandbox/stop-ticket-sandboxes.ts
  • apps/worker/src/sandbox/usage.ts
  • apps/worker/src/workflows/agent.ts
  • apps/worker/src/workflows/post-pr-gate.ts
  • docs/superpowers/plans/2026-06-10-redis-to-neon-postgres.md
💤 Files with no reviewable changes (4)
  • apps/worker/src/adapters/run-registry/upstash.test.ts
  • .claude/skills/init-upstash/SKILL.md
  • apps/worker/src/adapters/run-registry/upstash.ts
  • apps/worker/e2e/helpers/redis.ts

Comment on lines +30 to +31
for (const [label, table] of Object.entries(tables)) {
const rows = await sql.query(`SELECT * FROM ${table}`);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace interpolated table-name SQL with explicit static queries.

These queries interpolate ${table} into raw SQL. It’s currently constant-backed, but this pattern is still flagged by SAST and is easy to misuse later. Prefer a label→static-query switch so all executed SQL strings are fixed literals.

Suggested refactor
 async function dump() {
-  for (const [label, table] of Object.entries(tables)) {
-    const rows = await sql.query(`SELECT * FROM ${table}`);
+  for (const [label] of Object.entries(tables)) {
+    const rows =
+      label === "active"
+        ? await sql.query("SELECT * FROM active_runs")
+        : label === "failed"
+          ? await sql.query("SELECT * FROM failed_tickets")
+          : await sql.query("SELECT * FROM thread_parents");
     console.log(`\n[${label}] ${table}`);
     if (rows.length === 0) console.log("  (empty)");
     else for (const r of rows) console.log(`  ${JSON.stringify(r)}`);
   }
 }
 
 async function clearTicket(t: string) {
-  for (const [label, table] of Object.entries(tables)) {
-    const rows = await sql.query(
-      `DELETE FROM ${table} WHERE ticket_key = $1 RETURNING ticket_key`,
-      [t],
-    );
+  for (const [label] of Object.entries(tables)) {
+    const rows =
+      label === "active"
+        ? await sql.query(
+            "DELETE FROM active_runs WHERE ticket_key = $1 RETURNING ticket_key",
+            [t],
+          )
+        : label === "failed"
+          ? await sql.query(
+              "DELETE FROM failed_tickets WHERE ticket_key = $1 RETURNING ticket_key",
+              [t],
+            )
+          : await sql.query(
+              "DELETE FROM thread_parents WHERE ticket_key = $1 RETURNING ticket_key",
+              [t],
+            );
     console.log(`  delete ${label} ${t} -> ${rows.length}`);
   }
 }
 
 async function clearAll() {
-  for (const [label, table] of Object.entries(tables)) {
-    const rows = await sql.query(`DELETE FROM ${table} RETURNING ticket_key`);
+  for (const [label] of Object.entries(tables)) {
+    const rows =
+      label === "active"
+        ? await sql.query("DELETE FROM active_runs RETURNING ticket_key")
+        : label === "failed"
+          ? await sql.query("DELETE FROM failed_tickets RETURNING ticket_key")
+          : await sql.query("DELETE FROM thread_parents RETURNING ticket_key");
     console.log(`  delete all ${label} -> ${rows.length}`);
   }
 }

Also applies to: 40-42, 49-50

🧰 Tools
🪛 ast-grep (0.43.0)

[error] 30-30: Avoid SQL injection
Context: sql.query(SELECT * FROM ${table})
Note: [CWE-89].

(sql-injection-typescript)

🪛 OpenGrep (1.22.0)

[ERROR] 31-31: SQL query built via string concatenation or template literal passed to query()/execute(). Use parameterized queries instead.

(coderabbit.sql-injection.raw-query-concat-js)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/worker/scripts/clear-run-registry.ts` around lines 30 - 31, Replace all
SQL query interpolations with table names using a label-to-static-query mapping
pattern. At lines 30-31, 40-42, and 49-50, the queries currently interpolate the
table variable directly into SQL strings (e.g., `SELECT * FROM ${table}`).
Instead, create a switch statement or object mapping keyed by the label variable
that returns pre-written static SQL queries for each table, eliminating the
dynamic string interpolation and ensuring all executed SQL strings are fixed
literals that SAST tools will accept.

Source: Linters/SAST tools

Comment on lines +47 to +48
const rows = await sql`SELECT env, endpoint_host FROM env_marker WHERE id = 1`;
const marker = rows[0] as { env: string; endpoint_host: string };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a defensive check for missing marker row.

If the row is unexpectedly absent (e.g., concurrent deletion between the INSERT and SELECT), accessing marker.endpoint_host will throw an unclear TypeError. A guard with a meaningful error would improve debuggability.

Suggested fix
 const rows = await sql`SELECT env, endpoint_host FROM env_marker WHERE id = 1`;
+if (!rows[0]) {
+  console.error("[db-migrate] FATAL: env_marker row missing after insert — possible concurrent modification.");
+  process.exit(1);
+}
 const marker = rows[0] as { env: string; endpoint_host: string };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const rows = await sql`SELECT env, endpoint_host FROM env_marker WHERE id = 1`;
const marker = rows[0] as { env: string; endpoint_host: string };
const rows = await sql`SELECT env, endpoint_host FROM env_marker WHERE id = 1`;
if (!rows[0]) {
console.error("[db-migrate] FATAL: env_marker row missing after insert — possible concurrent modification.");
process.exit(1);
}
const marker = rows[0] as { env: string; endpoint_host: string };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/worker/scripts/db-migrate.ts` around lines 47 - 48, Add a defensive
check after the SELECT query in the database migration script to verify that the
marker row was successfully retrieved. After the sql query executes and assigns
to rows, check if rows is empty or undefined. If the rows array is empty, throw
a descriptive error message indicating that the env_marker row with id 1 is
missing, rather than allowing the code to proceed to access rows[0] which would
result in an unclear TypeError. This guard should prevent the code from
attempting to access properties like endpoint_host on an undefined or missing
row.

Comment on lines +63 to +70
async registerSandbox(ticketKey: string, sandboxId: string): Promise<void> {
// Sandboxes are only registered after claim()/register(), so the row
// exists; a bare UPDATE keeps run_id NOT NULL without an upsert dance.
await this.db
.update(activeRuns)
.set({ sandboxId })
.where(eq(activeRuns.ticketKey, ticketKey));
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when sandbox registration updates zero rows.

registerSandbox() currently no-ops when no active_runs row exists, which silently loses sandbox linkage. Cleanup paths depend on this mapping, so this can leave orphan sandboxes running.

Suggested fix
   async registerSandbox(ticketKey: string, sandboxId: string): Promise<void> {
     // Sandboxes are only registered after claim()/register(), so the row
     // exists; a bare UPDATE keeps run_id NOT NULL without an upsert dance.
-    await this.db
+    const updated = await this.db
       .update(activeRuns)
       .set({ sandboxId })
-      .where(eq(activeRuns.ticketKey, ticketKey));
+      .where(eq(activeRuns.ticketKey, ticketKey))
+      .returning({ ticketKey: activeRuns.ticketKey });
+
+    if (updated.length === 0) {
+      throw new Error(`Cannot register sandbox for missing active run: ${ticketKey}`);
+    }
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async registerSandbox(ticketKey: string, sandboxId: string): Promise<void> {
// Sandboxes are only registered after claim()/register(), so the row
// exists; a bare UPDATE keeps run_id NOT NULL without an upsert dance.
await this.db
.update(activeRuns)
.set({ sandboxId })
.where(eq(activeRuns.ticketKey, ticketKey));
}
async registerSandbox(ticketKey: string, sandboxId: string): Promise<void> {
// Sandboxes are only registered after claim()/register(), so the row
// exists; a bare UPDATE keeps run_id NOT NULL without an upsert dance.
const updated = await this.db
.update(activeRuns)
.set({ sandboxId })
.where(eq(activeRuns.ticketKey, ticketKey))
.returning({ ticketKey: activeRuns.ticketKey });
if (updated.length === 0) {
throw new Error(`Cannot register sandbox for missing active run: ${ticketKey}`);
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/worker/src/adapters/run-registry/postgres.ts` around lines 63 - 70, The
registerSandbox method in the postgres.ts file silently succeeds even when the
UPDATE affects zero rows, which can leave orphan sandboxes running. After
executing the update operation on activeRuns with the ticketKey filter, check
the number of affected rows from the result. If the update affected zero rows,
throw an error to fail fast and alert that no active run exists for the given
ticketKey, rather than silently losing the sandbox linkage that cleanup paths
depend on.

Comment on lines +150 to +153
// Cost & usage — workflow-owned (accumulated PhaseUsage). costKnown is false
// when any phase cost couldn't be priced (e.g. Codex with no price lookup).
costUsd: real("cost_usd"),
costKnown: boolean("cost_known"),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the schema file and check its current state
fd -e ts -p "schema.ts" apps/worker/src/db/

Repository: Blazity/ai-workflow

Length of output: 92


🏁 Script executed:

# Check the line count and specific lines around 150-153
if [ -f "apps/worker/src/db/schema.ts" ]; then
  echo "=== Lines 145-160 of schema.ts ==="
  sed -n '145,160p' apps/worker/src/db/schema.ts | cat -n
fi

Repository: Blazity/ai-workflow

Length of output: 816


🏁 Script executed:

# Look for the drizzle migration file
fd -e sql -p "drizzle" apps/worker/

Repository: Blazity/ai-workflow

Length of output: 162


🏁 Script executed:

# Check for any numeric imports in schema.ts
rg -A2 -B2 "from.*drizzle-orm/pg-core" apps/worker/src/db/schema.ts

Repository: Blazity/ai-workflow

Length of output: 119


🏁 Script executed:

# Get the full imports section from schema.ts
head -20 apps/worker/src/db/schema.ts | cat -n

Repository: Blazity/ai-workflow

Length of output: 739


🏁 Script executed:

# Check the drizzle migration file content
cat apps/worker/drizzle/0001_abnormal_dreaming_celestial.sql

Repository: Blazity/ai-workflow

Length of output: 1093


🏁 Script executed:

# Search for any existing numeric usage in the codebase
rg "numeric\(" apps/worker/src/db/ --type ts

Repository: Blazity/ai-workflow

Length of output: 45


🏁 Script executed:

# Check for other potential monetary fields in the schema
rg "Usd|Price|Cost|Money" apps/worker/src/db/schema.ts -i

Repository: Blazity/ai-workflow

Length of output: 510


🏁 Script executed:

# Verify numeric is available in drizzle-orm and check its documentation/usage patterns
grep -r "numeric" apps/worker/ --include="*.ts" --include="*.js"

Repository: Blazity/ai-workflow

Length of output: 747


Use fixed-precision storage for cost_usd to avoid monetary drift.

real is floating-point and will accumulate rounding errors when costs are rolled up or aggregated. PostgreSQL best practice for monetary values is numeric(p,s) (or at minimum double precision as a fallback). Update the schema definition and regenerate the migration to align them.

Suggested changes
 import {
   bigint,
   boolean,
   index,
   integer,
   jsonb,
+  numeric,
   pgTable,
   primaryKey,
-  real,
   text,
   timestamp,
 } from "drizzle-orm/pg-core";
-  costUsd: real("cost_usd"),
+  costUsd: numeric("cost_usd", { precision: 12, scale: 6 }),
--- a/apps/worker/drizzle/0001_abnormal_dreaming_celestial.sql
+++ b/apps/worker/drizzle/0001_abnormal_dreaming_celestial.sql
-  "cost_usd" real,
+  "cost_usd" numeric(12,6),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/worker/src/db/schema.ts` around lines 150 - 153, The costUsd field in
the schema currently uses the real type for storing monetary values, which is
floating-point and will accumulate rounding errors during cost rollups and
aggregations. Replace the real type with numeric(p,s) (such as numeric(19,4) for
currency values) to provide fixed-precision decimal storage appropriate for
monetary amounts. After updating the schema definition for costUsd, regenerate
the database migration to ensure the schema changes are properly captured.

Comment on lines +972 to +980
await recordRunTelemetryStep({
runId: workflowRunId,
ticketKey: ticket.identifier,
ticketTitle: ticket.title,
ticketUrl: `${env.JIRA_BASE_URL.replace(/\/+$/, "")}/browse/${ticket.identifier}`,
model: activeModel ?? null,
totals: computeUsageTotals(phaseUsages, priceLookup, activeModel),
pr: prForTelemetry,
}).catch(() => {});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Mark launched-but-unparsed phases as unknown before computing totals.

At Line 978, computeUsageTotals(...) only marks unknown when a phase key exists with null. Timeout/error exits skip writing that key, so runs can be persisted as costKnown: true with costUsd: 0 even after a phase actually ran but usage couldn’t be parsed.

Proposed fix
@@
-      await writeAndStartPhase(
+      phaseUsages["Research"] = null;
+      await writeAndStartPhase(
         sandboxId,
         researchPaths.input, researchInput,
         researchPaths.wrapper, researchScript,
       );
@@
-      await writeAndStartPhase(
+      phaseUsages["Impl"] = null;
+      await writeAndStartPhase(
         sandboxId,
         implPaths.input, implInput,
         implPaths.wrapper, implScript,
       );
@@
-        await writeAndStartPhase(
+        phaseUsages["Review"] = null;
+        await writeAndStartPhase(
           sandboxId,
           reviewPaths.input, reviewInput,
           reviewPaths.wrapper, reviewScript,
         );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/worker/src/workflows/agent.ts` around lines 972 - 980, Before calling
computeUsageTotals on the phaseUsages object, you need to mark any phases that
were launched but don't have parsed usage data as unknown. Identify which phases
were launched during the workflow execution, then check if each launched phase
has an entry in phaseUsages. For any launched phase that is missing from
phaseUsages (indicating a timeout or error exit before usage could be parsed),
add it to phaseUsages with a null value to mark it as unknown. This ensures that
computeUsageTotals will properly recognize these incomplete phases and set
costKnown to false rather than incorrectly reporting costKnown as true with
costUsd of 0.

// them as absent). Best-effort — a failed purge must not fail the poll.
await new GateStore(getDb())
.purgeExpired()
.catch((err) => logger.warn({ error: err }, "poll_gate_purge_failed"));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct the Pino logging pattern in Task 5, Step 6 to match the learnings.

The code snippet at line 1234 shows { error: err }, but the learnings (line 69) and actual context (poll.get.ts) use the correct pattern { err: (err as Error).message }. Pino's error serializer only applies to the err key; update the plan snippet to avoid guiding the implementer toward the incorrect pattern.

📝 Proposed correction
   // Housekeeping: physically drop expired gate rows (reads already treat
   // them as absent). Best-effort — a failed purge must not fail the poll.
   await new GateStore(getDb())
     .purgeExpired()
-    .catch((err) => logger.warn({ error: err }, "poll_gate_purge_failed"));
+    .catch((err) => logger.warn({ err: (err as Error).message }, "poll_gate_purge_failed"));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
.catch((err) => logger.warn({ error: err }, "poll_gate_purge_failed"));
// Housekeeping: physically drop expired gate rows (reads already treat
// them as absent). Best-effort — a failed purge must not fail the poll.
await new GateStore(getDb())
.purgeExpired()
.catch((err) => logger.warn({ err: (err as Error).message }, "poll_gate_purge_failed"));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/plans/2026-06-10-redis-to-neon-postgres.md` at line 1234,
The Pino logging pattern in the logger.warn catch block uses the incorrect key
`error` instead of `err`. Update the logging object from `{ error: err }` to `{
err: (err as Error).message }` to match the correct Pino error serialization
pattern, which only applies to the `err` key. This ensures the documentation
guides the implementer toward the correct logging pattern consistent with the
learnings and actual implementation in poll.get.ts.

@kasin-it kasin-it merged commit 185ceda into dev Jun 15, 2026
7 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Jun 18, 2026
Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant