This document defines baseline operational observability for the managed-cloud deployment model (Supabase + Netlify).
- Detect platform-impacting failures quickly.
- Preserve enough context for incident triage and audit.
- Track key user and agency workflow reliability.
- Every API response should include
X-Request-Id. - Edge function errors must log:
- request identifier (or generated correlation ID)
- endpoint/function name
- principal type (
anon,authenticated,service) - stable error code and message
- Avoid logging raw secrets and full PII values.
- Export lifecycle transitions are persisted in
exports(pending -> processing -> completed|failed). - Moderation and submission events are persisted in
moderation_logs. - Rate-limit decisions are persisted in
api_rate_limits(windowed counters).
- Public API health:
- success rate (
2xx/3xx) - error rate (
4xx/5xx) with429tracked separately - latency p50/p95/p99
- success rate (
- Submission pipeline:
- comment submit success/failure counts
- CAPTCHA verification failure rate
- attachment upload warning/failure rate
- Export pipeline:
- queue depth (
pendingcount) - export completion time
- export failure rate by error class
- queue depth (
- API error rate > 5% over 10 minutes.
- Export failure rate > 10% over 30 minutes.
- No successful comment submissions for 30 minutes during expected traffic windows.
- Sudden
429spikes (possible abuse or misconfigured client integrations).
- Operational handling:
docs/OPERATIONS_RUNBOOK.md - Security audit process:
docs/SECURITY_AUDIT_GUIDE.md - API contract:
docs/API_V1.md