Skip to content

Latest commit

 

History

History
55 lines (41 loc) · 1.79 KB

File metadata and controls

55 lines (41 loc) · 1.79 KB

OpenComments Observability Baseline

This document defines baseline operational observability for the managed-cloud deployment model (Supabase + Netlify).

Objectives

  • Detect platform-impacting failures quickly.
  • Preserve enough context for incident triage and audit.
  • Track key user and agency workflow reliability.

Logging Baseline

Application and Edge Functions

  • Every API response should include X-Request-Id.
  • Edge function errors must log:
    • request identifier (or generated correlation ID)
    • endpoint/function name
    • principal type (anon, authenticated, service)
    • stable error code and message
  • Avoid logging raw secrets and full PII values.

Database and Job Flows

  • Export lifecycle transitions are persisted in exports (pending -> processing -> completed|failed).
  • Moderation and submission events are persisted in moderation_logs.
  • Rate-limit decisions are persisted in api_rate_limits (windowed counters).

Operational Signals

  • Public API health:
    • success rate (2xx/3xx)
    • error rate (4xx/5xx) with 429 tracked separately
    • latency p50/p95/p99
  • Submission pipeline:
    • comment submit success/failure counts
    • CAPTCHA verification failure rate
    • attachment upload warning/failure rate
  • Export pipeline:
    • queue depth (pending count)
    • export completion time
    • export failure rate by error class

Alerting Starter Thresholds

  • API error rate > 5% over 10 minutes.
  • Export failure rate > 10% over 30 minutes.
  • No successful comment submissions for 30 minutes during expected traffic windows.
  • Sudden 429 spikes (possible abuse or misconfigured client integrations).

Runbook Links

  • Operational handling: docs/OPERATIONS_RUNBOOK.md
  • Security audit process: docs/SECURITY_AUDIT_GUIDE.md
  • API contract: docs/API_V1.md