getsentry · aldy505 · May 17, 2025 · aldy505 · May 17, 2025
diff --git a/develop-docs/self-hosted/data-flow.mdx b/develop-docs/self-hosted/data-flow.mdx
@@ -0,0 +1,98 @@
+---
+title: Self-hosted Data Flow
+sidebar_title: Data Flow
+sidebar_order: 20
+description: Learn about the data flow of self-hosted Sentry
+---
+
+This diagram shows the data flow of self-hosted Sentry. It is similar with [Application Architecture](/application-architecture/overview/) but we are focusing more on the self-hosted components.
+
+```mermaid
+graph LR
+  kafka@{ shape: cyl, label: "Kafka\n(eventstream)" }
+  redis@{ shape: cyl, label: "Redis" }
+  postgres@{ shape: cyl, label: "Postgres" }
+  memcached@{ shape: cyl, label: "Memcached" }
+  clickhouse@{ shape: cyl, label: "Clickhouse" }
+  smtp@{ shape: win-pane, label: "SMTP Server" }
+  symbol-server@{ shape: win-pane, label: "Public/Private Symbol Servers" }
+  internet@{ shape: trap-t, label: "Internet" }
+
+  internet --> nginx
+
+  nginx -- Event submitted by SDKs --> relay
+  nginx -- Web UI & API --> web
+
+  subgraph querier [Event Querier]
+    snuba-api --> clickhouse
+  end
+
+  subgraph processing [Event Processing]
+    kafka --> snuba-consumer --> clickhouse
+    snuba-consumer --> kafka
+    kafka --> snuba-replacer --> clickhouse
+    kafka --> snuba-subscription-scheduler --> clickhouse
+    kafka --> snuba-subscription-executor --> clickhouse
+    redis -- As a celery queue --> sentry-consumer
+    kafka --> sentry-consumer --> kafka
+    kafka --> sentry-post-process-forwarder --> kafka
+    sentry-post-process-forwarder -- Preventing concurrent processing of the same event --> redis
+
+    vroom-blob-storage@{ shape: cyl, label: "Blob Storage\n(default is filesystem)" }
+
+    kafka -- Profiling event processing --> vroom -- Republish to Kafka to be consumed by Snuba --> kafka
+    vroom --> snuba-api
+    vroom -- Store profiles data --> vroom-blob-storage
+
+    outgoing-monitors@{ shape: win-pane, label: "Outgoing HTTP Monitors" }
+    redis -- Fetching uptime configs --> uptime-checker -- Publishing uptime monitoring results --> kafka
+    uptime-checker --> outgoing-monitors
+  end
+
+  subgraph ui [Web User Interface]
+    sentry-blob-storage@{ shape: cyl, label: "Blob Storage\n(default is filesystem)" }
+
+    web --> worker
+    web --> postgres
+    web -- Caching layer --> memcached
+    web -- Queries on event (errors, spans, etc) data (to snuba-api) --> snuba-api
+    web -- Avatars, attachments, etc --> sentry-blob-storage
+    worker -- As a celery queue --> redis
+    worker --> postgres
+    worker -- Alert & digest emails --> smtp
+    web -- Sending test emails --> smtp
+  end
+
+  subgraph ingestion [Event Ingestion]
+    relay@{ shape: rect, label: 'Relay' }
+    sentry_ingest_consumer[sentry-ingest-consumers]
+
+    relay -- Process envelope into specific types --> kafka --> sentry_ingest_consumer -- Caching event data (to redis) --> redis
+    relay -- Register relay instance --> web
+    relay -- Fetching project configs (to redis) --> redis
+    sentry_ingest_consumer -- Symbolicate stack traces --> symbolicator --> symbol-server
+    sentry_ingest_consumer -- Save event payload to Nodestore --> postgres
+    sentry_ingest_consumer -- Republish to events topic --> kafka
+  end
+```
+
+### Event Ingestion Pipeline
+
+1. Events from the SDK is sent to the `relay` service.
+2. Relay parses the incoming envelope, validates whether the DSN and Project ID is valid. It reads project config data from `redis`.
+3. Relay build a new payload to be consumed by Sentry ingest consumers, and send it to `kafka`.
+4. Sentry `ingest-*` consumers ( with `*` [wildcard] being the event type [errors, transaction, profiles, etc]) consumes the event, cache it in `redis` and start the `preprocess_event` task.
+5. The `preprocess_event` task symbolicate stack traces with `symbolicator` service, and process event according to the event type's logic.
+6. The `preprocess_event` task saves the event payload to nodestore (the default is to `postgres`).
+7. The `preprocess_event` task publishes the event to `kafka` to `events` topic.
+
+### Event Processing Pipeline
+
+1. The `snuba-consumer` service consumes events from `events` topic and process them. Some are being republished to `post-process-forwarder` topic in `kafka`. Others are being written to `clickhouse`.
+2. The Sentry consumer that consumes `post-process-forwarder` topic process the events, and republish it to `kafka`.
+
+### Web User Interface
+
+1. THe `web` service is what you see, it's the Django web UI and API that serves the Sentry's frontend.
+2. The `worker` service mainly consumes `redis` that acts as a celery queue to execute tasks. One notable task is to send emails to SMTP servers.
+