From 6ff84c9c85c43ea6abb942b2590649e6e6ac26d0 Mon Sep 17 00:00:00 2001 From: Reinaldy Rafli Date: Sat, 17 May 2025 16:37:33 +0700 Subject: [PATCH] docs(self-hosted): explain self-hosted data flow --- develop-docs/self-hosted/data-flow.mdx | 98 ++++++++++++++++++++++++++ 1 file changed, 98 insertions(+) create mode 100644 develop-docs/self-hosted/data-flow.mdx diff --git a/develop-docs/self-hosted/data-flow.mdx b/develop-docs/self-hosted/data-flow.mdx new file mode 100644 index 0000000000000..b926c9af0ed4d --- /dev/null +++ b/develop-docs/self-hosted/data-flow.mdx @@ -0,0 +1,98 @@ +--- +title: Self-hosted Data Flow +sidebar_title: Data Flow +sidebar_order: 20 +description: Learn about the data flow of self-hosted Sentry +--- + +This diagram shows the data flow of self-hosted Sentry. It is similar with [Application Architecture](/application-architecture/overview/) but we are focusing more on the self-hosted components. + +```mermaid +graph LR + kafka@{ shape: cyl, label: "Kafka\n(eventstream)" } + redis@{ shape: cyl, label: "Redis" } + postgres@{ shape: cyl, label: "Postgres" } + memcached@{ shape: cyl, label: "Memcached" } + clickhouse@{ shape: cyl, label: "Clickhouse" } + smtp@{ shape: win-pane, label: "SMTP Server" } + symbol-server@{ shape: win-pane, label: "Public/Private Symbol Servers" } + internet@{ shape: trap-t, label: "Internet" } + + internet --> nginx + + nginx -- Event submitted by SDKs --> relay + nginx -- Web UI & API --> web + + subgraph querier [Event Querier] + snuba-api --> clickhouse + end + + subgraph processing [Event Processing] + kafka --> snuba-consumer --> clickhouse + snuba-consumer --> kafka + kafka --> snuba-replacer --> clickhouse + kafka --> snuba-subscription-scheduler --> clickhouse + kafka --> snuba-subscription-executor --> clickhouse + redis -- As a celery queue --> sentry-consumer + kafka --> sentry-consumer --> kafka + kafka --> sentry-post-process-forwarder --> kafka + sentry-post-process-forwarder -- Preventing concurrent processing of the same event --> redis + + vroom-blob-storage@{ shape: cyl, label: "Blob Storage\n(default is filesystem)" } + + kafka -- Profiling event processing --> vroom -- Republish to Kafka to be consumed by Snuba --> kafka + vroom --> snuba-api + vroom -- Store profiles data --> vroom-blob-storage + + outgoing-monitors@{ shape: win-pane, label: "Outgoing HTTP Monitors" } + redis -- Fetching uptime configs --> uptime-checker -- Publishing uptime monitoring results --> kafka + uptime-checker --> outgoing-monitors + end + + subgraph ui [Web User Interface] + sentry-blob-storage@{ shape: cyl, label: "Blob Storage\n(default is filesystem)" } + + web --> worker + web --> postgres + web -- Caching layer --> memcached + web -- Queries on event (errors, spans, etc) data (to snuba-api) --> snuba-api + web -- Avatars, attachments, etc --> sentry-blob-storage + worker -- As a celery queue --> redis + worker --> postgres + worker -- Alert & digest emails --> smtp + web -- Sending test emails --> smtp + end + + subgraph ingestion [Event Ingestion] + relay@{ shape: rect, label: 'Relay' } + sentry_ingest_consumer[sentry-ingest-consumers] + + relay -- Process envelope into specific types --> kafka --> sentry_ingest_consumer -- Caching event data (to redis) --> redis + relay -- Register relay instance --> web + relay -- Fetching project configs (to redis) --> redis + sentry_ingest_consumer -- Symbolicate stack traces --> symbolicator --> symbol-server + sentry_ingest_consumer -- Save event payload to Nodestore --> postgres + sentry_ingest_consumer -- Republish to events topic --> kafka + end +``` + +### Event Ingestion Pipeline + +1. Events from the SDK is sent to the `relay` service. +2. Relay parses the incoming envelope, validates whether the DSN and Project ID is valid. It reads project config data from `redis`. +3. Relay build a new payload to be consumed by Sentry ingest consumers, and send it to `kafka`. +4. Sentry `ingest-*` consumers ( with `*` [wildcard] being the event type [errors, transaction, profiles, etc]) consumes the event, cache it in `redis` and start the `preprocess_event` task. +5. The `preprocess_event` task symbolicate stack traces with `symbolicator` service, and process event according to the event type's logic. +6. The `preprocess_event` task saves the event payload to nodestore (the default is to `postgres`). +7. The `preprocess_event` task publishes the event to `kafka` to `events` topic. + +### Event Processing Pipeline + +1. The `snuba-consumer` service consumes events from `events` topic and process them. Some are being republished to `post-process-forwarder` topic in `kafka`. Others are being written to `clickhouse`. +2. The Sentry consumer that consumes `post-process-forwarder` topic process the events, and republish it to `kafka`. + +### Web User Interface + +1. THe `web` service is what you see, it's the Django web UI and API that serves the Sentry's frontend. +2. The `worker` service mainly consumes `redis` that acts as a celery queue to execute tasks. One notable task is to send emails to SMTP servers. +