Skip to content

docs(self-hosted): explain self-hosted data flow #13745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions develop-docs/self-hosted/data-flow.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
title: Self-hosted Data Flow
sidebar_title: Data Flow
sidebar_order: 20
description: Learn about the data flow of self-hosted Sentry
---

This diagram shows the data flow of self-hosted Sentry. It is similar with [Application Architecture](/application-architecture/overview/) but we are focusing more on the self-hosted components.

```mermaid
graph LR
kafka@{ shape: cyl, label: "Kafka\n(eventstream)" }
redis@{ shape: cyl, label: "Redis" }
postgres@{ shape: cyl, label: "Postgres" }
memcached@{ shape: cyl, label: "Memcached" }
clickhouse@{ shape: cyl, label: "Clickhouse" }
smtp@{ shape: win-pane, label: "SMTP Server" }
symbol-server@{ shape: win-pane, label: "Public/Private Symbol Servers" }
internet@{ shape: trap-t, label: "Internet" }

internet --> nginx

nginx -- Event submitted by SDKs --> relay
nginx -- Web UI & API --> web

subgraph querier [Event Querier]
snuba-api --> clickhouse
end

subgraph processing [Event Processing]
kafka --> snuba-consumer --> clickhouse
snuba-consumer --> kafka
kafka --> snuba-replacer --> clickhouse
kafka --> snuba-subscription-scheduler --> clickhouse
kafka --> snuba-subscription-executor --> clickhouse
redis -- As a celery queue --> sentry-consumer
kafka --> sentry-consumer --> kafka
kafka --> sentry-post-process-forwarder --> kafka
sentry-post-process-forwarder -- Preventing concurrent processing of the same event --> redis

vroom-blob-storage@{ shape: cyl, label: "Blob Storage\n(default is filesystem)" }

kafka -- Profiling event processing --> vroom -- Republish to Kafka to be consumed by Snuba --> kafka
vroom --> snuba-api
vroom -- Store profiles data --> vroom-blob-storage

outgoing-monitors@{ shape: win-pane, label: "Outgoing HTTP Monitors" }
redis -- Fetching uptime configs --> uptime-checker -- Publishing uptime monitoring results --> kafka
uptime-checker --> outgoing-monitors
end

subgraph ui [Web User Interface]
sentry-blob-storage@{ shape: cyl, label: "Blob Storage\n(default is filesystem)" }

web --> worker
web --> postgres
web -- Caching layer --> memcached
web -- Queries on event (errors, spans, etc) data (to snuba-api) --> snuba-api
web -- Avatars, attachments, etc --> sentry-blob-storage
worker -- As a celery queue --> redis
worker --> postgres
worker -- Alert & digest emails --> smtp
web -- Sending test emails --> smtp
end

subgraph ingestion [Event Ingestion]
relay@{ shape: rect, label: 'Relay' }
sentry_ingest_consumer[sentry-ingest-consumers]

relay -- Process envelope into specific types --> kafka --> sentry_ingest_consumer -- Caching event data (to redis) --> redis
relay -- Register relay instance --> web
relay -- Fetching project configs (to redis) --> redis
sentry_ingest_consumer -- Symbolicate stack traces --> symbolicator --> symbol-server
sentry_ingest_consumer -- Save event payload to Nodestore --> postgres
sentry_ingest_consumer -- Republish to events topic --> kafka
end
```

### Event Ingestion Pipeline

1. Events from the SDK is sent to the `relay` service.
2. Relay parses the incoming envelope, validates whether the DSN and Project ID is valid. It reads project config data from `redis`.
3. Relay build a new payload to be consumed by Sentry ingest consumers, and send it to `kafka`.
4. Sentry `ingest-*` consumers ( with `*` [wildcard] being the event type [errors, transaction, profiles, etc]) consumes the event, cache it in `redis` and start the `preprocess_event` task.
5. The `preprocess_event` task symbolicate stack traces with `symbolicator` service, and process event according to the event type's logic.
6. The `preprocess_event` task saves the event payload to nodestore (the default is to `postgres`).
7. The `preprocess_event` task publishes the event to `kafka` to `events` topic.

### Event Processing Pipeline

1. The `snuba-consumer` service consumes events from `events` topic and process them. Some are being republished to `post-process-forwarder` topic in `kafka`. Others are being written to `clickhouse`.
2. The Sentry consumer that consumes `post-process-forwarder` topic process the events, and republish it to `kafka`.
Comment on lines +79 to +92
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about these. Can anyone take a look? @markstory @untitaker @hubertdeng123 @BYK


### Web User Interface

1. THe `web` service is what you see, it's the Django web UI and API that serves the Sentry's frontend.
2. The `worker` service mainly consumes `redis` that acts as a celery queue to execute tasks. One notable task is to send emails to SMTP servers.

Loading