-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
docs(self-hosted): explain self-hosted data flow #13745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
aldy505
wants to merge
1
commit into
getsentry:master
Choose a base branch
from
aldy505:docs/self-hosted/data-flow
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+98
−0
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
--- | ||
title: Self-hosted Data Flow | ||
sidebar_title: Data Flow | ||
sidebar_order: 20 | ||
description: Learn about the data flow of self-hosted Sentry | ||
--- | ||
|
||
This diagram shows the data flow of self-hosted Sentry. It is similar with [Application Architecture](/application-architecture/overview/) but we are focusing more on the self-hosted components. | ||
|
||
```mermaid | ||
graph LR | ||
kafka@{ shape: cyl, label: "Kafka\n(eventstream)" } | ||
redis@{ shape: cyl, label: "Redis" } | ||
postgres@{ shape: cyl, label: "Postgres" } | ||
memcached@{ shape: cyl, label: "Memcached" } | ||
clickhouse@{ shape: cyl, label: "Clickhouse" } | ||
smtp@{ shape: win-pane, label: "SMTP Server" } | ||
symbol-server@{ shape: win-pane, label: "Public/Private Symbol Servers" } | ||
internet@{ shape: trap-t, label: "Internet" } | ||
|
||
internet --> nginx | ||
|
||
nginx -- Event submitted by SDKs --> relay | ||
nginx -- Web UI & API --> web | ||
|
||
subgraph querier [Event Querier] | ||
snuba-api --> clickhouse | ||
end | ||
|
||
subgraph processing [Event Processing] | ||
kafka --> snuba-consumer --> clickhouse | ||
snuba-consumer --> kafka | ||
kafka --> snuba-replacer --> clickhouse | ||
kafka --> snuba-subscription-scheduler --> clickhouse | ||
kafka --> snuba-subscription-executor --> clickhouse | ||
redis -- As a celery queue --> sentry-consumer | ||
kafka --> sentry-consumer --> kafka | ||
kafka --> sentry-post-process-forwarder --> kafka | ||
sentry-post-process-forwarder -- Preventing concurrent processing of the same event --> redis | ||
|
||
vroom-blob-storage@{ shape: cyl, label: "Blob Storage\n(default is filesystem)" } | ||
|
||
kafka -- Profiling event processing --> vroom -- Republish to Kafka to be consumed by Snuba --> kafka | ||
vroom --> snuba-api | ||
vroom -- Store profiles data --> vroom-blob-storage | ||
|
||
outgoing-monitors@{ shape: win-pane, label: "Outgoing HTTP Monitors" } | ||
redis -- Fetching uptime configs --> uptime-checker -- Publishing uptime monitoring results --> kafka | ||
uptime-checker --> outgoing-monitors | ||
end | ||
|
||
subgraph ui [Web User Interface] | ||
sentry-blob-storage@{ shape: cyl, label: "Blob Storage\n(default is filesystem)" } | ||
|
||
web --> worker | ||
web --> postgres | ||
web -- Caching layer --> memcached | ||
web -- Queries on event (errors, spans, etc) data (to snuba-api) --> snuba-api | ||
web -- Avatars, attachments, etc --> sentry-blob-storage | ||
worker -- As a celery queue --> redis | ||
worker --> postgres | ||
worker -- Alert & digest emails --> smtp | ||
web -- Sending test emails --> smtp | ||
end | ||
|
||
subgraph ingestion [Event Ingestion] | ||
relay@{ shape: rect, label: 'Relay' } | ||
sentry_ingest_consumer[sentry-ingest-consumers] | ||
|
||
relay -- Process envelope into specific types --> kafka --> sentry_ingest_consumer -- Caching event data (to redis) --> redis | ||
relay -- Register relay instance --> web | ||
relay -- Fetching project configs (to redis) --> redis | ||
sentry_ingest_consumer -- Symbolicate stack traces --> symbolicator --> symbol-server | ||
sentry_ingest_consumer -- Save event payload to Nodestore --> postgres | ||
sentry_ingest_consumer -- Republish to events topic --> kafka | ||
end | ||
``` | ||
|
||
### Event Ingestion Pipeline | ||
|
||
1. Events from the SDK is sent to the `relay` service. | ||
2. Relay parses the incoming envelope, validates whether the DSN and Project ID is valid. It reads project config data from `redis`. | ||
3. Relay build a new payload to be consumed by Sentry ingest consumers, and send it to `kafka`. | ||
4. Sentry `ingest-*` consumers ( with `*` [wildcard] being the event type [errors, transaction, profiles, etc]) consumes the event, cache it in `redis` and start the `preprocess_event` task. | ||
5. The `preprocess_event` task symbolicate stack traces with `symbolicator` service, and process event according to the event type's logic. | ||
6. The `preprocess_event` task saves the event payload to nodestore (the default is to `postgres`). | ||
7. The `preprocess_event` task publishes the event to `kafka` to `events` topic. | ||
|
||
### Event Processing Pipeline | ||
|
||
1. The `snuba-consumer` service consumes events from `events` topic and process them. Some are being republished to `post-process-forwarder` topic in `kafka`. Others are being written to `clickhouse`. | ||
2. The Sentry consumer that consumes `post-process-forwarder` topic process the events, and republish it to `kafka`. | ||
|
||
### Web User Interface | ||
|
||
1. THe `web` service is what you see, it's the Django web UI and API that serves the Sentry's frontend. | ||
2. The `worker` service mainly consumes `redis` that acts as a celery queue to execute tasks. One notable task is to send emails to SMTP servers. | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about these. Can anyone take a look? @markstory @untitaker @hubertdeng123 @BYK