Skip to content

Commit 2d0ba80

Browse files
authored
[Feature] ✨ adding BACKEND support for OpenTelemetry (OTEL) (mage-ai#4476)
* πŸ”§ docker-compose changes * πŸ“ mint updates * ✨ otel integration with mage-ai server * ⬆️ fixing requirements * πŸ”§ fixing deps and versions * πŸ’„ adding missing dep in setup.py * ♻️ cleanup * πŸ“ adding docs
1 parent 73715ac commit 2d0ba80

File tree

9 files changed

+143
-5
lines changed

9 files changed

+143
-5
lines changed

β€Ždocker-compose.yml

+3
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ x-server_settings: &server_settings
1919
- ECS_TASK_DEFINITION=$ECS_TASK_DEFINITION
2020
- ENABLE_NEW_RELIC=$ENABLE_NEW_RELIC
2121
- ENABLE_PROMETHEUS=$ENABLE_PROMETHEUS
22+
- OTEL_EXPORTER_OTLP_ENDPOINT=${OTEL_EXPORTER_OTLP_ENDPOINT}
23+
- OTEL_EXPORTER_OTLP_HTTP_ENDPOINT=${OTEL_EXPORTER_OTLP_HTTP_ENDPOINT}
24+
- OTEL_PYTHON_TORNADO_EXCLUDED_URLS=$OTEL_PYTHON_TORNADO_EXCLUDED_URLS
2225
- ENV=dev
2326
- GCP_PROJECT_ID=$GCP_PROJECT_ID
2427
- GCP_REGION=$GCP_REGION
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
title: "Monitoring with OpenTelemetry"
3+
sidebarTitle: "OpenTelemetry"
4+
---
5+
6+
## Overview
7+
8+
This guide details how to monitor your `mage_ai` application using OpenTelemetry. OpenTelemetry provides a unified approach to collecting telemetry data, such as traces and metrics, which are essential for understanding your application's performance and behavior.
9+
10+
## Prerequisites
11+
12+
Ensure the following prerequisites are in place before proceeding:
13+
- An OpenTelemetry Collector is set up and configured to receive telemetry data.
14+
- The `mage_ai` application is ready for instrumentation with OpenTelemetry.
15+
16+
## Configuring OpenTelemetry in `mage_ai`
17+
18+
### Setting Environment Variables
19+
20+
Configure the OpenTelemetry Exporters in your application by setting the following environment variables:
21+
22+
1. **OTLP HTTP Exporter**:
23+
Use `OTEL_EXPORTER_OTLP_HTTP_ENDPOINT` to specify the HTTP endpoint of your OpenTelemetry Collector.
24+
25+
Example:
26+
```bash
27+
export OTEL_EXPORTER_OTLP_HTTP_ENDPOINT="http://192.168.1.56:3418/v1/traces"
28+
```
29+
30+
2. **OTLP Exporter**:
31+
32+
The OTEL_EXPORTER_OTLP_ENDPOINT environment variable sets a general collector endpoint, which can be used for both gRPC and HTTP connections.
33+
34+
Example:
35+
36+
```bash
37+
export OTEL_EXPORTER_OTLP_ENDPOINT="192.168.1.56:3417"
38+
```
39+
40+
### Instrumentation in the Application
41+
42+
1. **SQLAlchemy Instrumentation**:
43+
The application uses the SQLAlchemyInstrumentor from OpenTelemetry to instrument database operations. This generates telemetry data like traces for database interactions.
44+
45+
2. **Tornado Instrumentation**:
46+
To monitor HTTP server operations, the application integrates OpenTelemetry's Tornado instrumentation. This collects valuable data related to HTTP requests and server performance.
47+
48+
## Telemetry Data Collection and Analysis
49+
50+
With the environment variables set and the application running, OpenTelemetry will start collecting telemetry data based on the defined instrumentation.
51+
52+
1. **Data Collection**:
53+
The application will send the collected telemetry data, including traces and metrics, to the specified OpenTelemetry Collector endpoint.
54+
55+
2. **Analyzing Telemetry Data**:
56+
For effective visualization and analysis, you can connect your OpenTelemetry Collector to backend tools that support OpenTelemetry data (like Grafana or Jaeger).
57+
58+

β€Ždocs/mint.json

+2-1
Original file line numberDiff line numberDiff line change
@@ -433,7 +433,8 @@
433433
"integrations/observability/metaplane",
434434
"integrations/observability/newrelic",
435435
"integrations/observability/sentry",
436-
"integrations/observability/prometheus"
436+
"integrations/observability/prometheus",
437+
"integrations/observability/opentelemetry"
437438
]
438439
},
439440
{

β€Žmage_ai/orchestration/db/__init__.py

+7
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@
2323
pool_pre_ping=True,
2424
)
2525

26+
# Only import if OpenTelemetry is enabled
27+
if os.getenv('OTEL_EXPORTER_OTLP_ENDPOINT'):
28+
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
2629

2730
if is_test():
2831
db_connection_url = f'sqlite:///{TEST_DB}'
@@ -50,6 +53,10 @@
5053
db_kwargs['connect_args']['options'] = '-c timezone=utc'
5154

5255
try:
56+
# if OpenTelemetry is enabled, instrument SQLAlchemy
57+
if os.getenv('OTEL_EXPORTER_OTLP_ENDPOINT'):
58+
SQLAlchemyInstrumentor().instrument(enable_commenter=True, commenter_options={})
59+
5360
engine = create_engine(
5461
db_connection_url,
5562
**db_kwargs,

β€Žmage_ai/server/server.py

+38
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@
8686
ENABLE_PROMETHEUS,
8787
LDAP_ADMIN_USERNAME,
8888
OAUTH2_APPLICATION_CLIENT_ID,
89+
OTEL_EXPORTER_OTLP_ENDPOINT,
8990
REDIS_URL,
9091
REQUESTS_BASE_PATH,
9192
REQUIRE_USER_AUTHENTICATION,
@@ -344,6 +345,43 @@ def make_app(template_dir: str = None, update_routes: bool = False):
344345
(r'/version-control', MainPageHandler),
345346
]
346347

348+
if ENABLE_PROMETHEUS or OTEL_EXPORTER_OTLP_ENDPOINT:
349+
from opentelemetry.instrumentation.tornado import TornadoInstrumentor
350+
TornadoInstrumentor().instrument()
351+
logger.info('OpenTelemetry instrumentation enabled.')
352+
353+
if OTEL_EXPORTER_OTLP_ENDPOINT:
354+
logger.info(f'OTEL_EXPORTER_OTLP_ENDPOINT: {OTEL_EXPORTER_OTLP_ENDPOINT}')
355+
356+
from opentelemetry import trace
357+
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
358+
OTLPSpanExporter,
359+
)
360+
from opentelemetry.sdk.resources import Resource
361+
from opentelemetry.sdk.trace import TracerProvider
362+
from opentelemetry.sdk.trace.export import BatchSpanProcessor
363+
364+
service_name = "mage-ai-server"
365+
resource = Resource(attributes={
366+
"service.name": service_name,
367+
})
368+
369+
# Set up a TracerProvider and attach an OTLP exporter to it
370+
trace.set_tracer_provider(TracerProvider(resource=resource))
371+
tracer_provider = trace.get_tracer_provider()
372+
373+
# Configure OTLP exporter
374+
otlp_exporter = OTLPSpanExporter(
375+
# Endpoint of your OpenTelemetry Collector
376+
endpoint=OTEL_EXPORTER_OTLP_ENDPOINT,
377+
# Use insecure channel if your collector does not support TLS
378+
insecure=True
379+
)
380+
381+
# Attach the OTLP exporter to the TracerProvider
382+
span_processor = BatchSpanProcessor(otlp_exporter)
383+
tracer_provider.add_span_processor(span_processor)
384+
347385
if ENABLE_PROMETHEUS:
348386
from opentelemetry import metrics
349387
from opentelemetry.exporter.prometheus import PrometheusMetricReader

β€Žmage_ai/settings/__init__.py

+9
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,13 @@ def get_bool_value(value: str) -> bool:
109109
# If enabled, the /metrics route will expose Tornado server metrics
110110
ENABLE_PROMETHEUS = get_bool_value(os.getenv('ENABLE_PROMETHEUS', 'False'))
111111

112+
# OpenTelemetry configuration
113+
OTEL_EXPORTER_OTLP_ENDPOINT = os.getenv('OTEL_EXPORTER_OTLP_ENDPOINT', None)
114+
OTEL_EXPORTER_OTLP_HTTP_ENDPOINT = os.getenv('OTEL_EXPORTER_HTTP_OTLP_ENDPOINT', None)
115+
OTEL_PYTHON_TORNADO_EXCLUDED_URLS = (
116+
os.getenv('OTEL_PYTHON_TORNADO_EXCLUDED_URLS') or '/api/statuses'
117+
)
118+
112119
DEFAULT_LOCALHOST_URL = 'http://localhost:6789'
113120
MAGE_PUBLIC_HOST = os.getenv('MAGE_PUBLIC_HOST') or DEFAULT_LOCALHOST_URL
114121

@@ -164,6 +171,8 @@ def get_bool_value(value: str) -> bool:
164171
'SCHEDULER_TRIGGER_INTERVAL',
165172
'REQUIRE_USER_PERMISSIONS',
166173
'ENABLE_PROMETHEUS',
174+
'OTEL_EXPORTER_OTLP_ENDPOINT',
175+
'OTEL_EXPORTER_OTLP_HTTP_ENDPOINT',
167176
'OKTA_DOMAIN_URL',
168177
'OKTA_CLIENT_ID',
169178
'OKTA_CLIENT_SECRET',

β€Žrequirements.txt

+4-2
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,10 @@ kubernetes>=28.1.0
6767
langchain>=0.0.222; python_version >= '3.8'
6868
mysql-connector-python~=8.2.0
6969
openai>=0.27.8, <1.0.0
70-
opentelemetry-exporter-prometheus==0.41b0
71-
opentelemetry-instrumentation-tornado==0.41b0
70+
opentelemetry-exporter-prometheus==0.43b0
71+
opentelemetry-instrumentation-tornado==0.43b0
72+
opentelemetry-exporter-otlp==1.22.0
73+
opentelemetry-instrumentation-sqlalchemy==0.43b0
7274
oracledb==1.3.1
7375
pinotdb==5.1.0
7476
prometheus_client>=0.18.0

β€Žscripts/dev.sh

+18
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,21 @@ case $key in
4242
shift # past argument
4343
shift # past value
4444
;;
45+
--otel_exporter_otlp_endpoint)
46+
OTEL_EXPORTER_OTLP_ENDPOINT="$3"
47+
shift # past argument
48+
shift # past value
49+
;;
50+
--otel_exporter_otlp_http_endpoint)
51+
OTEL_EXPORTER_OTLP_HTTP_ENDPOINT="$3"
52+
shift # past argument
53+
shift # past value
54+
;;
55+
--otel_python_tornado_excluded_urls)
56+
OTEL_PYTHON_TORNADO_EXCLUDED_URLS="$3"
57+
shift # past argument
58+
shift # past value
59+
;;
4560
--huggingface_api)
4661
HUGGINGFACE_API="$3"
4762
shift # past argument
@@ -143,6 +158,9 @@ export ECS_TASK_DEFINITION=$ECS_TASK_DEFINITION
143158
export ECS_CONTAINER_NAME=$ECS_CONTAINER_NAME
144159
export ENABLE_NEW_RELIC=$ENABLE_NEW_RELIC
145160
export ENABLE_PROMETHEUS=$ENABLE_PROMETHEUS
161+
export OTEL_EXPORTER_OTLP_ENDPOINT=$OTEL_EXPORTER_OTLP_ENDPOINT
162+
export OTEL_EXPORTER_OTLP_HTTP_ENDPOINT=$OTEL_EXPORTER_OTLP_HTTP_ENDPOINT
163+
export OTEL_PYTHON_TORNADO_EXCLUDED_URLS=$OTEL_PYTHON_TORNADO_EXCLUDED_URLS
146164

147165
export GCP_PROJECT_ID=$GCP_PROJECT_ID
148166
export GCP_PATH_TO_CREDENTIALS=$GCP_PATH_TO_CREDENTIALS

β€Žsetup.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -171,8 +171,10 @@ def readme():
171171
'nkeys~=0.1.0',
172172
'openai>=0.27.8, <1.0.0',
173173
'opensearch-py==2.0.0',
174-
'opentelemetry-exporter-prometheus==0.41b0',
175-
'opentelemetry-instrumentation-tornado==0.41b0',
174+
'opentelemetry-exporter-prometheus==0.43b0',
175+
'opentelemetry-instrumentation-tornado~=0.43b0',
176+
'opentelemetry-exporter-otlp~=1.22.0',
177+
'opentelemetry-instrumentation-sqlalchemy~=0.42b0',
176178
'oracledb==1.3.1',
177179
'pika==1.3.1',
178180
'pinotdb==5.1.0',

0 commit comments

Comments
Β (0)