Skip to content

Commit 8c51e68

Browse files
RFC for Open Telemetry Implementation for Presto.
1 parent 9d62ffe commit 8c51e68

File tree

4 files changed

+100
-0
lines changed

4 files changed

+100
-0
lines changed

RFC-0009-open-telemetry.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# **RFC0009 for Presto**
2+
3+
## Enhancing Open Telemetry Implementation in Presto
4+
5+
Proposers
6+
7+
* Suresh Babu Areekara
8+
* Siddarth Ajay
9+
* Ben Tony Joe
10+
11+
## [Related Issues]
12+
13+
* https://github.com/prestodb/presto/issues/23975
14+
15+
## Summary
16+
17+
The existing Open Telemetry implementation https://github.com/prestodb/presto/pull/18534 was an experimental feature, had a limited set of telemetry data(Query state changes) and did not include a child span concept. The recent implementation will make Presto more flexible, allowing support for both parent and child spans. Additionally, traces can now be propagated to the worker nodes as well.
18+
19+
## Background
20+
21+
OpenTelemetry is a powerful serviceability framework that helps to gain insights into the performance and behaviour of the systems. It facilitates generation, collection, and management of telemetry data such as traces.
22+
23+
The OSS Presto had a basic implementation of Open Telemetry.
24+
25+
![Traces existing implementation](/RFC-0009-open-telemetry/traces-existing-implementation-oss-presto.png)
26+
27+
## Proposed Implementation
28+
29+
The Presto can be manually instrumented and will have the following advantages.
30+
- More flexibility and control over instrumentation
31+
- Easier to customize what operations can be monitored
32+
- Ability to pass additional information as span attributes and events
33+
34+
![Instrumentation flow](/RFC-0009-open-telemetry/tracing-instrumentation-flow.png)
35+
- Open Telemetry SDK provides libraries for instrumenting applications to capture telemetry data(traces). It includes built-in integrations for common frameworks and supports custom instrumentation.
36+
- Presto application is getting instrumented using OpenTelemetry API.
37+
- After instrumentation Presto starts the span and register with OpenTelemetry SDK.
38+
- SDK creates context which is the actual association to the flow and attach to the current span(parent).
39+
- While performing any operations, Presto adds the required attributes and events to the respective span.
40+
- In case of sub operations (child span), Presto creates child span, extract the parent context and attach to the child span as parent context so that all parent and child spans get connected.
41+
- After the operation spans will get ended in the order of creation and update the span state.
42+
- SDK keeps on checking the flush trigger and if it reaches the batch, all those spans got batched and send to backend.
43+
- Backend is a system to store, analyse and visualize this telemetry data. Common backends include systems like Jaeger, Instana, Grafana stack, etc.
44+
45+
![Context propagation](/RFC-0009-open-telemetry/context-propagation-coordinator-to-worker.png)
46+
47+
Using context propagation, Signals can be correlated with each other, regardless of where they are generated.
48+
49+
Context contains the information for the sending and receiving service, or execution unit, to correlate one signal with another. For example, if service A calls service B, then a span from service A whose ID is in context can be used as the parent span for the next span created in service B.
50+
51+
Propagation is the mechanism that moves context between services and processes. It serializes or deserializes the context object and provides the relevant information to be propagated from one service to another.
52+
53+
Propagation is usually handled by instrumentation libraries and is transparent to the user. In the event that you need to manually propagate context, you can use the Propagators API.
54+
55+
OpenTelemetry maintains several official propagators. The default propagator is using the headers specified by the W3C TraceContext specification.
56+
- In Presto in areas where REST calls involved, we use the header for context propagation as per the above image.
57+
58+
- Presto Coordinator fetch the current span context and inject as the traceparent http header. Which is then extracted from the Worker side and use to create the child spans with the parent context.
59+
60+
- In some other areas parent context is available in child context and we directly set the parent context in child spans.
61+
62+
63+
## [Optional] Other Approaches Considered
64+
65+
Based on the discussion, this may need to be updated with feedback from reviewers.
66+
67+
## Adoption Plan
68+
69+
Presto Open Telemetry can be configured by modifying the values in presto-main/etc/telemetry.properties
70+
71+
```properties
72+
otel-factory.name=otel
73+
tracing-enabled=false
74+
tracing-backend-url=<backend endpoint>
75+
max-exporter-batch-size=256
76+
max-queue-size=1024
77+
schedule-delay=1000
78+
exporter-timeout=1024
79+
span-sampling=true
80+
```
81+
82+
***otel-factory.name***: unique identifier for OpenTelemetry factory implementation to be registered
83+
84+
***tracing-enabled***: boolean value controlling if tracing is on or off
85+
86+
***tracing-backend-url***: points to otel collector or backend for exporting telemetry data
87+
88+
***max-exporter-batch-size***: maximum number of spans that will be exported in one batch
89+
90+
***max-queue-size***: maximum number of spans that can be queued before being processed for export
91+
92+
***schedule-delay***: delay between batches of span export, controlling how frequently spans are exported
93+
94+
***exporter-timeout***: how long the span exporter will wait for a batch of spans to be successfully sent before timing out
95+
96+
***span-sampling***: boolean to enable/disable sampling. If enabled, spans are only generated for major operations
97+
98+
## Test Plan
99+
100+
We have added UT cases for all the OTel implementations and UT span assertion for few major classes where the spans are actually getting generated.
Loading
Loading
Loading

0 commit comments

Comments
 (0)