-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support persistent batch processor to prevent telemetry data loss #6940
Comments
The behavior of batch span processor is dictated by the specification, so changing its behavior to have a persistent queue would require changing the spec, which I expect would be an involved process. I have heard a variety of people interested in increasing the reliability of telemetry delivery. This requires several design changes:
These are not trivial challenges. There's currently a proposal to start an Audit logging SIG. On the surface, this seems unrelated to your request. But when you dig deeper, one of the primary challenges the audit SIG would face is improving the reliability around delivery of OpenTelemetry data. Whatever improvements they make for reliable delivery of audit logs will also likely be available as an opt in feature for traces, metrics, and logs. I think progress on this request is most likely to come from that area so I encourage you to check it out and comment. |
Hi @xhyzzZ . Sorry for the delay in responding, but I thought I should chime in to also let you know about the I think several of us would be curious to hear about your thoughts and hope that you can evaluate it. Thanks! |
Hey @breedx-splk , thanks for the info. I just read the doc briefly. We have implemented our own version of persistent storage solution and almost looks like the same as yours(two thread, one thread for writing and one thread for reading). I think there are two issues here for dropping spans:
For the disk buffering, if I understand correctly, will resolve most of the drop spans issues with limitations. But I have several questions would like to confirm.
|
Is your feature request related to a problem? Please describe.
We are using
BatchSpanProcessor
and have a scenario that where the traffic burst, we will lose some of the spans because we can't always tune the processor configs perfectly at time. Hence I am thinking if there is a way we can persist the data to make sure there is no data loss when traffic spikes.Reference: If the configs are not tuned well, it will drop spans here silently only with limited metrics: https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk/trace/src/main/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessor.java#L238
Describe the solution you'd like
opentelemetry-java/sdk/trace-shaded-deps/src/main/java/io/opentelemetry/sdk/trace/internal/JcTools.java
Line 32 in efdacc1
Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: