feat: add distributed tracing for webhook handling and PipelineRun timing#2605
feat: add distributed tracing for webhook handling and PipelineRun timing#2605ci-operator wants to merge 2 commits intotektoncd:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the observability of Pipelines-as-Code by integrating OpenTelemetry distributed tracing. This allows operators and developers to gain deeper insights into the performance and flow of webhook event processing and the various stages of PipelineRun execution. By propagating trace context, it facilitates a unified view of operations across PaC and Tekton Pipelines, streamlining debugging and performance analysis. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces OpenTelemetry distributed tracing to Pipelines-as-Code. Key changes include integrating tracing into the event handling flow, propagating trace context to PipelineRuns via annotations, and emitting timing spans for PipelineRun lifecycle events. The observability configuration has been updated to include new tracing options, but the removal of existing metrics-protocol and metrics-endpoint configurations requires clarification and documentation. Additionally, an improvement opportunity was identified to ensure consistent tracing data by always setting VCS repository and revision attributes, even when empty.
I am having trouble creating individual review comments. Click here to see my feedback.
config/305-config-observability.yaml (24-25)
The metrics-protocol and metrics-endpoint configurations are being removed from the data section. This change was not explicitly mentioned in the pull request description, which focuses on adding tracing. If these metrics were actively used, their removal could be a breaking change or an unintended side effect. Please clarify if this removal is intentional and, if so, document it in the PR description or release notes.
pkg/adapter/adapter.go (212-217)
For better consistency in tracing data, consider always setting the VCSRepositoryKey and VCSRevisionKey attributes, even if l.event.URL or l.event.SHA are empty. This ensures the attribute key is always present in the span, which can simplify querying and analysis in tracing backends. You could set them to an empty string or a placeholder like "unknown" if the values are not available, instead of omitting the attribute entirely.
if l.event.URL != "" {
span.SetAttributes(tracing.VCSRepositoryKey.String(l.event.URL))
} else {
span.SetAttributes(tracing.VCSRepositoryKey.String(""))
}
if l.event.SHA != "" {
span.SetAttributes(tracing.VCSRevisionKey.String(l.event.SHA))
} else {
span.SetAttributes(tracing.VCSRevisionKey.String(""))
}393b9d3 to
3870a7f
Compare
|
/ok-to-test |
3870a7f to
bcbb64e
Compare
|
@zakisk can you have a look pls |
bcbb64e to
cf3108c
Compare
…ming Emit a PipelinesAsCode:ProcessEvent span covering the full webhook event lifecycle. Emit waitDuration and executeDuration timing spans for completed PipelineRuns. Propagate trace context onto created PipelineRuns via the tekton.dev/pipelinerunSpanContext annotation. Configure the Knative observability framework to read tracing config from the pipelines-as-code-config-observability ConfigMap. Add tracing configuration guide and config examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cf3108c to
501e750
Compare
|
@ci-operator for E2E run we're working on permission workaround in this PR #2611 |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements distributed tracing for Pipelines-as-Code using OpenTelemetry. It adds logic to extract trace context from incoming webhook headers, propagate it to PipelineRuns via a new annotation, and emit timing spans for event processing and PipelineRun execution. The PR also includes configuration updates and new documentation. Feedback points out that an error during JSON marshalling of the span context is currently ignored and should be logged to assist with debugging.
| if jsonBytes, err := json.Marshal(carrier); err == nil { | ||
| if existing := pipelineRun.GetAnnotations()[keys.SpanContextAnnotation]; existing != "" { | ||
| logging.FromContext(ctx).Warnf("overwriting pre-existing %s annotation on PipelineRun template; honoring initiating event trace context", keys.SpanContextAnnotation) | ||
| } | ||
| annotations[keys.SpanContextAnnotation] = string(jsonBytes) | ||
| } |
There was a problem hiding this comment.
The error from json.Marshal(carrier) is silently ignored. If marshalling fails for some reason, trace context propagation will silently fail. It would be better to log this error to aid in debugging potential tracing issues.
if jsonBytes, err := json.Marshal(carrier); err != nil {
logging.FromContext(ctx).Errorf("failed to marshal span context carrier: %v", err)
} else {
if existing := pipelineRun.GetAnnotations()[keys.SpanContextAnnotation]; existing != "" {
logging.FromContext(ctx).Warnf("overwriting pre-existing %s annotation on PipelineRun template; honoring initiating event trace context", keys.SpanContextAnnotation)
}
annotations[keys.SpanContextAnnotation] = string(jsonBytes)
}|
this conflitcs with recently merged 0faad24 |
|
/ok-to-test |
📝 Description of the Change
Add OpenTelemetry distributed tracing to Pipelines-as-Code. When tracing is enabled via the
pipelines-as-code-config-observabilityConfigMap, PaC emits trace spans for webhook event processing and PipelineRun lifecycle timing.Controller: Emits a
PipelinesAsCode:ProcessEventspan covering the full webhook event lifecycle — from SCM event receipt through PipelineRun creation. Propagates trace context onto created PipelineRuns via thetekton.dev/pipelinerunSpanContextannotation, enabling end-to-end traces when Tekton Pipelines also has tracing enabled.Watcher: Emits
waitDuration(creation → start) andexecuteDuration(start → completion) timing spans for completed PipelineRuns, using resource timestamps for accurate wall-clock timing.Tracing is configured through the existing observability ConfigMap with three new keys:
tracing-protocol,tracing-endpoint, andtracing-sampling-rate. The controller's Knative observability configurator is pointed at the correct PaC-specific ConfigMap (pipelines-as-code-config-observability).🔗 Linked GitHub Issue
https://issues.redhat.com/browse/SRVKP-8544
🧪 Testing Strategy
Manually tested end-to-end with an OpenTelemetry collector and Tempo backend. Verified traces appear with correct span names, attributes, and parent-child relationships across PaC and Tekton Pipelines reconciler spans.
🤖 AI Assistance
AI (Claude) was used for code generation, debugging, and documentation. All code has been reviewed, tested, and deployed. Co-authored-by trailers are on each commit.
✅ Submitter Checklist
fix:,feat:) matches the "Type of Change" I selected above.make testlocally.make lintrequiresgolangci-lintwhich is not installed locally; CI will validate.