-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry refactoring #6381
Conversation
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Signed-off-by: Nicolas Lamirault <[email protected]>
Warning Rate limit exceeded@nlamirault has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 1 minutes and 46 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (5)
WalkthroughThe pull request updates the OpenTelemetry Collector Helm chart configuration. A new dependency alias is added in the chart file without altering existing entries. The ConfigMap for dashboards now uses a revised naming convention for consistency, and credential management has been updated with new secret key mappings and additional ExternalSecret resources. Furthermore, the values files have been restructured to separate configurations for logs, metrics, and traces, with updated environment variables and telemetry settings. Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer/Helm Chart
participant Chart as Chart.yaml
participant Template as Templates (ConfigMap & Credentials)
participant K8s as Kubernetes API
participant SecretOp as ExternalSecrets Operator
Dev->>Chart: Deploy OpenTelemetry Collector
Chart->>Template: Process dependency and template updates
Template->>K8s: Apply ConfigMap and ExternalSecret resources
K8s->>SecretOp: Trigger secret retrieval for updated credentials
SecretOp-->>K8s: Provide updated secrets
Possibly related PRs
Suggested labels
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
gitops/argocd/charts/opentelemetry/opentelemetry-collector/values-talos-homelab.yaml (1)
72-699
: Previous configuration commented out rather than removed.The previous collector configuration has been commented out but retained in the file. While this preserves the historical configuration for reference, consider cleaning up the file by removing the commented sections once the new configuration is confirmed to be working correctly. This would improve readability and maintainability of the configuration files.
gitops/argocd/charts/opentelemetry/opentelemetry-collector/values.yaml (2)
144-155
: Consider enabling the redaction processor for PII protection.The commented-out redaction processor would help protect sensitive information like credit card numbers. Consider evaluating and enabling this for production environments where logs might contain sensitive data.
1051-1065
: Consider enabling OpenTelemetry Management Protocol (OpAMP).The commented-out OpAMP configuration would enable remote management of the collector. Consider evaluating this for production environments to simplify configuration updates.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
gitops/argocd/charts/opentelemetry/alloy/templates/alloy/metrics/_metrics_prom_operator.alloy
(3 hunks)gitops/argocd/charts/opentelemetry/alloy/values.yaml
(2 hunks)gitops/argocd/charts/opentelemetry/opentelemetry-collector/Chart.yaml
(1 hunks)gitops/argocd/charts/opentelemetry/opentelemetry-collector/templates/configmap-dashboards.yaml
(2 hunks)gitops/argocd/charts/opentelemetry/opentelemetry-collector/templates/credentials.yaml
(3 hunks)gitops/argocd/charts/opentelemetry/opentelemetry-collector/values-talos-homelab.yaml
(1 hunks)gitops/argocd/charts/opentelemetry/opentelemetry-collector/values.yaml
(11 hunks)
🔇 Additional comments (29)
gitops/argocd/charts/opentelemetry/alloy/templates/alloy/metrics/_metrics_prom_operator.alloy (1)
16-16
: Syntax updates to match_labels key-value pairs.The changes consistently convert the
prometheus.io/operator
keys to a quoted string format with equals sign. This is a good practice for ensuring proper parsing of keys with special characters.Also applies to: 30-30, 44-44
gitops/argocd/charts/opentelemetry/opentelemetry-collector/Chart.yaml (1)
19-22
: New dependency for cluster-level metrics collection.Adding a dedicated metrics collector for cluster-level metrics is a good practice for separation of concerns and makes the telemetry architecture more maintainable.
gitops/argocd/charts/opentelemetry/alloy/values.yaml (2)
12-12
: Updated OTLP endpoint for metrics collection.The service name in the endpoint has been updated to include the collector prefix, which better aligns with Kubernetes service naming conventions for the OpenTelemetry components.
30-33
:❓ Verification inconclusive
Commented out mount and volume configurations.
The removal of these mount and volume configurations suggests a change in how configuration is managed for the OpenTelemetry collector.
Also applies to: 36-40
🏁 Script executed:
#!/bin/bash # Check if there are any references to portefaix-metrics in other configuration files # to ensure this removal doesn't break functionality rg -A 3 "portefaix-metrics" --type yamlLength of output: 2226
Attention: Verify Consistency of Configuration Changes for the OpenTelemetry Collector
We've confirmed that in
gitops/argocd/charts/opentelemetry/alloy/values.yaml
, the mount and volume configurations forportefaix-metrics
have been commented out on both lines 30–33 and 36–40—indicating the intended removal. However, active references toportefaix-metrics
still appear in the alloy‑otel charts (both in the configmap template and the values file). Please verify that this discrepancy is intentional. If the new configuration management approach should apply uniformly, similar changes may be needed for alloy‑otel.
- Locations in the codebase:
gitops/argocd/charts/opentelemetry/alloy/values.yaml
– configuration sections at lines 30–33 and 36–40 are commented out.gitops/argocd/charts/opentelemetry/alloy-otel/templates/configmap-portefaix.yaml
andgitops/argocd/charts/opentelemetry/alloy-otel/values.yaml
– still contain activeportefaix-metrics
references.gitops/argocd/charts/opentelemetry/opentelemetry-collector/values-talos-homelab.yaml (1)
6-20
: Added environment configuration for OpenTelemetry components.The addition of consistent environment variable configurations for logs, metrics, metrics-cluster, and traces is a good practice. Each component gets the same set of environment variables:
OTEL_RESOURCE_ATTRIBUTES
for pod IP- Kubernetes metadata (
K8S_NAMESPACE
,K8S_NODE_NAME
,K8S_CLUSTER_NAME
)This provides consistent attribution of telemetry data across all components.
Also applies to: 22-36, 38-52, 54-68
gitops/argocd/charts/opentelemetry/opentelemetry-collector/templates/credentials.yaml (6)
25-25
: Environment variable standardization aligns with Datadog's conventions.This change from
DATADOG_API_KEY
toDD_API_KEY
follows Datadog's recommended environment variable naming convention, which will make troubleshooting easier and match documentation examples.
49-49
: Environment variable standardization aligns with Lightstep's conventions.This change from
LIGHTSTEP_TOKEN
toLS_TOKEN
follows Lightstep's recommended environment variable naming convention, which will make troubleshooting easier and match documentation examples.
97-120
: Good addition of Kloudmate integration.The external secret for Kloudmate follows the established pattern and will enable integration with this observability platform.
121-144
: Good addition of HyperDX integration.The external secret for HyperDX follows the established pattern and will enable integration with this observability platform.
145-168
: Good addition of OneUptime integration.The external secret for OneUptime follows the established pattern and will enable integration with this observability platform.
73-90
:❓ Verification inconclusive
Verify if these Grafana Cloud credentials are still needed.
You've commented out multiple Grafana Cloud credentials while retaining the OTEL collector token and ID. This change appears to align with the shift to the OTLP-based integration in the values.yaml file. Please ensure all dependent configurations have been updated to use the new authentication method.
🏁 Script executed:
#!/bin/bash # Description: Check for any references to the commented-out Grafana Cloud credential variables echo "Checking for references to deprecated Grafana Cloud credentials..." rg --no-filename "GRAFANA_CLOUD_(METRICS|LOGS|TRACES)_(ID|APIKEY)" --type yamlLength of output: 1539
Action Required: Revisit Grafana Cloud Credential Removal
It appears that while a block of Grafana Cloud credentials has been commented out (as seen in the changes to lines 73–90), active references to these same credentials still exist in the YAML configuration (e.g., definitions with
- secretKey: GRAFANA_CLOUD_METRICS_ID
, etc.). Given the move toward the OTLP-based integration (where only the OTEL collector token and ID are retained), please verify:
- Whether the active Grafana Cloud credential references (IDs and APIKEYs for Metrics, Logs, and Traces) are still required.
- That all dependent configurations have been updated to solely use the new authentication method.
Review and update accordingly to ensure consistency in credential usage across the codebase.
gitops/argocd/charts/opentelemetry/opentelemetry-collector/templates/configmap-dashboards.yaml (2)
25-25
: Improved naming consistency for dashboard ConfigMaps.The naming change adds "collector" to the ConfigMap name, making it more specific and consistent with the component it's related to.
54-54
: Properly aligned configMapRef with the updated ConfigMap name.This change ensures the GrafanaDashboard properly references the renamed ConfigMap. The consistent naming pattern improves clarity and maintainability.
gitops/argocd/charts/opentelemetry/opentelemetry-collector/values.yaml (16)
14-14
: Improved labeling with consistent app.kubernetes.io/part-of label.Adding the standardized Kubernetes recommended label
app.kubernetes.io/part-of: opentelemetry-collector
to all components improves resource organization and enables better filtering and selection across the cluster.Also applies to: 232-232, 552-552, 780-780, 999-999
30-34
: Enablement of Kubernetes attributes with deactivation of events.The configuration now enables kubernetesAttributes extraction while disabling kubernetesEvents. This focuses collection on resource metadata rather than Kubernetes event logs, which is appropriate for most observability needs.
40-43
: Added standard OTLP, Jaeger, and Zipkin receivers.Enabling these standard receivers improves interoperability with various telemetry sources. This is a good practice as it allows different application instrumentation libraries to connect to the collector.
46-50
: Enhanced resource detection capability.The addition of multiple detectors (
k8snode
,env
,system
) for resource detection improves the richness of metadata available for observability.
52-134
: Comprehensive Kubernetes attributes extraction configuration.The detailed k8sattributes processor configuration will enable rich metadata extraction from Kubernetes, enhancing the context available for metrics, logs, and traces. The configuration extracts important labels and adds them as standard attributes, which improves the observability capabilities.
137-143
: Added cluster name enrichment through transform processor.The transform processor adds the cluster name to resources, which is valuable for distinguishing telemetry across multiple clusters in a single observability platform.
157-159
: Updated exporter to use the gateway pattern.Routing all telemetry through a central gateway (
otlphttp/gateway
) improves the architecture by centralizing export configurations and connections to external systems.
163-164
: Updated logging configuration for better observability.Setting the logs level to
info
and using JSON encoding will improve the collector's own logs observability and integration with log processing systems.
172-176
: Pipeline configuration with added processors.The logs pipeline now includes resourcedetection, k8sattributes, and transform processors, which will enrich logs with valuable context about their origin.
184-209
: Well-defined port configuration.The explicit port configuration improves clarity on which ports should be exposed. Disabling unused ports (like Jaeger and Zipkin) reduces the attack surface.
258-379
: Comprehensive metrics collection configuration.The metrics collection setup is thorough, with appropriate options for kubeletstats and hostmetrics. The exclusion lists for filesystem and network interfaces will help prevent collecting unnecessary metrics.
1022-1033
: Secured sensitive information with secret references.The use of extraEnvsFrom to reference secrets for various backends is a secure approach to handling credentials.
1068-1145
: Comprehensive exporter configuration for multiple backends.The gateway collector is well configured with exporters for various backends (Datadog, Grafana Cloud, Honeycomb, Lightstep, HyperDX, OneUptime, Loki, Tempo). This provides flexibility in choosing observability platforms.
1218-1246
: Well-defined service configuration for the gateway.The port and service configuration for the gateway component provides clear endpoints for different protocols (gRPC, HTTP) which simplifies client integrations.
1166-1179
:❓ Verification inconclusive
Verify batch processor configuration with multiple backends.
Using both standard batch processor and batch/datadog in the same pipeline could lead to inefficiencies. Consider evaluating if the specialized batch processor is necessary and its impact on performance.
Also applies to: 1186-1198, 1204-1216
🏁 Script executed:
#!/bin/bash # Description: Check OpenTelemetry collector documentation for batch processor recommendations echo "Searching for OpenTelemetry batch processor documentation and recommendations..." rg --type yaml "batch/" -A 5 -B 5Length of output: 4368
Review Update: Evaluate Combined Batch Processor Configuration
The configuration currently includes both the standard "batch" processor and the specialized "batch/datadog" processor. As seen in the file (
gitops/argocd/charts/opentelemetry/opentelemetry-collector/values.yaml
at lines 1166–1179, with similar settings at 1186–1198 and 1204–1216), this dual setup may lead to unexpected inefficiencies. Please review and confirm the following:
- Whether the inclusion of both processors is intentional.
- If the specialized "batch/datadog" processor is essential for your backend, or if relying solely on the standard "batch" processor would be sufficient.
- The impact on performance when processing multiple backends, ensuring there’s no redundant or conflicting behavior.
Make any necessary adjustments based on your evaluation and performance tests.
721-734
:✅ Verification successful
Verify k8s_cluster receiver compatibility.
The k8s_cluster receiver has been added to the metrics pipeline, but ensure it's compatible with the version of OpenTelemetry collector being used, as this is a relatively new receiver.
🌐 Web query:
Is k8s_cluster receiver supported in the latest OpenTelemetry Collector Contrib?
💡 Result:
The k8s_cluster receiver is supported in the latest OpenTelemetry Collector Contrib distribution. Here’s a breakdown of its status and functionality:
Key Details
Stability:
- Metrics: Beta
- Logs (entity events): Development
This indicates production-ready metrics collection with evolving support for logs[10].Deployment:
Designed for cluster-wide monitoring, it requires a single instance (deployed as aDeployment
orDaemonSet
). Multiple replicas will cause duplicate data[1][5].Capabilities:
- Collects cluster-level metrics (node conditions, pod phases, container restarts)
- Gathers Kubernetes entity events via API server polling/watching[1][10].
- Integrates with Helm charts via the
clusterMetrics
preset for simplified setup[1][5].Configuration Example
receivers: k8s_cluster: auth_type: serviceAccount node_conditions_to_report: [Ready, MemoryPressure] allocatable_types_to_report: [cpu, memory]Supported Distributions
- Included in both the contrib and k8s distributions of the OpenTelemetry Collector[10].
For implementation guidance, see the Kubernetes Cluster Receiver documentation[1][5].
Citations:
- 1: https://opentelemetry.io/docs/platforms/kubernetes/collector/components/
- 2: https://docs.honeycomb.io/send-data/kubernetes/opentelemetry/components/
- 3: https://isitobservable.io/open-telemetry/how-to-observe-your-kubernetes-cluster-with-opentelemetry
- 4: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-OTLPSimplesetup.html
- 5: https://opentelemetry.io/docs/platforms/kubernetes/getting-started/
- 6: https://opentelemetry.io/docs/collector/configuration/
- 7: https://grafana.com/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/helm-chart-config/otel-collector/
- 8: https://docs.splunk.com/observability/gdi/opentelemetry/components/kubernetes-cluster-receiver.html
- 9: https://opentelemetry.io/docs/collector/installation/
- 10: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/k8sclusterreceiver/README.md
k8s_cluster receiver compatibility confirmed — no changes required.
The web query confirms that the
k8s_cluster
receiver is supported in the latest OpenTelemetry Collector Contrib distribution (with metrics at a stable beta level). The current configuration in the values.yaml is appropriate for the metrics pipeline. Just ensure that your deployment matches the documented requirements for production use (e.g., using a single instance to avoid duplicate data).
metrics: | ||
address: ${env:MY_POD_IP}:8888 | ||
extensions: | ||
- health_check | ||
pipelines: | ||
logs: | ||
receivers: | ||
- filelog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify logs receiver configuration.
The logs pipeline still includes filelog
receiver on line 170, but it's not defined in the receivers section. This might cause configuration errors.
Apply this diff to fix the receiver configuration:
logs:
receivers:
- - filelog
+ - otlp
Also applies to: 179-180
securityContext: | ||
runAsUser: 0 | ||
runAsGroup: 0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Security concern: Container running as root.
Running the collector as root (uid 0, gid 0) presents security risks. Consider using a non-root user if possible, especially for components that don't require privileged access.
🏁 Script executed:
#!/bin/bash
# Description: Check if there's a specific requirement for the collector to run as root
echo "Checking for possible requirements for root access..."
fd . -t f --exec grep -l "hostNetwork:\s*true" {} \; || echo "No hostNetwork requirement found"
fd . -t f --exec grep -l "hostPID:\s*true" {} \; || echo "No hostPID requirement found"
fd . -t f --exec grep -l "hostPath" {} \; || echo "No hostPath volumes found"
Length of output: 699
Action: Update opentelemetry Collector Security Context to Use a Non-Root User
The file gitops/argocd/charts/opentelemetry/opentelemetry-collector/values.yaml
(lines 210-213) currently sets the container to run as root (uid 0, gid 0). Our repository-wide checks did not uncover any dependencies (such as enabled hostNetwork, hostPID, or hostPath configurations directly related to the collector) that would justify running with root privileges. In light of this, please consider updating the security context to use non-root user and group IDs to mitigate potential security risks.
- File to update:
gitops/argocd/charts/opentelemetry/opentelemetry-collector/values.yaml
(lines 210-213) - Suggested change:
securityContext: runAsUser: <non-root-user-id> runAsGroup: <non-root-group-id>
Summary by CodeRabbit
New Features
Refactor
Chores