Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mimir out-of-order samples for target_info after Alloy migration #1271

Open
bzurkowski opened this issue Feb 25, 2025 · 0 comments
Open

Mimir out-of-order samples for target_info after Alloy migration #1271

bzurkowski opened this issue Feb 25, 2025 · 0 comments

Comments

@bzurkowski
Copy link

After migrating from Grafana Agent to Grafana Alloy by upgrading Kubernetes monitoring from v1 to v2, I have encountered recurring errors from the Mimir Distributor warning about out-of-order samples in the target_info series:

ts=2025-02-25T14:46:20.230924388Z caller=push.go:130 level=error user=anonymous msg="push error" err="rpc error: code = Code(400) desc = failed pushing to ingester: user=anonymous: the sample has been rejected because another sample with a more recent timestamp has already been ingested and out-of-order samples are not allowed (err-mimir-sample-out-of-order). The affected sample has timestamp 2025-02-25T14:46:15.63Z and is from series {__name__=\"target_info\", cluster=\"pepe\"}"

Note that I am running a self-hosted Mimir setup with version 2.9.0.

These errors started appearing immediately after the migration, and they were not present before. Additionally, I noticed that the target_info metric was not collected prior to the migration.

Furthermore, I noticed that the scraped series indeed contains duplicated labels. Every minute, I receive approximately 50 errors concerning the same series with the same duplicated labels. As far as I know, the target_info metric should include differentiating labels such as job or instance, correct?

Image

I am unable to determine why this metric started being collected after the migration or why the mentioned error occurs. It seems that the metric originates from the otelcol.exporter.prometheus component and is controlled via the include_target_info setting, which defaults to true.

However, it appears that disabling gathering of this metric via Helm values is not currently possible. I noticed recent changes related to fine-tuning exporter settings, such as this PR. Would it make sense to introduce a similar configuration option for this setting?

Additionally, do you have any insights into why ingestion of this metric might fail in my setup?

Below is my current monitoring configuration:

cluster:
  name: default
global:
  scrapeInterval: 60s
destinations:
  - name: mysetup-prom
    type: prometheus
    url: dummy
    urlFrom: env("prometheusUrl")
    auth:
      type: basic
      usernameKey: username
      passwordKey: password
    secret:
      create: false
      name: monitoring-credentials-prom
  - name: mysetup-logs
    type: loki
    url: dummy
    urlFrom: env("lokiUrl")
    auth:
      type: basic
      usernameKey: username
      passwordKey: password
    secret:
      create: false
      name: monitoring-credentials-loki
  - name: mysetup-traces
    type: otlp
    url: dummy
    urlFrom: env("tempoUrl")
    protocol: http
    auth:
      type: basic
      usernameKey: username
      passwordKey: password
    secret:
      create: false
      name: monitoring-credentials-tempo
    metrics:
      enabled: false
    logs:
      enabled: false
    traces:
      enabled: true
clusterMetrics:
  enabled: true
  opencost:
    enabled: false
  kepler:
    enabled: false
  windows-exporter:
    enabled: false
prometheusOperatorObjects:
  enabled: true
  crds:
    deploy: true
clusterEvents:
  enabled: true
podLogs:
  enabled: true
applicationObservability:
  enabled: true
  receivers:
    otlp:
      grpc:
        enabled: true
        port: 4317
      http:
        enabled: true
        port: 4318
  processors:
    grafanaCloudMetrics:
      enabled: true
  traces:
    filters:
      span:
        - attributes["http.target"] == "/metrics"
        - attributes["http.target"] == "/internal/health/readiness"
        - attributes["http.target"] == "/internal/health/liveness"
integrations:
  alloy:
    instances:
      - name: alloy
        labelSelectors:
          app.kubernetes.io/name:
            - alloy-metrics
            - alloy-singleton
            - alloy-logs
            - alloy-receiver
selfReporting:
  enabled: false
alloy-metrics:
  enabled: true
  logging:
    level: info
  alloy:
    clustering:
      enabled: true
    envFrom:
      - configMapRef:
          name: monitoring-config
    securityContext:
      allowPrivilegeEscalation: false
    resources:
      requests:
        cpu: 200m
        memory: 1Gi
      limits:
        cpu: 1000m
        memory: 1Gi
  controller:
    autoscaling:
      enabled: true
      minReplicas: 2
      maxReplicas: 5
      targetMemoryUtilizationPercentage: 70
  configReloader:
    securityContext:
      allowPrivilegeEscalation: false
    resources:
      requests:
        cpu: 10m
        memory: 50Mi
      limits:
        cpu: 100m
        memory: 50Mi
alloy-singleton:
  enabled: true
  logging:
    level: info
  alloy:
    envFrom:
      - configMapRef:
          name: monitoring-config
    securityContext:
      allowPrivilegeEscalation: false
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        cpu: 1000m
        memory: 100Mi
  configReloader:
    securityContext:
      allowPrivilegeEscalation: false
    resources:
      requests:
        cpu: 10m
        memory: 50Mi
      limits:
        cpu: 100m
        memory: 50Mi
alloy-logs:
  enabled: true
  logging:
    level: info
  alloy:
    envFrom:
      - configMapRef:
          name: monitoring-config
    securityContext:
      allowPrivilegeEscalation: false
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        cpu: 1000m
        memory: 100Mi
  configReloader:
    securityContext:
      allowPrivilegeEscalation: false
    resources:
      requests:
        cpu: 10m
        memory: 50Mi
      limits:
        cpu: 100m
        memory: 50Mi
alloy-receiver:
  enabled: true
  logging:
    level: info
  alloy:
    envFrom:
      - configMapRef:
          name: monitoring-config
    securityContext:
      allowPrivilegeEscalation: false
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        cpu: 1000m
        memory: 100Mi
    extraPorts:
      - name: otlp-grpc
        port: 4317
        targetPort: 4317
        protocol: TCP
      - name: otlp-http
        port: 4318
        targetPort: 4318
        protocol: TCP
  configReloader:
    securityContext:
      allowPrivilegeEscalation: false
    resources:
      requests:
        cpu: 10m
        memory: 50Mi
      limits:
        cpu: 100m
        memory: 50Mi

Thanks in advance for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant