Skip to content

delete_matching_keys(resource.attributes, ".*") causes state modification for subsequent entries in batch #37647

Closed
@DValentiev

Description

@DValentiev

Component(s)

processor/transformprocessor, exporter/googlecloud

What happened?

Description

googlecloud logs only supports a subset or resource labels produced by resourcedetection / k8sattributes processors.
Therefore when using googlecloud log exporter with resourcedetection / k8sattributes processors I have to move resource.attributes to attributes.

To avoid data duplication after merging resource.attributes I deleted them.
This caused resource.attributes to be empty for subsequent resources in processed batch.

Steps to Reproduce

  - merge_maps(attributes, resource.attributes, "upsert")
  - delete_matching_keys(resource.attributes, ".*")   

Expected Result

resource.attributes is immutable between separate resource entries.

Actual Result

resource.attributes state carries over between separate log entries in a batch.

Collector version

otel/opentelemetry-collector-contrib:0.114.0

Environment information

EKS 1.31

OpenTelemetry Collector configuration

mode: daemonset
securityContext: 
  privileged: true

image:
  repository: "otel/opentelemetry-collector-contrib"

podAnnotations:
  iam.amazonaws.com/role: ##########################

resources:
  requests:
    cpu: 50m
    memory: 256Mi
  limits:
    cpu: 100m
    memory: 512Mi

extraVolumes:
  - name: google-cloud-credentials
    secret:
      secretName: my-google-cloud-credentials

extraVolumeMounts:
  - name: google-cloud-credentials
    mountPath: /etc/google-cloud
    readOnly: true

extraEnvs:
  - name: GOOGLE_APPLICATION_CREDENTIALS 
    value: /etc/google-cloud/credentials.json

clusterRole:
  create: true
  rules:
    - apiGroups: [""]
      resources: ["pods", "namespaces", "nodes", "configmaps"]
      verbs: ["get", "watch", "list"]
    - apiGroups: ["apps"]
      resources: ["replicasets"]
      verbs: ["get", "list", "watch"]
    - apiGroups: ["extensions"]
      resources: ["replicasets"]
      verbs: ["get", "list", "watch"]

presets:
  kubernetesAttributes:
    enabled: false
    extractAllPodLabels: false
  logsCollection:
    enabled: true

config:
  receivers:
    filelog:
      exclude:
      - /var/log/pods/otel-logs_otel-opentelemetry-collector*_*/opentelemetry-collector/*.log
      include:
      - /var/log/pods/*/*/*.log
      include_file_name: false
      include_file_path: true
      operators:
      - id: container-parser
        max_log_size: 102400
        type: container
      retry_on_failure:
        enabled: true
      start_at: end

  processors:
    resourcedetection:
      detectors: 
        - eks
        - ec2
      eks:
        resource_attributes:
          k8s.cluster.name:
            enabled: true
    filter:
      error_mode: ignore
      logs:
        log_record:
          - body == nil or (Substring(body, 0, 11) != "[ELASTIC]{\"" and Substring(body, 0, 2) != "{\"")
    transform:
      error_mode: ignore
      log_statements:
        - context: log
          conditions:
              - body != nil and Substring(body, 0, 11) == "[ELASTIC]{\""
          statements:
            - set(body.string, Substring(body, 9, Len(body) - 9))
            - set(attributes["legacy-format"], "[ELASTIC]")
        - context: log
          conditions:
              - body != nil and Substring(body, 0, 2) == "{\""
          statements:
            - merge_maps(attributes, resource.attributes, "upsert")
            - delete_matching_keys(resource.attributes, ".*")            

            - set(cache, ParseJSON(body))
            - set(body, cache)

            - set(severity_text, "DEBUG")
            - set(severity_text, body["level"])
            - delete_key(body, "level")

            - set(body["logging.googleapis.com/trace"], attributes["TraceId"])
            - set(body["logging.googleapis.com/spanId"], attributes["SpanId"])
            
            - delete_key(body, "TraceId")
            - delete_key(body, "SpanId")
            
  exporters:
    googlecloud:
      project: ###############
      log:
        default_log_name: otel-collector
    debug:
      verbosity: detailed
            
  service:
    pipelines:
      logs:
        receivers: 
          - filelog
        processors: 
          - memory_limiter
          - filter
          - resourcedetection
          - transform
          - batch
        exporters: 
          - googlecloud
          - debug
      metrics: null
      traces: null

tolerations:
  - effect: NoSchedule
    operator: Exists

Log output

2025-02-03T14:56:20.997Z        info    ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope

LogRecord #0
ObservedTimestamp: 2025-02-03 14:56:20.642726291 +0000 UTC
Timestamp: 2025-02-03 14:56:20.60939152 +0000 UTC
SeverityText: INFO
SeverityNumber: Unspecified(0)
Body: Map({"message":"HTTP GET /health/ready","timestamp":"2025-02-03T14:56:20.609Z","type":"######","userType":"ANONYMOUS"})
Attributes:
     -> log.iostream: Str(stdout)
     -> log.file.path: Str(/var/log/pods/#########-68548f9946-4dwd5_af3cf95a-9dd2-4500-a5d6-b53ec560ec7d/######/0.log)
     -> logtag: Str(F)
     -> k8s.pod.name: Str(######-68548f9946-4dwd5)
     -> k8s.container.restart_count: Str(0)
     -> k8s.pod.uid: Str(af3cf95a-9dd2-4500-a5d6-b53ec560ec7d)
     -> k8s.container.name: Str(#########)
     -> k8s.namespace.name: Str(#############)
     -> cloud.provider: Str(aws)
     -> cloud.platform: Str(aws_eks)
     -> k8s.cluster.name: Str()
     -> cloud.region: Str(eu-west-1)
     -> cloud.account.id: Str(############)
     -> cloud.availability_zone: Str(eu-west-1a)
     -> host.id: Str(##########)
     -> host.image.id: Str(ami-04b9518ca3840ddd9)
     -> host.type: Str(m7i.4xlarge)
     -> host.name: Str(#############)
Trace ID:
Span ID:
Flags: 0
LogRecord #1
ObservedTimestamp: 2025-02-03 14:56:20.642795798 +0000 UTC
Timestamp: 2025-02-03 14:56:20.609430294 +0000 UTC
SeverityText: INFO
SeverityNumber: Unspecified(0)
Body: Map({"message":"HTTP 200","properties":{"code":200,"content-length":2,"responseTime":0},"timestamp":"2025-02-03T14:56:20.609Z","type":"##########","userType":"ANONYMOUS"})
Attributes:
     -> logtag: Str(F)
     -> log.file.path: Str(/var/log/pods/############-68548f9946-4dwd5_af3cf95a-9dd2-4500-a5d6-b53ec560ec7d/######/0.log)
     -> log.iostream: Str(stdout)
Trace ID:
Span ID:
Flags: 0

Additional context

Secondary bug / feature request:

Google cloud exporter should not pass all resource labels as GCP log resource.labels as the supported fields are restrictive:
https://cloud.google.com/logging/docs/structured-logging

Instead most of the resource attributes should be passed as attributes which attaches them as log entry label.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions