Skip to content

delete_matching_keys(resource.attributes, ".*") causes state modification for subsequent entries in batch #37647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DValentiev opened this issue Feb 3, 2025 · 6 comments
Labels
processor/transform Transform processor question Further information is requested

Comments

@DValentiev
Copy link

DValentiev commented Feb 3, 2025

Component(s)

processor/transformprocessor, exporter/googlecloud

What happened?

Description

googlecloud logs only supports a subset or resource labels produced by resourcedetection / k8sattributes processors.
Therefore when using googlecloud log exporter with resourcedetection / k8sattributes processors I have to move resource.attributes to attributes.

To avoid data duplication after merging resource.attributes I deleted them.
This caused resource.attributes to be empty for subsequent resources in processed batch.

Steps to Reproduce

  - merge_maps(attributes, resource.attributes, "upsert")
  - delete_matching_keys(resource.attributes, ".*")   

Expected Result

resource.attributes is immutable between separate resource entries.

Actual Result

resource.attributes state carries over between separate log entries in a batch.

Collector version

otel/opentelemetry-collector-contrib:0.114.0

Environment information

EKS 1.31

OpenTelemetry Collector configuration

mode: daemonset
securityContext: 
  privileged: true

image:
  repository: "otel/opentelemetry-collector-contrib"

podAnnotations:
  iam.amazonaws.com/role: ##########################

resources:
  requests:
    cpu: 50m
    memory: 256Mi
  limits:
    cpu: 100m
    memory: 512Mi

extraVolumes:
  - name: google-cloud-credentials
    secret:
      secretName: my-google-cloud-credentials

extraVolumeMounts:
  - name: google-cloud-credentials
    mountPath: /etc/google-cloud
    readOnly: true

extraEnvs:
  - name: GOOGLE_APPLICATION_CREDENTIALS 
    value: /etc/google-cloud/credentials.json

clusterRole:
  create: true
  rules:
    - apiGroups: [""]
      resources: ["pods", "namespaces", "nodes", "configmaps"]
      verbs: ["get", "watch", "list"]
    - apiGroups: ["apps"]
      resources: ["replicasets"]
      verbs: ["get", "list", "watch"]
    - apiGroups: ["extensions"]
      resources: ["replicasets"]
      verbs: ["get", "list", "watch"]

presets:
  kubernetesAttributes:
    enabled: false
    extractAllPodLabels: false
  logsCollection:
    enabled: true

config:
  receivers:
    filelog:
      exclude:
      - /var/log/pods/otel-logs_otel-opentelemetry-collector*_*/opentelemetry-collector/*.log
      include:
      - /var/log/pods/*/*/*.log
      include_file_name: false
      include_file_path: true
      operators:
      - id: container-parser
        max_log_size: 102400
        type: container
      retry_on_failure:
        enabled: true
      start_at: end

  processors:
    resourcedetection:
      detectors: 
        - eks
        - ec2
      eks:
        resource_attributes:
          k8s.cluster.name:
            enabled: true
    filter:
      error_mode: ignore
      logs:
        log_record:
          - body == nil or (Substring(body, 0, 11) != "[ELASTIC]{\"" and Substring(body, 0, 2) != "{\"")
    transform:
      error_mode: ignore
      log_statements:
        - context: log
          conditions:
              - body != nil and Substring(body, 0, 11) == "[ELASTIC]{\""
          statements:
            - set(body.string, Substring(body, 9, Len(body) - 9))
            - set(attributes["legacy-format"], "[ELASTIC]")
        - context: log
          conditions:
              - body != nil and Substring(body, 0, 2) == "{\""
          statements:
            - merge_maps(attributes, resource.attributes, "upsert")
            - delete_matching_keys(resource.attributes, ".*")            

            - set(cache, ParseJSON(body))
            - set(body, cache)

            - set(severity_text, "DEBUG")
            - set(severity_text, body["level"])
            - delete_key(body, "level")

            - set(body["logging.googleapis.com/trace"], attributes["TraceId"])
            - set(body["logging.googleapis.com/spanId"], attributes["SpanId"])
            
            - delete_key(body, "TraceId")
            - delete_key(body, "SpanId")
            
  exporters:
    googlecloud:
      project: ###############
      log:
        default_log_name: otel-collector
    debug:
      verbosity: detailed
            
  service:
    pipelines:
      logs:
        receivers: 
          - filelog
        processors: 
          - memory_limiter
          - filter
          - resourcedetection
          - transform
          - batch
        exporters: 
          - googlecloud
          - debug
      metrics: null
      traces: null

tolerations:
  - effect: NoSchedule
    operator: Exists

Log output

2025-02-03T14:56:20.997Z        info    ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope

LogRecord #0
ObservedTimestamp: 2025-02-03 14:56:20.642726291 +0000 UTC
Timestamp: 2025-02-03 14:56:20.60939152 +0000 UTC
SeverityText: INFO
SeverityNumber: Unspecified(0)
Body: Map({"message":"HTTP GET /health/ready","timestamp":"2025-02-03T14:56:20.609Z","type":"######","userType":"ANONYMOUS"})
Attributes:
     -> log.iostream: Str(stdout)
     -> log.file.path: Str(/var/log/pods/#########-68548f9946-4dwd5_af3cf95a-9dd2-4500-a5d6-b53ec560ec7d/######/0.log)
     -> logtag: Str(F)
     -> k8s.pod.name: Str(######-68548f9946-4dwd5)
     -> k8s.container.restart_count: Str(0)
     -> k8s.pod.uid: Str(af3cf95a-9dd2-4500-a5d6-b53ec560ec7d)
     -> k8s.container.name: Str(#########)
     -> k8s.namespace.name: Str(#############)
     -> cloud.provider: Str(aws)
     -> cloud.platform: Str(aws_eks)
     -> k8s.cluster.name: Str()
     -> cloud.region: Str(eu-west-1)
     -> cloud.account.id: Str(############)
     -> cloud.availability_zone: Str(eu-west-1a)
     -> host.id: Str(##########)
     -> host.image.id: Str(ami-04b9518ca3840ddd9)
     -> host.type: Str(m7i.4xlarge)
     -> host.name: Str(#############)
Trace ID:
Span ID:
Flags: 0
LogRecord #1
ObservedTimestamp: 2025-02-03 14:56:20.642795798 +0000 UTC
Timestamp: 2025-02-03 14:56:20.609430294 +0000 UTC
SeverityText: INFO
SeverityNumber: Unspecified(0)
Body: Map({"message":"HTTP 200","properties":{"code":200,"content-length":2,"responseTime":0},"timestamp":"2025-02-03T14:56:20.609Z","type":"##########","userType":"ANONYMOUS"})
Attributes:
     -> logtag: Str(F)
     -> log.file.path: Str(/var/log/pods/############-68548f9946-4dwd5_af3cf95a-9dd2-4500-a5d6-b53ec560ec7d/######/0.log)
     -> log.iostream: Str(stdout)
Trace ID:
Span ID:
Flags: 0

Additional context

Secondary bug / feature request:

Google cloud exporter should not pass all resource labels as GCP log resource.labels as the supported fields are restrictive:
https://cloud.google.com/logging/docs/structured-logging

Instead most of the resource attributes should be passed as attributes which attaches them as log entry label.

@DValentiev DValentiev added bug Something isn't working needs triage New item requiring triage labels Feb 3, 2025
@github-actions github-actions bot added processor/resourcedetection Resource detection processor exporter/googlecloud processor/k8sattributes k8s Attributes processor labels Feb 3, 2025
Copy link
Contributor

github-actions bot commented Feb 3, 2025

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dashpole
Copy link
Contributor

dashpole commented Feb 3, 2025

I think this is because you are running the transform at the log context level. That means that those transforms are run for each log record. The issue is that when you process the first log record in the batch, it deletes all the resource attributes, so they don't exist when the next log record is processed. You can consider using the groupbyattributes processor (which is designed to do this), or delete the resource attributes in a subsequent processor to work around this issue.

@DValentiev
Copy link
Author

I think this is because you are running the transform at the log context level. That means that those transforms are run for each log record. The issue is that when you process the first log record in the batch, it deletes all the resource attributes, so they don't exist when the next log record is processed. You can consider using the groupbyattributes processor (which is designed to do this), or delete the resource attributes in a subsequent processor to work around this issue.

Thanks understood.
It is not very clear that resource.attributes spans multiple entries in transformprocessor.
I will reassign this.

@DValentiev
Copy link
Author

/label processor/transformprocessor -processor/k8sattributes -processor/resourcedetection

@dashpole dashpole added processor/transform Transform processor question Further information is requested and removed processor/k8sattributes k8s Attributes processor processor/resourcedetection Resource detection processor exporter/googlecloud needs triage New item requiring triage bug Something isn't working labels Feb 3, 2025
Copy link
Contributor

github-actions bot commented Feb 3, 2025

Pinging code owners for processor/transform: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley @edmocosta. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

@TylerHelmuth
Copy link
Member

@DValentiev is the goal to get the resource attributes onto every log? If so, you can do this:

    transform:
      error_mode: ignore
      log_statements:        
        # For every log in a resource, add the resource attributes to the log attributes
        - context: log
          statements:
            - merge_maps(attributes, resource.attributes, "upsert")
        # Now that every log in the payload has their resource's attributes, delete the resource attributes.
        - context: resource
          statements:
            - delete_matching_keys(attributes, ".*")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/transform Transform processor question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants