Skip to content

[pkg/ottl] setting higher contexts in OTTL can result in unexpected transformations #32080

Open
@mut3

Description

Component(s)

pkg/ottl, pkg/stanza, processor/transform, receiver/filelog

What happened?

Description

We have a pipeline that uses a filelog input with some operators that set certain fields in attributes. We then have a transform processor that reads those fields to set() some Resource.attributes. The values that get set in the resource attributes of the log appear to be from a different log as if the transform is reading from one log and writing another.

Steps to Reproduce

I do not currently have a MVC for this issue, but I will include a stripped and sanitized config that contains the core logic/entities around the bug we are seeing.

Configuration
receivers:
      filelog:
        start_at: beginning
        operators:
          # route and parse log based on formatting
          - id: get-format
            type: router
            routes:
              - expr: body matches "^\\{"
                output: parser-docker
              - expr: body matches "^[^ Z]+ "
                output: parser-crio
              - expr: body matches "^[^ Z]+Z"
                output: parser-containerd
          - id: parser-crio
            type: regex_parser
            regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
            timestamp:
              layout: 2006-01-02T15:04:05.999999999Z07:00
              layout_type: gotime
              parse_from: attributes.time
          - id: crio-recombine
            type: recombine
            combine_field: attributes.log
            combine_with: ""
            is_last_entry: attributes.logtag == 'F'
            output: extract_metadata_from_filepath
            source_identifier: attributes["log.file.path"]
          - id: parser-containerd
            type: regex_parser
            regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
            timestamp:
              layout: "%Y-%m-%dT%H:%M:%S.%LZ"
              parse_from: attributes.time
          - id: containerd-recombine
            type: recombine
            combine_field: attributes.log
            combine_with: ""
            is_last_entry: attributes.logtag == 'F'
            output: extract_metadata_from_filepath
            source_identifier: attributes["log.file.path"]
          - id: parser-docker
            type: json_parser
            output: extract_metadata_from_filepath
            timestamp:
              layout: "%Y-%m-%dT%H:%M:%S.%LZ"
              parse_from: attributes.time
          - id: extract_metadata_from_filepath
            parse_from: attributes["log.file.path"]
            regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
            type: regex_parser
          # move log to body, parse to attributes
          - type: move
            from: attributes.log
            to: body
          - type: json_parser
            parse_from: body
            parse_to: attributes

processors:
      transform/attribute_moves:
        error_mode: ignore
        log_statements:
          - context: log
            statements:
              - set(attributes["log.iostream"], attributes["stream"])
              - set(resource.attributes["k8s.container.name"], attributes["container_name"])
              - set(resource.attributes["k8s.namespace.name"], attributes["namespace"])
              - set(resource.attributes["k8s.pod.name"], attributes["pod_name"])
              - set(resource.attributes["k8s.container.restart_count"], attributes["restart_count"])
              - set(resource.attributes["k8s.pod.uid"], attributes["uid"])

Expected Result

Logs where attributes["pod_name"] == resource.attributes["k8s.pod.name"] and resource.attributes["k8s.namespace.name"] == attributes["namespace"]

This is the case for most of the logs emitted from our pipeline

Actual Result

For some small percentage of our logs, the values of

  • resource.attributes["k8s.container.name"]
  • resource.attributes["k8s.namespace.name"]
  • resource.attributes["k8s.pod.name"]
  • resource.attributes["k8s.container.restart_count"]
  • resource.attributes["k8s.pod.uid"]

Do not match their attribute counterparts, but instead appear to come from a different log event. The fields appear to be internally consistent, as if they all came from the same different log event. For example if resource.attributes["k8s.pod.name"] is wrong and holds the name of some other pod, the resource.attributes["k8s.namespace.name"] will have the namespace of that other pod.

Here is a sanitized example log json pulled from our elasticsearch

Sanitized incorrect log
{
  "_index": ".ds-otel-logs-stg-my-app-2024.03.07-000022",
  "_id": "Pz1km44B8ceE1n7Gbbpq",
  "_version": 1,
  "_score": 0,
  "_source": {
    "@timestamp": "2024-04-01T20:40:10.983479227Z",
    "Attributes": {
      "OTEL_collector_type": "daemonset",
      "canonical": {
        "owner": {
          "team_name": "unknown"
        }
      },
      "cloud": "aws",
      "cluster": "ci-stg",
      "container_name": "cilium-agent",
      "elasticsearch": {
        "index": {
          "suffix": "my-app"
        }
      },
      "environment": "stg",
      "log": {
        "file": {
          "path": "/var/log/pods/kube-system_cilium-hlkmw_fb7fae7c-d3ac-4308-885d-5d1426c96bcc/cilium-agent/0.log"
        },
        "iostream": "stderr"
      },
      "logtag": "F",
      "namespace": "kube-system",
      "pod_name": "cilium-hlkmw",
      "restart_count": "0",
      "stream": "stderr",
      "time": "2024-04-01T20:40:10.983479227Z",
      "uid": "fb7fae7c-d3ac-4308-885d-5d1426c96bcc"
    },
    "Body": "level=info msg=\"Delete endpoint request\" containerID=593b7f5fe7 subsys=daemon",
    "Resource": {
      "k8s": {
        "container": {
          "name": "my-app",
          "restart_count": "0"
        },
        "namespace": {
          "name": "my-app-namespace"
        },
        "pod": {
          "name": "my-app-65b4cf8769-bb66g",
          "uid": "33b38675-15a7-4f64-a3f7-f38a97455ce6"
        }
      }
    },
    "Scope": {
      "name": "",
      "version": ""
    },
    "SeverityNumber": 0,
    "TraceFlags": 0
  },
  "fields": {
    "Attributes.uid": [
      "fb7fae7c-d3ac-4308-885d-5d1426c96bcc"
    ],
    "Attributes.stream": [
      "stderr"
    ],
    "Attributes.environment": [
      "stg"
    ],
    "Attributes.canonical.owner.team_name": [
      "unknown"
    ],
    "Resource.k8s.container.restart_count": [
      "0"
    ],
    "Attributes.log.iostream": [
      "stderr"
    ],
    "Attributes.restart_count": [
      "0"
    ],
    "Attributes.pod_name": [
      "cilium-hlkmw"
    ],
    "Resource.k8s.namespace.name": [
      "my-app-namespace"
    ],
    "TraceFlags": [
      0
    ],
    "Resource.k8s.pod.name": [
      "my-app-65b4cf8769-bb66g"
    ],
    "Scope.name": [
      ""
    ],
    "Attributes.logtag": [
      "F"
    ],
    "Resource.k8s.pod.uid": [
      "33b38675-15a7-4f64-a3f7-f38a97455ce6"
    ],
    "Attributes.elasticsearch.index.suffix": [
      "my-app"
    ],
    "Attributes.cloud": [
      "aws"
    ],
    "SeverityNumber": [
      0
    ],
    "Attributes.log.file.path": [
      "/var/log/pods/kube-system_cilium-hlkmw_fb7fae7c-d3ac-4308-885d-5d1426c96bcc/cilium-agent/0.log"
    ],
    "Body": [
      "level=info msg=\"Delete endpoint request\" containerID=593b7f5fe7 subsys=daemon"
    ],
    "Attributes.time": [
      "2024-04-01T20:40:10.983Z"
    ],
    "Attributes.namespace": [
      "kube-system"
    ],
    "Resource.k8s.container.name": [
      "my-app"
    ],
    "@timestamp": [
      "2024-04-01T20:40:10.983Z"
    ],
    "Attributes.container_name": [
      "cilium-agent"
    ]
  }
}

Collector version

v0.96.0

Environment information

Environment

OS: Amazon Linux 2023 (Amazon's v1.25.16-eks ami)
Compiler(if manually compiled): Collector Helm Chart v0.84.0

OpenTelemetry Collector configuration

See Steps to Reproduce

Log output

No response

Additional context

We never experience this issue with these same moves to resources.attributes done via operators on the filelog receiver:

All operator solution
receivers:
      filelog:
        start_at: beginning
        operators:
          - id: get-format
            routes:
              - expr: body matches "^\\{"
                output: parser-docker
              - expr: body matches "^[^ Z]+ "
                output: parser-crio
              - expr: body matches "^[^ Z]+Z"
                output: parser-containerd
            type: router
          - id: parser-crio
            regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
            timestamp:
              layout: 2006-01-02T15:04:05.999999999Z07:00
              layout_type: gotime
              parse_from: attributes.time
            type: regex_parser
          - combine_field: attributes.log
            combine_with: ""
            id: crio-recombine
            is_last_entry: attributes.logtag == 'F'
            output: extract_metadata_from_filepath
            source_identifier: attributes["log.file.path"]
            type: recombine
          - id: parser-containerd
            regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
            timestamp:
              layout: "%Y-%m-%dT%H:%M:%S.%LZ"
              parse_from: attributes.time
            type: regex_parser
          - combine_field: attributes.log
            combine_with: ""
            id: containerd-recombine
            is_last_entry: attributes.logtag == 'F'
            output: extract_metadata_from_filepath
            source_identifier: attributes["log.file.path"]
            type: recombine
          - id: parser-docker
            output: extract_metadata_from_filepath
            timestamp:
              layout: "%Y-%m-%dT%H:%M:%S.%LZ"
              parse_from: attributes.time
            type: json_parser
          - id: extract_metadata_from_filepath
            parse_from: attributes["log.file.path"]
            regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
            type: regex_parser
          - from: attributes.stream
            to: attributes["log.iostream"]
            type: move
          - from: attributes.container_name
            to: resource["k8s.container.name"]
            type: move
          - from: attributes.namespace
            to: resource["k8s.namespace.name"]
            type: move
          - from: attributes.pod_name
            to: resource["k8s.pod.name"]
            type: move
          - from: attributes.restart_count
            to: resource["k8s.container.restart_count"]
            type: move
          - from: attributes.uid
            to: resource["k8s.pod.uid"]
            type: move
          - from: attributes.log
            to: body
            type: move
          - type: json_parser
            parse_from: body
            parse_to: attributes

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdiscussion neededCommunity discussion needednever staleIssues marked with this label will be never staled and automatically removedpkg/ottlprocessor/transformTransform processorroadmappingIssue describes several feature requests for a topic

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions