[exporter/elasticsearch] Duplicated data streams when using container name as index suffix #27590

mpostument · 2023-10-10T15:03:48Z

Component(s)

exporter/elasticsearch, receiver/filelog

Describe the issue you're reporting

Hello, i am using filelog receiver with elasticsearch exporter. In elasticsearch exporter i have enabled dynamic_indexes

      elasticsearch/logv2:
        logs_index: otel-logs-
        user: $ELASTIC_USER_V2
        password: $ELASTIC_PASSWORD_V2
        logs_dynamic_index:
           enabled: true

In filelog log i am using container name as elasticsearch index suffix

          - id: extract_metadata_from_filepath
            parse_from: attributes["log.file.path"]
            regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
            type: regex_parser
          - type: copy
            from: resource["k8s.container.name"]
            to: attributes["elasticsearch.index.suffix"]

Most of indexes in elasticsearch are fine. But some of the pods is having random id container name:
service-one-1696941203
service-one-1696942203
service-two-1696942203
service-two-1696941203
service-three-1696943203

And for those kind of pods i am getting separate data stream per pod. And within short amount in elastic hundreds of data streams created. How can i handle such cases using dynamic indexes?

github-actions · 2023-10-10T15:04:09Z

Pinging code owners:

exporter/elasticsearch: @JaredTan95
receiver/filelog: @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

ycombinator · 2023-11-27T14:55:51Z

I would like to take a look at this issue this week.

ycombinator · 2023-12-01T06:14:37Z

Hi @mpostument I'm starting to work on this issue and would like to clarify something:

Most of indexes in elasticsearch are fine. But some of the pods is having random id container name: service-one-1696941203 service-one-1696942203 service-two-1696942203 service-two-1696941203 service-three-1696943203

And for those kind of pods i am getting separate data stream per pod. And within short amount in elastic hundreds of data streams created. How can i handle such cases using dynamic indexes?

Looking at your configuration, the elasticsearch exporter seems to be behaving as expected. It's taking the value of attributes["elasticsearch.index.suffix"] and appending it to the value of the logs_index setting, which is otel-logs-. So it makes sense why you are ending up with several indices like otel-logs-service-one-1696941203, otel-logs-service-one-1696942203, otel-logs-service-two-1696942203, otel-logs-service-two-1696941203, otel-logs-service-three-1696943203, etc.

What would expect or like to happen instead? Would you like all the data to go into a single index or data stream? If yes, could you disable or not use the logs_dynamic_index setting?

Apologies if I'm missing something obvious here. I'm a new contributor so that's quite possible. :)

mpostument · 2023-12-04T15:58:09Z

@ycombinator yes, that's right. I would want to write logs to three indexes, otel-logs-service-one, otel-logs-service-two and otel-logs-service-three. And ignore those id. Right now i am doing this in filelog receiver config. But with time this list growing and i need to manage every individual service

            from: resource["k8s.container.name"]
            to: attributes["elasticsearch.index.suffix"]
          - type: add
            field: attributes["elasticsearch.index.suffix"]
            value: service-one
            if: 'attributes["elasticsearch.index.suffix"] matches "^service-one-\\d+(?:-[a-zA-Z0-9]+)*$"'
          - type: add
            field: attributes["elasticsearch.index.suffix"]
            value: service-two
            if: 'attributes["elasticsearch.index.suffix"] matches "^service-two-\\d+$"'
          - type: add
            field: attributes["elasticsearch.index.suffix"]
            value: service-three
            if: 'attributes["elasticsearch.index.suffix"] matches "^service-three-\\d+$"'

ycombinator · 2023-12-04T16:11:10Z

Thanks for the clarification, @mpostument, that helps.

Forgive me again if I'm misunderstanding something because I'm still pretty new to OTel, but could you parse out the service-XXXX part from resource["k8s.container.name"] using the regex_parser operator and then assign it to the elasticsearch.index.suffix attribute using the copy operator like so?

- id: extract_service_name
  type: regex_parser
  regex: (?P<service_name>service-\w+)
  parse_from: resource["k8s.container.name"]
- type: copy
  from: attributes["service_name"]
  to: attributes["elasticsearch.index.suffix"]

mpostument · 2023-12-05T08:05:02Z

Yes, but this is basically what i am doing right now. Just a bit differently. But in my example i am using service_one, tow and three is just an example. Services has random names, with multiple dashes and etc like super-awesome-service-218392183 and etc. I was not able to build regex which will cover all of them

ycombinator · 2023-12-05T17:12:49Z

I see. I'm not sure it is Elasticsearch or the Elasticsearch exporter's responsibility to understand semantics of container names. In other words, I still think parsing out the desired index suffix is outside the scope of Elasticsearch or the Elasticsearch exporter.

Services has random names, with multiple dashes and etc like super-awesome-service-218392183 and etc. I was not able to build regex which will cover all of them

It sounds to me like you want to extract just the service name from the container name and use that extracted service name as the Elasticsearch index suffix. If so, there has to be some pattern that can be used to separate the service name from the rest of the container name. Could you post a variety of container names? Perhaps I will be able to come up with a pattern to extract the service name from them.

JaredTan95 · 2023-12-06T02:57:22Z

In filelog log i am using container name as elasticsearch index suffix

This sounds a little scary, but let's say there are 100 deployments with 3 instances per deployment. Then your inde count will be at least 100 * 3, and as Pods are repeatedly created, your index count will be even more terrifying.

I recommend not bringing in frequently changing values (such as deployment/pod name) in the index.

mpostument · 2023-12-06T11:56:46Z

@JaredTan95 what can you suggest to use as index name?

Right now i have index per service. Even if i run pod as daemonset i still have one index per app. Here is my full config of filelog receiver

@ycombinator service names are in this config

receivers:
  filelog:
    exclude:
    - /var/log/pods/observability_observability-v2-otel-daemonset*_*/opentelemetry-collector-daemonset/*.log
    include:
    - /var/log/pods/*/*/*.log
    include_file_name: false
    include_file_path: true
    operators:
    - id: get-format
      routes:
      - expr: body matches "^\\{"
        output: parser-docker
      - expr: body matches "^[^ Z]+ "
        output: parser-crio
      - expr: body matches "^[^ Z]+Z"
        output: parser-containerd
      type: router
    - id: parser-crio
      regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
      timestamp:
        layout: 2006-01-02T15:04:05.999999999Z07:00
        layout_type: gotime
        parse_from: attributes.time
      type: regex_parser
    - combine_field: attributes.log
      combine_with: ""
      id: crio-recombine
      is_last_entry: attributes.logtag == 'F'
      output: extract_metadata_from_filepath
      source_identifier: attributes["log.file.path"]
      type: recombine
    - id: parser-containerd
      regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
      timestamp:
        layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        parse_from: attributes.time
      type: regex_parser
    - combine_field: attributes.log
      combine_with: ""
      id: containerd-recombine
      is_last_entry: attributes.logtag == 'F'
      output: extract_metadata_from_filepath
      source_identifier: attributes["log.file.path"]
      type: recombine
    - id: parser-docker
      output: extract_metadata_from_filepath
      timestamp:
        layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        parse_from: attributes.time
      type: json_parser
    - id: extract_metadata_from_filepath
      parse_from: attributes["log.file.path"]
      regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
      type: regex_parser
    - from: attributes.stream
      to: attributes["log.iostream"]
      type: move
    - from: attributes.container_name
      to: resource["k8s.container.name"]
      type: move
    - from: attributes.namespace
      to: resource["k8s.namespace.name"]
      type: move
    - from: attributes.pod_name
      to: resource["k8s.pod.name"]
      type: move
    - from: attributes.restart_count
      to: resource["k8s.container.restart_count"]
      type: move
    - from: attributes.uid
      to: resource["k8s.pod.uid"]
      type: move
    - from: attributes.log
      to: body
      type: move
    - parse_from: body
      parse_to: attributes
      type: json_parser
    - from: resource["k8s.container.name"]
      to: attributes["elasticsearch.index.suffix"]
      type: copy
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "pr-job-[0-9a-f]+"
      type: add
      value: pr-job
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^build-service-cnt-\\d+(?:-[a-zA-Z0-9]+)*$"
      type: add
      value: build-service-cnt
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-static-file-migration-job-\\d+$"
      type: add
      value: luzok-worker-static-file-migration-job
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-job-migration-db-\\d+$"
      type: add
      value: luzok-worker-job-migration-db
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-job-migration-static-files-\\d+$"
      type: add
      value: luzok-worker-job-migration-static-files
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-migration-job-db-\\d+$"
      type: add
      value: luzok-worker-job-migration-db
    start_at: beginning
    storage: file_storage

JaredTan95 · 2023-12-06T13:52:34Z

Index names should be trade-offs with your data volume and business. There is no absolute advice.

In our scenario, we store the logs of multiple k8s clusters in a unified elasticsearch cluster, so it is acceptable for us to include the k8s cluster id in the index name(otlp-logs-{k8s-cluster-id}). As a reference only.

mpostument · 2023-12-07T12:36:04Z

I tried similar approach, but in time we have data dropped because of we reach our limit in fields mapping. That's why i started to use index per app approach

github-actions · 2024-02-06T03:29:17Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/filelog: @djaglowski
exporter/elasticsearch: @JaredTan95

See Adding Labels via Comments if you do not have permissions to add labels yourself.

djaglowski · 2024-02-12T16:28:41Z

I am removing the filelog label since to me this does not appear to be a problem with the receiver but more a question of how to properly configure for this use case. Please feel free to tell me I'm wrong if I've missed something.

github-actions · 2024-04-15T04:07:12Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/elasticsearch: @JaredTan95 @ycombinator

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-07-17T03:31:03Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/elasticsearch: @JaredTan95 @ycombinator @carsonip

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-09-15T05:19:51Z

This issue has been closed as inactive because it has been stale for 120 days with no activity.

mpostument added the needs triage New item requiring triage label Oct 10, 2023

github-actions bot added exporter/elasticsearch receiver/filelog labels Oct 10, 2023

mpostument changed the title ] [exporter/elasticsearch] Duplicated data streams when using container name as index suffix Oct 10, 2023

This was referenced Oct 17, 2023

Weekly Report: 2023-10-10 - 2023-10-17 #27791

Closed

Weekly Report: 2023-10-17 - 2023-10-24 #28557

Closed

github-actions bot mentioned this issue Oct 31, 2023

Weekly Report: 2023-10-24 - 2023-10-31 #28813

Closed

This was referenced Nov 7, 2023

Weekly Report: 2023-10-31 - 2023-11-07 #29000

Closed

Weekly Report: 2023-11-07 - 2023-11-14 #29245

Closed

github-actions bot mentioned this issue Nov 21, 2023

Weekly Report: 2023-11-14 - 2023-11-21 #29422

Closed

JaredTan95 removed the needs triage New item requiring triage label Nov 27, 2023

JaredTan95 assigned ycombinator Nov 27, 2023

JaredTan95 added the question Further information is requested label Dec 6, 2023

github-actions bot added the Stale label Feb 6, 2024

djaglowski removed Stale receiver/filelog labels Feb 12, 2024

github-actions bot added the Stale label Apr 15, 2024

ycombinator removed their assignment May 16, 2024

github-actions bot removed the Stale label May 17, 2024

github-actions bot added the Stale label Jul 17, 2024

github-actions bot added the closed as inactive label Sep 15, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[exporter/elasticsearch] Duplicated data streams when using container name as index suffix #27590

[exporter/elasticsearch] Duplicated data streams when using container name as index suffix #27590

mpostument commented Oct 10, 2023

github-actions bot commented Oct 10, 2023

ycombinator commented Nov 27, 2023

ycombinator commented Dec 1, 2023

mpostument commented Dec 4, 2023

ycombinator commented Dec 4, 2023 •

edited

Loading

mpostument commented Dec 5, 2023

ycombinator commented Dec 5, 2023

JaredTan95 commented Dec 6, 2023

mpostument commented Dec 6, 2023

JaredTan95 commented Dec 6, 2023

mpostument commented Dec 7, 2023

github-actions bot commented Feb 6, 2024

djaglowski commented Feb 12, 2024

github-actions bot commented Apr 15, 2024

github-actions bot commented Jul 17, 2024

github-actions bot commented Sep 15, 2024

[exporter/elasticsearch] Duplicated data streams when using container name as index suffix #27590

[exporter/elasticsearch] Duplicated data streams when using container name as index suffix #27590

Comments

mpostument commented Oct 10, 2023

Component(s)

Describe the issue you're reporting

github-actions bot commented Oct 10, 2023

ycombinator commented Nov 27, 2023

ycombinator commented Dec 1, 2023

mpostument commented Dec 4, 2023

ycombinator commented Dec 4, 2023 • edited Loading

mpostument commented Dec 5, 2023

ycombinator commented Dec 5, 2023

JaredTan95 commented Dec 6, 2023

mpostument commented Dec 6, 2023

JaredTan95 commented Dec 6, 2023

mpostument commented Dec 7, 2023

github-actions bot commented Feb 6, 2024

djaglowski commented Feb 12, 2024

github-actions bot commented Apr 15, 2024

github-actions bot commented Jul 17, 2024

github-actions bot commented Sep 15, 2024

ycombinator commented Dec 4, 2023 •

edited

Loading