Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/elasticsearch] Duplicated data streams when using container name as index suffix #27590

Closed
mpostument opened this issue Oct 10, 2023 · 16 comments

Comments

@mpostument
Copy link

Component(s)

exporter/elasticsearch, receiver/filelog

Describe the issue you're reporting

Hello, i am using filelog receiver with elasticsearch exporter. In elasticsearch exporter i have enabled dynamic_indexes

      elasticsearch/logv2:
        logs_index: otel-logs-
        user: $ELASTIC_USER_V2
        password: $ELASTIC_PASSWORD_V2
        logs_dynamic_index:
           enabled: true

In filelog log i am using container name as elasticsearch index suffix

          - id: extract_metadata_from_filepath
            parse_from: attributes["log.file.path"]
            regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
            type: regex_parser
          - type: copy
            from: resource["k8s.container.name"]
            to: attributes["elasticsearch.index.suffix"]

Most of indexes in elasticsearch are fine. But some of the pods is having random id container name:
service-one-1696941203
service-one-1696942203
service-two-1696942203
service-two-1696941203
service-three-1696943203

And for those kind of pods i am getting separate data stream per pod. And within short amount in elastic hundreds of data streams created. How can i handle such cases using dynamic indexes?

@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@mpostument mpostument changed the title ] [exporter/elasticsearch] Duplicated data streams when using container name as index suffix Oct 10, 2023
@ycombinator
Copy link
Contributor

I would like to take a look at this issue this week.

@JaredTan95 JaredTan95 removed the needs triage New item requiring triage label Nov 27, 2023
@ycombinator
Copy link
Contributor

Hi @mpostument I'm starting to work on this issue and would like to clarify something:

Most of indexes in elasticsearch are fine. But some of the pods is having random id container name: service-one-1696941203 service-one-1696942203 service-two-1696942203 service-two-1696941203 service-three-1696943203

And for those kind of pods i am getting separate data stream per pod. And within short amount in elastic hundreds of data streams created. How can i handle such cases using dynamic indexes?

Looking at your configuration, the elasticsearch exporter seems to be behaving as expected. It's taking the value of attributes["elasticsearch.index.suffix"] and appending it to the value of the logs_index setting, which is otel-logs-. So it makes sense why you are ending up with several indices like otel-logs-service-one-1696941203, otel-logs-service-one-1696942203, otel-logs-service-two-1696942203, otel-logs-service-two-1696941203, otel-logs-service-three-1696943203, etc.

What would expect or like to happen instead? Would you like all the data to go into a single index or data stream? If yes, could you disable or not use the logs_dynamic_index setting?

Apologies if I'm missing something obvious here. I'm a new contributor so that's quite possible. :)

@mpostument
Copy link
Author

@ycombinator yes, that's right. I would want to write logs to three indexes, otel-logs-service-one, otel-logs-service-two and otel-logs-service-three. And ignore those id. Right now i am doing this in filelog receiver config. But with time this list growing and i need to manage every individual service

            from: resource["k8s.container.name"]
            to: attributes["elasticsearch.index.suffix"]
          - type: add
            field: attributes["elasticsearch.index.suffix"]
            value: service-one
            if: 'attributes["elasticsearch.index.suffix"] matches "^service-one-\\d+(?:-[a-zA-Z0-9]+)*$"'
          - type: add
            field: attributes["elasticsearch.index.suffix"]
            value: service-two
            if: 'attributes["elasticsearch.index.suffix"] matches "^service-two-\\d+$"'
          - type: add
            field: attributes["elasticsearch.index.suffix"]
            value: service-three
            if: 'attributes["elasticsearch.index.suffix"] matches "^service-three-\\d+$"'

@ycombinator
Copy link
Contributor

ycombinator commented Dec 4, 2023

Thanks for the clarification, @mpostument, that helps.

Forgive me again if I'm misunderstanding something because I'm still pretty new to OTel, but could you parse out the service-XXXX part from resource["k8s.container.name"] using the regex_parser operator and then assign it to the elasticsearch.index.suffix attribute using the copy operator like so?

- id: extract_service_name
  type: regex_parser
  regex: (?P<service_name>service-\w+)
  parse_from: resource["k8s.container.name"]
- type: copy
  from: attributes["service_name"]
  to: attributes["elasticsearch.index.suffix"]

@mpostument
Copy link
Author

Yes, but this is basically what i am doing right now. Just a bit differently. But in my example i am using service_one, tow and three is just an example. Services has random names, with multiple dashes and etc like super-awesome-service-218392183 and etc. I was not able to build regex which will cover all of them

@ycombinator
Copy link
Contributor

I see. I'm not sure it is Elasticsearch or the Elasticsearch exporter's responsibility to understand semantics of container names. In other words, I still think parsing out the desired index suffix is outside the scope of Elasticsearch or the Elasticsearch exporter.

Services has random names, with multiple dashes and etc like super-awesome-service-218392183 and etc. I was not able to build regex which will cover all of them

It sounds to me like you want to extract just the service name from the container name and use that extracted service name as the Elasticsearch index suffix. If so, there has to be some pattern that can be used to separate the service name from the rest of the container name. Could you post a variety of container names? Perhaps I will be able to come up with a pattern to extract the service name from them.

@JaredTan95
Copy link
Member

In filelog log i am using container name as elasticsearch index suffix

This sounds a little scary, but let's say there are 100 deployments with 3 instances per deployment. Then your inde count will be at least 100 * 3, and as Pods are repeatedly created, your index count will be even more terrifying.

I recommend not bringing in frequently changing values (such as deployment/pod name) in the index.

@JaredTan95 JaredTan95 added the question Further information is requested label Dec 6, 2023
@mpostument
Copy link
Author

@JaredTan95 what can you suggest to use as index name?

Right now i have index per service. Even if i run pod as daemonset i still have one index per app. Here is my full config of filelog receiver

@ycombinator service names are in this config

receivers:
  filelog:
    exclude:
    - /var/log/pods/observability_observability-v2-otel-daemonset*_*/opentelemetry-collector-daemonset/*.log
    include:
    - /var/log/pods/*/*/*.log
    include_file_name: false
    include_file_path: true
    operators:
    - id: get-format
      routes:
      - expr: body matches "^\\{"
        output: parser-docker
      - expr: body matches "^[^ Z]+ "
        output: parser-crio
      - expr: body matches "^[^ Z]+Z"
        output: parser-containerd
      type: router
    - id: parser-crio
      regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
      timestamp:
        layout: 2006-01-02T15:04:05.999999999Z07:00
        layout_type: gotime
        parse_from: attributes.time
      type: regex_parser
    - combine_field: attributes.log
      combine_with: ""
      id: crio-recombine
      is_last_entry: attributes.logtag == 'F'
      output: extract_metadata_from_filepath
      source_identifier: attributes["log.file.path"]
      type: recombine
    - id: parser-containerd
      regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
      timestamp:
        layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        parse_from: attributes.time
      type: regex_parser
    - combine_field: attributes.log
      combine_with: ""
      id: containerd-recombine
      is_last_entry: attributes.logtag == 'F'
      output: extract_metadata_from_filepath
      source_identifier: attributes["log.file.path"]
      type: recombine
    - id: parser-docker
      output: extract_metadata_from_filepath
      timestamp:
        layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        parse_from: attributes.time
      type: json_parser
    - id: extract_metadata_from_filepath
      parse_from: attributes["log.file.path"]
      regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
      type: regex_parser
    - from: attributes.stream
      to: attributes["log.iostream"]
      type: move
    - from: attributes.container_name
      to: resource["k8s.container.name"]
      type: move
    - from: attributes.namespace
      to: resource["k8s.namespace.name"]
      type: move
    - from: attributes.pod_name
      to: resource["k8s.pod.name"]
      type: move
    - from: attributes.restart_count
      to: resource["k8s.container.restart_count"]
      type: move
    - from: attributes.uid
      to: resource["k8s.pod.uid"]
      type: move
    - from: attributes.log
      to: body
      type: move
    - parse_from: body
      parse_to: attributes
      type: json_parser
    - from: resource["k8s.container.name"]
      to: attributes["elasticsearch.index.suffix"]
      type: copy
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "pr-job-[0-9a-f]+"
      type: add
      value: pr-job
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^build-service-cnt-\\d+(?:-[a-zA-Z0-9]+)*$"
      type: add
      value: build-service-cnt
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-static-file-migration-job-\\d+$"
      type: add
      value: luzok-worker-static-file-migration-job
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-job-migration-db-\\d+$"
      type: add
      value: luzok-worker-job-migration-db
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-job-migration-static-files-\\d+$"
      type: add
      value: luzok-worker-job-migration-static-files
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-migration-job-db-\\d+$"
      type: add
      value: luzok-worker-job-migration-db
    start_at: beginning
    storage: file_storage

@JaredTan95
Copy link
Member

Index names should be trade-offs with your data volume and business. There is no absolute advice.

In our scenario, we store the logs of multiple k8s clusters in a unified elasticsearch cluster, so it is acceptable for us to include the k8s cluster id in the index name(otlp-logs-{k8s-cluster-id}). As a reference only.

@mpostument
Copy link
Author

I tried similar approach, but in time we have data dropped because of we reach our limit in fields mapping. That's why i started to use index per app approach

Copy link
Contributor

github-actions bot commented Feb 6, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Feb 6, 2024
@djaglowski
Copy link
Member

I am removing the filelog label since to me this does not appear to be a problem with the receiver but more a question of how to properly configure for this use case. Please feel free to tell me I'm wrong if I've missed something.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Apr 15, 2024
@ycombinator ycombinator removed their assignment May 16, 2024
@github-actions github-actions bot removed the Stale label May 17, 2024
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 17, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants