-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/elasticsearch] Duplicated data streams when using container name as index suffix #27590
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I would like to take a look at this issue this week. |
Hi @mpostument I'm starting to work on this issue and would like to clarify something:
Looking at your configuration, the elasticsearch exporter seems to be behaving as expected. It's taking the value of What would expect or like to happen instead? Would you like all the data to go into a single index or data stream? If yes, could you disable or not use the Apologies if I'm missing something obvious here. I'm a new contributor so that's quite possible. :) |
@ycombinator yes, that's right. I would want to write logs to three indexes, otel-logs-service-one, otel-logs-service-two and otel-logs-service-three. And ignore those id. Right now i am doing this in filelog receiver config. But with time this list growing and i need to manage every individual service from: resource["k8s.container.name"]
to: attributes["elasticsearch.index.suffix"]
- type: add
field: attributes["elasticsearch.index.suffix"]
value: service-one
if: 'attributes["elasticsearch.index.suffix"] matches "^service-one-\\d+(?:-[a-zA-Z0-9]+)*$"'
- type: add
field: attributes["elasticsearch.index.suffix"]
value: service-two
if: 'attributes["elasticsearch.index.suffix"] matches "^service-two-\\d+$"'
- type: add
field: attributes["elasticsearch.index.suffix"]
value: service-three
if: 'attributes["elasticsearch.index.suffix"] matches "^service-three-\\d+$"' |
Thanks for the clarification, @mpostument, that helps. Forgive me again if I'm misunderstanding something because I'm still pretty new to OTel, but could you parse out the - id: extract_service_name
type: regex_parser
regex: (?P<service_name>service-\w+)
parse_from: resource["k8s.container.name"]
- type: copy
from: attributes["service_name"]
to: attributes["elasticsearch.index.suffix"] |
Yes, but this is basically what i am doing right now. Just a bit differently. But in my example i am using service_one, tow and three is just an example. Services has random names, with multiple dashes and etc like |
I see. I'm not sure it is Elasticsearch or the Elasticsearch exporter's responsibility to understand semantics of container names. In other words, I still think parsing out the desired index suffix is outside the scope of Elasticsearch or the Elasticsearch exporter.
It sounds to me like you want to extract just the service name from the container name and use that extracted service name as the Elasticsearch index suffix. If so, there has to be some pattern that can be used to separate the service name from the rest of the container name. Could you post a variety of container names? Perhaps I will be able to come up with a pattern to extract the service name from them. |
This sounds a little scary, but let's say there are 100 deployments with 3 instances per deployment. Then your inde count will be at least 100 * 3, and as Pods are repeatedly created, your index count will be even more terrifying. I recommend not bringing in frequently changing values (such as deployment/pod name) in the index. |
@JaredTan95 what can you suggest to use as index name? Right now i have index per service. Even if i run pod as daemonset i still have one index per app. Here is my full config of filelog receiver @ycombinator service names are in this config receivers:
filelog:
exclude:
- /var/log/pods/observability_observability-v2-otel-daemonset*_*/opentelemetry-collector-daemonset/*.log
include:
- /var/log/pods/*/*/*.log
include_file_name: false
include_file_path: true
operators:
- id: get-format
routes:
- expr: body matches "^\\{"
output: parser-docker
- expr: body matches "^[^ Z]+ "
output: parser-crio
- expr: body matches "^[^ Z]+Z"
output: parser-containerd
type: router
- id: parser-crio
regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
timestamp:
layout: 2006-01-02T15:04:05.999999999Z07:00
layout_type: gotime
parse_from: attributes.time
type: regex_parser
- combine_field: attributes.log
combine_with: ""
id: crio-recombine
is_last_entry: attributes.logtag == 'F'
output: extract_metadata_from_filepath
source_identifier: attributes["log.file.path"]
type: recombine
- id: parser-containerd
regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
timestamp:
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
parse_from: attributes.time
type: regex_parser
- combine_field: attributes.log
combine_with: ""
id: containerd-recombine
is_last_entry: attributes.logtag == 'F'
output: extract_metadata_from_filepath
source_identifier: attributes["log.file.path"]
type: recombine
- id: parser-docker
output: extract_metadata_from_filepath
timestamp:
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
parse_from: attributes.time
type: json_parser
- id: extract_metadata_from_filepath
parse_from: attributes["log.file.path"]
regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
type: regex_parser
- from: attributes.stream
to: attributes["log.iostream"]
type: move
- from: attributes.container_name
to: resource["k8s.container.name"]
type: move
- from: attributes.namespace
to: resource["k8s.namespace.name"]
type: move
- from: attributes.pod_name
to: resource["k8s.pod.name"]
type: move
- from: attributes.restart_count
to: resource["k8s.container.restart_count"]
type: move
- from: attributes.uid
to: resource["k8s.pod.uid"]
type: move
- from: attributes.log
to: body
type: move
- parse_from: body
parse_to: attributes
type: json_parser
- from: resource["k8s.container.name"]
to: attributes["elasticsearch.index.suffix"]
type: copy
- field: attributes["elasticsearch.index.suffix"]
if: attributes["elasticsearch.index.suffix"] matches "pr-job-[0-9a-f]+"
type: add
value: pr-job
- field: attributes["elasticsearch.index.suffix"]
if: attributes["elasticsearch.index.suffix"] matches "^build-service-cnt-\\d+(?:-[a-zA-Z0-9]+)*$"
type: add
value: build-service-cnt
- field: attributes["elasticsearch.index.suffix"]
if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-static-file-migration-job-\\d+$"
type: add
value: luzok-worker-static-file-migration-job
- field: attributes["elasticsearch.index.suffix"]
if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-job-migration-db-\\d+$"
type: add
value: luzok-worker-job-migration-db
- field: attributes["elasticsearch.index.suffix"]
if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-job-migration-static-files-\\d+$"
type: add
value: luzok-worker-job-migration-static-files
- field: attributes["elasticsearch.index.suffix"]
if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-migration-job-db-\\d+$"
type: add
value: luzok-worker-job-migration-db
start_at: beginning
storage: file_storage |
Index names should be trade-offs with your data volume and business. There is no absolute advice. In our scenario, we store the logs of multiple k8s clusters in a unified elasticsearch cluster, so it is acceptable for us to include the k8s cluster id in the index name( |
I tried similar approach, but in time we have data dropped because of we reach our limit in fields mapping. That's why i started to use index per app approach |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I am removing the filelog label since to me this does not appear to be a problem with the receiver but more a question of how to properly configure for this use case. Please feel free to tell me I'm wrong if I've missed something. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Component(s)
exporter/elasticsearch, receiver/filelog
Describe the issue you're reporting
Hello, i am using filelog receiver with elasticsearch exporter. In elasticsearch exporter i have enabled dynamic_indexes
In filelog log i am using container name as elasticsearch index suffix
Most of indexes in elasticsearch are fine. But some of the pods is having random id container name:
service-one-1696941203
service-one-1696942203
service-two-1696942203
service-two-1696941203
service-three-1696943203
And for those kind of pods i am getting separate data stream per pod. And within short amount in elastic hundreds of data streams created. How can i handle such cases using dynamic indexes?
The text was updated successfully, but these errors were encountered: