Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse add_metadata_from_filepath #33912

Closed
zeitlinger opened this issue Jul 4, 2024 · 16 comments
Closed

Reuse add_metadata_from_filepath #33912

zeitlinger opened this issue Jul 4, 2024 · 16 comments

Comments

@zeitlinger
Copy link
Member

zeitlinger commented Jul 4, 2024

Component(s)

pkg/stanza, receiver/filelog

Is your feature request related to a problem? Please describe.

add_metadata_from_filepath is tied to the container operator.

Describe the solution you'd like

Would it make sense to move this feature to file consumer, so that it can be reused, e.g. in OTLP JSON File Receiver?

Describe alternatives you've considered

Could be done manually - if operators were supported for OTLP JSON File Receiver:

otlpjsonfile:
        include:
          - /var/log/pods/*/anti-fraud/*.log
        include_file_path: true
        operators:
          - id: extract_metadata_from_filepath
            type: regex_parser
            cache:
              size: 128
            parse_from: attributes["log.file.path"]
            regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
          - from: attributes.restart_count
            to: resource["k8s.container.restart_count"]
            type: move
          - from: attributes.uid
            to: resource["k8s.pod.uid"]
            type: move
          - from: attributes.container_name
            to: resource["k8s.container.name"]
            type: move
          - from: attributes.namespace
            to: resource["k8s.namespace.name"]
            type: move
          - from: attributes.pod_name
            to: resource["k8s.pod.name"]
            type: move

Additional context

No response

@zeitlinger zeitlinger added enhancement New feature or request needs triage New item requiring triage labels Jul 4, 2024
Copy link
Contributor

github-actions bot commented Jul 4, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@ChrsMark
Copy link
Member

ChrsMark commented Jul 5, 2024

add_metadata_from_filepath is tied to the container operator for the reason that it's specific for the target technology: container logs from known runtimes which are written to known log file paths (usually that's the case but that's not guaranteed, that's why this is an optional setting which can be disabled).

So generalizing this, as is, for the entire file consumer sounds a bit weird unless we explicitly name it to something like add_metadata_from_pod_filepath. However, I'm not sure if that's a good practice since that might open the door for "packaging" into code patterns that can be supported through configuration. Do we want to potentially support (and encode) more "patterns" in the future? And if so what would be the acceptance criteria for this? More or less what was mentioned at #31959 (comment). Happy to hear others' opinions though.

Btw, how the log entries look like after they are parsed? I guess they are not wrapped in a container format, right?

@zeitlinger
Copy link
Member Author

How would you feel about adding add_metadata_from_pod_filepath to OTLP JSON File Receiver?

@zeitlinger
Copy link
Member Author

Btw, how the log entries look like after they are parsed? I guess they are not wrapped in a container format, right?

you mean for OTLP JSON File Receiver? There, they are wrapped in a container format - I've created #33846 to deal with that

@djaglowski
Copy link
Member

djaglowski commented Jul 8, 2024

How about a pod_path_parser which, given a field (e.g. parse_from: attributes["log.file.path"]) and optionally a type (e.g. pod_type: docker) will parse values according to the expected format?

@zeitlinger
Copy link
Member Author

How about a pod_path_parser which, given a field (e.g. parse_from: attributes["log.file.path"]) and optionally a type (e.g. pod_type: docker) will parse values according to the expected format?

sound good 😄

what is pod type and what are possible values?

@djaglowski
Copy link
Member

what is pod type and what are possible values?

Sorry, not the correct name. Maybe container_type or format is better?

I was thinking it would behave the same as the container parser's format setting, which accepts docker, crio, containerd, or if left blank will automatically detect the format (auto-detection is less efficient due to the necessary regex matching).

@ChrsMark
Copy link
Member

ChrsMark commented Jul 9, 2024

I like the idea of the path parser. I think that with some refactoring we could re-use what the container operator implements so as we provide the same functionality as part of the container operator along with the standalone operator as well. Similarly to what we do with the recombine operator.

Regarding scope/naming: the container operator can potentially handle docker logs written on the disk regardless of them being orchestrated by K8s or not. So it's not 100% tied to Pods.
In this, I believe the path parser should also focus on container level as well and be called container_path_parser or sth similar.

Then as far as the config option is concerned, I don't think that the format or the runtime really fits here. We can have the 3 different runtimes/formats logging at /var/log/pod/* in K8s envs (usually defined by the CRI). That's actually the case that the container operator embeds.
Hence I guess we should distinguish cases based on the environment/log_file_path_type/managed_by:

  1. K8s/CRI -> /var/log/pods/*/anti-fraud/*.log
  2. Docker -> /var/lib/docker/containers/08d1c419f017398a6587fcc858b75ac09875e07213964a84c2e351012a4899f8/08d1c419f017398a6587fcc858b75ac09875e07213964a84c2e351012a4899f8-json.log (which includes the container.id as it comes from the runtime)
  3. Any other case that might occur in the future?

As it was already mentioned the type can be detected in auto mode or defined directly.

I don't have any strong preference on the naming here, I would just suggest type: {k8s-cri, docker}.

@zeitlinger
Copy link
Member Author

great

I tried to add the functionality to

func (r *Resolver) Resolve(file *os.File) (attributes map[string]any, err error) {
- but I can't distinguish between regular and resource attributes there.

attributes map[entry.FieldInterface]any is not allowed by go - but I could do map[string]FieldEntry with

type FieldEntry struct {
	Key   FieldInterface
	Value any
}

ideas?

@djaglowski
Copy link
Member

I tried to add the functionality to

opentelemetry-collector-contrib/pkg/stanza/fileconsumer/attrs/attrs.go

I don't think this is what we were suggesting. The idea for adding a new parser means a new operator which is independent of fileconsumer. Maybe the closest example would be the uri parser.

@zeitlinger
Copy link
Member Author

OK - should operators be supported by json otlp receiver then?

@ChrsMark
Copy link
Member

Interesting, for some reason I took for granted that operators are also part of the otlpjsonfilereceiver at first place 😞.
I'm not familiar with the otlpjsonfilereceiver so I'm not sure if operators would fit there in general. We should ensure that operators would be used in general as part of this receiver and not just rely on this specific use-case. (@atoulme please chime in if you have feedback on this)

On the other hand I'm not sure if we could do sth similar to the Header operators which are part of fileconsumer: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/fileconsumer/config.go#L94
That also feels weird because the operation we need here is not strictly coupled to the content of the file I think.
@djaglowski did you have sth else in mind that we miss here maybe?

Based on the above I wonder if this specific feature should come through an ottl function since it seems to not be tightly coupled to the various receivers used to parse the content itself.

@djaglowski
Copy link
Member

I forgot that operators are not supported in this receiver, but it makes sense really. The purpose of the receiver is to read in fully formatted telemetry data. It's just a mechanism for transmission really.

With that in mind, I don't care if this functionality is implemented as an operator, but I do think it should happen outside of fileconsumer. It would make sense to me as a reuseable function or package shared by the container parser and this, and perhaps later a container path parser.

@ChrsMark
Copy link
Member

@djaglowski @zeitlinger shall we close this one? That was mostly needed for #33846 but I think that should also be fine now after the otlpjson connector.

@ChrsMark
Copy link
Member

ChrsMark commented Aug 1, 2024

Closing since #33846 was also closed. We can re-open if there is a specific use-case for this.

@ChrsMark ChrsMark closed this as completed Aug 1, 2024
@zeitlinger
Copy link
Member Author

@djaglowski @zeitlinger shall we close this one? That was mostly needed for #33846 but I think that should also be fine now after the otlpjson connector.

yes, the current solution is much better 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants