-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/filelog] Support detection of headers in header-based log formats (e.g. W3C) #18198
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I think this functionality should be supported in some way, but it would be best if we can justify enhancements to each operator independently. This will avoid a scenario where loosely coupled operators have overly specific dependencies on each other. The A The changes to the
A perfect solution here would make minimal assumptions about the specific format of the header and introduce minimal complexity to the codebase. Still, I think it is necessary to assume that we are working with a header. In other words, I do not think we should solve for a case where metadata about the file is discovered and/or updated throughout the reading of the file. I have some ideas for what this should look like and will post those when I have a moment to organize them. |
I think it's fair to make the assumption that metadata would be in a header (that is, in a section before any log lines begin). |
Regarding changes to the I spoke with @BinaryFissionGames offline and identified a potential solution for isolating header metadata. I've added some additional context and suggestions:
Sample configuration: receivers:
filelog:
include: foo*.log
header:
multiline_pattern: '...'
metadata_operators:
- type: regex_parser
regex: '...'
operators:
- type: json_parser
... |
@djaglowski Could you assign me to this issue? |
Completed with #18921 |
@BinaryFissionGames How can we enable this feature ? I want to ignore the header lines from IIS logs before exporting |
@djaglowski How to support multiline header like glog?
For example, I need to extract the machine name MACHINE_XXX from the header. I tried the relevant configuration, but it seems that the header only supports line by line matching. |
@xieyuguang, you may be able to do this with the filelog:
...
header:
pattern: '^.+: .+$'
metadata_operators:
- type: router
routes:
- output: create_at_parser
expr: 'bodymatches "^Log file created at: .*$"'
- output: running_on_parser
expr: 'body matches "^ Running on machine: .*$"'
...
- type: regex_parser
id: create_at_parser
regex: '^Log file created at: (?P<log.file.created_time>.+)$'
- type: regex_parser
id: running_on_parser
regex: '^ Running on machine: (?P<machine.name>.+)$'
... You can read more about this type of pipeline here. |
Component(s)
receiver/filelog
Is your feature request related to a problem? Please describe.
The W3C log format defines its fields through a list of headers. This allows any agent that is aware of these headers to parse any W3C log, even if the headers change mid-way through the log file (as they could in e.g. Microsoft IIS logs).
The filelog receiver currently does not support parsing these fields and using them to parse CSV lines.
Describe the solution you'd like
Ideally, there would be some way to configure the filelog receiver to recognize and pass these headers to the CSV parser so that the log lines can be parsed based on the headers.
In Stanza, this functionality was implemented in the following PRs:
Tangentially related:
The way it worked was the the filelog receiver would save the header line, adding it as an attribute to each log record read from the file.
Later in the pipeline, the CSV file would be able to use this attribute as dynamic headers, which allowed the log line to be parsed based on the header attribute that the filelog receiver added.
Describe alternatives you've considered
I haven't thought of other solutions besides the one implemented in stanza; Would love to hear other ideas!
Additional context
Sample W3C log line, for context:
W3C log
The text was updated successfully, but these errors were encountered: