You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File input should have the ability to parse a log file's headers and attach them as labels to each entry's $label map. This needs to work under the following situations:
Start at end
Start at beginning
Start at end + starting with an offset other than 0 (half way though a file)
Goals
This solution should not decrease performance of file input operator.
Changes
Added LabelRegex, an optional parameter for providing a regex that will be used for parsing headers
This regex should contain two capture groups: key and value
Added Labels map[string]string to the Fingerprint type. This map still store the labels derived from the headers
When NewFingerprint() is called, the map is initialized with fp.Labels = make(map[string]string)
Added ReadHeaders() method, which is called by ReadToEnd before a file's entry's are read
This method will read the beginning of a file until the regex stops matching, and return.
Updated ReadToEnd() to attach the Labels on the fingerprint to each entry
Use Cases
The initial use-case for this change is to allow for a W3C plugin to be built, where we will need to detect the field names before parsing the entry. A future PR will enable CSV parser to take an optional FieldLabel parameter, meaning the user will not need to define the fields in their stanza config, for example:
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of Changes
Context
File input should have the ability to parse a log file's headers and attach them as labels to each entry's $label map. This needs to work under the following situations:
Goals
This solution should not decrease performance of file input operator.
Changes
LabelRegex
, an optional parameter for providing a regex that will be used for parsing headerskey
andvalue
label_regex: '^#(?P<key>.*?): (?P<value>.*)'
Labels map[string]string
to the Fingerprint type. This map still store the labels derived from the headersNewFingerprint()
is called, the map is initialized withfp.Labels = make(map[string]string)
ReadHeaders()
method, which is called by ReadToEnd before a file's entry's are readReadToEnd()
to attach the Labels on the fingerprint to each entryUse Cases
The initial use-case for this change is to allow for a W3C plugin to be built, where we will need to detect the field names before parsing the entry. A future PR will enable CSV parser to take an optional
FieldLabel
parameter, meaning the user will not need to define the fields in their stanza config, for example:file input will read the file starting with this:
and then attach those headers as labels, to each entry
Once CSV is updated, we can do this: (removed some fields from this output to keep it small)
The pipeline config to handle this:
Please check that the PR fulfills these requirements