You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Yes, our feature request is related to a problem experienced with file detection and reading an environment with no concept
of a current file with a set name. It has a large group of files, all timestamped, which rotates continuously. It has been
challenging to accurately identify and read the "current" file within this pool of rotating files. The inability to effectively filter
these files leads to excessive CPU usage, as the system attempts to read more than just the current file as we need to check
that none of the other files have been updated.
Describe the solution you'd like
We propose an approach that involves utilizing a sequence of ordering filter rules to determine the most recent file. In cases where multiple groups are necessary, it would be more effective to use multiple receivers.
We also consider some assumptions:
It might be possible to have only one group, which could simplify the process, assuming the user specifies a matching
pattern that matches one group.
The most recent file could be determined by an integer in the filename, which would facilitate the process.
The filename format could be year, month, day, sequence number.
The solution should provide the capability to define alternate ordering strategies with different parsing/sorting techniques
such as:
Timestamp only
Integer only
Timestamp & integer, with primary sort based on timestamp and secondary sort based on integer.
Lastly, we suggest creating a configuration section that applies these sorting methods in order of priority.
In the proposed solution, we will introduce a new top-level key, tentatively named file_name_filtering_rules. This key will
have a list of filtering rules as its value, and these rules will be applied in sequence.
A single rule will comprise the following fields:
regex: A regular expression with a single capture group called value. This will be used against each filename, and the
contents of value will be used for the rule.
sort_type: Determines how the values of value are compared and sorted. Valid entries are timestamp, integer, and alphabetical.
format : If sort_type is timestamp, this field determines how to parse the timestamp. The stanza timestamp parsing logic can likely be applied here.
ascending: A boolean value which, if true, signals to sort in ascending order. If false, it sorts in descending order.
The first step will involve creating an algorithm to group files based on a sequence of rotations, effectively sorting matching filenames into their respective groups.
It's not clear to me whether this proposal is attempting to address this in any way. Am I missing it? Let's say I have the following files - how does one group these into two groups?
Component(s)
receiver/filelog
Is your feature request related to a problem? Please describe.
Yes, our feature request is related to a problem experienced with file detection and reading an environment with no concept
of a current file with a set name. It has a large group of files, all timestamped, which rotates continuously. It has been
challenging to accurately identify and read the "current" file within this pool of rotating files. The inability to effectively filter
these files leads to excessive CPU usage, as the system attempts to read more than just the current file as we need to check
that none of the other files have been updated.
Describe the solution you'd like
We propose an approach that involves utilizing a sequence of ordering filter rules to determine the most recent file. In cases where multiple groups are necessary, it would be more effective to use multiple receivers.
We also consider some assumptions:
It might be possible to have only one group, which could simplify the process, assuming the user specifies a matching
pattern that matches one group.
The most recent file could be determined by an integer in the filename, which would facilitate the process.
The filename format could be year, month, day, sequence number.
EX:
The solution should provide the capability to define alternate ordering strategies with different parsing/sorting techniques
such as:
Lastly, we suggest creating a configuration section that applies these sorting methods in order of priority.
In the proposed solution, we will introduce a new top-level key, tentatively named
file_name_filtering_rules
. This key willhave a list of filtering rules as its value, and these rules will be applied in sequence.
A single rule will comprise the following fields:
regex
: A regular expression with a single capture group called value. This will be used against each filename, and thecontents of value will be used for the rule.
sort_type
: Determines how the values of value are compared and sorted. Valid entries aretimestamp
,integer
, andalphabetical
.format
: Ifsort_type
istimestamp
, this field determines how to parse the timestamp. The stanza timestamp parsing logic can likely be applied here.ascending
: A boolean value which, if true, signals to sort in ascending order. If false, it sorts in descending order.Example Config:
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: