[receiver/filelog] Intelligent File Detection and Reading for Rotated Files #22998

Mrod1598 · 2023-06-01T13:48:59Z

Component(s)

receiver/filelog

Is your feature request related to a problem? Please describe.

Yes, our feature request is related to a problem experienced with file detection and reading an environment with no concept
of a current file with a set name. It has a large group of files, all timestamped, which rotates continuously. It has been
challenging to accurately identify and read the "current" file within this pool of rotating files. The inability to effectively filter
these files leads to excessive CPU usage, as the system attempts to read more than just the current file as we need to check
that none of the other files have been updated.

Describe the solution you'd like

We propose an approach that involves utilizing a sequence of ordering filter rules to determine the most recent file. In cases where multiple groups are necessary, it would be more effective to use multiple receivers.

We also consider some assumptions:

It might be possible to have only one group, which could simplify the process, assuming the user specifies a matching
pattern that matches one group.
The most recent file could be determined by an integer in the filename, which would facilitate the process.
The filename format could be year, month, day, sequence number.

EX:

err.2023053001.log
err.2023053002.log
err.2023053003.log
err.2023053101.log
err.2023053102.log
err.2023053103.log

The solution should provide the capability to define alternate ordering strategies with different parsing/sorting techniques
such as:

Timestamp only
Integer only
Timestamp & integer, with primary sort based on timestamp and secondary sort based on integer.

Lastly, we suggest creating a configuration section that applies these sorting methods in order of priority.
In the proposed solution, we will introduce a new top-level key, tentatively named file_name_filtering_rules. This key will
have a list of filtering rules as its value, and these rules will be applied in sequence.

A single rule will comprise the following fields:

regex: A regular expression with a single capture group called value. This will be used against each filename, and the
contents of value will be used for the rule.

sort_type: Determines how the values of value are compared and sorted. Valid entries are timestamp, integer, and
alphabetical.

format : If sort_type is timestamp, this field determines how to parse the timestamp. The stanza timestamp parsing logic can likely be applied here.

ascending: A boolean value which, if true, signals to sort in ascending order. If false, it sorts in descending order.

Example Config:

filelog:
  include: [dir/Error.*.log]
  file_name_filtering_rules:
    - regex: ¹/dir/Error\.(?P<value>\d{8}).*'
      sort_type: timestamp
      format: '%Y%M³D'
      ascending: true
    - regex: '/dir/Error\.\d{8}(?P<value>\d{2}).*'
      sort_type: integer
      ascending: true

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2023-06-01T13:49:19Z

Pinging code owners:

receiver/filelog: @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

djaglowski · 2023-06-01T14:39:15Z

The first step will involve creating an algorithm to group files based on a sequence of rotations, effectively sorting matching filenames into their respective groups.

It's not clear to me whether this proposal is attempting to address this in any way. Am I missing it? Let's say I have the following files - how does one group these into two groups?

group1-20230601.log
group1-20230602.log
group2-20230601.log
group2-20230602.log

Mrod1598 · 2023-06-01T18:32:25Z

For now grouping should be done through two different receivers.

djaglowski · 2023-06-30T18:24:21Z

Closed by #23633

Mrod1598 added enhancement New feature or request needs triage New item requiring triage labels Jun 1, 2023

github-actions bot added the receiver/filelog label Jun 1, 2023

JaredTan95 removed the needs triage New item requiring triage label Jun 2, 2023

djaglowski assigned Mrod1598 Jun 6, 2023

This was referenced Jun 21, 2023

[receiver/filelog] Add Support for only reading the current file #23633

Merged

[receiver/filelog] Active File Grouping #23787

Open

[receiver/filelog] Support for Multiple Current Sorted Files #23788

Closed

djaglowski closed this as completed Jun 30, 2023

github-actions bot mentioned this issue Jul 1, 2024

Link Checker Report signalfx/splunk-otel-collector#5039

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/filelog] Intelligent File Detection and Reading for Rotated Files #22998

[receiver/filelog] Intelligent File Detection and Reading for Rotated Files #22998

Mrod1598 commented Jun 1, 2023 •

edited

Loading

github-actions bot commented Jun 1, 2023

djaglowski commented Jun 1, 2023

Mrod1598 commented Jun 1, 2023

djaglowski commented Jun 30, 2023

[receiver/filelog] Intelligent File Detection and Reading for Rotated Files #22998

[receiver/filelog] Intelligent File Detection and Reading for Rotated Files #22998

Comments

Mrod1598 commented Jun 1, 2023 • edited Loading

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Jun 1, 2023

djaglowski commented Jun 1, 2023

Mrod1598 commented Jun 1, 2023

djaglowski commented Jun 30, 2023

Mrod1598 commented Jun 1, 2023 •

edited

Loading