Use the filestream
input to read lines from active log files. It is the
new, improved alternative to the log
input. It comes with various improvements
to the existing input:
-
Checking of
close_*
options happens out of band. Thus, if an output is blocked, {beatname_uc} can close the reader and avoid keeping too many files open. -
Detailed metrics are available for all files that match the
paths
configuration regardless of theharvester_limit
. This way, you can keep track of all files, even ones that are not actively read. -
The order of
parsers
is configurable. So it is possible to parse JSON lines and then aggregate the contents into a multiline event. -
Some position updates and metadata changes no longer depend on the publishing pipeline. If the pipeline is blocked some changes are still applied to the registry.
-
Only the most recent updates are serialized to the registry. In contrast, the
log
input has to serialize the complete registry on each ACK from the outputs. This makes the registry updates much quicker with this input. -
The input ensures that only offsets updates are written to the registry append only log. The
log
writes the complete file state. -
Stale entries can be removed from the registry, even if there is no active input.
To configure this input, specify a list of glob-based paths
that must be crawled to locate and fetch the log lines.
Example configuration:
{beatname_lc}.inputs:
- type: filestream
id: my-filestream-id
paths:
- /var/log/messages
- /var/log/*.log
You can apply additional
configuration settings (such as fields
,
include_lines
, exclude_lines
and so on) to the lines harvested
from these files. The options that you specify are applied to all the files
harvested by this input.
To apply different configuration settings to different files, you need to define multiple input sections:
{beatname_lc}.inputs:
- type: filestream <1>
id: my-filestream-id
paths:
- /var/log/system.log
- /var/log/wifi.log
- type: filestream <2>
id: bar
paths:
- "/var/log/apache2/*"
fields:
apache: true
-
Harvests lines from two files:
system.log
andwifi.log
. -
Harvests lines from every file in the
apache2
directory, and uses thefields
configuration option to add a field calledapache
to the output.
Warning
|
Filebeat does not support reading from network shares and cloud providers. |
However, one of the limitations of these data sources can be mitigated if you configure Filebeat adequately.
By default, {beatname_uc} identifies files based on their inodes and
device IDs. However, on network shares and cloud providers these
values might change during the lifetime of the file. If this happens
{beatname_uc} thinks that file is new and resends the whole content
of the file. To solve this problem you can configure file_identity
option. Possible
values besides the default inode_deviceid
are path
and inode_marker
.
Warning
|
Changing file_identity methods between runs may result in
duplicated events in the output.
|
Selecting path
instructs {beatname_uc} to identify files based on their
paths. This is a quick way to avoid rereading files if inode and device ids
might change. However, keep in mind if the files are rotated (renamed), they
will be reread and resubmitted.
The option inode_marker
can be used if the inodes stay the same even if
the device id is changed. You should choose this method if your files are
rotated instead of path
if possible. You have to configure a marker file
readable by {beatname_uc} and set the path in the option path
of inode_marker
.
The content of this file must be unique to the device. You can put the
UUID of the device or mountpoint where the input is stored. The following
example oneliner generates a hidden marker file for the selected mountpoint /logs
:
Please note that you should not use this option on Windows as file identifiers might be
more volatile.
$ lsblk -o MOUNTPOINT,UUID | grep /logs | awk '{print $2}' >> /logs/.filebeat-marker
To set the generated file as a marker for file_identity
you should configure
the input the following way:
{beatname_lc}.inputs:
- type: filestream
id: my-filestream-id
paths:
- /logs/*.log
file_identity.inode_marker.path: /logs/.filebeat-marker
When dealing with file rotation, avoid harvesting symlinks. Instead use the [filestream-input-paths] setting to point to the original file, and specify a pattern that matches the file you want to harvest and all of its rotated files. Also make sure your log rotation strategy prevents lost or duplicate messages. For more information, see [file-log-rotation].
Furthermore, to avoid duplicate of rotated log messages, do not use the
path
method for file_identity
. Or exclude the rotated files with exclude_files
option.