Skip to content

Latest commit

 

History

History
156 lines (118 loc) · 5.47 KB

input-filestream.asciidoc

File metadata and controls

156 lines (118 loc) · 5.47 KB

filestream input

filestream

Use the filestream input to read lines from active log files. It is the new, improved alternative to the log input. It comes with various improvements to the existing input:

  1. Checking of close_* options happens out of band. Thus, if an output is blocked, {beatname_uc} can close the reader and avoid keeping too many files open.

  2. Detailed metrics are available for all files that match the paths configuration regardless of the harvester_limit. This way, you can keep track of all files, even ones that are not actively read.

  3. The order of parsers is configurable. So it is possible to parse JSON lines and then aggregate the contents into a multiline event.

  4. Some position updates and metadata changes no longer depend on the publishing pipeline. If the pipeline is blocked some changes are still applied to the registry.

  5. Only the most recent updates are serialized to the registry. In contrast, the log input has to serialize the complete registry on each ACK from the outputs. This makes the registry updates much quicker with this input.

  6. The input ensures that only offsets updates are written to the registry append only log. The log writes the complete file state.

  7. Stale entries can be removed from the registry, even if there is no active input.

To configure this input, specify a list of glob-based paths that must be crawled to locate and fetch the log lines.

Example configuration:

{beatname_lc}.inputs:
- type: filestream
  id: my-filestream-id
  paths:
    - /var/log/messages
    - /var/log/*.log

You can apply additional configuration settings (such as fields, include_lines, exclude_lines and so on) to the lines harvested from these files. The options that you specify are applied to all the files harvested by this input.

To apply different configuration settings to different files, you need to define multiple input sections:

{beatname_lc}.inputs:
- type: filestream <1>
  id: my-filestream-id
  paths:
    - /var/log/system.log
    - /var/log/wifi.log
- type: filestream <2>
  id: bar
  paths:
    - "/var/log/apache2/*"
  fields:
    apache: true
  1. Harvests lines from two files: system.log and wifi.log.

  2. Harvests lines from every file in the apache2 directory, and uses the fields configuration option to add a field called apache to the output.

Reading files on network shares and cloud providers

Warning
Filebeat does not support reading from network shares and cloud providers.

However, one of the limitations of these data sources can be mitigated if you configure Filebeat adequately.

By default, {beatname_uc} identifies files based on their inodes and device IDs. However, on network shares and cloud providers these values might change during the lifetime of the file. If this happens {beatname_uc} thinks that file is new and resends the whole content of the file. To solve this problem you can configure file_identity option. Possible values besides the default inode_deviceid are path and inode_marker.

Warning
Changing file_identity methods between runs may result in duplicated events in the output.

Selecting path instructs {beatname_uc} to identify files based on their paths. This is a quick way to avoid rereading files if inode and device ids might change. However, keep in mind if the files are rotated (renamed), they will be reread and resubmitted.

The option inode_marker can be used if the inodes stay the same even if the device id is changed. You should choose this method if your files are rotated instead of path if possible. You have to configure a marker file readable by {beatname_uc} and set the path in the option path of inode_marker.

The content of this file must be unique to the device. You can put the UUID of the device or mountpoint where the input is stored. The following example oneliner generates a hidden marker file for the selected mountpoint /logs: Please note that you should not use this option on Windows as file identifiers might be more volatile.

$ lsblk -o MOUNTPOINT,UUID | grep /logs | awk '{print $2}' >> /logs/.filebeat-marker

To set the generated file as a marker for file_identity you should configure the input the following way:

{beatname_lc}.inputs:
- type: filestream
  id: my-filestream-id
  paths:
    - /logs/*.log
  file_identity.inode_marker.path: /logs/.filebeat-marker

Reading from rotating logs

When dealing with file rotation, avoid harvesting symlinks. Instead use the [filestream-input-paths] setting to point to the original file, and specify a pattern that matches the file you want to harvest and all of its rotated files. Also make sure your log rotation strategy prevents lost or duplicate messages. For more information, see [file-log-rotation].

Furthermore, to avoid duplicate of rotated log messages, do not use the path method for file_identity. Or exclude the rotated files with exclude_files option.