From 2f079f94462ff6aa93fd95989ee8ea89e61a526a Mon Sep 17 00:00:00 2001 From: Jimmy Kim Date: Mon, 17 Jun 2024 07:02:41 -0700 Subject: [PATCH] add line number option for filelogreceiver (#33530) **Description:** Adding an option to include line numbers as a record attribute to the filelogreceiver. **Testing:** Add unit tests **Documentation:** Add documentation on filelogreceiver for the new file line number option --------- Co-authored-by: Daniel Jaglowski --- .chloggen/add_include_file_record_number.yaml | 27 +++++++ pkg/stanza/docs/operators/file_input.md | 55 ++++++------- pkg/stanza/fileconsumer/attrs/attrs.go | 1 + pkg/stanza/fileconsumer/attrs/attrs_test.go | 4 +- pkg/stanza/fileconsumer/config.go | 60 +++++++------- pkg/stanza/fileconsumer/config_test.go | 1 + .../fileconsumer/internal/reader/factory.go | 50 ++++++------ .../fileconsumer/internal/reader/reader.go | 8 ++ pkg/stanza/operator/input/file/input_test.go | 45 +++++++++++ receiver/filelogreceiver/README.md | 81 ++++++++++--------- 10 files changed, 210 insertions(+), 122 deletions(-) create mode 100644 .chloggen/add_include_file_record_number.yaml diff --git a/.chloggen/add_include_file_record_number.yaml b/.chloggen/add_include_file_record_number.yaml new file mode 100644 index 000000000000..e990cb09037c --- /dev/null +++ b/.chloggen/add_include_file_record_number.yaml @@ -0,0 +1,27 @@ +# Use this changelog template to create an entry for release notes. + +# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' +change_type: enhancement + +# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver) +component: filelogreceiver + +# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). +note: If include_file_record_number is true, it will add the file record number as the attribute `log.file.record_number` + +# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. +issues: [33530] + +# (Optional) One or more lines of additional information to render under the primary note. +# These lines will be padded with 2 spaces and then inserted directly into the document. +# Use pipe (|) for multiline entries. +subtext: + +# If your change doesn't affect end users or the exported elements of any package, +# you should instead start your pull request title with [chore] or use the "Skip Changelog" label. +# Optional: The change log or logs in which this entry should be included. +# e.g. '[user]' or '[user, api]' +# Include 'user' if the change is relevant to end users. +# Include 'api' if there is a change to a library API. +# Default: '[user]' +change_logs: [user] diff --git a/pkg/stanza/docs/operators/file_input.md b/pkg/stanza/docs/operators/file_input.md index fadfe71852d6..efd77ec34504 100644 --- a/pkg/stanza/docs/operators/file_input.md +++ b/pkg/stanza/docs/operators/file_input.md @@ -4,35 +4,36 @@ The `file_input` operator reads logs from files. It will place the lines read in ### Configuration Fields -| Field | Default | Description | -| --- | --- | --- | -| `id` | `file_input` | A unique identifier for the operator. | -| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | -| `include` | required | A list of file glob patterns that match the file paths to be read. | -| `exclude` | [] | A list of file glob patterns to exclude from reading. | -| `poll_interval` | 200ms | The duration between filesystem polls. | -| `multiline` | | A `multiline` configuration block. See below for details. | -| `force_flush_period` | `500ms` | Time since last read of data from file, after which currently buffered log should be send to pipeline. Takes `time.Time` as value. Zero means waiting for new data forever. | -| `encoding` | `utf-8` | The encoding of the file being read. See the list of supported encodings below for available options. | -| `include_file_name` | `true` | Whether to add the file name as the attribute `log.file.name`. | -| `include_file_path` | `false` | Whether to add the file path as the attribute `log.file.path`. | -| `include_file_name_resolved` | `false` | Whether to add the file name after symlinks resolution as the attribute `log.file.name_resolved`. | -| `include_file_path_resolved` | `false` | Whether to add the file path after symlinks resolution as the attribute `log.file.path_resolved`. | -| `include_file_owner_name` | `false` | Whether to add the file owner name as the attribute `log.file.owner.name`. Not supported for windows. | -| `include_file_owner_group_name` | `false` | Whether to add the file group name as the attribute `log.file.owner.group.name`. Not supported for windows. | +| Field | Default | Description | +|---------------------------------| --- |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `id` | `file_input` | A unique identifier for the operator. | +| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. | +| `include` | required | A list of file glob patterns that match the file paths to be read. | +| `exclude` | [] | A list of file glob patterns to exclude from reading. | +| `poll_interval` | 200ms | The duration between filesystem polls. | +| `multiline` | | A `multiline` configuration block. See below for details. | +| `force_flush_period` | `500ms` | Time since last read of data from file, after which currently buffered log should be send to pipeline. Takes `time.Time` as value. Zero means waiting for new data forever. | +| `encoding` | `utf-8` | The encoding of the file being read. See the list of supported encodings below for available options. | +| `include_file_name` | `true` | Whether to add the file name as the attribute `log.file.name`. | +| `include_file_path` | `false` | Whether to add the file path as the attribute `log.file.path`. | +| `include_file_name_resolved` | `false` | Whether to add the file name after symlinks resolution as the attribute `log.file.name_resolved`. | +| `include_file_path_resolved` | `false` | Whether to add the file path after symlinks resolution as the attribute `log.file.path_resolved`. | +| `include_file_owner_name` | `false` | Whether to add the file owner name as the attribute `log.file.owner.name`. Not supported for windows. | +| `include_file_owner_group_name` | `false` | Whether to add the file group name as the attribute `log.file.owner.group.name`. Not supported for windows. | +| `include_file_record_number` | `false` | Whether to add the record's record number in the file as the attribute `log.file.record_number`. | | `preserve_leading_whitespaces` | `false` | Whether to preserve leading whitespaces. | -| `preserve_trailing_whitespaces` | `false` | Whether to preserve trailing whitespaces. | -| `start_at` | `end` | At startup, where to start reading logs from the file. Options are `beginning` or `end`. This setting will be ignored if previously read file offsets are retrieved from a persistence mechanism. | +| `preserve_trailing_whitespaces` | `false` | Whether to preserve trailing whitespaces. | +| `start_at` | `end` | At startup, where to start reading logs from the file. Options are `beginning` or `end`. This setting will be ignored if previously read file offsets are retrieved from a persistence mechanism. | | `fingerprint_size` | `1kb` | The number of bytes with which to identify a file. The first bytes in the file are used as the fingerprint. Decreasing this value at any point will cause existing fingerprints to forgotten, meaning that all files will be read from the beginning (one time). | -| `max_log_size` | `1MiB` | The maximum size of a log entry to read before failing. Protects against reading large amounts of data into memory |. -| `max_concurrent_files` | 1024 | The maximum number of log files from which logs will be read concurrently (minimum = 2). If the number of files matched in the `include` pattern exceeds half of this number, then files will be processed in batches. | -| `max_batches` | 0 | Only applicable when files must be batched in order to respect `max_concurrent_files`. This value limits the number of batches that will be processed during a single poll interval. A value of 0 indicates no limit. | -| `delete_after_read` | `false` | If `true`, each log file will be read and then immediately deleted. Requires that the `filelog.allowFileDeletion` feature gate is enabled. | -| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. | -| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. | -| `header` | nil | Specifies options for parsing header metadata. Requires that the `filelog.allowHeaderMetadataParsing` feature gate is enabled. See below for details. | -| `header.pattern` | required for header metadata parsing | A regex that matches every header line. | -| `header.metadata_operators` | required for header metadata parsing | A list of operators used to parse metadata from the header. | +| `max_log_size` | `1MiB` | The maximum size of a log entry to read before failing. Protects against reading large amounts of data into memory |. +| `max_concurrent_files` | 1024 | The maximum number of log files from which logs will be read concurrently (minimum = 2). If the number of files matched in the `include` pattern exceeds half of this number, then files will be processed in batches. | +| `max_batches` | 0 | Only applicable when files must be batched in order to respect `max_concurrent_files`. This value limits the number of batches that will be processed during a single poll interval. A value of 0 indicates no limit. | +| `delete_after_read` | `false` | If `true`, each log file will be read and then immediately deleted. Requires that the `filelog.allowFileDeletion` feature gate is enabled. | +| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. | +| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. | +| `header` | nil | Specifies options for parsing header metadata. Requires that the `filelog.allowHeaderMetadataParsing` feature gate is enabled. See below for details. | +| `header.pattern` | required for header metadata parsing | A regex that matches every header line. | +| `header.metadata_operators` | required for header metadata parsing | A list of operators used to parse metadata from the header. | Note that by default, no logs will be read unless the monitored file is actively being written to because `start_at` defaults to `end`. diff --git a/pkg/stanza/fileconsumer/attrs/attrs.go b/pkg/stanza/fileconsumer/attrs/attrs.go index 0b174a97a812..20a96a158f60 100644 --- a/pkg/stanza/fileconsumer/attrs/attrs.go +++ b/pkg/stanza/fileconsumer/attrs/attrs.go @@ -17,6 +17,7 @@ const ( LogFilePathResolved = "log.file.path_resolved" LogFileOwnerName = "log.file.owner.name" LogFileOwnerGroupName = "log.file.owner.group.name" + LogFileRecordNumber = "log.file.record_number" ) type Resolver struct { diff --git a/pkg/stanza/fileconsumer/attrs/attrs_test.go b/pkg/stanza/fileconsumer/attrs/attrs_test.go index c93cdef40d6e..b714975d460b 100644 --- a/pkg/stanza/fileconsumer/attrs/attrs_test.go +++ b/pkg/stanza/fileconsumer/attrs/attrs_test.go @@ -19,7 +19,7 @@ func TestResolver(t *testing.T) { for i := 0; i < 64; i++ { - // Create a 4 bit string where each bit represents the value of a config option + // Create a 6 bit string where each bit represents the value of a config option bitString := fmt.Sprintf("%06b", i) // Create a resolver with a config that matches the bit pattern of i @@ -54,7 +54,7 @@ func TestResolver(t *testing.T) { assert.Empty(t, attributes[LogFilePath]) } - // We don't have an independent way to resolve the path, so the only meangingful validate + // We don't have an independent way to resolve the path, so the only meaningful validate // is to ensure that the resolver returns nothing vs something based on the config. if r.IncludeFileNameResolved { expectLen++ diff --git a/pkg/stanza/fileconsumer/config.go b/pkg/stanza/fileconsumer/config.go index d09c82dfeaa8..6c7db8828a26 100644 --- a/pkg/stanza/fileconsumer/config.go +++ b/pkg/stanza/fileconsumer/config.go @@ -71,21 +71,22 @@ func NewConfig() *Config { // Config is the configuration of a file input operator type Config struct { - matcher.Criteria `mapstructure:",squash"` - attrs.Resolver `mapstructure:",squash"` - PollInterval time.Duration `mapstructure:"poll_interval,omitempty"` - MaxConcurrentFiles int `mapstructure:"max_concurrent_files,omitempty"` - MaxBatches int `mapstructure:"max_batches,omitempty"` - StartAt string `mapstructure:"start_at,omitempty"` - FingerprintSize helper.ByteSize `mapstructure:"fingerprint_size,omitempty"` - MaxLogSize helper.ByteSize `mapstructure:"max_log_size,omitempty"` - Encoding string `mapstructure:"encoding,omitempty"` - SplitConfig split.Config `mapstructure:"multiline,omitempty"` - TrimConfig trim.Config `mapstructure:",squash,omitempty"` - FlushPeriod time.Duration `mapstructure:"force_flush_period,omitempty"` - Header *HeaderConfig `mapstructure:"header,omitempty"` - DeleteAfterRead bool `mapstructure:"delete_after_read,omitempty"` - Compression string `mapstructure:"compression,omitempty"` + matcher.Criteria `mapstructure:",squash"` + attrs.Resolver `mapstructure:",squash"` + PollInterval time.Duration `mapstructure:"poll_interval,omitempty"` + MaxConcurrentFiles int `mapstructure:"max_concurrent_files,omitempty"` + MaxBatches int `mapstructure:"max_batches,omitempty"` + StartAt string `mapstructure:"start_at,omitempty"` + FingerprintSize helper.ByteSize `mapstructure:"fingerprint_size,omitempty"` + MaxLogSize helper.ByteSize `mapstructure:"max_log_size,omitempty"` + Encoding string `mapstructure:"encoding,omitempty"` + SplitConfig split.Config `mapstructure:"multiline,omitempty"` + TrimConfig trim.Config `mapstructure:",squash,omitempty"` + FlushPeriod time.Duration `mapstructure:"force_flush_period,omitempty"` + Header *HeaderConfig `mapstructure:"header,omitempty"` + DeleteAfterRead bool `mapstructure:"delete_after_read,omitempty"` + IncludeFileRecordNumber bool `mapstructure:"include_file_record_number,omitempty"` + Compression string `mapstructure:"compression,omitempty"` } type HeaderConfig struct { @@ -154,20 +155,21 @@ func (c Config) Build(set component.TelemetrySettings, emit emit.Callback, opts set.Logger = set.Logger.With(zap.String("component", "fileconsumer")) readerFactory := reader.Factory{ - TelemetrySettings: set, - FromBeginning: startAtBeginning, - FingerprintSize: int(c.FingerprintSize), - InitialBufferSize: scanner.DefaultBufferSize, - MaxLogSize: int(c.MaxLogSize), - Encoding: enc, - SplitFunc: splitFunc, - TrimFunc: trimFunc, - FlushTimeout: c.FlushPeriod, - EmitFunc: emit, - Attributes: c.Resolver, - HeaderConfig: hCfg, - DeleteAtEOF: c.DeleteAfterRead, - Compression: c.Compression, + TelemetrySettings: set, + FromBeginning: startAtBeginning, + FingerprintSize: int(c.FingerprintSize), + InitialBufferSize: scanner.DefaultBufferSize, + MaxLogSize: int(c.MaxLogSize), + Encoding: enc, + SplitFunc: splitFunc, + TrimFunc: trimFunc, + FlushTimeout: c.FlushPeriod, + EmitFunc: emit, + Attributes: c.Resolver, + HeaderConfig: hCfg, + DeleteAtEOF: c.DeleteAfterRead, + IncludeFileRecordNumber: c.IncludeFileRecordNumber, + Compression: c.Compression, } var t tracker.Tracker diff --git a/pkg/stanza/fileconsumer/config_test.go b/pkg/stanza/fileconsumer/config_test.go index 8635cf556631..340d7f7f5ee0 100644 --- a/pkg/stanza/fileconsumer/config_test.go +++ b/pkg/stanza/fileconsumer/config_test.go @@ -40,6 +40,7 @@ func TestNewConfig(t *testing.T) { assert.False(t, cfg.IncludeFilePathResolved) assert.False(t, cfg.IncludeFileOwnerName) assert.False(t, cfg.IncludeFileOwnerGroupName) + assert.False(t, cfg.IncludeFileRecordNumber) } func TestUnmarshal(t *testing.T) { diff --git a/pkg/stanza/fileconsumer/internal/reader/factory.go b/pkg/stanza/fileconsumer/internal/reader/factory.go index 370b07d9e816..646aebae3be6 100644 --- a/pkg/stanza/fileconsumer/internal/reader/factory.go +++ b/pkg/stanza/fileconsumer/internal/reader/factory.go @@ -30,19 +30,20 @@ const ( type Factory struct { component.TelemetrySettings - HeaderConfig *header.Config - FromBeginning bool - FingerprintSize int - InitialBufferSize int - MaxLogSize int - Encoding encoding.Encoding - SplitFunc bufio.SplitFunc - TrimFunc trim.Func - FlushTimeout time.Duration - EmitFunc emit.Callback - Attributes attrs.Resolver - DeleteAtEOF bool - Compression string + HeaderConfig *header.Config + FromBeginning bool + FingerprintSize int + InitialBufferSize int + MaxLogSize int + Encoding encoding.Encoding + SplitFunc bufio.SplitFunc + TrimFunc trim.Func + FlushTimeout time.Duration + EmitFunc emit.Callback + Attributes attrs.Resolver + DeleteAtEOF bool + IncludeFileRecordNumber bool + Compression string } func (f *Factory) NewFingerprint(file *os.File) (*fingerprint.Fingerprint, error) { @@ -64,17 +65,18 @@ func (f *Factory) NewReader(file *os.File, fp *fingerprint.Fingerprint) (*Reader func (f *Factory) NewReaderFromMetadata(file *os.File, m *Metadata) (r *Reader, err error) { r = &Reader{ - Metadata: m, - set: f.TelemetrySettings, - file: file, - fileName: file.Name(), - fingerprintSize: f.FingerprintSize, - initialBufferSize: f.InitialBufferSize, - maxLogSize: f.MaxLogSize, - decoder: decode.New(f.Encoding), - lineSplitFunc: f.SplitFunc, - deleteAtEOF: f.DeleteAtEOF, - compression: f.Compression, + Metadata: m, + set: f.TelemetrySettings, + file: file, + fileName: file.Name(), + fingerprintSize: f.FingerprintSize, + initialBufferSize: f.InitialBufferSize, + maxLogSize: f.MaxLogSize, + decoder: decode.New(f.Encoding), + lineSplitFunc: f.SplitFunc, + deleteAtEOF: f.DeleteAtEOF, + includeFileRecordNum: f.IncludeFileRecordNumber, + compression: f.Compression, } r.set.Logger = r.set.Logger.With(zap.String("path", r.fileName)) diff --git a/pkg/stanza/fileconsumer/internal/reader/reader.go b/pkg/stanza/fileconsumer/internal/reader/reader.go index 2a1416af70ba..df4c03498e92 100644 --- a/pkg/stanza/fileconsumer/internal/reader/reader.go +++ b/pkg/stanza/fileconsumer/internal/reader/reader.go @@ -15,6 +15,7 @@ import ( "go.uber.org/zap" "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/decode" + "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/attrs" "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/emit" "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/internal/fingerprint" "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/internal/header" @@ -25,6 +26,7 @@ import ( type Metadata struct { Fingerprint *fingerprint.Fingerprint Offset int64 + RecordNum int64 FileAttributes map[string]any HeaderFinalized bool FlushState *flush.State @@ -48,6 +50,7 @@ type Reader struct { emitFunc emit.Callback deleteAtEOF bool needsUpdateFingerprint bool + includeFileRecordNum bool compression string } @@ -122,6 +125,11 @@ func (r *Reader) ReadToEnd(ctx context.Context) { continue } + if r.includeFileRecordNum { + r.RecordNum++ + r.FileAttributes[attrs.LogFileRecordNumber] = r.RecordNum + } + err = r.processFunc(ctx, token, r.FileAttributes) if err == nil { r.Offset = s.Pos() // successful emit, update offset diff --git a/pkg/stanza/operator/input/file/input_test.go b/pkg/stanza/operator/input/file/input_test.go index 0fcbeab36f33..2063e01e7f26 100644 --- a/pkg/stanza/operator/input/file/input_test.go +++ b/pkg/stanza/operator/input/file/input_test.go @@ -71,6 +71,51 @@ func TestAddFileResolvedFields(t *testing.T) { } } +// AddFileRecordNumber tests that the `log.file.record_number` is correctly included +// when IncludeFileRecordNumber is set to true +func TestAddFileRecordNumber(t *testing.T) { + t.Parallel() + operator, logReceived, tempDir := newTestFileOperator(t, func(cfg *Config) { + cfg.IncludeFileRecordNumber = true + }) + + // Create a file, then start + temp := openTemp(t, tempDir) + writeString(t, temp, "testlog1\ntestlog2\ntestlog3\n") + + require.NoError(t, operator.Start(testutil.NewUnscopedMockPersister())) + defer func() { + require.NoError(t, operator.Stop()) + }() + + e := waitForOne(t, logReceived) + require.Equal(t, "testlog1", e.Body) + require.Equal(t, int64(1), e.Attributes["log.file.record_number"]) + + e = waitForOne(t, logReceived) + require.Equal(t, "testlog2", e.Body) + require.Equal(t, int64(2), e.Attributes["log.file.record_number"]) + + e = waitForOne(t, logReceived) + require.Equal(t, "testlog3", e.Body) + require.Equal(t, int64(3), e.Attributes["log.file.record_number"]) + + // Write 3 more entries + writeString(t, temp, "testlog4\ntestlog5\ntestlog6\n") + + e = waitForOne(t, logReceived) + require.Equal(t, "testlog4", e.Body) + require.Equal(t, int64(4), e.Attributes["log.file.record_number"]) + + e = waitForOne(t, logReceived) + require.Equal(t, "testlog5", e.Body) + require.Equal(t, int64(5), e.Attributes["log.file.record_number"]) + + e = waitForOne(t, logReceived) + require.Equal(t, "testlog6", e.Body) + require.Equal(t, int64(6), e.Attributes["log.file.record_number"]) +} + // ReadExistingLogs tests that, when starting from beginning, we // read all the lines that are already there func TestReadExistingLogs(t *testing.T) { diff --git a/receiver/filelogreceiver/README.md b/receiver/filelogreceiver/README.md index 61dd6175ff3c..5edb62ba96bc 100644 --- a/receiver/filelogreceiver/README.md +++ b/receiver/filelogreceiver/README.md @@ -16,47 +16,48 @@ Tails and parses logs from files. ## Configuration -| Field | Default | Description | -|---------------------------------------|--------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `include` | required | A list of file glob patterns that match the file paths to be read. | -| `exclude` | [] | A list of file glob patterns to exclude from reading. This is applied against the paths matched by `include`. | -| `exclude_older_than` | | Exclude files whose modification time is older than the specified [age](#time-parameters). | -| `start_at` | `end` | At startup, where to start reading logs from the file. Options are `beginning` or `end`. | -| `multiline` | | A `multiline` configuration block. See [below](#multiline-configuration) for more details. | -| `force_flush_period` | `500ms` | [Time](#time-parameters) since last time new data was found in the file, after which a partial log at the end of the file may be emitted. | -| `encoding` | `utf-8` | The encoding of the file being read. See the list of [supported encodings below](#supported-encodings) for available options. | -| `preserve_leading_whitespaces` | `false` | Whether to preserve leading whitespaces. | -| `preserve_trailing_whitespaces` | `false` | Whether to preserve trailing whitespaces. | -| `include_file_name` | `true` | Whether to add the file name as the attribute `log.file.name`. | -| `include_file_path` | `false` | Whether to add the file path as the attribute `log.file.path`. | -| `include_file_name_resolved` | `false` | Whether to add the file name after symlinks resolution as the attribute `log.file.name_resolved`. | -| `include_file_path_resolved` | `false` | Whether to add the file path after symlinks resolution as the attribute `log.file.path_resolved`. | -| `include_file_owner_name` | `false` | Whether to add the file owner name as the attribute `log.file.owner.name`. Not supported for windows. | -| `include_file_owner_group_name` | `false` | Whether to add the file group name as the attribute `log.file.owner.group.name`. Not supported for windows. | -| `poll_interval` | 200ms | The [duration](#time-parameters) between filesystem polls. | +| Field | Default | Description | +|---------------------------------------|--------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `include` | required | A list of file glob patterns that match the file paths to be read. | +| `exclude` | [] | A list of file glob patterns to exclude from reading. This is applied against the paths matched by `include`. | +| `exclude_older_than` | | Exclude files whose modification time is older than the specified [age](#time-parameters). | +| `start_at` | `end` | At startup, where to start reading logs from the file. Options are `beginning` or `end`. | +| `multiline` | | A `multiline` configuration block. See [below](#multiline-configuration) for more details. | +| `force_flush_period` | `500ms` | [Time](#time-parameters) since last time new data was found in the file, after which a partial log at the end of the file may be emitted. | +| `encoding` | `utf-8` | The encoding of the file being read. See the list of [supported encodings below](#supported-encodings) for available options. | +| `preserve_leading_whitespaces` | `false` | Whether to preserve leading whitespaces. | +| `preserve_trailing_whitespaces` | `false` | Whether to preserve trailing whitespaces. | +| `include_file_name` | `true` | Whether to add the file name as the attribute `log.file.name`. | +| `include_file_path` | `false` | Whether to add the file path as the attribute `log.file.path`. | +| `include_file_name_resolved` | `false` | Whether to add the file name after symlinks resolution as the attribute `log.file.name_resolved`. | +| `include_file_path_resolved` | `false` | Whether to add the file path after symlinks resolution as the attribute `log.file.path_resolved`. | +| `include_file_owner_name` | `false` | Whether to add the file owner name as the attribute `log.file.owner.name`. Not supported for windows. | +| `include_file_owner_group_name` | `false` | Whether to add the file group name as the attribute `log.file.owner.group.name`. Not supported for windows. | +| `include_file_record_number` | `false` | Whether to add the record number in the file as the attribute `log.file.record_number`. | +| `poll_interval` | 200ms | The [duration](#time-parameters) between filesystem polls. | | `fingerprint_size` | `1kb` | The number of bytes with which to identify a file. The first bytes in the file are used as the fingerprint. Decreasing this value at any point will cause existing fingerprints to forgotten, meaning that all files will be read from the beginning (one time) | -| `max_log_size` | `1MiB` | The maximum size of a log entry to read. A log entry will be truncated if it is larger than `max_log_size`. Protects against reading large amounts of data into memory. | -| `max_concurrent_files` | 1024 | The maximum number of log files from which logs will be read concurrently. If the number of files matched in the `include` pattern exceeds this number, then files will be processed in batches. | -| `max_batches` | 0 | Only applicable when files must be batched in order to respect `max_concurrent_files`. This value limits the number of batches that will be processed during a single poll interval. A value of 0 indicates no limit. | -| `delete_after_read` | `false` | If `true`, each log file will be read and then immediately deleted. Requires that the `filelog.allowFileDeletion` feature gate is enabled. Must be `false` when `start_at` is set to `end`. | -| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. | -| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. | -| `operators` | [] | An array of [operators](../../pkg/stanza/docs/operators/README.md#what-operators-are-available). See below for more details. | -| `storage` | none | The ID of a storage extension to be used to store file offsets. File offsets allow the receiver to pick up where it left off in the case of a collector restart. If no storage extension is used, the receiver will manage offsets in memory only. | -| `header` | nil | Specifies options for parsing header metadata. Requires that the `filelog.allowHeaderMetadataParsing` feature gate is enabled. See below for details. Must be `false` when `start_at` is set to `end`. | -| `header.pattern` | required for header metadata parsing | A regex that matches every header line. | -| `header.metadata_operators` | required for header metadata parsing | A list of operators used to parse metadata from the header. | -| `retry_on_failure.enabled` | `false` | If `true`, the receiver will pause reading a file and attempt to resend the current batch of logs if it encounters an error from downstream components. | -| `retry_on_failure.initial_interval` | `1s` | [Time](#time-parameters) to wait after the first failure before retrying. | -| `retry_on_failure.max_interval` | `30s` | Upper bound on retry backoff [interval](#time-parameters). Once this value is reached the delay between consecutive retries will remain constant at the specified value. | -| `retry_on_failure.max_elapsed_time` | `5m` | Maximum amount of [time](#time-parameters) (including retries) spent trying to send a logs batch to a downstream consumer. Once this value is reached, the data is discarded. Retrying never stops if set to `0`. -| `ordering_criteria.regex` | | Regular expression used for sorting, should contain a named capture groups that are to be used in `regex_key`. | -| `ordering_criteria.top_n` | 1 | The number of files to track when using file ordering. The top N files are tracked after applying the ordering criteria. | -| `ordering_criteria.sort_by.sort_type` | | Type of sorting to be performed (e.g., `numeric`, `alphabetical`, `timestamp`, `mtime`) | -| `ordering_criteria.sort_by.location` | | Relevant if `sort_type` is set to `timestamp`. Defines the location of the timestamp of the file. | -| `ordering_criteria.sort_by.format` | | Relevant if `sort_type` is set to `timestamp`. Defines the strptime format of the timestamp being sorted. | -| `ordering_criteria.sort_by.ascending` | | Sort direction | -| `compression` | | Indicate the compression format of input files. If set accordingly, files will be read using a reader that uncompresses the file before scanning its content. Options are `` or `gzip` | +| `max_log_size` | `1MiB` | The maximum size of a log entry to read. A log entry will be truncated if it is larger than `max_log_size`. Protects against reading large amounts of data into memory. | +| `max_concurrent_files` | 1024 | The maximum number of log files from which logs will be read concurrently. If the number of files matched in the `include` pattern exceeds this number, then files will be processed in batches. | +| `max_batches` | 0 | Only applicable when files must be batched in order to respect `max_concurrent_files`. This value limits the number of batches that will be processed during a single poll interval. A value of 0 indicates no limit. | +| `delete_after_read` | `false` | If `true`, each log file will be read and then immediately deleted. Requires that the `filelog.allowFileDeletion` feature gate is enabled. Must be `false` when `start_at` is set to `end`. | +| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. | +| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. | +| `operators` | [] | An array of [operators](../../pkg/stanza/docs/operators/README.md#what-operators-are-available). See below for more details. | +| `storage` | none | The ID of a storage extension to be used to store file offsets. File offsets allow the receiver to pick up where it left off in the case of a collector restart. If no storage extension is used, the receiver will manage offsets in memory only. | +| `header` | nil | Specifies options for parsing header metadata. Requires that the `filelog.allowHeaderMetadataParsing` feature gate is enabled. See below for details. Must be `false` when `start_at` is set to `end`. | +| `header.pattern` | required for header metadata parsing | A regex that matches every header line. | +| `header.metadata_operators` | required for header metadata parsing | A list of operators used to parse metadata from the header. | +| `retry_on_failure.enabled` | `false` | If `true`, the receiver will pause reading a file and attempt to resend the current batch of logs if it encounters an error from downstream components. | +| `retry_on_failure.initial_interval` | `1s` | [Time](#time-parameters) to wait after the first failure before retrying. | +| `retry_on_failure.max_interval` | `30s` | Upper bound on retry backoff [interval](#time-parameters). Once this value is reached the delay between consecutive retries will remain constant at the specified value. | +| `retry_on_failure.max_elapsed_time` | `5m` | Maximum amount of [time](#time-parameters) (including retries) spent trying to send a logs batch to a downstream consumer. Once this value is reached, the data is discarded. Retrying never stops if set to `0`. +| `ordering_criteria.regex` | | Regular expression used for sorting, should contain a named capture groups that are to be used in `regex_key`. | +| `ordering_criteria.top_n` | 1 | The number of files to track when using file ordering. The top N files are tracked after applying the ordering criteria. | +| `ordering_criteria.sort_by.sort_type` | | Type of sorting to be performed (e.g., `numeric`, `alphabetical`, `timestamp`, `mtime`) | +| `ordering_criteria.sort_by.location` | | Relevant if `sort_type` is set to `timestamp`. Defines the location of the timestamp of the file. | +| `ordering_criteria.sort_by.format` | | Relevant if `sort_type` is set to `timestamp`. Defines the strptime format of the timestamp being sorted. | +| `ordering_criteria.sort_by.ascending` | | Sort direction | +| `compression` | | Indicate the compression format of input files. If set accordingly, files will be read using a reader that uncompresses the file before scanning its content. Options are `` or `gzip` | Note that _by default_, no logs will be read from a file that is not actively being written to because `start_at` defaults to `end`.