Skip to content

Commit

Permalink
[receiver/filelog] Add docs for offset tracking (#30914)
Browse files Browse the repository at this point in the history
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

This PR adds documentation notes on how to achieve fault tolerance on
`filelog`'s receiver offset tracking.
The need for this is obvious but was also explained at
#20552 (comment).

**Link to tracking Issue:** <Issue number if applicable>

**Testing:** <Describe what testing was performed and which tests were
added.>

**Documentation:** <Describe the documentation added.>

---------

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
  • Loading branch information
ChrsMark authored Feb 1, 2024
1 parent e0a18f0 commit 7319c8f
Show file tree
Hide file tree
Showing 4 changed files with 102 additions and 1 deletion.
9 changes: 9 additions & 0 deletions examples/fault-tolerant-logs-collection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## Fault tolerant log collection example

Filelog receiver's persistence can be covered by the usage of the following extensions:
- [filestorage](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/storage/filestorage) extension,
to ensure that Collector's restarts do not affect the log collection and offset tracking.
- [exporterhelper persistent-queue](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md#persistent-queue),
to ensure that Collector's restarts do not affect the delivery of the already collected logs.

A full configuration example is provided in [example config](./otel-col-config.yaml)
24 changes: 24 additions & 0 deletions examples/fault-tolerant-logs-collection/otel-col-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
receivers:
filelog:
include: [/var/log/busybox/simple.log]
storage: file_storage/filelogreceiver

extensions:
file_storage/filelogreceiver:
directory: /var/lib/otelcol/file_storage/receiver
file_storage/otlpoutput:
directory: /var/lib/otelcol/file_storage/output

service:
extensions: [file_storage/filelogreceiver, file_storage/otlpoutput]
pipelines:
logs:
receivers: [filelog]
exporters: [otlp/custom]
processors: []

exporters:
otlp/custom:
endpoint: http://0.0.0.0:4242
sending_queue:
storage: file_storage/otlpoutput
63 changes: 63 additions & 0 deletions examples/my-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
receivers:
filelog:
include:
#- /var/log/busybox/*.log
- /var/lib/docker/containers/cf79f880f414937e7befa0e4d2770590a19d83058b4f5df0e1cd22d819c836b3/cf79f880f414937e7befa0e4d2770590a19d83058b4f5df0e1cd22d819c836b3-json.log
#storage: file_storage/filelogreceiver
#start_at: beginning
operators:
- id: get-format
routes:
- expr: body matches "^\\{"
output: parser-docker
type: router
- id: parser-docker
timestamp:
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
parse_from: attributes.time
type: json_parser
- from: attributes.log
to: body
type: move

processors:
transform:
error_mode: ignore
log_statements:
- context: log
statements:
# Parse body as JSON and merge the resulting map with the cache map, ignoring non-json bodies.
# cache is a field exposed by OTTL that is a temporary storage place for complex operations.
- merge_maps(cache, ParseJSON(body), "upsert") where IsMatch(body, "^\\{")

# Set attributes using the values merged into cache.
# If the attribute doesn't exist in cache then nothing happens.
- set(attributes["message"], cache["message"])
- set(attributes["severity"], cache["log.level"])
- merge_maps(attributes, cache, "upsert")

extensions:
file_storage/filelogreceiver:
directory: /home/chrismark/otelcol/file_storage/freceiver
file_storage/otcouput:
directory: /home/chrismark/otelcol/file_storage/output

service:
extensions: [file_storage/filelogreceiver, file_storage/otcouput]
pipelines:
logs:
receivers: [filelog]
exporters: [otlp/elastic]
processors: [transform]
# telemetry:
# logs:
# level: "debug"

exporters:
otlp/elastic:
endpoint: http://0.0.0.0:8200
sending_queue:
storage: file_storage/otcouput
tls:
insecure: true
insecure_skip_verify: true
7 changes: 6 additions & 1 deletion receiver/filelogreceiver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Tails and parses logs from files.
| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. |
| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. |
| `operators` | [] | An array of [operators](../../pkg/stanza/docs/operators/README.md#what-operators-are-available). See below for more details. |
| `storage` | none | The ID of a storage extension to be used to store file checkpoints. File checkpoints allow the receiver to pick up where it left off in the case of a collector restart. If no storage extension is used, the receiver will manage checkpoints in memory only. |
| `storage` | none | The ID of a storage extension to be used to store file offsets. File offsets allow the receiver to pick up where it left off in the case of a collector restart. If no storage extension is used, the receiver will manage offsets in memory only. |
| `header` | nil | Specifies options for parsing header metadata. Requires that the `filelog.allowHeaderMetadataParsing` feature gate is enabled. See below for details. Must be `false` when `start_at` is set to `end`. |
| `header.pattern` | required for header metadata parsing | A regex that matches every header line. |
| `header.metadata_operators` | required for header metadata parsing | A list of operators used to parse metadata from the header. |
Expand Down Expand Up @@ -153,4 +153,9 @@ The above configuration will read logs from the "simple.log" file. Some examples
2023-06-20 12:50:00 DEBUG This is a test debug message
```

## Offset tracking

`storage` setting allows to define the proper storage extension to be used for storing file offsets.
While the storage parameter can ensure that log files are consumed accurately, it is possible that
logs are dropped while moving downstream through other components in the collector.
For additional resiliency, see [Fault tolerant log collection example](../../examples/fault-tolerant-logs-collection/README.md)

0 comments on commit 7319c8f

Please sign in to comment.