Description
Fluentd version: 0.14.25
Environment: running inside a debian:stretch-20180312 based container.
Dockerfile: here
We noticed a slow memory leak that built up over a month or so.
The same setup that ran with Fluentd 0.12.41 have stable memory usage over the same period of time.
Still investigating and trying to narrow down versions. But wanna create a ticket to track this.
Config:
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/k8s-gcp-containers.log.pos
tag reform.*
read_from_head true
format multi_format
<pattern>
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</pattern>
<pattern>
format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
time_format %Y-%m-%dT%H:%M:%S.%N%:z
</pattern>
</source>
<filter reform.**>
@type parser
format /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<log>.*)/
reserve_data true
suppress_parse_error_log true
emit_invalid_record_to_error false
key_name log
</filter>
<match reform.**>
@type record_reformer
enable_ruby true
<record>
# Extract local_resource_id from tag for 'k8s_container' monitored
# resource. The format is:
# 'k8s_container.<namespace_name>.<pod_name>.<container_name>'.
"logging.googleapis.com/local_resource_id" ${"k8s_container.#{tag_suffix[4].rpartition('.')[0].split('_')[1]}.#{tag_suffix[4].rpartition('.')[0].split('_')[0]}.#{tag_suffix[4].rpartition('.')[0].split('_')[2].rpartition('-')[0]}"}
# Rename the field 'log' to a more generic field 'message'. This way the
# fluent-plugin-google-cloud knows to flatten the field as textPayload
# instead of jsonPayload after extracting 'time', 'severity' and
# 'stream' from the record.
message ${record['log']}
</record>
tag ${if record['stream'] == 'stderr' then 'stderr' else 'stdout' end}
remove_keys stream,log
</match>
<match fluent.**>
@type null
</match>
# This section is exclusive for k8s_container logs. These logs come with
# 'stderr'/'stdout' tags.
# We use a separate output stanza for 'k8s_node' logs with a smaller buffer
# because node logs are less important than user's container logs.
<match {stderr,stdout}>
@type google_cloud
# Try to detect JSON formatted log entries.
detect_json true
# Collect metrics in Prometheus registry about plugin activity.
enable_monitoring true
monitoring_type prometheus
# Allow log entries from multiple containers to be sent in the same request.
split_logs_by_tag false
# Set the buffer type to file to improve the reliability and reduce the memory consumption
buffer_type file
buffer_path /var/log/k8s-fluentd-buffers/kubernetes.containers.buffer
# Set queue_full action to block because we want to pause gracefully
# in case of the off-the-limits load instead of throwing an exception
buffer_queue_full_action block
# Set the chunk limit conservatively to avoid exceeding the recommended
# chunk size of 5MB per write request.
buffer_chunk_limit 1M
# Cap the combined memory usage of this buffer and the one below to
# 1MiB/chunk * (6 + 2) chunks = 8 MiB
buffer_queue_limit 6
# Never wait more than 5 seconds before flushing logs in the non-error case.
flush_interval 5s
# Never wait longer than 30 seconds between retries.
max_retry_wait 30
# Disable the limit on the number of retries (retry forever).
disable_retry_limit
# Use multiple threads for processing.
num_threads 2
use_grpc false
# Use Metadata Agent to get monitored resource.
enable_metadata_agent true
</match>