PrometheusDuplicateTimestamps errors with log_to_metrics filter starting in fluent-bit 3.1.5 #9413

reneeckstein · 2024-09-23T17:29:15Z

Bug Report

Describe the bug
After upgrading from fluent-bit 3.1.4 to 3.1.5 all our k8s clusters start reporting PrometheusDuplicateTimestamps errors
Prometheus metric rate(prometheus_target_scrapes_sample_duplicate_timestamp_total}[5m]) > 0 is increasing.
Prometheus is logging a lot of warnings like this:

ts=2024-09-23T16:22:32.820Z caller=scrape.go:1754 level=warn component="scrape manager" scrape_pool=serviceMonitor/platform-logging/fluent-bit/1 target=http://10.67.3.197:2021/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=32
ts=2024-09-23T16:22:38.237Z caller=scrape.go:1754 level=warn component="scrape manager" scrape_pool=serviceMonitor/platform-logging/fluent-bit/1 target=http://10.67.4.81:2021/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=1876
ts=2024-09-23T16:22:39.697Z caller=scrape.go:1754 level=warn component="scrape manager" scrape_pool=serviceMonitor/platform-logging/fluent-bit/1 target=http://10.67.13.208:2021/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=4
ts=2024-09-23T16:22:41.643Z caller=scrape.go:1754 level=warn component="scrape manager" scrape_pool=serviceMonitor/platform-logging/fluent-bit/1 target=http://10.67.3.110:2021/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=7

To Reproduce

Steps to reproduce the problem:
- Deploy fluent-bit with the below tail input config as a daemonset into a k8s cluster using version 3.1.4 to see container logs and metrics to validate success.
- Update fluent-bit image to 3.1.5 (or newer, <= 3.1.8) and verify /metrics endpoint on port 2021

Expected behavior
No duplicate metrics on the additional endpoint /metrics for log_to_metrics feature usually on port 2021, no warnings in Prometheus logs, no PrometheusDuplicateTimestamps errors.

Screenshots

Your Environment

Version used: 3.1.5 (or higher) tested until 3.1.8, issue is still present.
Configuration: (Helm chart values)

serviceMonitor:
  enabled: true
  interval: 10s
  scrapeTimeout: 10s
  additionalEndpoints:
  - port: log-metrics
    path: /metrics
    interval: 10s
    scrapeTimeout: 10s

extraPorts:
  - port: 2021
    containerPort: 2021
    protocol: TCP
    name: log-metrics

config:
  service: |
    [SERVICE]
        Flush 1
        Daemon Off
        Log_Level info
        Parsers_File parsers.conf
        Parsers_File custom_parsers.conf
        HTTP_Server On
        HTTP_Listen 0.0.0.0
        HTTP_Port {{ .Values.service.port }}

  inputs: |
    [INPUT]
        Name tail
        Tag kube.*
        Alias tail_container_logs
        Path /var/log/containers/*.log
        multiline.parser docker, cri
        DB /var/log/flb_kube.db
        DB.locking true
        Mem_Buf_Limit 32MB
        Skip_Long_Lines On

  filters: |
    [FILTER]
        Name kubernetes
        Alias kubernetes_all
        Match kube.*
        Merge_Log On
        Keep_Log Off
        K8S-Logging.Parser On
        K8S-Logging.Exclude On
        Annotations Off
        Buffer_Size 1MB
        Use_Kubelet true

    [FILTER]
        name               log_to_metrics
        match              kube.*
        tag                log_counter_metric
        metric_mode        counter
        metric_name        kubernetes_messages
        metric_description This metric counts Kubernetes messages
        kubernetes_mode    true

  outputs: |
    [OUTPUT]
        name               prometheus_exporter
        match              log_counter_metric
        host               0.0.0.0
        port               2021

Environment name and version (e.g. Kubernetes? What version?):
- EKS; Kubernetes 1.30
Server type and version:
Operating System and version:
- EKS on Bottlerocket OS 1.22.0 (aws-k8s-1.30) Kernel version 6.1.106 containerd://1.7.20+bottlerocket
Filters and plugins:
- kubernetes, log_to_metrics

Additional context
It is just very annoying when every k8s cluster with this common configuration reports PrometheusDuplicateTimestamps errors

The text was updated successfully, but these errors were encountered:

edsiper · 2024-09-26T20:47:11Z

@reneeckstein are you facing the same issue with v3.1.8 ? (we have some fixes in place for a similar problem)

reneeckstein · 2024-09-26T20:50:49Z

@edsiper Yes we are facing the same issue in fluent-bit v3.1.8. I'm looking forward for v3.1.9 I noticed to metrics-related commits on master branch.

reneeckstein added the status: waiting-for-triage label Sep 23, 2024

edsiper removed the status: waiting-for-triage label Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PrometheusDuplicateTimestamps errors with log_to_metrics filter starting in fluent-bit 3.1.5 #9413

PrometheusDuplicateTimestamps errors with log_to_metrics filter starting in fluent-bit 3.1.5 #9413

reneeckstein commented Sep 23, 2024

edsiper commented Sep 26, 2024

reneeckstein commented Sep 26, 2024

PrometheusDuplicateTimestamps errors with log_to_metrics filter starting in fluent-bit 3.1.5 #9413

PrometheusDuplicateTimestamps errors with log_to_metrics filter starting in fluent-bit 3.1.5 #9413

Comments

reneeckstein commented Sep 23, 2024

Bug Report

edsiper commented Sep 26, 2024

reneeckstein commented Sep 26, 2024