Skip to content

out_stackdriver: does not batch output records properly if passed a large chunk of records and can drop a majority of records #9374

Closed as not planned
@ryanohnemus

Description

@ryanohnemus

Bug Report

Describe the bug
If you set a tail input with a large Buffer_Chunk_Size and Buffer_Max_Size, the chunks that are created and passed to fluentbit are larger than a max 10485760 bytes they are rejected by cloud logging and dropped by the stackdriver output plugin with the following error:

 "error": {
    "code": 400,
    "message": "Request payload size exceeds the limit: 10485760 bytes.",
    "status": "INVALID_ARGUMENT"
    }

To Reproduce

  1. Use the following fluentbit input:
    [INPUT]
        name                     tail
        read_from_head    false
        skip_long_lines     on
        path                        /var/log/containers/*.log
        Tag                            kube.*
        buffer_chunk_size   5M
        buffer_max_size      10M
        Exclude_Path              /var/log/containers/*fluent*
        Refresh_Interval          1
        mem_buf_limit          50MB
        threaded                    on
        Skip_long_lines         on
  • Have a high volume logging container running on the same node as fluentbit.
  • Fluentbit tail input successfully reads all messages from the container (and can be verified by checking the prometheus metrics)
    • fluentbit_input_records_total{name="tail.0"} 125000002
  • out_stackdriver fails to create properly sized requests to cloud logging:

    fluentbit_stackdriver_proc_records_total{grpc_code="-1",status="400",name="stackdriver.0"} 12033778
    fluentbit_stackdriver_proc_records_total{grpc_code="0",status="200",name="stackdriver.0"} 466224
    

Most of the records here have been dropped by out_stackdriver plugin

  • you will also see the error messages above, in the log

(This can most likely happen in any situation where a fluentbit chunk is greater than 10485760, in fluentbit chunks can be up to 2MB

Expected behavior
out_stackdriver plugin should batch cloud logging payloads and not rely on the incoming chunk to be below the 10485760 bytes limit. I believe fluentbit chunks can be around 2MB based on https://docs.fluentbit.io/manual/v/1.8/administration/buffering-and-storage#chunks

Your Environment

  • Version used: fluentbit-3.1.5
  • Configuration:
  • Environment name and version (e.g. Kubernetes? What version?): GKE 1.29
  • Server type and version:
  • Operating System and version: ContainerOS
  • Filters and plugins:

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions