Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS 1.29 Windows node - Fluent Bit error #9409

Open
Chandramouli15 opened this issue Sep 23, 2024 · 1 comment
Open

EKS 1.29 Windows node - Fluent Bit error #9409

Chandramouli15 opened this issue Sep 23, 2024 · 1 comment

Comments

@Chandramouli15
Copy link

Chandramouli15 commented Sep 23, 2024

Intermittent error getting in fluentbit, restarting the ds will temporarily resolve the error. But After 10-20 days same error is coming.

During this error time application logs are not streaming to cloudwatch log group.

We're using AWS managed amazon-cloudwatch-observability addon, version v1.6.0-eksbuild.1.

Fluentbit pod Error :

[C:\build\fluent-bit\lib\chunkio\src\cio_memfs.c:50 errno=12] Not enough space
[2024/07/25 12:15:10] [error] [input chunk] could not create chunk file: tail.1:6716-1721909710.306189500.flb
[2024/07/25 12:15:10] [error] [input chunk] no available chunk
[C:\build\fluent-bit\lib\chunkio\src\cio_memfs.c:50 errno=12] Not enough space

Increased the buffer limit to 2.5 GB , but same error.

apiVersion: v1
data:
application-log.conf: |
[INPUT]
Name tail
Tag application.*
Exclude_Path C:\var\log\containers\fluent-bit*, C:\var\log\containers\cloudwatch-agent*
Path C:\var\log\containers\*.log
Parser docker
DB C:\var\fluent-bit\state\flb_container.db
Mem_Buf_Limit 2500MB
Skip_Long_Lines On
Rotate_Wait 30
Refresh_Interval 10
Read_from_Head ${READ_FROM_HEAD}

[INPUT]
    Name                tail
    Tag                 application.*
    Path                C:\\var\\log\\containers\\fluent-bit*
    Parser              docker
    DB                  C:\\var\\fluent-bit\\state\\flb_log.db
    Mem_Buf_Limit       2500MB
    Skip_Long_Lines     On
    Rotate_Wait         30
    Refresh_Interval    10
    Read_from_Head      ${READ_FROM_HEAD}

[INPUT]
    Name                tail
    Tag                 application.*
    Path                C:\\var\\log\\containers\\cloudwatch-agent*
    Parser              docker
    DB                  C:\\var\\fluent-bit\\state\\flb_cwagent.db
    Mem_Buf_Limit       2500MB
    Skip_Long_Lines     On
    Rotate_Wait         30
    Refresh_Interval    10
    Read_from_Head      ${READ_FROM_HEAD}

[OUTPUT]
    Name                cloudwatch_logs
    Match               application.*
    region              ${AWS_REGION}
    log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
    log_stream_prefix   ${HOST_NAME}-
    auto_create_group   true
    extra_user_agent    container-insights

dataplane-log.conf: |
[INPUT]
Name tail
Tag dataplane.tail.*
Path C:\ProgramData\containerd\root\.log, C:\ProgramData\Amazon\EKS\logs\.log
Parser dataplane_firstline
DB C:\var\fluent-bit\state\flb_dataplane_tail.db
Mem_Buf_Limit 2500MB
Skip_Long_Lines On
Rotate_Wait 30
Refresh_Interval 10
Read_from_Head ${READ_FROM_HEAD}

[INPUT]
    Name                tail
    Tag                 dataplane.tail.C.ProgramData.Amazon.EKS.logs.vpc-bridge
    Path                C:\\ProgramData\\Amazon\\EKS\\logs\\*.log.*
    Path_Key            file_name
    Parser              dataplane_firstline
    DB                  C:\\var\\fluent-bit\\state\\flb_dataplane_cni_tail.db
    Mem_Buf_Limit       2500MB
    Skip_Long_Lines     On
    Rotate_Wait         30
    Refresh_Interval    10
    Read_from_Head      ${READ_FROM_HEAD}

[FILTER]
    Name                aws
    Match               dataplane.*
    imds_version        v2

[OUTPUT]
    Name                cloudwatch_logs
    Match               dataplane.*
    region              ${AWS_REGION}
    log_group_name      /aws/containerinsights/${CLUSTER_NAME}/dataplane
    log_stream_prefix   ${HOST_NAME}-
    auto_create_group   true
    extra_user_agent    container-insights

fluent-bit.conf: |
[SERVICE]
Flush 5
Log_Level error
Daemon off
net.dns.resolver LEGACY
Parsers_File parsers.conf

@INCLUDE application-log.conf
@INCLUDE dataplane-log.conf
@INCLUDE host-log.conf

host-log.conf: |
[INPUT]
Name winlog
Channels EKS, System
DB C:\var\fluent-bit\state\flb_system_winlog.db
Interval_Sec 60

[FILTER]
    Name                aws
    Match               winlog.*
    imds_version        v2

[OUTPUT]
    Name                cloudwatch_logs
    Match               winlog.*
    region              ${AWS_REGION}
    log_group_name      /aws/containerinsights/${CLUSTER_NAME}/host
    log_stream_prefix   ${HOST_NAME}.
    auto_create_group   true
    extra_user_agent    container-insights

parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %b %d %H:%M:%S

[PARSER]
    Name                container_firstline
    Format              regex
    Regex               (?<log>(?<="log":")\S(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
    Time_Key            time
    Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

[PARSER]
    Name                dataplane_firstline
    Format              regex
    Regex               (?<log>(?<="log":")\S(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
    Time_Key            time
    Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

kind: ConfigMap

Note:Only modified the buffer limit in default config map.

@patrick-stephens
Copy link
Contributor

Please provide full details from the template, i.e. what version of Fluent Bit and if not latest then try that?
Could you sort out formatting of the text as well, it's a little confusing?

What are the metrics like on input vs output rates? Is there a spike when this happens or is it slowing filling up, etc.?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants