Skip to content

FluentBit is hanging and stops collecting logs after a hot reload #10518

@quanghungb

Description

@quanghungb

Bug Report

Describe the bug
FluentBit is hanging and stops collecting logs after a hot reload that has been triggered.

Context
We observed 24 pods (of 46 pods of a DS) are presenting the same pattern logs

[2025/06/25 20:13:32] [engine] caught signal (SIGHUP)
[2025/06/25 20:13:32] [ info] reloading instance pid=1 tid=0x7f37941dae40
[2025/06/25 20:13:32] [ info] [reload] stop everything of the old context
[2025/06/25 20:13:32] [ warn] [engine] service will shutdown when all remaining tasks are flushed
[2025/06/25 20:13:32] [ info] [reload] start everything

The behaviour is very similar to what is reported here:

FluentBit is hang, consumes almost no resources (CPU, memory) and no logs are collected.

It seems spending time on sleep

cat /proc/1548315/stack
[<0>] hrtimer_nanosleep+0x95/0x120
[<0>] common_nsleep+0x40/0x50
[<0>] __x64_sys_clock_nanosleep+0xc7/0x130
[<0>] do_syscall_64+0x35/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x6c/0xd6

strace: Process 1548315 attached
restart_syscall(<... resuming interrupted clock_nanosleep ...>) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0

Expected behavior
The reload should be done and logs to be collected continously

Your Environment

  • Version used: 3.2.4
  • Deployed by FluentBit Helm Chart fluent-bit-0.47.10 on Kubernetes v1.2.6

Additional context
We trigger the hot reload when the secret containing the certificate used by FluentBit kafka input (mtls required) has been updated.
It impacts 24 pods of 46. So huge impact on our logs collection pipelines :(

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions