FluentBit is hanging and stops collecting logs after a hot reload

## Bug Report

**Describe the bug**
FluentBit is hanging and stops collecting logs after a hot reload that has been triggered. 

**Context**
We observed 24 pods (of 46 pods of a DS) are presenting the same pattern logs 

> [2025/06/25 20:13:32] [engine] caught signal (SIGHUP)
[2025/06/25 20:13:32] [ info] reloading instance pid=1 tid=0x7f37941dae40
[2025/06/25 20:13:32] [ info] [reload] stop everything of the old context
[2025/06/25 20:13:32] [ warn] [engine] service will shutdown when all remaining tasks are flushed
[2025/06/25 20:13:32] [ info] [reload] start everything

The behaviour is very similar to what is reported here:
-  https://github.com/fluent/fluent-bit/issues/9927
- https://github.com/fluent/fluent-bit/issues/9354 (but we don't have the info pause logs and it stucks after the "[ info] [reload] start everything"

FluentBit is hang, consumes almost no resources (CPU, memory) and no logs are collected. 

It seems spending time on sleep 

>cat /proc/1548315/stack 
>[<0>] hrtimer_nanosleep+0x95/0x120
>[<0>] common_nsleep+0x40/0x50
>[<0>] __x64_sys_clock_nanosleep+0xc7/0x130
>[<0>] do_syscall_64+0x35/0x80
>[<0>] entry_SYSCALL_64_after_hwframe+0x6c/0xd6

>strace: Process 1548315 attached
restart_syscall(<... resuming interrupted clock_nanosleep ...>) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffe5b0605c0) = 0

**Expected behavior**
The reload should be done and logs to be collected continously



**Your Environment**
* Version used: 3.2.4
* Deployed by FluentBit Helm Chart fluent-bit-0.47.10  on Kubernetes v1.2.6


**Additional context**
We trigger the hot reload when the secret containing the certificate used by FluentBit kafka input (mtls required) has been updated. 
It impacts 24 pods of 46. So huge impact on our logs collection pipelines :( 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FluentBit is hanging and stops collecting logs after a hot reload #10518

Bug Report

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FluentBit is hanging and stops collecting logs after a hot reload #10518

Description

Bug Report

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions