Open
Description
We're experiencing a critical issue where Fluent Bit pods are running and healthy according to Kubernetes, but they stop forwarding logs to Fluentd, causing significant log loss. This behavior occurs intermittently and is difficult to detect without manual inspection.
Fluent Bit pods appear healthy (Running status, no restarts), Logs are no longer being forwarded to Fluentd.
So no logs are being ingested or processed to Elasticsearch
Upon restart of Fluent Bit pods, logging resumes and we loss all the logs before restarting
we are using logging operator version 5.2.0
and our Pods running on OKE
Is this a known issue, and what are the recommended mitigation strategies?
can additional metrics or alerts be exposed to catch this failure?