Description
original issue: elastic/beats#33258
long story short: we run auditbeat
as DaemonSet on GKE clusters with slightly different versions, some nodes run docker
, other nodes run containerd
.
it runs with all permissions it needs, journald
already unregistered by an initContainer so auditbeat
can get audit events.
Problem is that some random auditbeat
pods keep outputting this error until we restart them:
ERROR: get status request failed:failed to get audit status reply: no reply received
and if we restart a totally fine auditbeat
pod, it might start outputting that error too.
it doesn't however stop writing audit logs to elasticsearch. we get audit logs from the pods that are outputting the error as much as the other pods.
I traced down the error to this block of code:
Lines 496 to 498 in 6fba496
Wouldn't it be okay if msgs
was empty? At this point we already got through this without any error:
Lines 480 to 494 in 6fba496
and func (c *NetlinkClient) Receive()
already got the appropriate error checks here:
Lines 152 to 190 in 6fba496
Shouldn't len(msgs) == 0
be reported as a warning instead of an error?