[Metricbeat] Exponential backoff for http timeout in elasticsearch module

At the moment elasticsearch stats methods used by metricbeat's elasticsearch module don't have any internal timeouts, which means that elasticsearch will try to perform the request until it gets responses from all nodes or unresponsive nodes die. We have recently observed some cases (elastic/elasticsearch#50241   for example) where a data node in a small cluster was responding very very slowly but didn't disconnect from the cluster. Meanwhile metricbeat was sending requests to elasticsearch every 10 seconds with 10 seconds response timeout (default settings). Basically, we were adding 6 in-flight requests per minute. This caused an eventual accumulation of in-flight stats requests on the master node that cause it to crash with OOM error. We are addressing this issue on the elasticsearch side https://github.com/elastic/elasticsearch/issues/55550 but I was hoping we can improve metricbeat's behavior as well by introducing an exponential backoff for the timeout value. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metricbeat] Exponential backoff for http timeout in elasticsearch module #17948

imotov
openedon Apr 23, 2020

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Metricbeat] Exponential backoff for http timeout in elasticsearch module #17948

Description

imotovopenedon Apr 23, 2020

Metadata