Description
Elasticsearch version (bin/elasticsearch --5.6.4
):
Plugins installed: []
JVM version (1.8
):
OS version (Ubuntu):
Description of the problem including expected versus actual behavior:
SnifferBuilder.setSniffAfterFailureDelayMillis is not honored and not called based on the supplied interval. Instead the interval sniffer is using is SnifferBuilder.setSniffIntervalMillis
Steps to reproduce:
Use following code which was taken from elastic doc :
SniffOnFailureListener sniffOnFailureListener = new SniffOnFailureListener();
RestClient restClient = RestClient.builder(new HttpHost("localhost", 9200))
.setFailureListener(sniffOnFailureListener)
.build();
Sniffer sniffer = Sniffer.builder(restClient)
.setSniffAfterFailureDelayMillis(30000)
.build();
sniffOnFailureListener.setSniffer(sniffer);
I think the reason of the bug is in method Sniffer.Task.sniff()
.
The first call to Sniffer.Task.sniff()
is done for setSniffIntervalMillis
timeout which is blocked in hostsSniffer.sniffHosts()
.
During that time, for each failed node, Sniffer.Task.sniff()
is called with setSniffAfterFailureDelayMillis
timeout. Since the first call is still blocking in Sniffer.Task.sniff()
it cannot call scheduleNextRun
with setSniffAfterFailureDelayMillis
because if (running.compareAndSet(false, true))
condition prevents it from. When all failed nodes are sniffed, hostsSniffer.sniffHosts()
returns and set the time again to setSniffIntervalMillis
.
I think the in the Exception statement, you should set nextSniffDelayMillis
to be equal to sniffAfterFailureDelayMillis
and in case of success set nextSniffDelayMillis
to setSniffIntervalMillis
.