Skip to content

Java minimal rest client hangs with default SniffOnFailure listener enabled #25701

Closed
@ssesha

Description

@ssesha

Elasticsearch version: 2.4.0

Elasticsearch java rest client version: 5.2.2

Plugins installed: []

JVM version (java -version): 1.8.0_131

OS version (uname -a if on a Unix-like system): Ubuntu 16.04

Description of the problem including expected versus actual behavior:
Default SniffonfailureListener on rest client blocks the HTTPAsyncClient reactor thread when request encounters a java.net.ConnectException
Steps to reproduce:

  1. Have two es nodes and let sniffer pick them up
  2. Shut down one node
  3. Client tries to connect to that node --> fails --> tries to sniff and hangs till maxRetryTimeoutMillis

The failed callback triggers the sniffer https://github.com/elastic/elasticsearch/blob/master/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L374

However, the failed callback is being handled by the reactor thread of the underlying HttpAsyncClient. Since, the sniffer does a blocking performRequest using the same client instance and the HttpClient can't handle the request because the reactor thread is blocked, its effectively a deadlock till the SyncResponselistener timeout of maxRetryTimeoutMillis and no requests can be served at all during this time period. 😰

I found a similar issue https://issues.apache.org/jira/browse/HTTPCLIENT-1805 where the suggestion is to avoid potentially blocking or long running operations in the callbacks and more so in the failed callback since it could block the reactor thread.

I guess the solution would be to trigger the retries as well as sniffer on a separate threadpool internal to the RestClient so that the HttpClient's dispatcher and reactor threads are freed up asap.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions