Description
Today (5f6321a) when constructing an InternalEngine
we perform some potentially expensive operations, including:
at org.elasticsearch.index.IndexWarmer.warm(IndexWarmer.java:81)
at org.elasticsearch.index.IndexService.lambda$createShard$4(IndexService.java:402)
at org.elasticsearch.index.engine.InternalEngine$SearchFactory.newSearcher(InternalEngine.java:2244)
at org.apache.lucene.search.SearcherManager.getSearcher(SearcherManager.java:198)
at org.elasticsearch.index.engine.InternalEngine$ExternalSearcherManager.<init>(InternalEngine.java:326)
at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:598)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:238)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:184)
and
at org.elasticsearch.index.engine.InternalEngine.restoreVersionMapAndCheckpointTracker(InternalEngine.java:2820)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:257)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:184)
We create the Engine
under IndexShard#mutex
:
elasticsearch/server/src/main/java/org/elasticsearch/index/shard/IndexShard.java
Lines 1448 to 1458 in 0cfc9ff
This can block cluster state updates, because IndexShard#updateShardState
requires the same mutex:
We should survey the things we do during the startup of all of the engines and make sure that none of them will block for too long.
Relates https://discuss.elastic.co/t/187604 in which the engine takes multiple minutes to start up, because it's loading global ordinals, and this blocks a cluster state update for long enough that the node is removed from the cluster.