Description
With our current approach to phasing searches (query then fetch) we assume that shards servicing a query will respond within similar timeframes. We keep Lucene reader contexts open until all shards have responded to the query phase so we can then fetch the final top doc IDs.
If there is a big gap in response times between the start of the query and the final set of top matches being returned from the query phase we are hanging on to resources for longer than we'd like. Frozen indices represent such a scenario where "hot" nodes undergoing constant indexing and search activity may now be asked to hang on to several reader contexts for longer while slower frozen indices hit by the same query lag in producing their responses.
One such scenario may be a phone company that keeps a "hot" window of 2 or 3 days' call data but has an investigation team who occasionally need to search for a phone number's calls going back a few years across frozen indices and are prepared to wait for the response.
In this case the long-running searches will impact the number of points-in-time that need to be kept open on the hot nodes. Of course the applications could use top-hits
aggs instead of regular hits
to make searches finish in the query phase. This would be a work-around but the concern raised here is that the default behaviour may be a little "trappy".
Thoughts? cc @jpountz