Very large scroll search (i.e. reindex) can gradually slow down 

Since 7.7 (via this [PR](https://github.com/elastic/elasticsearch/pull/52822)) added better ability to cancel a search request. However, this resulted in [adding](https://github.com/elastic/elasticsearch/pull/52822/files#diff-9f5585442c620397550df0ceb9faf151e8042e3c7444254ebf562153073c1d3fR269) a method to cancel a task to a collection on the context searcher. That collection is [checked very frequently](https://github.com/elastic/elasticsearch/blob/v7.7.0/server/src/main/java/org/elasticsearch/search/internal/ContextIndexSearcher.java#L355) and the count of that collection can grow unbounded. The memory footprint is not an issue, rather the number of iterations for very long running scroll searches, such as used by re-index. In testing this started to show an issue around 50m documents and kept increasing the search latency as time went on.  

Below is a test run of 180m documents being re-index that show the increase in the search latency and decrease in the search rate. 

(7.9.1)
![image](https://user-images.githubusercontent.com/976291/100934058-cf801480-34b3-11eb-9056-93d93d47382a.png)

Hot threads will look similar to:
```
  2.9% (29.3ms out of 1s) cpu usage by thread 'elasticsearch[node1][search][T#93]'
     2/10 snapshots sharing following 20 elements
       app//org.elasticsearch.search.internal.ContextIndexSearcher$MutableQueryTimeout.checkCancelled(ContextIndexSearcher.java:357)
       app//org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:196)
       app//org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:185)
       app//org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
       app//org.elasticsearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:343)
       app//org.elasticsearch.search.query.QueryPhase.executeInternal(QueryPhase.java:298)
       app//org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:150)
       app//org.elasticsearch.search.SearchService.lambda$executeQueryPhase$1(SearchService.java:485)
       app//org.elasticsearch.search.SearchService$$Lambda$5754/0x0000000801a8b040.get(Unknown Source)
       app//org.elasticsearch.search.SearchService$$Lambda$5270/0x0000000801a2d040.get(Unknown Source)
       app//org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58)
       app//org.elasticsearch.action.ActionRunnable$$Lambda$5092/0x00000008019a2840.accept(Unknown Source)
       app//org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       app//org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:710)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.base@14.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
       java.base@14.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
       java.base@14.0.1/java.lang.Thread.run(Thread.java:832)
```

This issue is fixed as 7.10.0 due to https://github.com/elastic/elasticsearch/pull/61062 and https://github.com/elastic/elasticsearch/issues/46523 which will now re-create the searcher on each phase even for scroll requests. Which means that this collection will grow unbounded anymore.  The same test above was run on 7.10.0 and did not show any signs of performance degradation. 

For 7.7 -> 7.9.x there is an easy work around to for this issue:
```
PUT _cluster/settings
{
  "persistent": {
    "search.low_level_cancellation" : false
  }
}
```
Which will will prevent that collection from even being used. (also tested to fix the issue). 






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Very large scroll search (i.e. reindex) can gradually slow down #65780

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Very large scroll search (i.e. reindex) can gradually slow down #65780

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions