[RFC] Tracking Search latency and stats at Coordinator node level

Co-Author: @jainankitk 

### Problem Statement
As of today, we track search request latencies on a shard level via node stats. After every query/fetch phase is completed on a shard, we note down the time taken for each, keep accumulating those values and maintain an overall average value which is tracked under stats.

But we don’t have a mechanism to track search latencies around coordinator node. Coordinator node plays an important role in fanning out requests to individual shard/data-nodes, aggregating those responses and eventually sending response back to the client. We have seen multiple issues in the past where it becomes hard/impossible to reason latency related issues because of lack of insights into coordinator level stats and we ended up spending a lot of unnecessary time/bandwidth on figuring it out. Clients using search API only rely on overall took time(present as part of search response) which doesn’t offer much insights into time taken by different phases.

### Solution
We want to introduce below high level changes to give us more insights into coordinator level latencies.

1. **Per Request level tracking**: As part of this, we will offer further breakdown of existing took time in search response. To do this, we will introduce a new field(phase_took) in search response which will give more insights/visibility into overall time taken by different search phases(query/fetch/canMatch etc) to the clients.
```
{
    "took" : 29321,
    "phase_took" : { // new field
        "fetch" : 29253,
        "query" : 14,
    },
    "timed_out" : false,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
        "value" : 0,
        "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
    }
}
```
2. **Add coordinator node level stats**: As part of this, we will also track search stats from coordinator node perspective. Currently we track search shard stats under IndicesStats. We can utilize same stats and also track overall search request level stats ie (total time taken by coordinator + communication time between coordinator and data node + execution time at data node)

```
"indices" : {
        "search" : {
          "open_contexts" : 1,
          "query_total" : 117336056,
          "query_time_in_millis" : 2244945904,
          "query_current" : 1,
          "fetch_total" : 47060301,
          "fetch_time_in_millis" : 78532219,
          "fetch_current" : 0,
          "scroll_total" : 466,
          "scroll_time_in_millis" : 318294065,
          "scroll_current" : 0,
          "suggest_total" : 0,
          "suggest_time_in_millis" : 0,
          "suggest_current" : 0,
          "request": { // New field. Contains overall search requests stats from coordinator node perspective.
             "query_time_in_millis" : 5244945904,
             "query_current" : 1,
             "query_total" : 7336056,
             "fetch_time_in_millis": 178532219,
             "fetch_current" : 0,
              "fetch_total" : 7060301
          }
        }
```


3. **Slow logs at coordinator level**: As of now, we only have the capability to enable slow logs at a shard level for desired search phase(query and fetch). See [this](https://opensearch.org/docs/1.2/opensearch/logs/#slow-logs).  Setting this threshold is tricky when customer usually sees latency spikes at a request level. Plus shard level slow logs doesn't offer a holistic view.
So as part of this, we will also add capabilities to capture slow logs at a request level along with different search phases from coordinator node perspective.

We can plan to use new prefix **index.search.request** to achieve this. 

```
// Setting on a whole search request level
index.search.request.slowlog.threshold.overall.warn: 10s
index.search.request.slowlog.threshold.overall.info: 5s
index.search.request.slowlog.threshold.overall.debug: 2s
index.search.request.slowlog.threshold.overall.trace: 500ms

// Setting on query phase level
index.search.request.slowlog.threshold.query.warn: 10s
.....

// Setting on fetch phase level
index.search.request.slowlog.threshold.fetch.warn: 1s
.....
```

### Approach
We want to design this in a way so that it is simplistic, extensible and has negligible performance impact.

#### Listener based approach
We can utilize a listener based approach which is pretty similar the way current shard level stats and slow log works. Via this mechanism we can solve all the three problems mentioned above.

Here, before the start and end of every phase at a coordinator node level, we will invoke listener which will help us keep track of the took time of individual search phases(query/fetch/canMatch)

**High level Listener interface** 
```
// Listener to track start and end of phases at a coordinator node level
public interface SearchRequestOperationsListener {
    
    void onRequestStart(SearchPhaseContext context) {}
    void onRequestEnd(SearchPhaseContext context) {}
   
    // Query phase
    void onQueryPhaseStart(SearchPhaseContext context) {}
    void onQueryPhaseEnd(SearchPhaseContext phase, long tookTime) {}
    void onQueryPhaseFailure(SearchPhaseContext context) {}
    
    // Fetch phase
    void onFetchPhaseStart(SearchPhaseContext context) {}
    void onFetchPhaseEnd(SearchPhaseContext phase, long tookTime) {}
    void onFetchPhaseFailure(SearchPhaseContext context) {}
    
        
    // Gives a capability to register multiple listeners and invoke them accordingly.
    final class CompositeListener implements SearchRequestOperationsListener {
        private final List<SearchRequestOperationsListener> listeners;
        private final Logger logger;

        CompositeListener(List<SearchRequestOperationsListener> listeners, Logger logger) {
            this.listeners = listeners;
            this.logger = logger;
        }

       public void onQueryPhaseStart(SearchPhaseContext context) {
           for (SearchRequestOperationsListener listener : listeners) {
               try {
                   listener.onQueryPhaseStart(context);
               } catch (Exception e) {}
           }
       }

       public void onQueryPhaseEnd(SearchPhaseContext context, long tookTime) {
           for (SearchRequestOperationsListener listener : listeners) {
               try {
                   listener.onQueryPhaseEnd(context, tookTime);
               } catch (Exception e) {}
           }
       }
    }
}
```

Above listener will be invoked from [AbstractSearchAsyncAction](https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/action/search/AbstractSearchAsyncAction.java) which serves as an entry point to the search request landing at coordinator node. This action has a mechanism to keep track as to when any phase starts or ends. See [this](https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/action/search/AbstractSearchAsyncAction.java#L425) and [this](https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/action/search/AbstractSearchAsyncAction.java#L362).

Below components will subscribe to above listener to achieve our goal.

-  **[SearchTimeProvider](https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/action/search/TransportSearchAction.java#L242):** Currently keeps track of overall took time in search response. To offer breakdown of each phase took time, we will use same and track those via above listener.
- **SearchRequestStats**: A new class to keep track of coordinator level stats. We will encapsulate it within existing [SearchStats](https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/search/stats/SearchStats.java). SearchRequestStats will also subscribe to above listener events.
- **SearchRequestSlowLogs**: Same as above. This will also implement listener and via those events will provide the capability to capture slow logs at coordinator/request level.

![latency_listener_based](https://user-images.githubusercontent.com/99425694/235818156-0375bfb3-6829-4e2f-9cd8-b89b7137f14e.png)


### Scenarios to cover

- **Search type**: We will handle both queryAndThenFetch and DFS search types of part of this.
- **SearchPhase**: We will cover all possible search phases which exists as of today. Basically every search phase implements base class [SearchPhase](https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/action/search/SearchPhase.java). This includes:
    1. CanMatchPreFilterPhase - Only valid for QueryAndThenFetch type. 
    2. PreQueryPhase/DFSPhase - Only valid for DFS search type.
    3. QueryPhase - Only Valid for QueryAndThenFetch type
    4. DFSQueryPhase - Only valid for DFS type.
    5. FetchSearchPhase - Valid for both types.
    6. ExpandSearchPhase - Valid for both types. This is executed if search request has inner hits and needs field collapsing
 
- **MultiSearch** - Multi search response has individual search request responses. We will accordingly provide phase wise breakdown.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Tracking Search latency and stats at Coordinator node level #7334

Problem Statement

Solution

Approach

Listener based approach

Scenarios to cover

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development