Description
Is your feature request related to a problem? Please describe.
As of today, we track search request latencies on a shard level via node stats. After every query/fetch phase is completed on a shard, we note down the time taken for each, keep accumulating those values and maintain an overall average value which is tracked under stats.
But we don’t have a mechanism to track search latencies around coordinator node. Coordinator node plays an important role in fanning out requests to individual shard/data-nodes, aggregating those responses and eventually sending response back to the client. We have seen multiple issues in the past where it becomes hard/impossible to reason latency related issues because of lack of insights into coordinator level stats and we ended up spending a lot of unnecessary time/bandwidth on figuring it out. Clients using search API only rely on overall took time(present as part of search response) which doesn’t offer much insights into time taken by different phases.
Parent RFC: #7334
Describe the solution you'd like
Per Request level tracking: As part of this, we will offer further breakdown of existing took time in search response. To do this, we will introduce a new field(phase_took) in search response which will give more insights/visibility into overall time taken by different search phases(query/fetch/canMatch etc) to the clients.
{
"took" : 92,
"phase_took" : { // new field
"dfs_prequery" : 0,
"can_match" : 0,
"query" : 66,
"fetch" : 4,
"expand_search" : 0
},
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
Additional context
Request phase_took times will be disabled by default since applications will not expect this new response field. Users can be enable the feature via a query parameter OR cluster setting. This gives users flexibility to set at a cluster level while also turning on/off as needed on individual requests.
// Query param
GET /_search?phase_took
GET /_search?phase_took=true
GET /_search?phase_took=false
// Cluster setting
"search.phase_took_enabled"
Metadata
Assignees
Type
Projects
Status
✅ Done
Activity