Skip to content

Search Latency Tracking - Per Request Phase Took Time #9650

Closed
@dzane17

Description

Is your feature request related to a problem? Please describe.
As of today, we track search request latencies on a shard level via node stats. After every query/fetch phase is completed on a shard, we note down the time taken for each, keep accumulating those values and maintain an overall average value which is tracked under stats.

But we don’t have a mechanism to track search latencies around coordinator node. Coordinator node plays an important role in fanning out requests to individual shard/data-nodes, aggregating those responses and eventually sending response back to the client. We have seen multiple issues in the past where it becomes hard/impossible to reason latency related issues because of lack of insights into coordinator level stats and we ended up spending a lot of unnecessary time/bandwidth on figuring it out. Clients using search API only rely on overall took time(present as part of search response) which doesn’t offer much insights into time taken by different phases.

Parent RFC: #7334

Describe the solution you'd like
Per Request level tracking: As part of this, we will offer further breakdown of existing took time in search response. To do this, we will introduce a new field(phase_took) in search response which will give more insights/visibility into overall time taken by different search phases(query/fetch/canMatch etc) to the clients.

{
  "took" : 92,
  "phase_took" : {  // new field
    "dfs_prequery" : 0,
    "can_match" : 0,
    "query" : 66,
    "fetch" : 4,
    "expand_search" : 0
  },
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

Additional context
Request phase_took times will be disabled by default since applications will not expect this new response field. Users can be enable the feature via a query parameter OR cluster setting. This gives users flexibility to set at a cluster level while also turning on/off as needed on individual requests.

// Query param
GET /_search?phase_took
GET /_search?phase_took=true
GET /_search?phase_took=false

// Cluster setting
"search.phase_took_enabled"

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    SearchSearch query, autocomplete ...etcenhancementEnhancement or improvement to existing feature or request

    Type

    No type

    Projects

    • Status

      ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions