Skip to content

Improve discover query #69049

@nik9000

Description

@nik9000

I bumped into a _search generated by discover that had a few things in it that looked like they'd slow elasticsearch down. I'm wondering if we can do anything to speed this up:

curl -XPOST -HContent-Type:application/json ????????   -d'{
  "version": true,  <--- do we really need this?
  "size": 500,      <--- this is fairly large. too big to fit on the screen, right?
  "sort": [
    {
      "@timestamp": {
        "order": "desc",
        "unmapped_type": "boolean" <---- wat
      }
    }
  ],
  "aggs": {    <---- having an agg in the same query as `size` turns off agg caching and doesn't let the early terminate fetching the top hits
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "30m",
        "time_zone": "America/Chicago",
        "min_doc_count": 1
      }
    }
  },
  "stored_fields": [  <---- stored_fields should be pretty rare. I'd expect leaving this off would mostly produce all you need and adding it will fetch more than you need.
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [   <----- that is a lot of fields. doc_values are a column store so aren't going to be efficient to fetch. I guess you do this to get a formatted date. https://github.com/elastic/elasticsearch/issues/55363 will help with that.
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    .... 11 other date_time fields
  ],
  "_source": {   <----- this is confusing. I think it means "don't filter" but I'd have to look it up. It's way, way less confusing to leave this out if you don't need any filtering.
    "excludes": []
  },
  "query": {
    "bool": {   <----- This looks big but it is 100% ok. We'll rewrite it down to just the range query.
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2020-06-08T05:00:00.000Z",
              "lte": "2020-06-09T02:06:38.662Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  },
  "highlight": {
    "pre_tags": [
      "@kibana-highlighted-field@"
    ],
    "post_tags": [
      "@/kibana-highlighted-field@"
    ],
    "fields": {
      "*": {}       <--- this is expensive. There are 100 fields in this index. In this particular case the search query may be too simple to highlight. I'm not sure. But I am certain that if the query *isn't* super simple then this will be expensive.
    },
    "fragment_size": 2147483647 <---- this is asking ES to OOM if there are large documents.
  }
}

Metadata

Metadata

Assignees

Labels

Feature:DiscoverDiscover ApplicationFeature:SearchQuerying infrastructure in KibanaTeam:DataDiscoveryDiscover, search (data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. t//enhancementNew value added to drive a business resultimpact:lowAddressing this issue will have a low level of impact on the quality/strength of our product.loe:smallSmall Level of Effort

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions