Skip to content

Range field performance regression after Lucene compression change #53498

Closed
@markharwood

Description

@markharwood

Nightly benchmarks revealed a performance regression in searches on range fields following a change to add compression to binary doc values in Lucene.
While the change reduced the disk storage costs, the benchmarks showed median search times for the benchmark increased from 12ms to 42. The affected query was on the "noaa" dataset with these clauses:

{
  "query": {
	"bool": {
	  "must": [
		{
		  "term": {
			"station.country_code": "JA"
		  }
		},
		{
		  "range": {
			"TRANGE": {
			  "gte": 0,
			  "lte": 30
			}
		  }
		}
	  ]
	}
  }
}

This issue was opened to investigate the reason for the slow-down and possible fixes.
The range content shouldn't compress too much which means it should be fast to decompress given how the LZ4 compression works. However, another cost added by the Lucene change is that groups of 32 document values are now loaded when retrieving values rather than accessing single document values. This may be behind the slow-down.

Metadata

Metadata

Assignees

Labels

:PerformanceAll issues related to Elasticsearch performance including regressions and investigations:Search/SearchSearch-related issues that do not fall into other categoriesv7.7.0

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions