Description
Nightly benchmarks revealed a performance regression in searches on range fields following a change to add compression to binary doc values in Lucene.
While the change reduced the disk storage costs, the benchmarks showed median search times for the benchmark increased from 12ms to 42. The affected query was on the "noaa" dataset with these clauses:
{
"query": {
"bool": {
"must": [
{
"term": {
"station.country_code": "JA"
}
},
{
"range": {
"TRANGE": {
"gte": 0,
"lte": 30
}
}
}
]
}
}
}
This issue was opened to investigate the reason for the slow-down and possible fixes.
The range content shouldn't compress too much which means it should be fast to decompress given how the LZ4 compression works. However, another cost added by the Lucene change is that groups of 32 document values are now loaded when retrieving values rather than accessing single document values. This may be behind the slow-down.