Description
Elasticsearch version (bin/elasticsearch --version
): 6.3.2
Description of the problem including expected versus actual behavior:
After upgrading from 6.0.0 to anything >= 6.1.0, we are getting an extra DocValueFieldsExistsQuery [field=_primary_term] being called. This is an extra underlying query that impacts performance. This seems to be potentially introduced at #27469 which resolves #24362 based on the release notes.
Steps to reproduce:
- Mapping is defined as
{
"mappings": {
"item": {
"item": {
"_all": {
"enabled": false
},
"properties": {
"id": {
"type": "integer"
},
"document_type": {
"type": "keyword",
"doc_values": false
},
"language": {
"type": "keyword",
"doc_values": false
},
"authorships": {
"type": "nested",
"properties": {
"user_id": {
"type": "keyword",
"doc_values": false
},
"contribution_types": {
"type": "keyword",
"doc_values": false
}
}
}
}
}
}
}
}
- Data inserted is defined as
{
"document_type": "5",
"language": "1"
}
- The query which, when profiled, spits out the extra DocValueFieldsExistsQuery (_primary_term) is
{
"profile": true,
"query": {
"bool": {
"filter": [{
"terms": {
"document_type": [1, 5]
}
}, {
"terms": {
"language": [1]
}
}]
}
}
}
The partial results from profiling the query above are
(#ConstantScore(document_type:1 document_type:5) #ConstantScore(language:1)) #DocValuesFieldExistsQuery [field=_primary_term]"
- the DocValuesFieldExistsQuery is causing extra overhead which adds to the response time.
Some extra findings - the extra DocValuesFieldExistsQuery only surfaces when there is a nested mapping and all the filter arguments are "terms" which means all array-type. For example - if I change language
to be a term-query which filters on just 1
, the extra Lucene query does not fire.
So if there is not a nested mapping, this extra query never fires. If there exists a nested mapping, we can get around it by including a singular term query.
I also profiled 6.0.0 and 6.0.1. 6.0.1 introduces another query which seems like a prerequisite for this new change. Profiling the query above in ES 6.0.1 produces this partial profile output
(#ConstantScore(document_type:1 document_type:5) #ConstantScore(language:1)) #(#*:* -_type:__*)
The #*:* -_type:__*
adds unnecessary overhead as well - it seems to be fire a MatchAllDocsQuery. The current solution seems to be going back to 6.0.0.
The performance impacts for us are the following:
- For our smaller recommendations index, we are getting avg response times of 100+ ms vs 50 ms without the extra query.
- For one of our bigger indices (300 million docs), our avg response times are 700 ms vs 350 ms.
The query example above is simplified compared to our production use case, but the behavior still remains consistent.
Is this behavior intentional?