Extra Lucene DocValueExistsQuery fired - due to nested mapping and primary_terms

**Elasticsearch version** (`bin/elasticsearch --version`): 6.3.2

**Description of the problem including expected versus actual behavior**:

After upgrading from 6.0.0 to anything >= 6.1.0, we are getting an extra *DocValueFieldsExistsQuery* [field=_primary_term] being called. This is an extra underlying query that impacts performance. This seems to be potentially introduced at https://github.com/elastic/elasticsearch/pull/27469 which resolves https://github.com/elastic/elasticsearch/issues/24362 based on the release notes. 

**Steps to reproduce**:

 1. Mapping is defined as 

```
{
    "mappings": {
        "item": {
            "item": {
                "_all": {
                    "enabled": false
                },
                "properties": {
                    "id": {
                        "type": "integer"
                    },
                    "document_type": {
                        "type": "keyword",
                        "doc_values": false
                    },
                    "language": {
                        "type": "keyword",
                        "doc_values": false
                    },
                    "authorships": {
                        "type": "nested",
                        "properties": {
                            "user_id": {
                                "type": "keyword",
                                "doc_values": false
                            },
                            "contribution_types": {
                                "type": "keyword",
                                "doc_values": false
                            }
                        }
                    }
                }
            }
        }
    }
}
```

 2. Data inserted is defined as 

```
{
    "document_type": "5",
    "language": "1"
}
```
 3. The query which, when profiled, spits out the extra DocValueFieldsExistsQuery (_primary_term) is

```
{
    "profile": true,
    "query": {
        "bool": {
            "filter": [{
                "terms": {
                    "document_type": [1, 5]
                }
            }, {
                "terms": {
                    "language": [1]
                }
            }]
        }
    }
}
```

The partial results from profiling the query above are 

```
(#ConstantScore(document_type:1 document_type:5) #ConstantScore(language:1)) #DocValuesFieldExistsQuery [field=_primary_term]"
``` 
- the DocValuesFieldExistsQuery is causing extra overhead which adds to the response time.

Some extra findings - the extra DocValuesFieldExistsQuery *only* surfaces when there is a *nested* mapping *and* all the filter arguments are "terms" which means all array-type. For example - if I change `language` to be a term-query which filters on just `1`, the extra Lucene query does not fire.

So if there is not a nested mapping, this extra query never fires. If there exists a nested mapping, we can get around it by including a singular *term* query.

I also profiled 6.0.0 and 6.0.1. 6.0.1 introduces another query which seems like a prerequisite for this new change. Profiling the query above in ES 6.0.1 produces this partial profile output
```
(#ConstantScore(document_type:1 document_type:5) #ConstantScore(language:1)) #(#*:* -_type:__*)
```
The `#*:* -_type:__*` adds unnecessary overhead as well - it seems to be fire a MatchAllDocsQuery. The current solution seems to be going back to 6.0.0.

The performance impacts for us are the following:
1. For our smaller recommendations index, we are getting avg response times of 100+ ms vs 50 ms without the extra query.
2. For one of our bigger indices (300 million docs), our avg response times are 700 ms vs 350 ms.

The query example above is simplified compared to our production use case, but the behavior still remains consistent.

Is this behavior intentional?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extra Lucene DocValueExistsQuery fired - due to nested mapping and primary_terms #34067

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extra Lucene DocValueExistsQuery fired - due to nested mapping and primary_terms #34067

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions