Skip to content

Extra Lucene DocValueExistsQuery fired - due to nested mapping and primary_terms #34067

Closed
@jeffreynscrbdee

Description

@jeffreynscrbdee

Elasticsearch version (bin/elasticsearch --version): 6.3.2

Description of the problem including expected versus actual behavior:

After upgrading from 6.0.0 to anything >= 6.1.0, we are getting an extra DocValueFieldsExistsQuery [field=_primary_term] being called. This is an extra underlying query that impacts performance. This seems to be potentially introduced at #27469 which resolves #24362 based on the release notes.

Steps to reproduce:

  1. Mapping is defined as
{
    "mappings": {
        "item": {
            "item": {
                "_all": {
                    "enabled": false
                },
                "properties": {
                    "id": {
                        "type": "integer"
                    },
                    "document_type": {
                        "type": "keyword",
                        "doc_values": false
                    },
                    "language": {
                        "type": "keyword",
                        "doc_values": false
                    },
                    "authorships": {
                        "type": "nested",
                        "properties": {
                            "user_id": {
                                "type": "keyword",
                                "doc_values": false
                            },
                            "contribution_types": {
                                "type": "keyword",
                                "doc_values": false
                            }
                        }
                    }
                }
            }
        }
    }
}
  1. Data inserted is defined as
{
    "document_type": "5",
    "language": "1"
}
  1. The query which, when profiled, spits out the extra DocValueFieldsExistsQuery (_primary_term) is
{
    "profile": true,
    "query": {
        "bool": {
            "filter": [{
                "terms": {
                    "document_type": [1, 5]
                }
            }, {
                "terms": {
                    "language": [1]
                }
            }]
        }
    }
}

The partial results from profiling the query above are

(#ConstantScore(document_type:1 document_type:5) #ConstantScore(language:1)) #DocValuesFieldExistsQuery [field=_primary_term]"
  • the DocValuesFieldExistsQuery is causing extra overhead which adds to the response time.

Some extra findings - the extra DocValuesFieldExistsQuery only surfaces when there is a nested mapping and all the filter arguments are "terms" which means all array-type. For example - if I change language to be a term-query which filters on just 1, the extra Lucene query does not fire.

So if there is not a nested mapping, this extra query never fires. If there exists a nested mapping, we can get around it by including a singular term query.

I also profiled 6.0.0 and 6.0.1. 6.0.1 introduces another query which seems like a prerequisite for this new change. Profiling the query above in ES 6.0.1 produces this partial profile output

(#ConstantScore(document_type:1 document_type:5) #ConstantScore(language:1)) #(#*:* -_type:__*)

The #*:* -_type:__* adds unnecessary overhead as well - it seems to be fire a MatchAllDocsQuery. The current solution seems to be going back to 6.0.0.

The performance impacts for us are the following:

  1. For our smaller recommendations index, we are getting avg response times of 100+ ms vs 50 ms without the extra query.
  2. For one of our bigger indices (300 million docs), our avg response times are 700 ms vs 350 ms.

The query example above is simplified compared to our production use case, but the behavior still remains consistent.

Is this behavior intentional?

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search/SearchSearch-related issues that do not fall into other categories

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions