Skip to content

Combined_fields query not behaving correctly inside bool query under certain circumstances #75223

Closed
@nemphys

Description

@nemphys

Elasticsearch version (bin/elasticsearch --version): 7.13.3

Plugins installed: [ analysis_icu ]

JVM version (java -version): bundled

OS version (uname -a if on a Unix-like system): macOS Big Sur

Description of the problem including expected versus actual behavior:

As already reported in the last comments of #74037, I am facing a weird issue where a boolean should query containing various combined_fields queries returns no results under certain circumstances.
I have tried adjusting the query in many different ways in order to find what the cause could be, but the findings are quite confusing.

The base (greatly simplified) query that produces no results is as follows (please note that fields have been split into separate combined_fields queries according to their search_analyzer as they are supposed to and the only field that should return matches in this case is the 'id.ngram' field of the 3rd query):

{
    "query": {
        "bool": {
            "should": [
                {
                    "combined_fields": {
                        "query": "xxxxx",
                        "operator": "AND",
                        "fields": [
                            "name",
                            "description",
                            "manufacturer",
                            "category",
                            "category_leaf",
                            "color",
                            "sizes"
                        ]
                    }
                },
                {
                    "combined_fields": {
                        "query": "xxxxx",
                        "operator": "AND",
                        "fields": [
                            "name.std",
                            "description.std",
                            "manufacturer.std",
                            "category.std",
                            "category_leaf.std",
                            "color.std",
                            "sizes.std"
                        ]
                    }
                },
                {
                    "combined_fields": {
                        "query": "xxxxx",
                        "operator": "AND",
                        "fields": [
                            "id",
                            "mpn",
                            "id.ngram",
                            "mpn.ngram"
                        ]
                    }
                }
            ]
        }
    }
}

Queries that do return the expected results are:

  1. Removal of the 2nd query (which is base on a [ "fold", "lowercase", "stop" ] token filter analyzer):
{
    "query": {
        "bool": {
            "should": [
                {
                    "combined_fields": {
                        "query": "xxxxx",
                        "operator": "AND",
                        "fields": [
                            "name",
                            "description",
                            "manufacturer",
                            "category",
                            "category_leaf",
                            "color",
                            "sizes"
                        ]
                    }
                },
                {
                    "combined_fields": {
                        "query": "xxxxx",
                        "operator": "AND",
                        "fields": [
                            "id",
                            "mpn",
                            "id.ngram",
                            "mpn.ngram"
                        ]
                    }
                }
            ]
        }
    }
}

Please note that removing the first query (same fields as the second, but with a more complex analyzer) does not fix the problem. Needless to say that removing both the first and the second queries does return results.

  1. Removal of all fields of the second query except for one (does not matter which one, have tried all of them):
{
    "query": {
        "bool": {
            "should": [
                {
                    "combined_fields": {
                        "query": "xxxxx",
                        "operator": "AND",
                        "fields": [
                            "name",
                            "description",
                            "manufacturer",
                            "category",
                            "category_leaf",
                            "color",
                            "sizes"
                        ]
                    }
                },
                {
                    "combined_fields": {
                        "query": "xxxxx",
                        "operator": "AND",
                        "fields": [
                            "name.std"
                        ]
                    }
                },
                {
                    "combined_fields": {
                        "query": "xxxxx",
                        "operator": "AND",
                        "fields": [
                            "id",
                            "mpn",
                            "id.ngram",
                            "mpn.ngram"
                        ]
                    }
                }
            ]
        }
    }
}
  1. Removal of all fields of the third query, except (of course) for 'id.ngram' (which is the matching field):
{
    "query": {
        "bool": {
            "should": [
                {
                    "combined_fields": {
                        "query": "xxxxx",
                        "operator": "AND",
                        "fields": [
                            "name",
                            "description",
                            "manufacturer",
                            "category",
                            "category_leaf",
                            "color",
                            "sizes"
                        ]
                    }
                },
                {
                    "combined_fields": {
                        "query": "xxxxx",
                        "operator": "AND",
                        "fields": [
                            "name.std",
                            "description.std",
                            "manufacturer.std",
                            "category.std",
                            "category_leaf.std",
                            "color.std",
                            "sizes.std"
                        ]
                    }
                },
                {
                    "combined_fields": {
                        "query": "xxxxx",
                        "operator": "AND",
                        "fields": [
                            "id.ngram"
                        ]
                    }
                }
            ]
        }
    }
}

@jtibshirani please take a look at this, might ring a bell somehow.

Metadata

Metadata

Assignees

Labels

:Search/SearchSearch-related issues that do not fall into other categories>bugTeam:SearchMeta label for search team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions