Skip to content

SQL: wrong query when LIKE and NOT LIKE are used together with aggregate functions  #39931

Closed
@broodfusion

Description

@broodfusion

Elasticsearch version (bin/elasticsearch --version): 6.6.1

Plugins installed: []

JVM version (java -version):

java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

OS version (uname -a if on a Unix-like system):
macOS Mojave 10.14.3

Description of the problem including expected versus actual behavior:
Not sure if this is a current limitation as I could not find any documentation explicitly stating this issue as described below:

When performing an SQL query using aggregate functions such as COUNT(), AVG() with LIKE and NOT LIKE, the actual performed query is not as expected.

Using the translate API, we can see the issue more clearly in Kibana:

POST /_xpack/sql/translate
{
  "query": "SELECT COUNT(*), station.keyword as station FROM call WHERE station.keyword LIKE 'B 4E%' and station.keyword NOT LIKE 'B 4E D%' GROUP BY station.keyword"
}

In the below response, we see that the wildcard in must and wildcard in must_not are exactly the same. Therefore the expected query data will not be returned correctly.

Response:

{
  "size" : 0,
  "query" : {
    "bool" : {
      "must" : [
        {
          "wildcard" : {
            "station.keyword" : {
              "wildcard" : "B 4E*",
              "boost" : 1.0
            }
          }
        },
        {
          "bool" : {
            "must_not" : [
              {
                "wildcard" : {
                  "station.keyword" : {
                    "wildcard" : "B 4E*",
                    "boost" : 1.0
                  }
                }
              }
            ],
            "adjust_pure_negative" : true,
            "boost" : 1.0
          }
        }
      ],
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "_source" : false,
  "stored_fields" : "_none_",
  "aggregations" : {
    "groupby" : {
      "composite" : {
        "size" : 1000,
        "sources" : [
          {
            "21587" : {
              "terms" : {
                "field" : "station.keyword",
                "missing_bucket" : true,
                "order" : "asc"
              }
            }
          }
        ]
      }
    }
  }
}

Expected:

{
  "size" : 0,
  "query" : {
    "bool" : {
      "must" : [
        {
          "wildcard" : {
            "station.keyword" : {
              "wildcard" : "B 4E*",
              "boost" : 1.0
            }
          }
        },
        {
          "bool" : {
            "must_not" : [
              {
                "wildcard" : {
                  "station.keyword" : {
                    "wildcard" : "B 4E D*",
                    "boost" : 1.0
                  }
                }
              }
            ],
            "adjust_pure_negative" : true,
            "boost" : 1.0
          }
        }
      ],
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "_source" : false,
  "stored_fields" : "_none_",
  "aggregations" : {
    "groupby" : {
      "composite" : {
        "size" : 1000,
        "sources" : [
          {
            "21587" : {
              "terms" : {
                "field" : "station.keyword",
                "missing_bucket" : true,
                "order" : "asc"
              }
            }
          }
        ]
      }
    }
  }
}

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

  1. create an index named call and a field station that has mapping:
"station" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
  1. Seed some data for station (optional)
B 4E Ph 1
B 4E Ph 2
B 4E D Ph 1
  1. Using the translate API, check the result of using the below
POST /_xpack/sql/translate
{
  "query": "SELECT COUNT(*), station.keyword as station FROM call WHERE station.keyword LIKE 'B 4E%' and station.keyword NOT LIKE 'B 4E D%' GROUP BY station.keyword"
}

Provide logs (if relevant):

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions