Skip to content

[BUG] query_string behavior using regex when shard failures occur due to too_complex_to_determinize_exception #18733

@burnthm

Description

@burnthm

Describe the bug

Summary of analysis:

  • able to see same behavior following repro steps below (replicated in AOS 2.17)
  • use case is query_string query type which supports regular expression syntax
  • I've translated the query_string example repro query into standard regex type and see the expected behavior where a 400 response is returned on the too_complex_to_determinize_exception
  • I've compared the query profiles and they are nearly identical outside of a parent BoostQuery used in the query_string version of the request

What I understand is that the query_string is failing to catch shard failures, notably for the too_complex_to_determinize_exception error.

It is especially worrisome that the too_complex_to_determinize_exception is neither caught during query execution or in the cluster error logs. Only after running the regex query type that faced the same actively thrown exception did I see it in the cluster's application logs.

Related component

Search

To Reproduce

  1. Create test index
PUT test-regex
{
  "mappings": {
    "properties": {
      "f1": {
        "type": "keyword"
      }
    }
  }
}
  1. Add sample document
PUT test-regex/_doc/1
{
  "f1": "value"
}
  1. Send a request which fails silently with 200 response
GET test-regex/_search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "query_string": {
                "query": "f1:/.*value.*|.*__two__.*|.*__three__.*|.*__four__.*|.*__five__.*|.*__six__.*|.*__seven__.*|.*__eight__.*|.*__nine__.*/",
                "analyze_wildcard": true
              }
            }
          ]
        }
      }
    }
  }
}
  1. Send a request which succeeds with 200 response and returns sample document
GET test-regex/_search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "query_string": {
                "query": "f1:/.*value.*|.*__two__.*|.*__three__.*|.*__four__.*|.*__five__.*|.*__six__.*|.*__seven__.*|.*__eight__.*/",
                "analyze_wildcard": true
              }
            }
          ]
        }
      }
    }
  }
}
  1. Observe the too_complex_to_determinize_exception error on the silently failing query using the _validate API
GET test-regex/_validate/query?explain=true
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "query_string": {
                "query": "f1:/.*value.*|.*__two__.*|.*__three__.*|.*__four__.*|.*__five__.*|.*__six__.*|.*__seven__.*|.*__eight__.*|.*__nine__.*/",
                "analyze_wildcard": true
              }
            }
          ]
        }
      }
    }
  }
}
  1. Convert the silently failing query_string using regex to an entirely regex query type and observe 400 response
POST test-regex/_search
{
  "query": {
    "regexp": {
      "f1": ".*value.*|.*__two__.*|.*__three__.*|.*__four__.*|.*__five__.*|.*__six__.*|.*__seven__.*|.*__eight__.*|.*__nine__.*"
    }
  }
}

Expected behavior

Response to query [3] from the reproduction above should return a 4xx response to the client, similar to the response observed when executing [6]:

{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: Determinizing .*value.*|.*__two__.*|.*__three__.*|.*__four__.*|.*__five__.*|.*__six__.*|.*__seven__.*|.*__eight__.*|.*__nine__.* would require more than 10000 effort.",
        "index": "test-regex",
        "index_uuid": "hcEaXmFTSL-UJ86Pfdq0DA"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "test-regex",
        "node": "X5jNaaOERuCxyStZa7CPhQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: Determinizing .*value.*|.*__two__.*|.*__three__.*|.*__four__.*|.*__five__.*|.*__six__.*|.*__seven__.*|.*__eight__.*|.*__nine__.* would require more than 10000 effort.",
          "index": "test-regex",
          "index_uuid": "hcEaXmFTSL-UJ86Pfdq0DA",
          "caused_by": {
            "type": "too_complex_to_determinize_exception",
            "reason": "Determinizing .*value.*|.*__two__.*|.*__three__.*|.*__four__.*|.*__five__.*|.*__six__.*|.*__seven__.*|.*__eight__.*|.*__nine__.* would require more than 10000 effort.",
            "caused_by": {
              "type": "too_complex_to_determinize_exception",
              "reason": "Determinizing automaton with 80 states and 319 transitions would require more than 10000 effort."
            }
          }
        }
      }
    ]
  },
  "status": 400
}

Additional Details

No response

Metadata

Metadata

Assignees

Labels

SearchSearch query, autocomplete ...etcbugSomething isn't working

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions