Skip to content

Conversation

@peteralfonsi
Copy link
Contributor

Description

In query_string queries that use regex, TooComplexToDeterminizeException was incorrectly swallowed if lenient query behavior was on. lenient is intended to "ignore data type mismatches between the query and the document field," but TooComplexToDeterminizeException comes from the same place in the code despite not having to do with data type mismatches. This caused query_string queries to return 200 incorrectly even when the same regex on a regexp query would return 400.

A related question is why lenient was on in the first place within QueryStringQueryBuilder, given that the index setting defaults to false and I didn't specify it in the query body. I will raise a separate issue for this as I'm not sure if the current behavior is intended or not and I want to get feedback from others. Either way though, the fix in this PR should apply.

Testing: added coverage to the existing UT. Also manually tested the query from the issue:

curl -XGET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
          {
          "query": {
          "query_string": {
                "query":{
                "f1:/.*value.*|.*__two__.*|.*__three__.*|.*__four__.*|.*__five__.*|.*__six__.*|.*__seven__.*|.*__eight__.*|.*__nine__.*/",
                "analyze_wildcard": true
              }
        }
        }'

This originally succeeded, but now correctly returns 400:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "query_shard_exception",
        "reason" : "failed to create query: Determinizing automaton with 89 states and 115 transitions would require more than 10000 effort.",
        "index" : "text_regex",
        "index_uuid" : "H5Dk6CxwTWuwYIDh1BEEQg"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "text_regex",
        "node" : "TQ_xeOJ-QD2z_bbHPLY-vw",
        "reason" : {
          "type" : "query_shard_exception",
          "reason" : "failed to create query: Determinizing automaton with 89 states and 115 transitions would require more than 10000 effort.",
          "index" : "text_regex",
          "index_uuid" : "H5Dk6CxwTWuwYIDh1BEEQg",
          "caused_by" : {
            "type" : "too_complex_to_determinize_exception",
            "reason" : "Determinizing automaton with 89 states and 115 transitions would require more than 10000 effort."
          }
        }
      }
    ]
  },
  "status" : 400
}

Related Issues

Resolves #18733

Check List

  • Functionality includes testing.
  • [N/A] API changes companion pull request created, if applicable.
  • [N/A] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Signed-off-by: Peter Alfonsi <petealft@amazon.com>
@peteralfonsi peteralfonsi changed the title Propagate regex exception in query_string queries Propagate TooComplexToDeterminizeException in query_string regex queries Jul 31, 2025
@github-actions
Copy link
Contributor

❌ Gradle check result for 89a1044: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@peteralfonsi
Copy link
Contributor Author

#18872

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for dd7f617: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2025

❕ Gradle check result for 3d0b614: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link

codecov bot commented Aug 1, 2025

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 72.81%. Comparing base (cff74ff) to head (3d0b614).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
...pensearch/index/search/QueryStringQueryParser.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18883      +/-   ##
============================================
+ Coverage     72.78%   72.81%   +0.02%     
- Complexity    68681    68697      +16     
============================================
  Files          5582     5582              
  Lines        315495   315495              
  Branches      45784    45784              
============================================
+ Hits         229625   229718      +93     
+ Misses        67223    67185      -38     
+ Partials      18647    18592      -55     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@peteralfonsi
Copy link
Contributor Author

Not sure why codecov thinks line 802 of QueryStringQueryParser is uncovered, it would be run in any test that takes the lenient query path, for example QueryStringQueryBuilderTests.testPrefixNumeric() and others

@peteralfonsi
Copy link
Contributor Author

@jainankitk could you take a look at this one?

Copy link
Contributor

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @peteralfonsi for addressing this issue. LGTM!

@jainankitk jainankitk merged commit f90b12c into opensearch-project:main Aug 1, 2025
30 of 31 checks passed
sunqijun1 pushed a commit to sunqijun1/OpenSearch that referenced this pull request Aug 4, 2025
…ies (opensearch-project#18883)

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Co-authored-by: Peter Alfonsi <petealft@amazon.com>
Signed-off-by: sunqijun.jun <sunqijun.jun@bytedance.com>
tandonks pushed a commit to tandonks/OpenSearch that referenced this pull request Aug 5, 2025
…ies (opensearch-project#18883)

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Co-authored-by: Peter Alfonsi <petealft@amazon.com>
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
…ies (opensearch-project#18883)

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Co-authored-by: Peter Alfonsi <petealft@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Search Search query, autocomplete ...etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] query_string behavior using regex when shard failures occur due to too_complex_to_determinize_exception

2 participants