Bugfix to guard against stack overflow errors caused by very large reg-ex input #2810

kartg · 2022-04-07T20:58:34Z

Description

This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide an arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames.

Signed-off-by: Kartik Ganesh gkart@amazon.com

Issues Resolved

closes #1992

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…g-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com>

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

dblock

This is super great work!

opensearch-ci-bot · 2022-04-07T21:29:53Z

✅ Gradle Check success 23ee87a
Log 4281

Reports 4281

kartg · 2022-04-07T21:36:40Z

@dblock should this be backported?

andrross · 2022-04-07T21:55:48Z

server/src/main/java/org/opensearch/search/aggregations/bucket/terms/IncludeExclude.java

-    private Automaton toAutomaton() {
-        Automaton a = null;
+    private Automaton toAutomaton(@Nullable IndexSettings indexSettings) {
+        int maxRegexLength = indexSettings == null ? -1 : indexSettings.getMaxRegexLength();


When can index settings be null? Seems like this would result in an odd exception message of "...the allowed maximum of [-1]...".

Oops, good catch. This condition should really be:

if (maxRegexLength > 0 && include.length() > maxRegexLength) {

I'll make this change

andrross · 2022-04-07T21:56:42Z

server/src/main/java/org/opensearch/search/aggregations/bucket/terms/IncludeExclude.java

        if (include != null) {
-            a = include.toAutomaton();
+            if (include.length() > maxRegexLength) {
+                throw new IllegalArgumentException(


The exception text is identical in both cases, right? Maybe an opportunity to have a static helper like validateRegexLength(String regex, IndexSettings settings) or similar

…g-ex input (opensearch-project#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com>

… by very large reg-ex input (#2817) * Bugfix to guard against stack overflow errors caused by very large reg-ex input (#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Updates to the large string reg-ex check (#2814) * Updates to the large string reg-ex check Removed the null-case for IndexSettings since this only occurs in tests. The tests now use a dummy Index Setting. This change also fixes a bug with the base case handling of max regex length in the check. Signed-off-by: Kartik Ganesh <gkart@amazon.com>

… by very large reg-ex input (#2816) * Bugfix to guard against stack overflow errors caused by very large reg-ex input (#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Updates to the large string reg-ex check (#2814) * Updates to the large string reg-ex check Removed the null-case for IndexSettings since this only occurs in tests. The tests now use a dummy Index Setting. This change also fixes a bug with the base case handling of max regex length in the check. Signed-off-by: Kartik Ganesh <gkart@amazon.com>

dreamer-89

@kartg : Thanks for pulling up this change and fixing the bug in a short time.

I wanted to share one observation I have around search API response, get feedback and decide if we need to track further. The API returns empty successful result instead of an exception when providing a large string (> max_regex_length) in include on a random field. I think one way to fix it, is to gate the length check in AggregatorFactories where other parsing exceptions are handled (though it would need to index settings reference and is much bigger change).
Or we can return a response suggesting the referenced field is not found.

curl -X POST "localhost:9200/test-index/_search?pretty" -H "Content-Type: application/json" -d '
{
  "query": {
  	"bool":{"filter":[]}
  },
  "aggs":{"suggestions":{"terms":{"field":"random_field", "include":"<large string>","execution_hint":"map","shard_size":10}}}
}

Actual Response: Empty result even though include field exceeds max_regex_length.

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "suggestions": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
        }
    }
}

Expected: Should fail before with max_regex_length exception instead of empty response.

{
    "error": {
        "root_cause": [
            {
                "type": "illegal_argument_exception",
                "reason": "The length of regex [50001] used in the request has exceeded the allowed maximum of [1000]. This maximum can be set by changing the [index.max_regex_length] index level setting."
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "test-index",
                "node": "w4M35XX3Rtq10akMmt0crg",
                "reason": {
                    "type": "illegal_argument_exception",
                    "reason": "The length of regex [50001] used in the request has exceeded the allowed maximum of [1000]. This maximum can be set by changing the [index.max_regex_length] index level setting."
                }
            }
        ],
        "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "The length of regex [50001] used in the request has exceeded the allowed maximum of [1000]. This maximum can be set by changing the [index.max_regex_length] index level setting.",
            "caused_by": {
                "type": "illegal_argument_exception",
                "reason": "The length of regex [50001] used in the request has exceeded the allowed maximum of [1000]. This maximum can be set by changing the [index.max_regex_length] index level setting."
            }
        }
    },
    "status": 400
}

kartg · 2022-04-08T22:53:48Z

@dreamer-89 Discussed here - #1992 (comment)

Assuming random_field is not a valid field on test-index, then this code path does not actually parse the input string as a RegExp. Given that this then returns an empty aggregation, this feels like reasonable behavior. Thoughts?

rishabhmaurya · 2022-04-11T19:41:13Z

Sorry for posting this late, but I'm not convinced with the way this change is implemented due to following reasons -

This change adds dependency on IndexSettings for IncludeExclude, which seems logically incorrect and down the line it may interrupt in modularity of codebase for such non-essential dependencies.
Also, making IndexSettings to be mandatory, for one of the method, again looks logically incorrect. There could be consumers, who are using this class, which don't have instance of IndexSettings or are not related to index.
Changing publicly exposed class constructor/public apis, doesn't seems like a good idea. If its really needed, it should be put on the deprecation path.

dblock · 2022-04-11T19:53:30Z

@kartg Take a look above, open another issue if you agree

kartg · 2022-04-11T20:33:55Z

@dblock Tracking in #2858

jromeroch · 2023-07-07T07:15:14Z

This fix can be backported to version 1.35? We faced this issue, but currently we can't upgrade as we have some blockers.

dblock · 2023-07-12T16:34:04Z

@jromeroch I think this can qualify, I'll mark it backport 1.x to start, but if auto-backport fails it will need to be done by someone (you?) by hand.

…g-ex input (#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com> (cherry picked from commit 566ebfa) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…y very large reg-ex input (#8663) * Bugfix to guard against stack overflow errors caused by very large reg-ex input (#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com> (cherry picked from commit 566ebfa) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * [Patch for 1.x] Use apache.logging.log4j.util.Strings.repeat to generate integration test data This is necessary because the Java-native String.repeat() method is not present in JDK8 Signed-off-by: Kartik Ganesh <gkart@amazon.com> --------- Signed-off-by: Kartik Ganesh <gkart@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kartik Ganesh <gkart@amazon.com>

jameshinsta · 2023-08-03T02:33:54Z

My understanding is that this will be included in 1.3.12 as it is marked for backport 1.x.

Does auto-backport then include all 1.x versions, so we should expect it in 1.3.11?

…g-ex input (#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com> (cherry picked from commit 566ebfa) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…y very large reg-ex input (#16101) * Bugfix to guard against stack overflow errors caused by very large reg-ex input (#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com> (cherry picked from commit 566ebfa) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Use JDK8 compatible string repetition Signed-off-by: Daniel Widdis <widdis@gmail.com> * Change log Signed-off-by: Daniel Widdis <widdis@gmail.com> --------- Signed-off-by: Kartik Ganesh <gkart@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Daniel Widdis <widdis@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Daniel Widdis <widdis@gmail.com>

kartg added 3 commits April 7, 2022 13:50

Adding integration tests for large string RegEx

a900ab4

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

Spotless

23ee87a

Signed-off-by: Kartik Ganesh <gkart@amazon.com>

kartg requested a review from a team as a code owner April 7, 2022 20:58

dblock approved these changes Apr 7, 2022

View reviewed changes

dblock requested review from andrross, nknize and dreamer-89 April 7, 2022 21:06

dblock mentioned this pull request Apr 7, 2022

StackOverflow crash - large regex produced by Discover filter not limited by index.regex_max_length elastic/elasticsearch#82923

Closed

kartg merged commit 566ebfa into opensearch-project:main Apr 7, 2022

andrross reviewed Apr 7, 2022

View reviewed changes

tlfeng added bug Something isn't working Indexing & Search v3.0.0 Issues and PRs related to version 3.0.0 and removed v3.0.0 Issues and PRs related to version 3.0.0 labels Apr 8, 2022

kartg mentioned this pull request Apr 8, 2022

[Backport] [2.0] Bugfix to guard against stack overflow errors caused by very large reg-ex input #2816

Merged

5 tasks

kartg mentioned this pull request Apr 8, 2022

[Backport] [2.x] Bugfix to guard against stack overflow errors caused by very large reg-ex input #2817

Merged

5 tasks

tlfeng added the v2.0.0 Version 2.0.0 label Apr 8, 2022

dreamer-89 reviewed Apr 8, 2022

View reviewed changes

rishabhmaurya mentioned this pull request Apr 11, 2022

[BUG][DISTRIBUTION][2.0.0] org.gradle.api.GradleException: Compilation error #2858

Closed

dblock added the backport 1.x label Jul 12, 2023

opensearch-trigger-bot bot mentioned this pull request Jul 12, 2023

[Backport 1.x] Bugfix to guard against stack overflow errors caused by very large reg-ex input #8663

Merged

sandeshkr419 mentioned this pull request Jun 19, 2024

Refactor Grok validate pattern to iterative approach #14206

Merged

3 tasks

dbwiddis added the backport 1.3 Backport to 1.3 branch label Sep 26, 2024

opensearch-trigger-bot bot mentioned this pull request Sep 26, 2024

[Backport 1.3] Bugfix to guard against stack overflow errors caused by very large reg-ex input #16101

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix to guard against stack overflow errors caused by very large reg-ex input #2810

Bugfix to guard against stack overflow errors caused by very large reg-ex input #2810

kartg commented Apr 7, 2022 •

edited

Loading

dblock left a comment

opensearch-ci-bot commented Apr 7, 2022

kartg commented Apr 7, 2022

andrross Apr 7, 2022

kartg Apr 7, 2022

kartg Apr 7, 2022

andrross Apr 7, 2022

kartg Apr 7, 2022

dreamer-89 left a comment •

edited

Loading

kartg commented Apr 8, 2022

rishabhmaurya commented Apr 11, 2022 •

edited

Loading

dblock commented Apr 11, 2022

kartg commented Apr 11, 2022

jromeroch commented Jul 7, 2023

dblock commented Jul 12, 2023

jameshinsta commented Aug 3, 2023

Bugfix to guard against stack overflow errors caused by very large reg-ex input #2810

Bugfix to guard against stack overflow errors caused by very large reg-ex input #2810

Conversation

kartg commented Apr 7, 2022 • edited Loading

Description

Issues Resolved

Check List

dblock left a comment

Choose a reason for hiding this comment

opensearch-ci-bot commented Apr 7, 2022

kartg commented Apr 7, 2022

andrross Apr 7, 2022

Choose a reason for hiding this comment

kartg Apr 7, 2022

Choose a reason for hiding this comment

kartg Apr 7, 2022

Choose a reason for hiding this comment

andrross Apr 7, 2022

Choose a reason for hiding this comment

kartg Apr 7, 2022

Choose a reason for hiding this comment

dreamer-89 left a comment • edited Loading

Choose a reason for hiding this comment

kartg commented Apr 8, 2022

rishabhmaurya commented Apr 11, 2022 • edited Loading

dblock commented Apr 11, 2022

kartg commented Apr 11, 2022

jromeroch commented Jul 7, 2023

dblock commented Jul 12, 2023

jameshinsta commented Aug 3, 2023

kartg commented Apr 7, 2022 •

edited

Loading

dreamer-89 left a comment •

edited

Loading

rishabhmaurya commented Apr 11, 2022 •

edited

Loading