-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix to guard against stack overflow errors caused by very large reg-ex input #2810
Conversation
…g-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com>
Signed-off-by: Kartik Ganesh <gkart@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super great work!
@dblock should this be backported? |
private Automaton toAutomaton() { | ||
Automaton a = null; | ||
private Automaton toAutomaton(@Nullable IndexSettings indexSettings) { | ||
int maxRegexLength = indexSettings == null ? -1 : indexSettings.getMaxRegexLength(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When can index settings be null? Seems like this would result in an odd exception message of "...the allowed maximum of [-1]...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good catch. This condition should really be:
if (maxRegexLength > 0 && include.length() > maxRegexLength) {
I'll make this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (include != null) { | ||
a = include.toAutomaton(); | ||
if (include.length() > maxRegexLength) { | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception text is identical in both cases, right? Maybe an opportunity to have a static helper like validateRegexLength(String regex, IndexSettings settings)
or similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…g-ex input (opensearch-project#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com>
…g-ex input (opensearch-project#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com>
… by very large reg-ex input (#2817) * Bugfix to guard against stack overflow errors caused by very large reg-ex input (#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Updates to the large string reg-ex check (#2814) * Updates to the large string reg-ex check Removed the null-case for IndexSettings since this only occurs in tests. The tests now use a dummy Index Setting. This change also fixes a bug with the base case handling of max regex length in the check. Signed-off-by: Kartik Ganesh <gkart@amazon.com>
… by very large reg-ex input (#2816) * Bugfix to guard against stack overflow errors caused by very large reg-ex input (#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Updates to the large string reg-ex check (#2814) * Updates to the large string reg-ex check Removed the null-case for IndexSettings since this only occurs in tests. The tests now use a dummy Index Setting. This change also fixes a bug with the base case handling of max regex length in the check. Signed-off-by: Kartik Ganesh <gkart@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kartg : Thanks for pulling up this change and fixing the bug in a short time.
I wanted to share one observation I have around search API response, get feedback and decide if we need to track further. The API returns empty successful result instead of an exception when providing a large string (> max_regex_length) in include
on a random field. I think one way to fix it, is to gate the length check in AggregatorFactories
where other parsing exceptions are handled (though it would need to index settings reference and is much bigger change).
Or we can return a response suggesting the referenced field is not found.
curl -X POST "localhost:9200/test-index/_search?pretty" -H "Content-Type: application/json" -d '
{
"query": {
"bool":{"filter":[]}
},
"aggs":{"suggestions":{"terms":{"field":"random_field", "include":"<large string>","execution_hint":"map","shard_size":10}}}
}
Actual Response: Empty result even though include
field exceeds max_regex_length
.
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"suggestions": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
}
}
Expected: Should fail before with max_regex_length exception instead of empty response.
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "The length of regex [50001] used in the request has exceeded the allowed maximum of [1000]. This maximum can be set by changing the [index.max_regex_length] index level setting."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "test-index",
"node": "w4M35XX3Rtq10akMmt0crg",
"reason": {
"type": "illegal_argument_exception",
"reason": "The length of regex [50001] used in the request has exceeded the allowed maximum of [1000]. This maximum can be set by changing the [index.max_regex_length] index level setting."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "The length of regex [50001] used in the request has exceeded the allowed maximum of [1000]. This maximum can be set by changing the [index.max_regex_length] index level setting.",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "The length of regex [50001] used in the request has exceeded the allowed maximum of [1000]. This maximum can be set by changing the [index.max_regex_length] index level setting."
}
}
},
"status": 400
}
@dreamer-89 Discussed here - #1992 (comment) Assuming |
Sorry for posting this late, but I'm not convinced with the way this change is implemented due to following reasons -
|
@kartg Take a look above, open another issue if you agree |
This fix can be backported to version 1.35? We faced this issue, but currently we can't upgrade as we have some blockers. |
@jromeroch I think this can qualify, I'll mark it backport 1.x to start, but if auto-backport fails it will need to be done by someone (you?) by hand. |
…g-ex input (#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com> (cherry picked from commit 566ebfa) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…y very large reg-ex input (#8663) * Bugfix to guard against stack overflow errors caused by very large reg-ex input (#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com> (cherry picked from commit 566ebfa) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * [Patch for 1.x] Use apache.logging.log4j.util.Strings.repeat to generate integration test data This is necessary because the Java-native String.repeat() method is not present in JDK8 Signed-off-by: Kartik Ganesh <gkart@amazon.com> --------- Signed-off-by: Kartik Ganesh <gkart@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Kartik Ganesh <gkart@amazon.com>
My understanding is that this will be included in 1.3.12 as it is marked for backport 1.x. Does auto-backport then include all 1.x versions, so we should expect it in 1.3.11? |
…g-ex input (#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com> (cherry picked from commit 566ebfa) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…y very large reg-ex input (#16101) * Bugfix to guard against stack overflow errors caused by very large reg-ex input (#2810) * Bugfix to guard against stack overflow errors caused by very large reg-ex input This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide ar arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames. Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Adding integration tests for large string RegEx Signed-off-by: Kartik Ganesh <gkart@amazon.com> * Spotless Signed-off-by: Kartik Ganesh <gkart@amazon.com> (cherry picked from commit 566ebfa) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Use JDK8 compatible string repetition Signed-off-by: Daniel Widdis <widdis@gmail.com> * Change log Signed-off-by: Daniel Widdis <widdis@gmail.com> --------- Signed-off-by: Kartik Ganesh <gkart@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Daniel Widdis <widdis@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Daniel Widdis <widdis@gmail.com>
Description
This change fixes a code path that did not properly impose the index-level max_regex_length limit. Therefore, it was possibly to provide an arbitrarily large string as the include/exclude reg-ex value under search aggregations. This exposed the underlying node to crashes from a StackOverflowError, due to how the Lucene RegExp class processes strings using stack frames.
Signed-off-by: Kartik Ganesh gkart@amazon.com
Issues Resolved
closes #1992
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.