Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]"type": "index_out_of_bounds_exception", "reason": "index_out_of_bounds_exception: null" #8155

Closed
adiOpenSearch opened this issue Jun 19, 2023 · 9 comments
Assignees
Labels
bug Something isn't working Search Search query, autocomplete ...etc v2.12.0 Issues and PRs related to version 2.12.0 v3.0.0 Issues and PRs related to version 3.0.0

Comments

@adiOpenSearch
Copy link

Exception - index_out_of_bounds_exception: null

When querying an OpenSearch get the following exception:

{
"error": {
"root_cause": [
{
"type": "index_out_of_bounds_exception",
"reason": "index_out_of_bounds_exception: null"

"script_stack": [
"java.base/java.nio.Buffer.checkIndex(Buffer.java:687 )",
"java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:269 )",
"org.apache.lucene.store .ByteBufferGuard.getByte(ByteBufferGuard.java:119 )",
"org.apache.lucene.store .ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:564 )",
"org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$8.longValue(Lucene90NormsProducer.java:442 )",
"org.apache.lucene.search .LeafSimScorer.getNormValue(LeafSimScorer.java:47 )",
"org.apache.lucene.search .LeafSimScorer.score(LeafSimScorer.java:60 )",
"org.apache.lucene.search .SynonymQuery$SynonymScorer.score(SynonymQuery.java:548 )",
"org.apache.lucene.search .ConjunctionScorer.score(ConjunctionScorer.java:61 )",
"org.apache.lucene.search .DisjunctionSumScorer.score(DisjunctionSumScorer.java:41 )",
"org.apache.lucene.search .DisjunctionScorer.score(DisjunctionScorer.java:193 )",
"org.apache.lucene.search .ConjunctionScorer.score(ConjunctionScorer.java:61 )",
"org.apache.lucene.search .ReqOptSumScorer.score(ReqOptSumScorer.java:263 )",
"org.opensearch.script.ScoreScript.lambda$setScorer$4(ScoreScript.java:156 )",
"org.opensearch.script.ScoreScript.get_score(ScoreScript.java:168 )",
"def ",
"^---- HERE"

Expected behavior
The fix of this issue is not in both 2.5 and 2.7 version. Can it be fixed in the next release version 2.9.

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@adiOpenSearch adiOpenSearch added bug Something isn't working untriaged labels Jun 19, 2023
@adiOpenSearch adiOpenSearch changed the title [BUG] [BUG]"type": "index_out_of_bounds_exception", "reason": "index_out_of_bounds_exception: null" Jun 19, 2023
@saratvemulapalli
Copy link
Member

Trying to understand, what are you querying? How do I reproduce this problem?

@adiOpenSearch
Copy link
Author

When querying an OpenSearch get the following exception in OS 2.5

{
"error": {
"root_cause": [
{
"type": "index_out_of_bounds_exception",
"reason": "index_out_of_bounds_exception: null"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "catalog_product_instances__20230413121800",
"node": "8gNMNc4BS2qhHwnPoxbcLQ",
"reason": { -
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"java.base/java.nio.Buffer.checkIndex(Buffer.java:687 )",
"java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:269 )",
"org.apache.lucene.store .ByteBufferGuard.getByte(ByteBufferGuard.java:119 )",
"org.apache.lucene.store .ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:564 )",
"org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$8.longValue(Lucene90NormsProducer.java:442 )",
"org.apache.lucene.search .LeafSimScorer.getNormValue(LeafSimScorer.java:47 )",
"org.apache.lucene.search .LeafSimScorer.score(LeafSimScorer.java:60 )",
"org.apache.lucene.search .SynonymQuery$SynonymScorer.score(SynonymQuery.java:548 )",
"org.apache.lucene.search .ConjunctionScorer.score(ConjunctionScorer.java:61 )",
"org.apache.lucene.search .DisjunctionSumScorer.score(DisjunctionSumScorer.java:41 )",
"org.apache.lucene.search .DisjunctionScorer.score(DisjunctionScorer.java:193 )",
"org.apache.lucene.search .ConjunctionScorer.score(ConjunctionScorer.java:61 )",
"org.apache.lucene.search .ReqOptSumScorer.score(ReqOptSumScorer.java:263 )",
"org.opensearch.script.ScoreScript.lambda$setScorer$4(ScoreScript.java:156 )",
"org.opensearch.script.ScoreScript.get_score(ScoreScript.java:168 )",
"def ",
"^---- HERE"
],
"script": "def expires_at = doc["variant.test_result.expires_at"]; ...",
"lang": "painless",
"position": {
"offset": 0,
"start": 0,
"end": 4
},
"caused_by": {
"type": "index_out_of_bounds_exception",
"reason": "index_out_of_bounds_exception: null"
}
}
}
],
"caused_by": {
"type": "index_out_of_bounds_exception",
"reason": "index_out_of_bounds_exception: null"
}
},
"status": 400
}
Attached is a file with a sample of the full query being executed.
To call out the painless script running:

def expires_at = doc["variant.test_result.expires_at"];
def has_test = expires_at.size() > 0;
def boost_test = params.boosts.lab_tested > 0;

def score = 0;

if (has_test && boost_test && expires_at.value.toInstant().toEpochMilli() > params.now ) {
score += params.boosts.lab_tested;
}

def owner_priority = doc["owner_priority"];
def has_priority = owner_priority.size() > 0;

if (has_priority) {
score += params.boosts.owner_priority * owner_priority.value;
}

return score + _score;
Changing the content of this script to just...

return _score;
continues to cause the query to throw the same exception. The content of the script is very unlikely to be the cause of this error.

Analysis: It appears that there are two related issues on ElasticSearch that seem to potentially fix this issue.
elastic/elasticsearch#82508
elastic/elasticsearch#90901

@adiOpenSearch
Copy link
Author

The core of the problem looks like is caused by adding the min_score: 1
to the query. Below is the query in a shorter format than the query is sent to ES. The noncritical info has been removed as it is very long and difficult to read.
{
"query": {
"script_score": {
"script": {
"source": "return _score.isNaN() ? score : score + _score;",
...
},
"query": {...},
"min_score": 1
}
},
...
}
The _score function does not return NaN when the min_score is zero or nonexistent. Even with this, however, still get an Index 16 out of bounds for length 16 failure when calling the _score method. No idea what is the root cause of that because we can't get an exact line number on the problem. Here is what that error looks like coming back from the API.
{ -
"took": 87,
"timed_out": false,
"_shards": { -
"total": 6,
"successful": 5,
"skipped": 0,
"failed": 1,
"failures": [ -
{ -
"shard": 4,
"index": "catalog_product_instances__20230413003052",
"node": "ArRTN3Z2SWGp9V30VK_f6w",
"reason": { -
"type": "array_index_out_of_bounds_exception",
"reason": "Index 16 out of bounds for length 16"
}
}
]
},
"hits": { -

@Rishikesh1159 Rishikesh1159 added the Search Search query, autocomplete ...etc label Jun 27, 2023
@dblock
Copy link
Member

dblock commented Jun 28, 2023

@adiOpenSearch could you please edit your issue with instructions of the smallest possible repro for what does look like a bug? try to reduce the query to the absolute minimum one

Next, if you want to help fixing, start by writing a REST test for it? https://github.com/opensearch-project/OpenSearch/blob/main/TESTING.md#testing-the-rest-layer

@macohen
Copy link
Contributor

macohen commented Sep 14, 2023

@adiOpenSearch are you able to work on @dblock's suggestion? do you need any help/guidance on this?

@adiOpenSearch
Copy link
Author

@macohen Unfortunately this error has only been tracked happening on production cluster. Not able to reproduce it locally or on any pre-production environment. This issue only happens on a subset of queries, these queries are also known to break consistently for a period of time then inexplicably start working on their own.
It seems like the core of the problem is caused by adding the min_score: 1 to a query. Below is the query tried to use fix the problem (with some of the extra non-essential information removed for readability):
{
“query”: {
“script_score”: {
“script”: {
“source”: “return _score.isNaN() ? score : score + _score;“,
...
},
“query”: {...},
“min_score”: 1
}
},
...
}
The _score function does not return NaN when the min_score is zero or nonexistent. Even with this, however, we still get an Index 16 out of bounds for length 16 failure when calling the _score method. No idea what is the root cause of that because we can’t get an exact line number on the problem. Here is what that error looks like coming back from the API.
{
“took”: 87,
“timed_out”: false,
“_shards”: {
“total”: 6,
“successful”: 5,
“skipped”: 0,
“failed”: 1,
“failures”: [
{
“shard”: 4,
“index”: “catalog_product_instances__20230413003052”,
“node”: “ArRTN3Z2SWGp9V30VK_f6w”,
“reason”: {
“type”: “array_index_out_of_bounds_exception”,
“reason”: “Index 16 out of bounds for length 16"
}
}
]
},
“hits”: {
Current work around is to run the query against Opensearch. If it fails, run the query again without the custom score. This typically yields a successful result, although not ideally ranked.

@msfroh
Copy link
Collaborator

msfroh commented Sep 21, 2023

@adiOpenSearch -- based on the stack trace, this almost sounds like index corruption (which tends to be unlikely).

Is this happening on a single index? Or does it happen across multiple indices? If it's a single index, then it sounds like the norms for one of the segments was written incorrectly (and maybe can be repaired).

@adiOpenSearch
Copy link
Author

adiOpenSearch commented Sep 22, 2023

@msfroh This issue is on single index for now and against a particular query. And rolled index multiple times.

@noCharger noCharger self-assigned this Nov 20, 2023
@msfroh
Copy link
Collaborator

msfroh commented Jan 9, 2024

@noCharger -- We determined this was caused by ScriptScorer missing an override for the approximation method, right?

The exception comes because the MinScoreScorer's TwoPhaseIterator doesn't think that it's wrapped around a TwoPhaseIterator, so it doesn't check the ScriptScorer for a match before asking it for a score for the current doc:

// we need to check the two-phase iterator first
// otherwise calling score() is illegal
if (inTwoPhase != null && inTwoPhase.matches() == false) {
return false;
}
curScore = in.score();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search Search query, autocomplete ...etc v2.12.0 Issues and PRs related to version 2.12.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
Status: Done
Development

No branches or pull requests

8 participants