Skip to content

[BUG] Long IN value list cause stack overflow error #1469

@dai-chen

Description

@dai-chen

What is the bug?

StackOverflowError occurs and OpenSearch JVM exits if the value list in IN expression is very long.

How can one reproduce the bug?

$ head big-in-list.txt
{ "query": "select * from opensearch_dashboards_sample_data_flights where FlightDelayMin in (1, 2, 3 ...)" }

$ curl -X POST "localhost:9200/_plugins/_sql" -H 'Content-Type: application/json' --data "@big-in-list.txt"
curl: (52) Empty reply from server

OpenSearch log:

java.lang.StackOverflowError
        ...
	at org.opensearch.sql.data.type.WideningTypeRule.distance(WideningTypeRule.java:51)
	at org.opensearch.sql.data.type.WideningTypeRule.lambda$distance$0(WideningTypeRule.java:50)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:558)
	at org.opensearch.sql.data.type.WideningTypeRule.distance(WideningTypeRule.java:51)
	at org.opensearch.sql.data.type.WideningTypeRule.distance(WideningTypeRule.java:40)
	at org.opensearch.sql.expression.function.FunctionSignature.match(FunctionSignature.java:53)
	at org.opensearch.sql.expression.function.DefaultFunctionResolver.resolve(DefaultFunctionResolver.java:49)
	at org.opensearch.sql.expression.function.BuiltinFunctionRepository.getFunctionBuilder(BuiltinFunctionRepository.java:177)
	at org.opensearch.sql.expression.function.BuiltinFunctionRepository.resolve(BuiltinFunctionRepository.java:159)
	at org.opensearch.sql.expression.function.BuiltinFunctionRepository.compile(BuiltinFunctionRepository.java:135)
	at org.opensearch.sql.expression.function.BuiltinFunctionRepository.compile(BuiltinFunctionRepository.java:119)
	at org.opensearch.sql.analysis.ExpressionAnalyzer.visitCompare(ExpressionAnalyzer.java:234)
	at org.opensearch.sql.analysis.ExpressionAnalyzer.visitIn(ExpressionAnalyzer.java:221)
	at org.opensearch.sql.analysis.ExpressionAnalyzer.visitIn(ExpressionAnalyzer.java:222)
	at org.opensearch.sql.analysis.ExpressionAnalyzer.visitIn(ExpressionAnalyzer.java:222)
	at org.opensearch.sql.analysis.ExpressionAnalyzer.visitIn(ExpressionAnalyzer.java:222)
	at org.opensearch.sql.analysis.ExpressionAnalyzer.visitIn(ExpressionAnalyzer.java:222)
	at org.opensearch.sql.analysis.ExpressionAnalyzer.visitIn(ExpressionAnalyzer.java:222)

What is the expected behavior?

The query can return correct results as expected or terminate properly at compile time.

What is your host/environment?

  • Plugins: latest code in main branch

Do you have any additional context?

Root Cause

The root cause I found is IN expression is compiled recursively into comparisons connected by OR. Each value in list cause the recursive call and stack go deeper. Code: https://github.com/opensearch-project/sql/blob/main/core/src/main/java/org/opensearch/sql/analysis/ExpressionAnalyzer.java#L222

  private Expression visitIn(
      UnresolvedExpression field, List<UnresolvedExpression> valueList, AnalysisContext context) {
    if (valueList.size() == 1) {
      return visitCompare(new Compare("=", field, valueList.get(0)), context);
    } else if (valueList.size() > 1) {
      return DSL.or(
          visitCompare(new Compare("=", field, valueList.get(0)), context),
          visitIn(field, valueList.subList(1, valueList.size()), context));
    } else {
      throw new SemanticCheckException("Values in In clause should not be empty");
    }
  }

Solutions

Possible solution to solve this problem include:

  • Add value list length check and document the limitation
  • Rewrite the recursion in visitIn function above to for loop
  • Compile IN expression to field = ANY(ARRAY(values)) which maybe optimal at both compile and runtime

Metadata

Metadata

Assignees

Labels

SQLbugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions