-
Notifications
You must be signed in to change notification settings - Fork 176
Description
Query Information
PPL Command/Query:
source=opensearch_dashboards_sample_data_logs
| eval RamMB=CEIL(machine.ram/1000000.0)
| stats bucket_nullable=false MIN(RamMB) as Min_RAM_MB, percentile(RamMB, 80) as P80_RAM_MB
Expected Result:
The query should return both the minimum value and the 80th percentile of the RamMB field in a single result row, similar to how MIN and MAX work together.
Actual Result:
{
"error": {
"reason": "There was internal problem at backend",
"details": "java.sql.SQLException: exception while executing query: Index 1 out of bounds for length 0",
"type": "RuntimeException"
},
"status": 500
}
Dataset Information
Dataset/Schema Type
- OpenTelemetry (OTEL)
- Simple Schema for Observability (SS4O)
- Open Cybersecurity Schema Framework (OCSF)
- Custom (OpenSearch Dashboards sample data)
Index Mapping
{
"mappings": {
"properties": {
"machine": {
"properties": {
"ram": { "type": "long" },
"os": { "type": "text" }
}
}
}
}
}
Sample Data
{
"machine": {
"ram": 8589934592,
"os": "win 8"
}
}
Bug Description
Issue Summary:
The stats
command fails with an "Index out of bounds" error when combining PERCENTILE()
or MEDIAN()
functions with standard aggregation functions like MIN()
or MAX()
in the same stats command.
Steps to Reproduce:
- Execute:
source=opensearch_dashboards_sample_data_logs | eval RamMB=CEIL(machine.ram/1000000.0) | stats MIN(RamMB), percentile(RamMB, 80)
- Observe error: "Index 1 out of bounds for length 0"
- Execute with MAX instead:
source=opensearch_dashboards_sample_data_logs | eval RamMB=CEIL(machine.ram/1000000.0) | stats MAX(RamMB), percentile(RamMB, 80)
- Same error occurs
- Execute with MEDIAN:
source=opensearch_dashboards_sample_data_logs | eval RamMB=CEIL(machine.ram/1000000.0) | stats MIN(RamMB), median(RamMB)
- Same error occurs
Verified Working Scenarios:
stats MIN(RamMB), MAX(RamMB)
- ✅ Worksstats percentile(RamMB, 80)
- ✅ Worksstats median(RamMB)
- ✅ Works
Impact:
Users cannot combine approximate aggregation functions (percentile, median) with standard aggregations (MIN, MAX) in a single stats command, requiring multiple separate queries and manual result merging.
Environment Information
OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT
Additional Details:
- Bug affects both PPL
percentile()
andmedian()
functions - Bug occurs regardless of aggregation order
- No test coverage exists for combining percentile/median with other aggregations
Tentative Root Cause Analysis
Very preliminary, the issue appears to be in the aggregation argument processing pipeline:
File: ./opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java
Line 486-487:
case PERCENTILE_APPROX -> {
PercentilesAggregationBuilder aggBuilder =
helper
.build(args.getFirst(), AggregationBuilders.percentiles(aggFieldName))
.percentiles(helper.inferValue(args.get(1), Double.class)); // Line 487
The code assumes args.get(1)
exists, but when percentile is combined with other aggregations, the Calcite query optimizer may restructure the arguments differently, resulting in an empty or incorrectly sized args list for the percentile aggregation.
Related File: ./opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/SinglePercentileParser.java
Line 30:
Streams.stream(((Percentiles) agg).iterator()).findFirst().get().getValue());
The .get()
call without checking if a value is present could also contribute to failures if the percentile aggregation returns no results.
Tentative Proposed Fix
This is a preliminary analysis and requires further investigation:
- Add defensive checks in
AggregateAnalyzer.java
to validate args list size before accessing indices - Investigate Calcite optimization to understand why args are restructured when multiple aggregations are present
- Add null-safety in
SinglePercentileParser.java
using.orElse()
or.orElseThrow()
with a descriptive error - Add integration tests in
./integ-test/src/test/java/org/opensearch/sql/ppl/StatsCommandIT.java
to cover combinations of percentile/median with other aggregations
Metadata
Metadata
Assignees
Labels
Type
Projects
Status