-
Notifications
You must be signed in to change notification settings - Fork 181
Description
Query Information
PPL Command/Query:
source=jaeger-span-2024-05-08 | dedup references.refType
Expected Result:
The query should return deduplicated results based on the references.refType field values (e.g., "CHILD_OF", "FOLLOWS_FROM", and null), similar to how source=jaeger-span-2024-05-08 | fields references.refType successfully returns 4226 results with these values.
Actual Result:
The query returns 0 results:
{
"schema": [...],
"datarows": [],
"total": 0,
"size": 0
}Dataset Information
Dataset/Schema Type
- OpenTelemetry (OTEL)
- Simple Schema for Observability (SS4O)
- Open Cybersecurity Schema Framework (OCSF)
- Custom (details below)
Index Mapping
{
"mappings": {
"properties": {
"references": {
"type": "nested",
"dynamic": "false",
"properties": {
"refType": {
"type": "keyword",
"ignore_above": 256
},
"spanID": {
"type": "keyword",
"ignore_above": 256
},
"traceID": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}Sample Data
{
"traceID": "3c61c828d40ae3c8b553f0ba2d185898",
"spanID": "1dc84d1a2920a6bf",
"operationName": "oteldemo.ProductCatalogService/GetProduct",
"references": [
{
"refType": "CHILD_OF",
"traceID": "3c61c828d40ae3c8b553f0ba2d185898",
"spanID": "047eded12fcda381"
}
]
}Bug Description
Issue Summary:
The dedup command returns zero results when applied to fields within nested objects, even though the same field can be successfully queried using the fields command.
Steps to Reproduce:
- Create an index with a nested field mapping (e.g., Jaeger span data with nested
referencesfield) - Insert documents containing nested objects with the target field populated
- Run
source=<index> | fields <nested_field>- this works correctly - Run
source=<index> | dedup <nested_field>- this returns 0 results
Minimal Reproduction:
# Create test index
curl -X PUT "localhost:9200/test-nested-dedup" -H 'Content-Type: application/json' -d '{
"mappings": {
"properties": {
"items": {
"type": "nested",
"properties": {
"name": {"type": "keyword"}
}
}
}
}
}'
# Insert test data
curl -X POST "localhost:9200/test-nested-dedup/_bulk" -H 'Content-Type: application/json' -d '
{"index":{"_id":"1"}}
{"items":[{"name":"apple"}]}
{"index":{"_id":"2"}}
{"items":[{"name":"banana"}]}
'
# Refresh index
curl -X POST "localhost:9200/test-nested-dedup/_refresh"
# This works - returns 2 results
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{
"query": "source=test-nested-dedup | fields items.name"
}'
# This fails - returns 0 results
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{
"query": "source=test-nested-dedup | dedup items.name"
}'Impact:
This bug prevents users from deduplicating results based on fields within nested objects.
Environment Information
OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT
Additional Details:
The issue occurs with the Calcite-based query engine (V3).
Technical Analysis
Tentative Root Cause:
Note: This is a preliminary analysis and requires further investigation.
The dedup command internally generates an IS NOT NULL filter on the dedup field to exclude null values before applying the deduplication logic. The query execution plan shows:
LogicalFilter(condition=[IS NOT NULL($42)])
This filter is then converted to an OpenSearch exists query:
{
"query": {
"exists": {
"field": "references.refType"
}
}
}However, for nested fields in OpenSearch, an exists query on a nested field path (e.g., references.refType) must be wrapped in a nested query to function correctly:
{
"query": {
"nested": {
"path": "references",
"query": {
"exists": {
"field": "references.refType"
}
}
}
}
}Verification:
- Direct
existsquery onreferences.refType: 0 results existsquery wrapped innestedquery: 2565 results (correct)
Code Location:
The issue appears to be in /opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java around lines 577-586:
// OpenSearch DSL does not handle IS_NULL / IS_NOT_NULL on nested fields correctly
checkForNestedFieldOperands(call);
Expression a = call.getOperands().get(0).accept(this);
QueryExpression operand = QueryExpression.create((TerminalExpression) a);
return call.getKind() == SqlKind.IS_NOT_NULL ? operand.exists() : operand.notExists();The code has a comment acknowledging the limitation and includes a check (checkForNestedFieldOperands) that throws an exception for nested fields. However, this check only detects when the operand type is ArraySqlType (the nested array itself), not when accessing a field within a nested object (e.g., references.refType).
Tentative Proposed Fix:
Note: This is a preliminary analysis and requires further investigation.
The fix would require:
-
Enhanced Detection: Modify the nested field detection logic to identify field paths that traverse nested objects (e.g.,
references.refTypewherereferencesis nested). -
Nested Query Wrapping: When generating an
existsquery for a field within a nested object, wrap it in anestedquery with the appropriate path. -
Implementation Approach:
- Extract the nested path from the field name (e.g.,
referencesfromreferences.refType) - Check if this path corresponds to a nested field in the index mapping
- If yes, generate a nested query wrapper around the exists query
- Extract the nested path from the field name (e.g.,
Example Fix Location:
In PredicateAnalyzer.java, the postfix() method would need to:
- Detect if the field is within a nested object
- Generate an appropriate nested query DSL instead of a simple exists query
Note: This analysis is preliminary. A complete fix would require further investigation!
Related Issues
- [FEATURE] Support nested (object) fields with Calcite #3452 - Support nested (object) fields with Calcite
Metadata
Metadata
Assignees
Labels
Type
Projects
Status