Skip to content

[BUG] PPL aggregation operations (stats, top) return null for nested field values #4564

@alexey-temnikov

Description

@alexey-temnikov

Query Information

PPL Command/Query:

source=test-nested-agg | stats COUNT() by request.path
source=test-nested-agg | top request.path

Expected Result:
Aggregation operations should return grouped counts by the nested field values, similar to how fields command successfully retrieves the values.

Actual Result:

  • stats COUNT() by nested.field returns a single row with null as the grouping field value
  • top nested.field returns null with total count
  • fields nested.field works correctly (returns all values)
{
  "schema": [
    {"name": "COUNT()", "type": "bigint"},
    {"name": "request.path", "type": "string"}
  ],
  "datarows": [
    [10, null]
  ]
}

Dataset Information

Dataset/Schema Type

  • Open Cybersecurity Schema Framework (OCSF)
  • OpenTelemetry (OTEL)
  • Custom (minimal reproduction below)

Index Mapping

{
  "mappings": {
    "properties": {
      "product": {"type": "keyword"},
      "request": {
        "type": "nested",
        "properties": {
          "path": {"type": "keyword"},
          "method": {"type": "keyword"}
        }
      }
    }
  }
}

Sample Data

{"product":"eks","request":{"path":"/api/v1/pods","method":"GET"}}
{"product":"eks","request":{"path":"/api/v1/services","method":"GET"}}
{"product":"eks","request":{"path":"/api/v1/pods","method":"POST"}}

Bug Description

Issue Summary:
PPL aggregation operations (stats, top) return null for nested field values instead of performing proper aggregation. The fields command works correctly on the same nested fields, confirming the data exists.

Steps to Reproduce:

# 1. Create test index with nested field
curl -X PUT "http://localhost:9200/test-nested-agg" -H 'Content-Type: application/json' -d '{
  "mappings": {
    "properties": {
      "product": {"type": "keyword"},
      "request": {
        "type": "nested",
        "properties": {
          "path": {"type": "keyword"},
          "method": {"type": "keyword"}
        }
      }
    }
  }
}'

# 2. Insert test data
curl -X POST "http://localhost:9200/test-nested-agg/_bulk" -H 'Content-Type: application/json' -d '
{"index":{}}
{"product":"eks","request":{"path":"/api/v1/pods","method":"GET"}}
{"index":{}}
{"product":"eks","request":{"path":"/api/v1/services","method":"GET"}}
{"index":{}}
{"product":"eks","request":{"path":"/api/v1/pods","method":"POST"}}
'

curl -X POST "http://localhost:9200/test-nested-agg/_refresh"

# 3. Test fields command (works correctly)
curl -X POST "http://localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{
  "query": "source=test-nested-agg | fields request.path"
}'
# Returns: 3 rows with values ✅

# 4. Test aggregation (fails - returns null)
curl -X POST "http://localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{
  "query": "source=test-nested-agg | stats COUNT() by request.path"
}'
# Returns: 1 row with null ❌

Environment Information

OpenSearch Version:
OpenSearch 3.4.0-SNAPSHOT (also affects 3.3.0)

Root Cause Analysis

Tentative Root Cause:

Note: This is a preliminary analysis and requires further investigation.

OpenSearch requires a nested aggregation wrapper to aggregate on nested fields. PPL/SQL generates standard aggregations without this wrapper.

Verification with OpenSearch DSL:

Without nested wrapper (what PPL generates):

curl -X POST "http://localhost:9200/test-nested-agg/_search?size=0" -H 'Content-Type: application/json' -d '{
  "aggs": {
    "by_path": {
      "terms": {"field": "request.path"}
    }
  }
}'
# Result: Empty buckets ❌

With nested wrapper (correct approach):

curl -X POST "http://localhost:9200/test-nested-agg/_search?size=0" -H 'Content-Type: application/json' -d '{
  "aggs": {
    "nested_request": {
      "nested": {"path": "request"},
      "aggs": {
        "by_path": {
          "terms": {"field": "request.path"}
        }
      }
    }
  }
}'
# Result: Correct aggregation with 2 buckets ✅

Code Location:

The issue is in the aggregation generation layer:

This is a preliminary analysis and requires further investigation.

Related Issues

This bug is part of the broader nested field support gap:

Note: This issue specifically tracks aggregation operations returning null. Related issues #4487 and #4508 affect the query/filter layer with a similar underlying cause but different manifestations.

Metadata

Metadata

Assignees

Labels

PPLPiped processing languagebugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions