Skip to content

[FEATURE] PPL should support terms aggregation with multiple buckets #4208

@noCharger

Description

@noCharger

Is your feature request related to a problem?
Currently, PPL converts multiple field grouping operations into composite aggregations instead of using OpenSearch's native multi_terms aggregation. This leads to:

  1. Inconsistent results between PPL and DSL queries
  2. Suboptimal performance for multi-field grouping

For example, this PPL query:

source = big5
| where `@timestamp` >= '2023-01-05 00:00:00' and `@timestamp` < '2023-01-05 05:00:00'
| stats count() by `process.name`, `cloud.region`
| sort - `count()`

Currently generates this composite aggregation:

{
  "aggregations": {
    "composite_buckets": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "process.name": {
              "terms": {
                "field": "process.name",
                "missing_bucket": true,
                "missing_order": "first",
                "order": "asc"
              }
            }
          },
          {
            "cloud.region": {
              "terms": {
                "field": "cloud.region",
                "missing_bucket": true,
                "missing_order": "first",
                "order": "asc"
              }
            }
          }
        ]
      },
      "aggregations": {
        "count()": {
          "value_count": {
            "field": "_index"
          }
        }
      }
    }
  }
}

What solution would you like?

Implement optimization to use native multi_terms aggregation when grouping by multiple fields. The above query should generate:

{
  "aggs": {
    "important_terms": {
      "multi_terms": {
        "terms": [
          {
            "field": "process.name"
          },
          {
            "field": "cloud.region"
          }
        ]
      }
    }
  }
}

Metadata

Metadata

Assignees

Labels

PPLPiped processing languageenhancementNew feature or request

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions