Skip to content

[FEATURE] Support nested aggregation #2813

@LantaoJin

Description

@LantaoJin

Is your feature request related to a problem?
In order to interact with nested fields in an aggregation, a nested aggregation must be used.
OpenSearch DSL query:

GET logs/_search
{
  "query": {
    "match": { "response": "200" }
  },
  "aggs": {
    "pages": {
      "nested": {
        "path": "pages"
      },
      "aggs": {
        "min_load_time": { "min": { "field": "pages.load_time" } }
      }
    }
  }
}

Translated OpenSearch SQL query:

SELECT min(nested(pages.load_time, pages)) FROM logs WHERE response = 200;

However, above nested aggregation query hasn't supported yet in SQL plugin. Aggregation on nested fields directly will lead to incorrect results. Some related issues: #2739, #2529

What solution would you like?
Support nested aggregation with follow syntaxes:
SQL: agg_func(nested(field | [field, path]))
PPL: | stats agg_func(nested(field | [field, path]))

What alternatives have you considered?
For PPL query, an alternative to calculate aggregation on nested field is searching out all nested field data then aggregate:

source = jaeger-span-* | fields references.refType | stats count(references.refType)

For SQL query:

SELECT count(t.temp) FROM
   (SELECT nested(references.refType, references) as temp FROM jaeger-span-*) t

But both of them have performance problem and the results are limited by memory and query_size parameters.

Do you have any additional context?
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

Labels

PPLPiped processing languageenhancementNew feature or request

Type

No type

Projects

Status

In review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions