generated from amazon-archives/__template_Custom
-
Notifications
You must be signed in to change notification settings - Fork 181
Closed as duplicate of#4282
Closed as duplicate of#4282
Copy link
Labels
PPLPiped processing languagePiped processing languageenhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem?
Currently, PPL converts multiple field grouping operations into composite aggregations instead of using OpenSearch's native multi_terms aggregation. This leads to:
- Inconsistent results between PPL and DSL queries
- Suboptimal performance for multi-field grouping
For example, this PPL query:
source = big5
| where `@timestamp` >= '2023-01-05 00:00:00' and `@timestamp` < '2023-01-05 05:00:00'
| stats count() by `process.name`, `cloud.region`
| sort - `count()`Currently generates this composite aggregation:
{
"aggregations": {
"composite_buckets": {
"composite": {
"size": 1000,
"sources": [
{
"process.name": {
"terms": {
"field": "process.name",
"missing_bucket": true,
"missing_order": "first",
"order": "asc"
}
}
},
{
"cloud.region": {
"terms": {
"field": "cloud.region",
"missing_bucket": true,
"missing_order": "first",
"order": "asc"
}
}
}
]
},
"aggregations": {
"count()": {
"value_count": {
"field": "_index"
}
}
}
}
}
}
What solution would you like?
Implement optimization to use native multi_terms aggregation when grouping by multiple fields. The above query should generate:
{
"aggs": {
"important_terms": {
"multi_terms": {
"terms": [
{
"field": "process.name"
},
{
"field": "cloud.region"
}
]
}
}
}
}
Metadata
Metadata
Assignees
Labels
PPLPiped processing languagePiped processing languageenhancementNew feature or requestNew feature or request
Type
Projects
Status
Done