generated from amazon-archives/__template_Custom
-
Notifications
You must be signed in to change notification settings - Fork 181
Closed
Labels
PPLPiped processing languagePiped processing languagecalcitecalcite migration releatedcalcite migration releatedenhancementNew feature or requestNew feature or requestpushdownpushdown related issuespushdown related issuesv3.3.0
Description
Is your feature request related to a problem?
Currently, PPL's span() function with fixed intervals doesn't provide the same flexibility as OpenSearch's native auto_date_histogram aggregation. This leads to:
- Suboptimal bucket distribution for time series analysis (as 1000 fixed)
- Need for manual tuning of time intervals
- Potential for too many or too few buckets
- Inconsistent results between PPL and DSL queries
For example, this PPL query uses a fixed 1-hour interval:
source = custom-big5
| eval range_bucket = case(
`metrics.size` < -10, 'range_1',
`metrics.size` >= -10 and `metrics.size` < 10, 'range_2',
`metrics.size` >= 10 and `metrics.size` < 100, 'range_3',
`metrics.size` >= 100 and `metrics.size` < 1000, 'range_4',
`metrics.size` >= 1000 and `metrics.size` < 2000, 'range_5',
`metrics.size` >= 2000, 'range_6'
)
| stats min(`metrics.tmin`) as tmin, avg(`metrics.size`) as tavg, max(`metrics.size`) as tmax
by range_bucket, span(`@timestamp`, 1h) as auto_spanWhile the equivalent DSL query uses auto_date_histogram for optimal bucket distribution:
{
"aggs": {
"tmax": {
"range": {
"field": "metrics.size",
"ranges": [...]
},
"aggs": {
"date": {
"auto_date_histogram": {
"field": "@timestamp",
"buckets": 20
}
}
}
}
}
}
What solution would you like?
Implement auto_date_histogram support in PPL with:
New function syntax for auto-bucketing:
source = custom-big5
| stats count() by auto_span(`@timestamp`, buckets=20)
Key features:
- Dynamic interval selection based on data range
- Target bucket count specification
- Automatic interval rounding to human-readable values
- Consistent bucket count regardless of time range
What alternatives have you considered?
Enhanced span() function with auto mode:
span(`@timestamp`, 'auto', target_buckets=20)
Hybrid approach with min/max intervals:
adaptive_span(`@timestamp`, buckets=20, min_interval='1m', max_interval='1d')
Metadata
Metadata
Assignees
Labels
PPLPiped processing languagePiped processing languagecalcitecalcite migration releatedcalcite migration releatedenhancementNew feature or requestNew feature or requestpushdownpushdown related issuespushdown related issuesv3.3.0
Type
Projects
Status
New
Status
Done