-
Notifications
You must be signed in to change notification settings - Fork 176
Description
Problem Statement
Now OpenSearch PPL lacks the implementation of streamstats
. This streamstats
command is conceptually different from the traditional stats
command because it performs cumulative or rolling statistical calculations in a streaming manner as events are processed, rather than computing final aggregates after the entire dataset is scanned.
While OpenSearch PPL already supports aggregation through the stats
and eventstats
command, it does not currently provide streaming aggregations or window-based cumulative metrics similar to streamstats
.
Differences between stats
, eventstats
and streamstats
All of these commands can be used to generate aggregations such as average, sum, and maximum. The key difference is that stats
in PPL aggregates data globally or by groups in a single result row per group, while eventstats
performs similar aggregations but retains the original events and appends the aggregated values as new fields. Whereas streamstats
updates and outputs running calculations for every incoming event
, making it especially useful for cumulative or time-series analyses.
The differences between these commands are summarized in the following table:
Feature | stats | eventstats | streamstats |
---|---|---|---|
Transformation Behavior | Converts all events into an aggregation result table, removing original event structure. | Adds aggregation results as new fields to the original events while keeping the original structure. | Calculates cumulative or sliding statistics event-by-event within the stream and adds them as new fields. |
Output Format | Outputs only aggregated values, does not retain original events. | Retains original events and adds overall summary fields. | Retains original events and adds running or window-based statistics fields. |
Aggregation Scope | Based on all events in the search (or groups defined by BY ). |
Based on all relevant events, then the result is added back to each event in the group. | Computes statistics incrementally in event order; supports window , time_window , or reset conditions to limit the scope. |
Use Cases | When only aggregated results are needed (e.g., total count, average). | When aggregated statistics are needed alongside original event data. | When cumulative or rolling statistics are required (e.g., running totals, moving averages). |
Key Arguments | BY <field> for grouping. |
Same as stats , supports BY and allnum . |
Additional options: window , time_window , reset_before , reset_after , reset_on_change , current , etc. |
Performance Considerations | Reduces data volume significantly. | Higher memory usage because all events are retained; controlled by max_mem_usage_mb . |
Stream-based processing; memory usage depends on window size and reset strategy. Controlled by max_mem_usage_mb and maxresultrows . |
Syntax
streamstats
[reset_before="("<eval-expression>")"]
[reset_after="("<eval-expression>")"]
[current=<bool>]
[window=<int>]
[global=<bool>]
<stats-agg-term>...
[<by-clause>]
Required arguments
- stats-agg-term
Optional arguments
- by-clause
- current
- window
- global
- reset_after
- reset_before
(more details will be in the "docs/user/ppl/cmd/streamstats.rst" TBD)
Usage Examples
Basic Streaming Calculation
-- Running count of consecutive error minutes
... | streamstats count(eval(errorCount>0)) as consecutiveErrorMinutes
Optional Arguments Usage
-- Disable current event contribution and group by host
... | streamstats current=f values(consecutiveErrorMinutes) as previousResults by host
-- Sliding window average over the last 10 events
... | streamstats window=10 avg(count) as avg_count
-- Conditional reset before calculation
... | streamstats count(eval(errorCount>0)) as consecutiveErrorMinutes by source reset_before="REDACTED"match(errorCount)
-- Conditional reset after calculation
... | streamstats reset_after=total_errors<10 c as running_count
-- Group-scoped sliding window with global=f
... | streamstats current=f window=2 global=f last(current_value) as prev_value
Implement RFC
-
For the Boolean type of current, opensearch strictly controls the input to true/false, but a large number of use cases use “current = f”. Do we need to adapt it?
-
How to implement arguments reset?…… TBD. dynamic offset/ session partition
-
Strange behavior for argument
global
-According to the doc, theglobal
argument only works when both thewindow
andby
are set. In this case, we should setglobal
to false to achieve group-by-group windowing. However, ifglobal
is set to true (or does nothing, as it defaults to true), how should we interpret the query's behavior? Attached below are the documentation and my experimental query results, which leave me somewhat puzzled.




Metadata
Metadata
Assignees
Labels
Type
Projects
Status