Skip to content

[FEATURE]Add ppl eventstats command functionality #3024

@YANG-DB

Description

@YANG-DB

Description

PPL eventstats command

Description

The eventstats command enriches your event data with calculated summary statistics. It operates by analyzing specified fields within your events, computing various statistical measures, and then appending these results as new fields to each original event.

Key aspects of eventstats:

  1. It performs calculations across the entire result set or within defined groups.
  2. The original events remain intact, with new fields added to contain the statistical results.
  3. The command is particularly useful for comparative analysis, identifying outliers, or providing additional context to individual events.

Difference between stats and eventstats

The stats and eventstats commands are both used for calculating statistics, but they have some key differences in how they operate and what they produce:

  • Output Format:
    • stats: Produces a summary table with only the calculated statistics.
    • eventstats: Adds the calculated statistics as new fields to the existing events, preserving the original data.
  • Event Retention:
    • stats: Reduces the result set to only the statistical summary, discarding individual events.
    • eventstats: Retains all original events and adds new fields with the calculated statistics.
  • Use Cases:
    • stats: Best for creating summary reports or dashboards. Often used as a final command to summarize results.
    • eventstats: Useful when you need to enrich events with statistical context for further analysis or filtering. Can be used mid-search to add statistics that can be used in subsequent commands.

Syntax

eventstats <aggregation>... [by-clause]

(check "docs/ppl-lang/ppl-eventstats-command.md" for details)

Event Aggregations

See additional command details

  • source = table | eventstats avg(a)
  • source = table | where a < 50 | eventstats avg(c)
  • source = table | eventstats max(c) by b
  • source = table | eventstats count(c) by b | head 5
  • source = table | eventstats stddev_samp(c)
  • source = table | eventstats stddev_pop(c)
  • source = table | eventstats percentile(c, 90)
  • source = table | eventstats percentile_approx(c, 99)

Limitation: distinct aggregation could not used in eventstats:_

  • source = table | eventstats distinct_count(c) (throw exception)

Aggregations With Span

  • source = table | eventstats count(a) by span(a, 10) as a_span
  • source = table | eventstats sum(age) by span(age, 5) as age_span | head 2
  • source = table | eventstats avg(age) by span(age, 20) as age_span, country | sort - age_span | head 2

Aggregations With TimeWindow Span (tumble windowing function)

  • source = table | eventstats sum(productsAmount) by span(transactionDate, 1d) as age_date | sort age_date
  • source = table | eventstats sum(productsAmount) by span(transactionDate, 1w) as age_date, productId

Aggregations Group by Multiple Times

  • source = table | eventstats avg(age) as avg_state_age by country, state | eventstats avg(avg_state_age) as avg_country_age by country
  • source = table | eventstats avg(age) as avg_city_age by country, state, city | eval new_avg_city_age = avg_city_age - 1 | eventstats avg(new_avg_city_age) as avg_state_age by country, state | where avg_state_age > 18 | eventstats avg(avg_state_age) as avg_adult_country_age by country

Metadata

Metadata

Assignees

Labels

PPLPiped processing languageenhancementNew feature or request

Type

No type

Projects

Status

Done

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions