Skip to content

Normalization pipeline aggregations #51005

Closed
@polyfractal

Description

@polyfractal

This proposal is for one (or several) pipeline aggs that can perform normalization of the metrics. For example, given the series of data:

[5, 5, 10, 50, 10, 20]

A user might want to normalize those in different ways:

  • Rescale [-1, 1]
    • [-1, -1, -0.77, 1, -0.77, -0.33]
  • Rescale [0, 100]
    • [0, 0, 11.11, 100, 11.11, 33.33]
  • Percentage of sum [0, 100%]
    • [5%, 5%, 10%, 50%, 10%, 20%]
  • Mean normalization
    • [4.63, 4.63, 9.63, 49.63, 9.63, 9.63, 19.63]
  • Z-score normalization (mean of zero, stdev of 1)
    • [-0.68, -0.68, -0.39, 1.94, -0.39, 0.19]
  • Softmax (0-1 range, sum to 1, larger values have more weight)
    • [2.862E-20, 2.862E-20, 4.248E-18, 0.999, 9.357E-14, 4.248E-18]

etc etc

The two obvious use-cases are rescaling values to a a [0, 1] range to make it easier to compare relative magnitudes, and normalizing to percentage of the sum for percentage charts.

More advanced functions like z-score are useful for their statistical properties, softmax can handle negative numbers nicely, etc. But I'm not sure how useful they would be in practice, since this is operating over bucket values and not raw values (which is where normalization/centering/standardizing typically has value).

In any case, a pipeline agg could accept the values from a multi-bucket agg (like a date_histo) and perform the normalization to produce a new set of metrics. Unsure how the syntax would look. If it was a single-purpose agg (percentage_of_sum) it's easy. But if we want to build a multi-function agg that can perform multiple functions, we either need a selectable function or something like MovingFunction where the user specifies a script (with helper methods)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions