Description
This proposal is for one (or several) pipeline aggs that can perform normalization of the metrics. For example, given the series of data:
[5, 5, 10, 50, 10, 20]
A user might want to normalize those in different ways:
- Rescale [-1, 1]
[-1, -1, -0.77, 1, -0.77, -0.33]
- Rescale [0, 100]
[0, 0, 11.11, 100, 11.11, 33.33]
- Percentage of sum [0, 100%]
[5%, 5%, 10%, 50%, 10%, 20%]
- Mean normalization
[4.63, 4.63, 9.63, 49.63, 9.63, 9.63, 19.63]
- Z-score normalization (mean of zero, stdev of 1)
[-0.68, -0.68, -0.39, 1.94, -0.39, 0.19]
- Softmax (0-1 range, sum to 1, larger values have more weight)
[2.862E-20, 2.862E-20, 4.248E-18, 0.999, 9.357E-14, 4.248E-18]
etc etc
The two obvious use-cases are rescaling values to a a [0, 1]
range to make it easier to compare relative magnitudes, and normalizing to percentage of the sum for percentage charts.
More advanced functions like z-score are useful for their statistical properties, softmax can handle negative numbers nicely, etc. But I'm not sure how useful they would be in practice, since this is operating over bucket values and not raw values (which is where normalization/centering/standardizing typically has value).
In any case, a pipeline agg could accept the values from a multi-bucket agg (like a date_histo) and perform the normalization to produce a new set of metrics. Unsure how the syntax would look. If it was a single-purpose agg (percentage_of_sum
) it's easy. But if we want to build a multi-function agg that can perform multiple functions, we either need a selectable function or something like MovingFunction where the user specifies a script (with helper methods)