Skip to content

Add standard deviation / variance sampling to extended stats aggregation #49554

Closed
@costin

Description

@costin

Currently Elasticsearch offers standard deviation (STDDEV) and variance (VAR) both in population form however there's also the sampling form which depending on the data size, can yield significantly different results.
As it's just a matter of a (somewhat) different formula, it should be straight forward to expand the current implementation Extended Stats to support this variant as well.

Potentially to avoid any ambiguities going forward, the current std_deviation could be aliased to std_deviation_population (same for variance) so one could easily pick up the desired type and while also being clear about what type the default fields are.

The improved response can look something like this:

{
    ...

    "aggregations": {
        "grades_stats": {
           "count": 2,
           "min": 50.0,
           "max": 100.0,
           "avg": 75.0,
           "sum": 150.0,
           "sum_of_squares": 12500.0,
           "variance": 625.0,
           "variance_population": 625.0,  // same as "variance"
           "variance_sampling" : ...
           "std_deviation": 25.0,
           "std_deviation_population": 25.0, // same as std_deviation
           "std_deviation_sampling": ...,
           "std_deviation_bounds": {
            "upper": 125.0,
            "lower": 25.0
           },
        }
    }
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions