Closed
Description
Currently Elasticsearch offers standard deviation (STDDEV) and variance (VAR) both in population form however there's also the sampling
form which depending on the data size, can yield significantly different results.
As it's just a matter of a (somewhat) different formula, it should be straight forward to expand the current implementation Extended Stats to support this variant as well.
Potentially to avoid any ambiguities going forward, the current std_deviation
could be aliased to std_deviation_population
(same for variance
) so one could easily pick up the desired type and while also being clear about what type the default fields are.
The improved response can look something like this:
{
...
"aggregations": {
"grades_stats": {
"count": 2,
"min": 50.0,
"max": 100.0,
"avg": 75.0,
"sum": 150.0,
"sum_of_squares": 12500.0,
"variance": 625.0,
"variance_population": 625.0, // same as "variance"
"variance_sampling" : ...
"std_deviation": 25.0,
"std_deviation_population": 25.0, // same as std_deviation
"std_deviation_sampling": ...,
"std_deviation_bounds": {
"upper": 125.0,
"lower": 25.0
},
}
}
}