Skip to content

Invalid variance computed in extended_stats aggregation #37303

Closed
@raphz

Description

@raphz

Elasticsearch version (bin/elasticsearch --version): 6.1.4

Plugins installed: n/a

JVM version (java -version): 1.8.191

OS version (uname -a if on a Unix-like system): Centos 7.5

Description of the problem including expected versus actual behavior:
In some conditions, the variance computed in an extended_stats aggregation is computed as a negative number that should never append.

The variance is a sum of positive numbers, hence cannot be negative. What makes it negative here is the way it is computed (probably as the difference of two positive numbers here: "sum_of_squares / count" and "avg * avg"). Due to the non-infinite precision of floating point numbers, both numbers are 'almost' the same...

Proposed solution:

At least prevent negative values to appear in the variance: add a "Math.max(0.0, ...)" to the existing computation formula.

Steps to reproduce:

Using the attached zip file, do the following :

  • in env.properties: put the host/port to the elastic-search instance (default: localhost:9200)
  • run ./reproduce.sh my-test-index (or any other index name, beware, this index will be dropped!)

What does it do?

  • drop the given index
  • create it new with a specific mapping: index: long, amount: double
  • load 3 documents with same amount: 49.95
  • ask for an "extended_stats" aggregation on the amounts

Provide logs (if relevant):

{"acknowledged":true}

{"acknowledged":true,"shards_acknowledged":true,"index":"my-test-index"}

{"_index":"my-test-index","_type":"document","_id":"BErfN2gBWLZgAGEHAdb8","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

{"_index":"my-test-index","_type":"document","_id":"BUrfN2gBWLZgAGEHAtY2","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

{"_index":"my-test-index","_type":"document","_id":"BkrfN2gBWLZgAGEHAtZd","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

{"_shards":{"total":10,"successful":5,"failed":0}}
{
 "took" : 1,
 "timed_out" : false,
 "_shards" : {
   "total" : 5,
   "successful" : 5,
   "skipped" : 0,
   "failed" : 0
 },
 "hits" : {
   "total" : 3,
   "max_score" : 1.0,
   "hits" : [
     {
       "_index" : "my-test-index",
       "_type" : "document",
       "_id" : "BUrfN2gBWLZgAGEHAtY2",
       "_score" : 1.0,
       "_source" : {
         "amount" : 49.95,
         "index" : 2
       }
     },
     {
       "_index" : "my-test-index",
       "_type" : "document",
       "_id" : "BErfN2gBWLZgAGEHAdb8",
       "_score" : 1.0,
       "_source" : {
         "amount" : 49.95,
         "index" : 1
       }
     },
     {
       "_index" : "my-test-index",
       "_type" : "document",
       "_id" : "BkrfN2gBWLZgAGEHAtZd",
       "_score" : 1.0,
       "_source" : {
         "amount" : 49.95,
         "index" : 3
       }
     }
   ]
 },
 "aggregations" : {
   "amount" : {
     "count" : 3,
     "min" : 49.95,
     "max" : 49.95,
     "avg" : 49.95000000000001,
     "sum" : 149.85000000000002,
     "sum_of_squares" : 7485.0075000000015,
     "variance" : -3.0316490059097606E-13,
     "std_deviation" : "NaN",
     "std_deviation_bounds" : {
       "upper" : "NaN",
       "lower" : "NaN"
     }
   }
 }
}

File used to reproduce:
to-reproduce-it.zip

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions