Skip to content

Accuracy metric fails when the dependent variable is of type 'boolean' or 'integer' #49796

@przemekwitek

Description

@przemekwitek

I've run a classification analysis on a synthetic dataset that tries to detect circle on a plane.
I've indexed docs with points on a 2D plane as well as a dependent variable ("is the point inside a unit circle"). The analysis finished correctly, but then I tried to evaluate the results using the following request:

{
  "index": "circle-ml",
  "query": {
    "term": {
      "ml.is_training": false
    }
  },
  "evaluation": {
    "classification": {
      "actual_field": "in_unit_circle",
      "predicted_field": "ml.in_unit_circle_prediction.keyword",
      "metrics": {
        "accuracy": {},
        "multiclass_confusion_matrix": {}
      }
    }
  }
}

The evaluation reported accuracy of 0 as it could not find any point for which dependent_variable was equal to the prediction.
The problem is that dependent variable is boolean and prediction is string, and the painless script is:

doc[''{0}''].value == doc[''{1}''].value

Two solutions I see here are:

  • (simpler) relax the equality check so that it treats boolean true and string "true" as equal
  • (more involved) make C++ code report prediction using the type of dependent variable. The type of the dependent variable can be passed down from Java.

Also, the same scenario should be reproduced for integer types.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions