-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Closed
Closed
Copy link
Labels
Description
I've run a classification analysis on a synthetic dataset that tries to detect circle on a plane.
I've indexed docs with points on a 2D plane as well as a dependent variable ("is the point inside a unit circle"). The analysis finished correctly, but then I tried to evaluate the results using the following request:
{
"index": "circle-ml",
"query": {
"term": {
"ml.is_training": false
}
},
"evaluation": {
"classification": {
"actual_field": "in_unit_circle",
"predicted_field": "ml.in_unit_circle_prediction.keyword",
"metrics": {
"accuracy": {},
"multiclass_confusion_matrix": {}
}
}
}
}
The evaluation reported accuracy of 0
as it could not find any point for which dependent_variable
was equal to the prediction.
The problem is that dependent variable is boolean and prediction is string, and the painless script is:
doc[''{0}''].value == doc[''{1}''].value
Two solutions I see here are:
- (simpler) relax the equality check so that it treats boolean
true
and string"true"
as equal - (more involved) make C++ code report prediction using the type of dependent variable. The type of the dependent variable can be passed down from Java.
Also, the same scenario should be reproduced for integer types.