Description
The multi-bucket functionality has highlighted that it would be nice to be able to label our anomaly results to say why they were created.
A possible way to do this that would be to add a field to certain types of results that is a string array. It could be called labels
, explanation
, or maybe there is a better name.
A multi-bucket anomaly might then look like this:
{
"job_id": "it-ops-kpi",
"result_type": "record",
"probability": 0.00000332668,
"record_score": 72.9929,
"initial_record_score": 65.7923,
"bucket_span": 300,
"detector_index": 0,
"is_interim": false,
"timestamp": 1454944200000,
"function": "low_sum",
"function_description": "sum",
"typical": [
1806.48
],
"actual": [
288
],
"field_name": "events_per_min",
"explanation": [
"multi-bucket"
]
}
The explanation
field contains zero or more strings that indicate why the result was created. We can have many possible reasons, but we should be rigorous about documenting what strings can possibly be used so that people who search for them know what to search for.
Should this field be available for both influencer results and record results or just record results?
This change would require a corresponding change to parsing and serialisation on the Java side, and a UI change to make the reasons visible to end users.
Originally it was thought that the same functionality could be used by users to add arbitrary annotations to results, but the current thinking is that it is better to have separate functionality for the two use cases, hence elastic/elasticsearch#33376 has been raised to discuss user annotations.