-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Description
When you run an ML anomaly detection job that also does categorization you end up with category definition results and anomaly results.
The category definitions have a category_id
field of type long
that stores the unique category ID within the job. The anomalies contain a keyword
field mlcategory
that stores the unique category ID that the anomaly relates to. The reason this is a keyword
is that all by/over/partition fields are added to anomaly results as keywords; the by/over/partition fields are not strongly typed within the core analytics code.
This discrepancy in how the category ID is stored makes it harder to use generic Kibana functionality to tie the two types of document together. It would be nicer if either the anomaly results had a category_id
field or the category definitions had a mlcategory
field.
There is a further complication. Many documents we write to the ML results have a result_type
field that indicates the document type. However, some do not. Category definitions are one such type of ML result. The way category definition documents are found is to do an exists
query on the category_id
field. Since this practice is widely used, it would be a bad idea to include the category_id
field in any other type of ML result.
As a result way to allow easier joining of category definitions and anomaly results is to add a field mlcategory
of type keyword
to category definition documents. This can easily be added to all category definition documents for both new jobs and pre-existing jobs when they are updated for any reason. Only jobs created in the version where the functionality is added or higher could guarantee the presence of mlcategory
in category definition documents, but older jobs that are actively running would acquire it over time.
While this change is being made a further change should be made to make querying ML results easier and more consistent in the future. We should add a result_type
field with value category_definition
to category definition documents. We will not be able to take advantage of this for a long time - we'll have to stick with querying for the existence of category_id
. But by adding the result_type
now we will create the possibility to simplify things further in a few years time, say in version 9.