Skip to content

Add mlcategory and result_type to category definition docs #60108

@droberts195

Description

@droberts195

When you run an ML anomaly detection job that also does categorization you end up with category definition results and anomaly results.

The category definitions have a category_id field of type long that stores the unique category ID within the job. The anomalies contain a keyword field mlcategory that stores the unique category ID that the anomaly relates to. The reason this is a keyword is that all by/over/partition fields are added to anomaly results as keywords; the by/over/partition fields are not strongly typed within the core analytics code.

This discrepancy in how the category ID is stored makes it harder to use generic Kibana functionality to tie the two types of document together. It would be nicer if either the anomaly results had a category_id field or the category definitions had a mlcategory field.

There is a further complication. Many documents we write to the ML results have a result_type field that indicates the document type. However, some do not. Category definitions are one such type of ML result. The way category definition documents are found is to do an exists query on the category_id field. Since this practice is widely used, it would be a bad idea to include the category_id field in any other type of ML result.

As a result way to allow easier joining of category definitions and anomaly results is to add a field mlcategory of type keyword to category definition documents. This can easily be added to all category definition documents for both new jobs and pre-existing jobs when they are updated for any reason. Only jobs created in the version where the functionality is added or higher could guarantee the presence of mlcategory in category definition documents, but older jobs that are actively running would acquire it over time.

While this change is being made a further change should be made to make querying ML results easier and more consistent in the future. We should add a result_type field with value category_definition to category definition documents. We will not be able to take advantage of this for a long time - we'll have to stick with querying for the existence of category_id. But by adding the result_type now we will create the possibility to simplify things further in a few years time, say in version 9.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions