Skip to content

[ML] Consider unifying datafeed and job configuration  #34231

Open
@davidkyle

Description

@davidkyle

The original design of the datafeed envisioned it as a general purpose tool that could be used with different types of jobs rather than just anomaly detection. As the code has evolved the datafeed looks more like a single purpose tool dedicated to feeding data to anomaly detector jobs (query delay, aggregations, write to autodetect) and not easily adaptable to future use cases or job types. Also it was imagined that a single datafeed could feed multiple jobs but aggregations efficiently reduce the data volume enough that we have not required this and because the ideal aggregation interval is a function of bucket span it is not always appropriate to feed the same data to multiple jobs at different bucket spans. To some extent multi-bucket anomalies have mitigated this requirement.

The change to move configuration out of the cluster state (#32905) has shown the current arrangement is vulnerable to inconsistencies as the datafeed and job are defined in separate documents that can change independently. Given that a datafeed is tightly coupled to its job the configuration could be defined inside the job itself - this is how the UI presents the datafeed as part of the job - simplifying the code as only one document needs to be read and ensuring consistency. This needn't break the REST API as the datafeeds can be extracted from the jobs without the client having any knowledge of where they came from.

I'm not advocating making the change today but if the burden of maintaining separate configs for datafeeds and anomaly detector jobs grows the refactor should be made.

Metadata

Metadata

Assignees

Labels

:mlMachine learning

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions