Skip to content

[ML] Validate existing cluster state differently to newly submitted configs #30084

Closed
@elasticmachine

Description

@elasticmachine

Original comment by @droberts195:

If we're going to introduce completely new job types in the future, we need to change the way unknown job/datafeed cluster state is validated.

While trying to add categorizer jobs, which are quite similar to anomaly_detector jobs, I ran into the following problem:

  • Logically, a categorizer job should have no detectors
  • But the AnalysisConfig class requires detectors
  • There are two possible solutions that seem reasonable at first glance:
    1. Have categorizer jobs have a categorization_config instead of analysis_config
    2. Change analysis_config so that detectors is not required if the job_type is categorizer
  • Unfortunately neither of these works:
    1. Old nodes will ignore categorization_config when parsing metadata, but then error because Job requires an analysis_config
    2. Old nodes will not tolerate an analysis_config with no detectors
  • This results in the messy solution that categorizer jobs will have to have an analysis_config that includes unnecessary fields - new nodes will ignore these fields and mask them when printing the config in REST responses, but old nodes will show the unnecessary bits

I think the only long term solution that allows the necessary degree of extensibility is to hold Jobs as arbitrary Map<String, Object> or BytesReference when parsing from cluster state, and only interpret what's in the Map or BytesReference if the job_type is understood. This is pretty much how index settings work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions