Skip to content

[ML] Update index mappings on process start, not job open #37607

Closed
@droberts195

Description

@droberts195

At present we update index mappings for the state and results indices in TransportOpenJobAction. This dates back to the 5.4 to 5.5 upgrade, when we knew 5.4 jobs would not run in 5.5 because there were special checks to prevent it.

Unfortunately doing the index mapping upgrades when opening a job is not sufficient to ensure the mappings are correct by the time documents requiring the mappings are indexed. Documents can be indexed before the mappings are correct when a rolling upgrade is done with ML jobs open. These cause dynamic mappings to be created, and then when a subsequent job open is called (possibly for a different job) an error results because mappings cannot be updated.

The solution is to update the mappings on process open, not on job open. This is similar to the change made in e194d8e on #37483. (Thankfully with that one we noticed the problem in the initial review phase.)

Although the problem has existed since 5.6, version 6.5 is more likely to suffer from it because (a) the validation for enabled=false has been tightened up in #33933 and (b) in 6.5 we introduced the multi_bucket_impact field with mapping type double.

The only workaround to recover from dynamic mappings that clash with the desired mappings is to reindex the affected index while preserving all aliases, and this is hard. Therefore we should fix this as a priority for 6.6.1.

The fix will only stop the mappings inconsistency being created in the future. It will not help anyone who has already suffered from mappings inconsistency. I will paste the steps to recover by reindexing into this issue once they are validated.

Metadata

Metadata

Assignees

Labels

:mlMachine learning>bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions