[ML] _all requests can suffer "job not found" errors

(Migrated from https://github.com/elastic/elasticsearch/issues/37545#issuecomment-455237072 to improve visibility.)

The failure of https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.5+matrix-java-periodic/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=java11,nodes=virtual&&linux/166/ showed that it is possible for a request to do some ML operation for `_all` can return an error that it could not find an entity it expected to find.

For example, closing `_all` jobs might return an error that job `foo` does not exist.  Or stopping `_all` datafeeds might return an error that datafeed `bar` does not exist.

This seems completely crazy, as it's obvious that `_all` should only include entities that exist.

The reason this can happen is that our actions involve multiple base level Elasticsearch actions chained together, and entities could be deleted in between these base level steps.  For example:

1. Alice requests force delete of job `foo`
2. Bob requests close `_all` jobs
3. Bob's request to close `_all` jobs expands `_all` to `foo` and `bar`
4. Alice's request to force delete `foo` removes the config associated with job `foo`
5. Bob's request to close `_all` jobs attempts to find the config for job `foo`
6. Bob's request to close `_all` fails because the config for job `foo` does not exist

Although the test failure that highlighted this problem was a 6.5 test run, I suspect the problem is worse in 6.6 and above because expanding `_all` requires a search for configs in an index rather than just looking in the (in-memory on all nodes) cluster state.

ML actions that operate on `_all` should silently ignore failures to find entities from the original expansion of `_all`, on the assumption that these entities have been deleted by a concurrent request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] _all requests can suffer "job not found" errors #37959

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ML] _all requests can suffer "job not found" errors #37959

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions