Skip to content

[7.16] [ML] Audit job open failures and stop any corresponding datafeed (#80665) #80678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 12, 2021

Conversation

droberts195
Copy link
Contributor

Backports the following commits to 7.16:

…stic#80665)

The anomaly detection code contained an assumption dating back
to 2016 that if a job failed then its datafeed would notice and
stop itself. That works if the job fails on a node after it has
successfully started up. But it doesn't work if the job fails
during the startup sequence. If the job is being started for the
first time then the datafeed won't be running, so there's no
problem, but if the job fails when it's being reassigned to a
new node then it breaks down, because the datafeed is started
by not assigned to any node at that instant.

This PR addresses this by making the job force-stop its own
datafeed if it fails during its startup sequence and the datafeed
is started.

Fixes elastic#48934

Additionally, auditing of job failures during the startup
sequence is moved so that it happens for all failure scenarios
instead of just one.

Fixes elastic#80621
@droberts195 droberts195 added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport labels Nov 11, 2021
Copy link
Contributor Author

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test won't work on 7.16 as 7.16 can load 6.3 model state

@elasticsearchmachine elasticsearchmachine merged commit 34b2990 into elastic:7.16 Nov 12, 2021
@droberts195 droberts195 deleted the backport/7.16/pr-80665 branch November 12, 2021 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport v7.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants