Skip to content

[ML] Do not make autoscaling decision when memory is undetermined #90259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dimitris-athanasiou
Copy link
Contributor

When a processor autoscaling decider was added in #89645, an unwanted change of behaviour sneaked in. In particular, if we cannot determine required memory capacity, we previously returned new AutoscalingDeciderResult(null) where as we now return an autoscaling result with no memory capacity and whatever the result of the processor decider is.

Previously, if we returned a result with null capacity, the cluster would remain as-is. Now, it is possible to cause unwanted scaling.

This commit fixes this by checking if the memory decider result was undetermined and returns an empty result if so.

Also, some logging warnings have been added to pop up scenarios that shouldn't happen like when the memory tracker is not called by the master node or it has no memory estimate for anomaly detection or analytics jobs.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although it would be nice to assert that we always set both node size and tier size or neither of them (if that's possible without making a test fail - I might be missing a valid situation where it can happen).

@dimitris-athanasiou
Copy link
Contributor Author

@elasticmachine update branch

When a processor autoscaling decider was added in elastic#89645, an unwanted
change of behaviour sneaked in. In particular, if we cannot determine
required memory capacity, we previously returned `new AutoscalingDeciderResult(null)`
where as we now return an autoscaling result with no memory capacity
and whatever the result of the processor decider is.

Previously, if we returned a result with null capacity, the cluster
would remain as-is. Now, it is possible to cause unwanted scaling.

This commit fixes this by checking if the memory decider result
was undetermined and returns an empty result if so.

Also, some logging warnings have been added to pop up scenarios
that shouldn't happen like when the memory tracker is not called
by the master node or it has no memory estimate for anomaly detection
or analytics jobs.
@dimitris-athanasiou dimitris-athanasiou force-pushed the do-not-make-autoscaling-decision-when-memory-is-undetermined branch from e930a74 to c31dbad Compare September 23, 2022 07:54
Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dimitris-athanasiou
Copy link
Contributor Author

run elasticsearch-ci/part-1

@dimitris-athanasiou dimitris-athanasiou merged commit 81a8b7d into elastic:main Sep 26, 2022
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Sep 26, 2022
…astic#90259)

When a processor autoscaling decider was added in elastic#89645, an unwanted
change of behaviour sneaked in. In particular, if we cannot determine
required memory capacity, we previously returned `new AutoscalingDeciderResult(null)`
where as we now return an autoscaling result with no memory capacity
and whatever the result of the processor decider is.

Previously, if we returned a result with null capacity, the cluster
would remain as-is. Now, it is possible to cause unwanted scaling.

This commit fixes this by checking if the memory decider result
was undetermined and returns an empty result if so.

Also, some logging warnings have been added to pop up scenarios
that shouldn't happen like when the memory tracker is not called
by the master node or it has no memory estimate for anomaly detection
or analytics jobs.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.5

elasticsearchmachine pushed a commit that referenced this pull request Sep 26, 2022
…0259) (#90335)

When a processor autoscaling decider was added in #89645, an unwanted
change of behaviour sneaked in. In particular, if we cannot determine
required memory capacity, we previously returned `new AutoscalingDeciderResult(null)`
where as we now return an autoscaling result with no memory capacity
and whatever the result of the processor decider is.

Previously, if we returned a result with null capacity, the cluster
would remain as-is. Now, it is possible to cause unwanted scaling.

This commit fixes this by checking if the memory decider result
was undetermined and returns an empty result if so.

Also, some logging warnings have been added to pop up scenarios
that shouldn't happen like when the memory tracker is not called
by the master node or it has no memory estimate for anomaly detection
or analytics jobs.
@dimitris-athanasiou dimitris-athanasiou deleted the do-not-make-autoscaling-decision-when-memory-is-undetermined branch September 26, 2022 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >non-issue Team:ML Meta label for the ML team v8.5.0 v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants