Skip to content

[ML] prevent accidentally asking for more resources when scaling down and improve scaling size estimations #74691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

benwtrent
Copy link
Member

@benwtrent benwtrent commented Jun 29, 2021

This commit addresses two problems:

  • Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be.
  • When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what.

One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be 2gb it has been changed to 1.2gb. This accounts for the "window of uncertainty" for JVM sizes.

closes: #74709

@benwtrent benwtrent added :ml Machine learning >bug labels Jun 29, 2021
@benwtrent benwtrent force-pushed the feature/ml-fix-autoscaling-node-size-calculation branch from 87e8c94 to 6e35a2b Compare June 29, 2021 15:07
@benwtrent benwtrent marked this pull request as ready for review June 29, 2021 15:50
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Jun 29, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@benwtrent benwtrent changed the title [ML] improve memory calculation estimation for scaling decisions [ML] prevent accidentally asking for more resources when scaling down and improve scaling size estimations Jun 29, 2021
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments on specific lines.

A more general comment is that I am not sure how much effect all the changes of ceil to round and switching between integer and floating point arithmetic at different steps of calculations really has. I would expect those changes to only make a difference of a few bytes here and there. The biggest change by far seems to be the 2GB -> 1280MB change to the cutoff for choosing which of two values possible values to choose for our ill-defined inverse function. If this is correct then it would be good to change the PR description to reflect this.

@@ -139,8 +129,8 @@ private static long allowedBytesForMl(long machineMemory, Long jvmSize, int maxM
// 2GB node -> 66%
// 16GB node -> 87%
// 64GB node -> 90%
long memoryPercent = Math.min(90, (int)Math.ceil(((machineMemory - jvmSize - OS_OVERHEAD) / (double)machineMemory) * 100.0D));
return (long)(machineMemory * (memoryPercent / 100.0));
double memoryPercent = Math.min(0.90, (machineMemory - jvmSize - OS_OVERHEAD) / (double)machineMemory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
double memoryPercent = Math.min(0.90, (machineMemory - jvmSize - OS_OVERHEAD) / (double)machineMemory);
double memoryProportion = Math.min(0.90, (machineMemory - jvmSize - OS_OVERHEAD) / (double)machineMemory);

if (nodeSize < ByteSizeValue.ofGb(2).getBytes()) {
return (long) (nodeSize * 0.40);
public static long dynamicallyCalculateJvmSizeFromNodeSize(long nodeSize) {
if (nodeSize < ByteSizeValue.ofMb(1280).getBytes()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a comment here that the reason this is different to the value used in the JVM heap size formula is that that formula does not have a well-defined inverse, so we pick a value here that makes the inverse pick the correct value for some node sizes we know Cloud uses, in particular 2GB.

Comment on lines 58 to 64
Tuple.tuple(1073741824L, 432013312L), // 1GB and true JVM size
Tuple.tuple(2147483648L, 536870912L), // 2GB ...
Tuple.tuple(4294967296L, 1073741824L), // 4GB ...
Tuple.tuple(8589934592L, 2147483648L), // 8GB ...
Tuple.tuple(17179869184L, 2147483648L), // 16GB ...
Tuple.tuple(34359738368L, 2147483648L), // 32GB ...
Tuple.tuple(68719476736L, 2147483648L) // 64GB ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test still pass if you add in some values that are close to the round multiples, for example 0.9GB, 1.1GB, 1.9GB and 2.1GB? If it doesn't then we need to check with the Cloud team how exact their node sizes are across different Cloud providers.

Also, 58GB is a node size that gets used with one of the IaaS providers I believe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droberts195 I will add those conditions!

For these, I went through every node size in GCP and looked at the native size and JVM size. I can work on doing that with some other providers

@@ -586,6 +596,19 @@ public AutoscalingDeciderResult scale(Settings configuration, AutoscalingDecider
.build()));
}

static AutoscalingCapacity ensureScaleDown(AutoscalingCapacity scaleDownResult, AutoscalingCapacity currentCapacity) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this method is needed suggests there are bugs in the other calculations. It's good to have it as a safety net, but I think it would be good if it could log a warning if it changes the answer by more than the few bytes that could be attributed to rounding errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I think it would be good if it could log a warning if it changes the answer by more than the few bytes that could be attributed to rounding errors.

100%, I will add a log line.

I do think we need this method as this has been an issue in a couple known scenarios and it can be exceptionally frustrating to troubleshoot.

@mark-vieira
Copy link
Contributor

@elasticmachine retest this please

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have one question, which is about how sensitive we are to a 2GB container reporting exactly 2GB available within the container.

If it turned out that a container provisioned as a 2GB container actually only has, say, 1.999GB memory within the container then we have a problem with the discontinuity in our JVM heap size function occurring at the 2GB mark.

If our code no longer goes haywire when the machine memory is close to, but not exactly, 1GB or 2GB, then I am happy to merge this now.

But if there's still sensitivity just above or below 1GB or 2GB then we should make extra tweaks to move the sensitivity into an area that Cloud isn't near at the moment.

Comment on lines +59 to +60
Tuple.tuple(1073741824L, 432013312L), // 1GB and true JVM size
Tuple.tuple(2147483648L, 536870912L), // 2GB ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test fail for any of 0.9GB, 1.1GB, 1.9GB and 2.1GB? If it does then we may have a problem that will hit us in production.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droberts195 please see lines: https://github.com/elastic/elasticsearch/pull/74691/files#diff-633121fb7250eeda7eb60908a4d89e0d0007a60c2e01fe452c6e962933208822R72

I add and subtract 10MB from every node value and make sure things still pass.

Additionally, we never allow things to be below 1GB. When autoscaling, that is the smallest size we will ever ask for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droberts195 I will additionally add a clause where the trueJvm isn't provided for either the native bytes size nor the inverse check. This way we use the dynamic jvm calculation.

Everything still checksout fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see lines: https://github.com/elastic/elasticsearch/pull/74691/files#diff-633121fb7250eeda7eb60908a4d89e0d0007a60c2e01fe452c6e962933208822R72

Sorry, I didn't look closely enough

Additionally, we never allow things to be below 1GB. When autoscaling, that is the smallest size we will ever ask for.

The situation I am thinking of is where Docker reports a container as 1GB but the programs running inside it think the machine they're running on has 0.9999GB. But it sounds like this is covered.

This PR LGRM to merge then 🚀

@benwtrent
Copy link
Member Author

@elasticmachine update branch

@benwtrent
Copy link
Member Author

run elasticsearch-ci/part-2

@benwtrent benwtrent merged commit f9991e6 into elastic:master Jun 30, 2021
@benwtrent benwtrent deleted the feature/ml-fix-autoscaling-node-size-calculation branch June 30, 2021 17:49
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jun 30, 2021
… and improve scaling size estimations (elastic#74691)

This commit addresses two problems:

 - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be.
 - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what.

One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes.

closes: elastic#74709
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jun 30, 2021
… and improve scaling size estimations (elastic#74691)

This commit addresses two problems:

 - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be.
 - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what.

One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes.

closes: elastic#74709
benwtrent added a commit that referenced this pull request Jul 1, 2021
…g down and improve scaling size estimations (#74691) (#74780)

* [ML] prevent accidentally asking for more resources when scaling down and improve scaling size estimations (#74691)

This commit addresses two problems:

 - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be.
 - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what.

One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes.

closes: #74709
benwtrent added a commit that referenced this pull request Jul 1, 2021
…ng down and improve scaling size estimations (#74691) (#74781)

* [ML] prevent accidentally asking for more resources when scaling down and improve scaling size estimations (#74691)

This commit addresses two problems:

 - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be.
 - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what.

One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes.

closes: #74709
benwtrent added a commit that referenced this pull request Jul 1, 2021
…ng down and improve scaling size estimations (#74691) (#74782)

* [ML] prevent accidentally asking for more resources when scaling down and improve scaling size estimations (#74691)

This commit addresses two problems:

 - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be.
 - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what.

One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes.

closes: #74709
benwtrent added a commit that referenced this pull request Jul 1, 2021
…handles that (#74822)

context commit may be null. This should only really happen early in a cluster's life cycle or if a node was just recently brought online. Mainly because the current node sizes have not been discovered yet and cached.

This change should really have been part of #74691
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team v7.15.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ML is causing a scale up when its actually requesting for a scale down
6 participants