-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[ML] prevent accidentally asking for more resources when scaling down and improve scaling size estimations #74691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] prevent accidentally asking for more resources when scaling down and improve scaling size estimations #74691
Conversation
87e8c94
to
6e35a2b
Compare
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments on specific lines.
A more general comment is that I am not sure how much effect all the changes of ceil
to round
and switching between integer and floating point arithmetic at different steps of calculations really has. I would expect those changes to only make a difference of a few bytes here and there. The biggest change by far seems to be the 2GB -> 1280MB change to the cutoff for choosing which of two values possible values to choose for our ill-defined inverse function. If this is correct then it would be good to change the PR description to reflect this.
@@ -139,8 +129,8 @@ private static long allowedBytesForMl(long machineMemory, Long jvmSize, int maxM | |||
// 2GB node -> 66% | |||
// 16GB node -> 87% | |||
// 64GB node -> 90% | |||
long memoryPercent = Math.min(90, (int)Math.ceil(((machineMemory - jvmSize - OS_OVERHEAD) / (double)machineMemory) * 100.0D)); | |||
return (long)(machineMemory * (memoryPercent / 100.0)); | |||
double memoryPercent = Math.min(0.90, (machineMemory - jvmSize - OS_OVERHEAD) / (double)machineMemory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double memoryPercent = Math.min(0.90, (machineMemory - jvmSize - OS_OVERHEAD) / (double)machineMemory); | |
double memoryProportion = Math.min(0.90, (machineMemory - jvmSize - OS_OVERHEAD) / (double)machineMemory); |
if (nodeSize < ByteSizeValue.ofGb(2).getBytes()) { | ||
return (long) (nodeSize * 0.40); | ||
public static long dynamicallyCalculateJvmSizeFromNodeSize(long nodeSize) { | ||
if (nodeSize < ByteSizeValue.ofMb(1280).getBytes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add a comment here that the reason this is different to the value used in the JVM heap size formula is that that formula does not have a well-defined inverse, so we pick a value here that makes the inverse pick the correct value for some node sizes we know Cloud uses, in particular 2GB.
Tuple.tuple(1073741824L, 432013312L), // 1GB and true JVM size | ||
Tuple.tuple(2147483648L, 536870912L), // 2GB ... | ||
Tuple.tuple(4294967296L, 1073741824L), // 4GB ... | ||
Tuple.tuple(8589934592L, 2147483648L), // 8GB ... | ||
Tuple.tuple(17179869184L, 2147483648L), // 16GB ... | ||
Tuple.tuple(34359738368L, 2147483648L), // 32GB ... | ||
Tuple.tuple(68719476736L, 2147483648L) // 64GB ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test still pass if you add in some values that are close to the round multiples, for example 0.9GB, 1.1GB, 1.9GB and 2.1GB? If it doesn't then we need to check with the Cloud team how exact their node sizes are across different Cloud providers.
Also, 58GB is a node size that gets used with one of the IaaS providers I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droberts195 I will add those conditions!
For these, I went through every node size in GCP and looked at the native size and JVM size. I can work on doing that with some other providers
@@ -586,6 +596,19 @@ public AutoscalingDeciderResult scale(Settings configuration, AutoscalingDecider | |||
.build())); | |||
} | |||
|
|||
static AutoscalingCapacity ensureScaleDown(AutoscalingCapacity scaleDownResult, AutoscalingCapacity currentCapacity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that this method is needed suggests there are bugs in the other calculations. It's good to have it as a safety net, but I think it would be good if it could log a warning if it changes the answer by more than the few bytes that could be attributed to rounding errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I think it would be good if it could log a warning if it changes the answer by more than the few bytes that could be attributed to rounding errors.
100%, I will add a log line.
I do think we need this method as this has been an issue in a couple known scenarios and it can be exceptionally frustrating to troubleshoot.
@elasticmachine retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have one question, which is about how sensitive we are to a 2GB container reporting exactly 2GB available within the container.
If it turned out that a container provisioned as a 2GB container actually only has, say, 1.999GB memory within the container then we have a problem with the discontinuity in our JVM heap size function occurring at the 2GB mark.
If our code no longer goes haywire when the machine memory is close to, but not exactly, 1GB or 2GB, then I am happy to merge this now.
But if there's still sensitivity just above or below 1GB or 2GB then we should make extra tweaks to move the sensitivity into an area that Cloud isn't near at the moment.
Tuple.tuple(1073741824L, 432013312L), // 1GB and true JVM size | ||
Tuple.tuple(2147483648L, 536870912L), // 2GB ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test fail for any of 0.9GB, 1.1GB, 1.9GB and 2.1GB? If it does then we may have a problem that will hit us in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droberts195 please see lines: https://github.com/elastic/elasticsearch/pull/74691/files#diff-633121fb7250eeda7eb60908a4d89e0d0007a60c2e01fe452c6e962933208822R72
I add and subtract 10MB from every node value and make sure things still pass.
Additionally, we never allow things to be below 1GB. When autoscaling, that is the smallest size we will ever ask for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droberts195 I will additionally add a clause where the trueJvm isn't provided for either the native bytes size nor the inverse check. This way we use the dynamic jvm calculation.
Everything still checksout fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see lines: https://github.com/elastic/elasticsearch/pull/74691/files#diff-633121fb7250eeda7eb60908a4d89e0d0007a60c2e01fe452c6e962933208822R72
Sorry, I didn't look closely enough
Additionally, we never allow things to be below 1GB. When autoscaling, that is the smallest size we will ever ask for.
The situation I am thinking of is where Docker reports a container as 1GB but the programs running inside it think the machine they're running on has 0.9999GB. But it sounds like this is covered.
This PR LGRM to merge then 🚀
…toscaling-node-size-calculation
@elasticmachine update branch |
…thub.com:benwtrent/elasticsearch into feature/ml-fix-autoscaling-node-size-calculation
run elasticsearch-ci/part-2 |
… and improve scaling size estimations (elastic#74691) This commit addresses two problems: - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be. - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what. One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes. closes: elastic#74709
… and improve scaling size estimations (elastic#74691) This commit addresses two problems: - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be. - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what. One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes. closes: elastic#74709
…g down and improve scaling size estimations (#74691) (#74780) * [ML] prevent accidentally asking for more resources when scaling down and improve scaling size estimations (#74691) This commit addresses two problems: - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be. - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what. One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes. closes: #74709
…ng down and improve scaling size estimations (#74691) (#74781) * [ML] prevent accidentally asking for more resources when scaling down and improve scaling size estimations (#74691) This commit addresses two problems: - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be. - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what. One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes. closes: #74709
…ng down and improve scaling size estimations (#74691) (#74782) * [ML] prevent accidentally asking for more resources when scaling down and improve scaling size estimations (#74691) This commit addresses two problems: - Our memory estimations are not very exact. Consequently, its possible to request for too much or too little by a handful of KBs, while this is not a large issue in ESS, for custom tier sizes, it may be. - When scaling down, it was possible that part of the scale down was actually a scale up! This was due to some floating point rounding errors and poor estimations. Even though are estimations are better, it is best to NOT request higher resources in a scale down, no matter what. One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be `2gb` it has been changed to `1.2gb`. This accounts for the "window of uncertainty" for JVM sizes. closes: #74709
This commit addresses two problems:
One of the ways we improve the calculation is during JVM size calculations. Instead of having the knot point be
2gb
it has been changed to1.2gb
. This accounts for the "window of uncertainty" for JVM sizes.closes: #74709