Closed
Description
Issue
Versions: 7.11-7.13
Fixed in: 7.14+
Due to poor estimations, it is possible that a scale down request accidentally requires a scale up.
Here is a response that epitomizes the scenario:
"ml": {
"required_capacity": {
"node": {
"memory": 2520765440
},
"total": {
"memory": 2520765440
}
},
"current_capacity": {
"node": {
"storage": 0,
"memory": 2147483648
},
"total": {
"storage": 0,
"memory": 6442450944
}
},
"current_nodes": [
{
"name": "instance-0000000099"
},
{
"name": "instance-0000000100"
},
{
"name": "instance-0000000101"
}
],
"deciders": {
"ml": {
"required_capacity": {
"node": {
"memory": 2520765440
},
"total": {
"memory": 2520765440
}
},
"reason_summary": "Requesting scale down as tier and/or node size could be smaller",
"reason_details": {
"waiting_analytics_jobs": [],
"waiting_anomaly_jobs": [],
"configuration": {},
"perceived_current_capacity": {
"node": {
"memory": 2503160627
},
"total": {
"memory": 6074310888
}
},
"required_capacity": {
"node": {
"memory": 2520765440
},
"total": {
"memory": 2520765440
}
},
"reason": "Requesting scale down as tier and/or node size could be smaller"
}
}
}
}
Note how the current size is actually 2GB (2147483648
), but ML's estimation is off due to rounding values inappropriately (2520765440
). This actually caused a scale up instead of a scale down.
Work around
If you have an Elasticsearch version that suffers from this and the scenario occurs, it is possible to statically set the minimum and maximum autoscaling sizes for ML inside of elastic cloud.