Skip to content

ML is causing a scale up when its actually requesting for a scale down #74709

Closed
@benwtrent

Description

@benwtrent

Issue

Versions: 7.11-7.13

Fixed in: 7.14+

Due to poor estimations, it is possible that a scale down request accidentally requires a scale up.

Here is a response that epitomizes the scenario:

    "ml": {
      "required_capacity": {
        "node": {
          "memory": 2520765440
        },
        "total": {
          "memory": 2520765440
        }
      },
      "current_capacity": {
        "node": {
          "storage": 0,
          "memory": 2147483648
        },
        "total": {
          "storage": 0,
          "memory": 6442450944
        }
      },
      "current_nodes": [
        {
          "name": "instance-0000000099"
        },
        {
          "name": "instance-0000000100"
        },
        {
          "name": "instance-0000000101"
        }
      ],
      "deciders": {
        "ml": {
          "required_capacity": {
            "node": {
              "memory": 2520765440
            },
            "total": {
              "memory": 2520765440
            }
          },
          "reason_summary": "Requesting scale down as tier and/or node size could be smaller",
          "reason_details": {
            "waiting_analytics_jobs": [],
            "waiting_anomaly_jobs": [],
            "configuration": {},
            "perceived_current_capacity": {
              "node": {
                "memory": 2503160627
              },
              "total": {
                "memory": 6074310888
              }
            },
            "required_capacity": {
              "node": {
                "memory": 2520765440
              },
              "total": {
                "memory": 2520765440
              }
            },
            "reason": "Requesting scale down as tier and/or node size could be smaller"
          }
        }
      }
    }

Note how the current size is actually 2GB (2147483648), but ML's estimation is off due to rounding values inappropriately (2520765440). This actually caused a scale up instead of a scale down.

Work around

If you have an Elasticsearch version that suffers from this and the scenario occurs, it is possible to statically set the minimum and maximum autoscaling sizes for ML inside of elastic cloud.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions