Skip to content

Autoscale Group creates Infinite Number Of VMs when unable to startup #9318

Open
@btzq

Description

@btzq
ISSUE TYPE
  • Bug Report
COMPONENT NAME
VM Autoscale Feature
CLOUDSTACK VERSION
4.19.0
CONFIGURATION

Any Autoscale VM Setup

OS / ENVIRONMENT

Linux Ubuntu

SUMMARY

When a AutoscaleGroup is setup, on normal days, it works fine.

However, when a template is provided to an Autoscale Group that is unable to start up a VM (for whatever reason eg. Corrupted Template, Storage issues etc), the VM will enter start -> stop -> error state.

When a VM enter error state, the Autoscale Group will attempt to create another VM in an attempt to retry.

But if it fails, it tries again, and it ends up going into an infinite loop.

We personally had this issue when our storage service went down. We have approximately 15 VMs in our setup, but due to this bug, we ended up with 30,000 VMs due to autoscaling infinitely. Refer to screenshot below:

WhatsApp Image 2024-06-30 at 8 37 47 AM

Note, our MAX Replica = 4 and Min Replica = 2 for each Autoscale Rule.

In order to resolve this issue, we had to disable the autoscale rules, and manually delete affected VMs from DB using below command:

update cloud.vm_instance set state='Destroyed', removed ='2024-06-30 02:59:49' where name like 'autoscale%' and state = 'error'

We managed to resolve it ourselves as we are using it within known groups of users. However, if this issue happened when operating a public cloud, where the users are unknown, it would have been catastrophic. Because we technically would have to contact each and every customer to repeat the above steps. (technically cloud operator should not have access/interfere with customer environments)

STEPS TO REPRODUCE
Upload a Template that cannot be spinned up (Eg. Corrupted), or simulate a storage failure.
EXPECTED RESULTS
To have a max number of retries when scalling up and resulting in an 'Error' state. To prevent infinite scaling.
ACTUAL RESULTS
Infinite VM scaling causing huge consumption and bottlenecking of our cloud services.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    In Review

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions