-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
ISSUE TYPE
- Bug Report
COMPONENT NAME
API
CLOUDSTACK VERSION
4.11.2 (didn't test previous ones)
CONFIGURATION
VMware 6.5u2 (should be irelevant)
OS / ENVIRONMENT
mgmt = CentOS 7 (irelevant)
SUMMARY
Temp VMware worker VM used to export volume snapshot is not being recycled (i.e. removed) after the timeout is reached (and when configured to be removed). Root cause seems to be the fact that we are not stopping the running OVF task, which cause the attempt to remove worker VM to fail (i.e. it is stuck in pending state).
STEPS TO REPRODUCE
Set following global settings:
- job.expire.minutes : 1
- job.cancel.threshold.minutes : 1
vmware.clean.old.worker.vms : true
Create a volume snapshot.
After 2 x ( job.expire.minutes + job.cancel.threshold.minutes) = 240 sec, the job will time out / fail with the following message in logs:
2019-06-17 11:35:26,459 INFO [c.c.h.v.m.VmwareManagerImpl] (DirectAgentCronJob-40:ctx-dbfaf1a2) (logid:bff5a57c) Worker VM expired, seconds elapsed: 252
2019-06-17 11:35:26,463 INFO [c.c.h.v.r.VmwareResource] (DirectAgentCronJob-40:ctx-dbfaf1a2) (logid:bff5a57c) Recycle pending worker VM: 935686f0ee0b4b518ca2b50597650c75
but tasks remain on the vCenter side:
The way to "clean it manually" is to stop/kill the OVF export task in vCenter, after which the VM reconfiguration task and finally removal task (not visible on the image) will be executed and worker VM will be removed as per the expected behavior.
Issues seems, that we are NOT killing to OVF export task once the timeouts are reached in ACS, which is the required step.
##### EXPECTED RESULTS
OVF task is stopped/killed and worker VM get's removed.
##### ACTUAL RESULTS
Worker VM keeps running, as well as the OVF export task.