-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Description
ISSUE TYPE
- Bug Report
COMPONENT NAME
System VMs
(Maybe also KVM hosts)
CLOUDSTACK VERSION
4.11.0
CONFIGURATION
OS / ENVIRONMENT
4.11.0 environment with VMware
SUMMARY
If there is an interruption to mgmt server <-> agent communications while an action is taking place (such as the mgmt server restarting when the ssvm is performing a snapshot) the SSVM will not be able to reconnect due to following error:
2018-05-09 11:37:09,403 INFO [cloud.agent.Agent] (Agent-Handler-9:null) Lost connection to host: 10.220.136.127. Dealing with the remaining commands...
2018-05-09 11:37:09,404 INFO [cloud.agent.Agent] (Agent-Handler-9:null) Cannot connect because we still have 1 commands in progress.
STEPS TO REPRODUCE
During a volume snapshot exporting the ovf restart the management server.
EXPECTED RESULTS
SSVM reconnects.
ACTUAL RESULTS
The storage VM does not reconnect to the management server and has an error such as:
INFO [cloud.agent.Agent] (Agent-Handler-9:null) Lost connection to host: 10.220.136.127. Dealing with the remaining commands...
INFO [cloud.agent.Agent] (Agent-Handler-9:null) Cannot connect because we still have 1 commands in progress.
Once the job had finished it will reconnect but until this point all other jobs failed unless there is another secondary storage vm up and running.
The backup job even though it is forced to complete from secondary storage is still left in the db as state backing up forever so it does not make that it even waiting for it to finish.
Reactions are currently unavailable