Skip to content

SSVM cannot reconnect after connection disruption if there is an active event. #2633

@PaulAngus

Description

@PaulAngus
ISSUE TYPE
  • Bug Report
COMPONENT NAME
System VMs
(Maybe also KVM hosts)
CLOUDSTACK VERSION
4.11.0
CONFIGURATION
OS / ENVIRONMENT

4.11.0 environment with VMware

SUMMARY

If there is an interruption to mgmt server <-> agent communications while an action is taking place (such as the mgmt server restarting when the ssvm is performing a snapshot) the SSVM will not be able to reconnect due to following error:
2018-05-09 11:37:09,403 INFO [cloud.agent.Agent] (Agent-Handler-9:null) Lost connection to host: 10.220.136.127. Dealing with the remaining commands...
2018-05-09 11:37:09,404 INFO [cloud.agent.Agent] (Agent-Handler-9:null) Cannot connect because we still have 1 commands in progress.

STEPS TO REPRODUCE
During a volume snapshot exporting the ovf restart the management server.  
EXPECTED RESULTS
SSVM reconnects.
ACTUAL RESULTS
The storage VM does not reconnect to the management server and has an error such as: 
INFO  [cloud.agent.Agent] (Agent-Handler-9:null) Lost connection to host: 10.220.136.127. Dealing with the remaining commands...
INFO  [cloud.agent.Agent] (Agent-Handler-9:null) Cannot connect because we still have 1 commands in progress.

Once the job had finished it will reconnect but until this point all other jobs failed unless there is another secondary storage vm up and running.
The backup job even though it is forced to complete from secondary storage is still left in the db as state backing up forever so it does not make that it even waiting for it to finish.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions