Skip to content

Ceph RBD primary storage fails connection and renders node unusable #2611

@giorgiomassar8

Description

@giorgiomassar8
ISSUE TYPE
  • Bug Report
COMPONENT NAME

Cloudstack agent

CLOUDSTACK VERSION

4.11

CONFIGURATION

KVM cluster with CEPH backed RDB primary storage

OS / ENVIRONMENT

Ubuntu 16.04 / 14.04

SUMMARY

On a perfectly working 4.10 node with KVM hypervisor and Ceph RBD primary storage, after upgrading to 4.11, cloudstack agent is unable to connect the BRD pool in libvirt, giving just a generic "operation not supported" error in its logs:

2018-04-06 16:27:37,650 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt
2018-04-06 16:27:37,652 WARN [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Storage pool be80af6a-7201-3410-8da4-9b3b58c4954f was not found running in libvirt. Need to create it.
2018-04-06 16:27:37,653 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Didn't find an existing storage pool be80af6a-7201-3410-8da4-9b3b58c4954f by UUID, checking for pools with duplicate paths
2018-04-06 16:27:37,664 ERROR [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Failed to create RBD storage pool: org.libvirt.LibvirtException: failed to connect to the RADOS monitor on: storagepool1:6789,: Operation not supported
2018-04-06 16:27:42,762 INFO [cloud.agent.Agent] (Agent-Handler-4:null) (logid Lost connection to the server. Dealing with the remaining commands...

Exactly the same pool was previously working before upgrade:

2018-04-06 12:53:52,847 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt
2018-04-06 12:53:52,850 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Found existing defined storage pool be80af6a-7201-3410-8da4-9b3b58c4954f, using it.
2018-04-06 12:53:52,850 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Trying to fetch storage pool be80af6a-7201-3410-8da4-9b3b58c4954f from libvirt
2018-04-06 12:53:53,171 INFO [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:14dace5e) Proccess agent ready command, agent id = 46

STEPS TO REPRODUCE

Take an existing and working cloudstack cluster with 4.10, with RDB primary storage and Ubuntu 14.04 based agents and upgrade them to version 4.11 of the agent.

EXPECTED RESULTS

The cluster should be working fine, the agents should be connecting and the RDB pool should be correctly opened in libvirt.

ACTUAL RESULTS

Cloudstack agents fails to boot with a generic "Failed to create RBD storage pool: org.libvirt.LibvirtException: failed to connect to the RADOS monitor on: storagepool1:6789,: Operation not supported" error and loops in a failed state, rendering the machine unusable.

WORKAROUND

To workaround the issue I have tried to use the following XML config (dumped from another node where it is correctly running) and define the pool directly in libvirt, and it worked as expected:

be80af6a-7201-3410-8da4-9b3b58c4954f be80af6a-7201-3410-8da4-9b3b58c4954f cephstor1

virsh pool-define test.xml
Pool be80af6a-7201-3410-8da4-9b3b58c4954f defined from test.xml

root@compute6:~# virsh pool-start be80af6a-7201-3410-8da4-9b3b58c4954f
Pool be80af6a-7201-3410-8da4-9b3b58c4954f started

root@compute6:~# virsh pool-info be80af6a-7201-3410-8da4-9b3b58c4954f
Name: be80af6a-7201-3410-8da4-9b3b58c4954f
UUID: be80af6a-7201-3410-8da4-9b3b58c4954f
State: running
Persistent: yes
Autostart: no
Capacity: 10.05 TiB
Allocation: 2.22 TiB
Available: 2.71 TiB

And now the cloudstack agent correctly starts:

2018-04-09 10:29:19,989 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:f0021131) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt
2018-04-09 10:29:19,990 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:f0021131) Found existing defined storage pool be80af6a-7201-3410-8da4-9b3b58c4954f, using it.
2018-04-09 10:29:19,991 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:f0021131) Trying to fetch storage pool be80af6a-7201-3410-8da4-9b3b58c4954f from libvirt
2018-04-09 10:29:20,372 INFO [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:f0021131) Proccess agent ready command, agent id = 56

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions