Skip to content

CSI snapshots of an erasure-coded-backed volume stores snapshot data in the metadata pool #4761

Open
@Champ-Goblem

Description

@Champ-Goblem

For reference, I asked about the following in the Ceph Slack: https://ceph-storage.slack.com/archives/C05522L7P60/p1723555634369239

Describe the bug

When creating snapshots of erasure-coded RBD volumes the data is stored in the replicated metadata pool.

From the investigation into the ceph-csi code, the data pool option is not set during CreateSnapshot->GenVolFromVolID.
This instance of rbdVolume is passed to doSnapshotClone, which calls createRBDClone.
During the clone process:

  • a snapshot is created of the original volume
  • a new rbd image is created from the snapshot of the original image <- issue happens here
  • the original snapshot is deleted

cloneRbdImageFromSnapshot copies the RBD image options from the cloneRbdVol, created based on the rbdVolume generated from GenVolFromVolID. These options are passed to librbd.CloneImage and thus copies the data into the erasure-coded metadata pool, rather than keeping the data in the original erasure-coded pool.

This can be seen when inspecting the image in rbd with rbd info, the image is missing the data_pool field:

rbd info -p ec-metadatapool-us-east-1b csi-snap-12f5524f-de0d-4c21-bc4f-af843960337b
rbd image 'csi-snap-12f5524f-de0d-4c21-bc4f-af843960337b':
	size 30 GiB in 7680 objects
	order 22 (4 MiB objects)
	snapshot_count: 1
	id: 1933d7e8dc4d58
	block_name_prefix: rbd_data.1933d7e8dc4d58
	format: 2
	features: layering, deep-flatten, operations
	op_features: clone-child
	flags: 
	create_timestamp: Sat Aug 10 09:40:32 2024
	access_timestamp: Sat Aug 10 09:40:32 2024
	modify_timestamp: Sat Aug 10 09:40:32 2024
	parent: ec-metadatapool-us-east-1b/csi-vol-5aea1d4e-6575-492e-ad9d-f1c378d9f21c@a0684712-eadf-40b9-8d33-e46711a19fc2
	overlap: 30 GiB

Compare this to the original image:

rbd image 'csi-vol-5aea1d4e-6575-492e-ad9d-f1c378d9f21c':
	size 30 GiB in 7680 objects
	order 22 (4 MiB objects)
	snapshot_count: 88
	id: 17e84e87be3a0c
	data_pool: ec-blockpool-us-east-1b
	block_name_prefix: rbd_data.17.17e84e87be3a0c
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, data-pool, operations
	op_features: clone-parent, snap-trash
	flags: 
	create_timestamp: Thu Aug  8 14:44:59 2024
	access_timestamp: Mon Aug 12 10:24:16 2024
	modify_timestamp: Thu Aug  8 14:44:59 2024

Steps to reproduce

Steps to reproduce the behavior: Snapshot an erasure-coded volume

Expected behavior

The snapshot should remain within the original erasure-coded pool and not have data copied to the replicated metadata pool.

Metadata

Metadata

Labels

bugSomething isn't workingcomponent/rbdIssues related to RBDkeepaliveThis label can be used to disable stale bot activiity in the reponeed test case

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions