Description
For reference, I asked about the following in the Ceph Slack: https://ceph-storage.slack.com/archives/C05522L7P60/p1723555634369239
Describe the bug
When creating snapshots of erasure-coded RBD volumes the data is stored in the replicated metadata pool.
From the investigation into the ceph-csi code, the data pool option is not set during CreateSnapshot
->GenVolFromVolID
.
This instance of rbdVolume is passed to doSnapshotClone
, which calls createRBDClone
.
During the clone process:
- a snapshot is created of the original volume
- a new rbd image is created from the snapshot of the original image <- issue happens here
- the original snapshot is deleted
cloneRbdImageFromSnapshot
copies the RBD image options from the cloneRbdVol
, created based on the rbdVolume generated from GenVolFromVolID
. These options are passed to librbd.CloneImage
and thus copies the data into the erasure-coded metadata pool, rather than keeping the data in the original erasure-coded pool.
This can be seen when inspecting the image in rbd with rbd info
, the image is missing the data_pool field:
rbd info -p ec-metadatapool-us-east-1b csi-snap-12f5524f-de0d-4c21-bc4f-af843960337b
rbd image 'csi-snap-12f5524f-de0d-4c21-bc4f-af843960337b':
size 30 GiB in 7680 objects
order 22 (4 MiB objects)
snapshot_count: 1
id: 1933d7e8dc4d58
block_name_prefix: rbd_data.1933d7e8dc4d58
format: 2
features: layering, deep-flatten, operations
op_features: clone-child
flags:
create_timestamp: Sat Aug 10 09:40:32 2024
access_timestamp: Sat Aug 10 09:40:32 2024
modify_timestamp: Sat Aug 10 09:40:32 2024
parent: ec-metadatapool-us-east-1b/csi-vol-5aea1d4e-6575-492e-ad9d-f1c378d9f21c@a0684712-eadf-40b9-8d33-e46711a19fc2
overlap: 30 GiB
Compare this to the original image:
rbd image 'csi-vol-5aea1d4e-6575-492e-ad9d-f1c378d9f21c':
size 30 GiB in 7680 objects
order 22 (4 MiB objects)
snapshot_count: 88
id: 17e84e87be3a0c
data_pool: ec-blockpool-us-east-1b
block_name_prefix: rbd_data.17.17e84e87be3a0c
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, data-pool, operations
op_features: clone-parent, snap-trash
flags:
create_timestamp: Thu Aug 8 14:44:59 2024
access_timestamp: Mon Aug 12 10:24:16 2024
modify_timestamp: Thu Aug 8 14:44:59 2024
Steps to reproduce
Steps to reproduce the behavior: Snapshot an erasure-coded volume
Expected behavior
The snapshot should remain within the original erasure-coded pool and not have data copied to the replicated metadata pool.