Description
Describe the feature you'd like to have
The ability to set librbd QoS settings on a PV to limit how much IO can be consumed from the Ceph Cluster.
The exactly limits would be informed through the storage-class configuration. Ideally we would support three different types of limits:
- static rbd_qos_iops_limit and rbd_qos_bps_limit per volume
- dynamic rbd_qos_iops_limit and rbd_qos_bps_limit per volume as a function of the PV size (eg. 3 IOPS per GB, 100 MB/s per TB with a configurable rbd_qos_schedule_tick_min.
A PVC could specify the number of IOPs from storage classes of the second type, but it would adjust the capacity requested based on the above ratio configured in the storage class definition.
What is the value to the end user?
Many users were frustrated by IO noisy neighbor issues in early Ceph deployments that were catering to OpenStack environments. Folks started to implement QEMU throttling at the virtio-blk/scsi and this became much more manageable. Capacity based IOPs further improved on the situation by providing familiar a public cloud like experience (vs static per volume limits).
We want Kubernetes and OpenShift users to have improved noisy neighbor isolation too!
How will we know we have a good solution?
- Configure ceph-csi to use nbd-rbd approach.
- Provision volume from storage class as configured above.
- CSI provisioner would set limit on RBD image.
- fio test against PV would confirm that IOPs limits were being enforced.
Once resize work is finished, we'll need to ensure new limits are applied when a volume is re-sized.
Activity