-
Notifications
You must be signed in to change notification settings - Fork 105
Description
Hello all,
We're running DRBD in a 2 node cluster using protocol C. DRBD is running in a carrier grade product that is always on. Over the last 2 months, even during quiet periods (low disk usage), DRBD util% (on the master instance) is hitting 100% as reported by iostat.
This is baffling because there is no unusual traffic on the system. This 100% util% happens even during off hours. Once starts happening, it happens every 4 or 5 minutes. Obviously, this has negative effects on system load which spikes to 10 for 1 min load average on a 4 CPU node.
The only workaround I have found is to reboot the DRBD master node and after that node is rebooted, reboot the DRBD slave node. This makes the 100% util% problem go away for around a month or so.
I'm running DRBD version 9.0.30. I understand 9.2 is out but we can't upgrade DRBD easily though we will eventually.
Can anyone please point me in the right direction of where to look? Could this issue have been addressed in later DRBD versions if so, which one?
Here's how iostat output appears when DRBD hits 100% util%
Time DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
08:57:34 drbd0 3.00 0.00 12.00 4.00 6.21 2069.33 333.33 100.00
Thank you for any help.