DRBD hits 100% util% with low disk usage

Hello all,

We're running DRBD in a 2 node cluster using protocol C.  DRBD is running in a carrier grade product that is always on.  Over the last 2 months, even during quiet periods (low disk usage), DRBD util% (on the master instance) is hitting 100% as reported by iostat.
This is baffling because there is no unusual traffic on the system.  This 100% util% happens even during off hours.  Once starts happening, it happens every 4 or 5 minutes.  Obviously, this has negative effects on system load which spikes to 10 for 1 min load average on a 4 CPU node.

The only workaround I have found is to reboot the DRBD master node and after that node is rebooted, reboot the DRBD slave node.  This makes the 100% util% problem go away for around a month or so.

I'm running DRBD version 9.0.30.  I understand 9.2 is out but we can't upgrade DRBD easily though we will eventually.

Can anyone please point me in the right direction of where to look?  Could this issue have been addressed in later DRBD versions if so, which one?


Here's how iostat output appears when DRBD hits 100% util%
Time             DEV        tps      rkB/s    wkB/s      areq-sz   aqu-sz await      svctm     %util
08:57:34        drbd0      3.00      0.00     12.00      4.00      6.21   2069.33    333.33    100.00

Thank you for any help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DRBD hits 100% util% with low disk usage #114

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DRBD hits 100% util% with low disk usage #114

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions