Skip to content

DRBD hits 100% util% with low disk usage #114

@Rav19

Description

@Rav19

Hello all,

We're running DRBD in a 2 node cluster using protocol C. DRBD is running in a carrier grade product that is always on. Over the last 2 months, even during quiet periods (low disk usage), DRBD util% (on the master instance) is hitting 100% as reported by iostat.
This is baffling because there is no unusual traffic on the system. This 100% util% happens even during off hours. Once starts happening, it happens every 4 or 5 minutes. Obviously, this has negative effects on system load which spikes to 10 for 1 min load average on a 4 CPU node.

The only workaround I have found is to reboot the DRBD master node and after that node is rebooted, reboot the DRBD slave node. This makes the 100% util% problem go away for around a month or so.

I'm running DRBD version 9.0.30. I understand 9.2 is out but we can't upgrade DRBD easily though we will eventually.

Can anyone please point me in the right direction of where to look? Could this issue have been addressed in later DRBD versions if so, which one?

Here's how iostat output appears when DRBD hits 100% util%
Time DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
08:57:34 drbd0 3.00 0.00 12.00 4.00 6.21 2069.33 333.33 100.00

Thank you for any help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions