Skip to content

Commit

Permalink
The DRBD driver
Browse files Browse the repository at this point in the history
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
  • Loading branch information
Philipp-Reisner authored and Jens Axboe committed Oct 1, 2009
1 parent 1a35e0f commit b411b36
Show file tree
Hide file tree
Showing 35 changed files with 23,140 additions and 0 deletions.
588 changes: 588 additions & 0 deletions Documentation/blockdev/drbd/DRBD-8.3-data-packets.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
459 changes: 459 additions & 0 deletions Documentation/blockdev/drbd/DRBD-data-packets.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 16 additions & 0 deletions Documentation/blockdev/drbd/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Description

DRBD is a shared-nothing, synchronously replicated block device. It
is designed to serve as a building block for high availability
clusters and in this context, is a "drop-in" replacement for shared
storage. Simplistically, you could see it as a network RAID 1.

Please visit http://www.drbd.org to find out more.

The here included files are intended to help understand the implementation

DRBD-8.3-data-packets.svg, DRBD-data-packets.svg
relates some functions, and write packets.

conn-states-8.dot, disk-states-8.dot, node-states-8.dot
The sub graphs of DRBD's state transitions
18 changes: 18 additions & 0 deletions Documentation/blockdev/drbd/conn-states-8.dot
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
digraph conn_states {
StandAllone -> WFConnection [ label = "ioctl_set_net()" ]
WFConnection -> Unconnected [ label = "unable to bind()" ]
WFConnection -> WFReportParams [ label = "in connect() after accept" ]
WFReportParams -> StandAllone [ label = "checks in receive_param()" ]
WFReportParams -> Connected [ label = "in receive_param()" ]
WFReportParams -> WFBitMapS [ label = "sync_handshake()" ]
WFReportParams -> WFBitMapT [ label = "sync_handshake()" ]
WFBitMapS -> SyncSource [ label = "receive_bitmap()" ]
WFBitMapT -> SyncTarget [ label = "receive_bitmap()" ]
SyncSource -> Connected
SyncTarget -> Connected
SyncSource -> PausedSyncS
SyncTarget -> PausedSyncT
PausedSyncS -> SyncSource
PausedSyncT -> SyncTarget
Connected -> WFConnection [ label = "* on network error" ]
}
16 changes: 16 additions & 0 deletions Documentation/blockdev/drbd/disk-states-8.dot
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
digraph disk_states {
Diskless -> Inconsistent [ label = "ioctl_set_disk()" ]
Diskless -> Consistent [ label = "ioctl_set_disk()" ]
Diskless -> Outdated [ label = "ioctl_set_disk()" ]
Consistent -> Outdated [ label = "receive_param()" ]
Consistent -> UpToDate [ label = "receive_param()" ]
Consistent -> Inconsistent [ label = "start resync" ]
Outdated -> Inconsistent [ label = "start resync" ]
UpToDate -> Inconsistent [ label = "ioctl_replicate" ]
Inconsistent -> UpToDate [ label = "resync completed" ]
Consistent -> Failed [ label = "io completion error" ]
Outdated -> Failed [ label = "io completion error" ]
UpToDate -> Failed [ label = "io completion error" ]
Inconsistent -> Failed [ label = "io completion error" ]
Failed -> Diskless [ label = "sending notify to peer" ]
}
85 changes: 85 additions & 0 deletions Documentation/blockdev/drbd/drbd-connection-state-overview.dot
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
// vim: set sw=2 sts=2 :
digraph {
rankdir=BT
bgcolor=white

node [shape=plaintext]
node [fontcolor=black]

StandAlone [ style=filled,fillcolor=gray,label=StandAlone ]

node [fontcolor=lightgray]

Unconnected [ label=Unconnected ]

CommTrouble [ shape=record,
label="{communication loss|{Timeout|BrokenPipe|NetworkFailure}}" ]

node [fontcolor=gray]

subgraph cluster_try_connect {
label="try to connect, handshake"
rank=max
WFConnection [ label=WFConnection ]
WFReportParams [ label=WFReportParams ]
}

TearDown [ label=TearDown ]

Connected [ label=Connected,style=filled,fillcolor=green,fontcolor=black ]

node [fontcolor=lightblue]

StartingSyncS [ label=StartingSyncS ]
StartingSyncT [ label=StartingSyncT ]

subgraph cluster_bitmap_exchange {
node [fontcolor=red]
fontcolor=red
label="new application (WRITE?) requests blocked\lwhile bitmap is exchanged"

WFBitMapT [ label=WFBitMapT ]
WFSyncUUID [ label=WFSyncUUID ]
WFBitMapS [ label=WFBitMapS ]
}

node [fontcolor=blue]

cluster_resync [ shape=record,label="{<any>resynchronisation process running\l'concurrent' application requests allowed|{{<T>PausedSyncT\nSyncTarget}|{<S>PausedSyncS\nSyncSource}}}" ]

node [shape=box,fontcolor=black]

// drbdadm [label="drbdadm connect"]
// handshake [label="drbd_connect()\ndrbd_do_handshake\ndrbd_sync_handshake() etc."]
// comm_error [label="communication trouble"]

//
// edges
// --------------------------------------

StandAlone -> Unconnected [ label="drbdadm connect" ]
Unconnected -> StandAlone [ label="drbdadm disconnect\lor serious communication trouble" ]
Unconnected -> WFConnection [ label="receiver thread is started" ]
WFConnection -> WFReportParams [ headlabel="accept()\land/or \lconnect()\l" ]

WFReportParams -> StandAlone [ label="during handshake\lpeers do not agree\labout something essential" ]
WFReportParams -> Connected [ label="data identical\lno sync needed",color=green,fontcolor=green ]

WFReportParams -> WFBitMapS
WFReportParams -> WFBitMapT
WFBitMapT -> WFSyncUUID [minlen=0.1,constraint=false]

WFBitMapS -> cluster_resync:S
WFSyncUUID -> cluster_resync:T

edge [color=green]
cluster_resync:any -> Connected [ label="resnyc done",fontcolor=green ]

edge [color=red]
WFReportParams -> CommTrouble
Connected -> CommTrouble
cluster_resync:any -> CommTrouble
edge [color=black]
CommTrouble -> Unconnected [label="receiver thread is stopped" ]

}
14 changes: 14 additions & 0 deletions Documentation/blockdev/drbd/node-states-8.dot
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
digraph node_states {
Secondary -> Primary [ label = "ioctl_set_state()" ]
Primary -> Secondary [ label = "ioctl_set_state()" ]
}

digraph peer_states {
Secondary -> Primary [ label = "recv state packet" ]
Primary -> Secondary [ label = "recv state packet" ]
Primary -> Unknown [ label = "connection lost" ]
Secondary -> Unknown [ label = "connection lost" ]
Unknown -> Primary [ label = "connected" ]
Unknown -> Secondary [ label = "connected" ]
}

13 changes: 13 additions & 0 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -1758,6 +1758,19 @@ S: Maintained
F: drivers/scsi/dpt*
F: drivers/scsi/dpt/

DRBD DRIVER
P: Philipp Reisner
P: Lars Ellenberg
M: drbd-dev@lists.linbit.com
L: drbd-user@lists.linbit.com
W: http://www.drbd.org
T: git git://git.drbd.org/linux-2.6-drbd.git drbd
T: git git://git.drbd.org/drbd-8.3.git
S: Supported
F: drivers/block/drbd/
F: lib/lru_cache.c
F: Documentation/blockdev/drbd/

DRIVER CORE, KOBJECTS, AND SYSFS
M: Greg Kroah-Hartman <gregkh@suse.de>
T: quilt kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
Expand Down
2 changes: 2 additions & 0 deletions drivers/block/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,8 @@ config BLK_DEV_CRYPTOLOOP
instead, which can be configured to be on-disk compatible with the
cryptoloop device.

source "drivers/block/drbd/Kconfig"

config BLK_DEV_NBD
tristate "Network block device support"
depends on NET
Expand Down
1 change: 1 addition & 0 deletions drivers/block/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -36,5 +36,6 @@ obj-$(CONFIG_BLK_DEV_UB) += ub.o
obj-$(CONFIG_BLK_DEV_HD) += hd.o

obj-$(CONFIG_XEN_BLKDEV_FRONTEND) += xen-blkfront.o
obj-$(CONFIG_BLK_DEV_DRBD) += drbd/

swim_mod-objs := swim.o swim_asm.o
82 changes: 82 additions & 0 deletions drivers/block/drbd/Kconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
#
# DRBD device driver configuration
#

comment "DRBD disabled because PROC_FS, INET or CONNECTOR not selected"
depends on !PROC_FS || !INET || !CONNECTOR

config BLK_DEV_DRBD
tristate "DRBD Distributed Replicated Block Device support"
depends on PROC_FS && INET && CONNECTOR
select LRU_CACHE
default n
help

NOTE: In order to authenticate connections you have to select
CRYPTO_HMAC and a hash function as well.

DRBD is a shared-nothing, synchronously replicated block device. It
is designed to serve as a building block for high availability
clusters and in this context, is a "drop-in" replacement for shared
storage. Simplistically, you could see it as a network RAID 1.

Each minor device has a role, which can be 'primary' or 'secondary'.
On the node with the primary device the application is supposed to
run and to access the device (/dev/drbdX). Every write is sent to
the local 'lower level block device' and, across the network, to the
node with the device in 'secondary' state. The secondary device
simply writes the data to its lower level block device.

DRBD can also be used in dual-Primary mode (device writable on both
nodes), which means it can exhibit shared disk semantics in a
shared-nothing cluster. Needless to say, on top of dual-Primary
DRBD utilizing a cluster file system is necessary to maintain for
cache coherency.

For automatic failover you need a cluster manager (e.g. heartbeat).
See also: http://www.drbd.org/, http://www.linux-ha.org

If unsure, say N.

config DRBD_TRACE
tristate "DRBD tracing"
depends on BLK_DEV_DRBD
select TRACEPOINTS
default n
help

Say Y here if you want to be able to trace various events in DRBD.

If unsure, say N.

config DRBD_FAULT_INJECTION
bool "DRBD fault injection"
depends on BLK_DEV_DRBD
help

Say Y here if you want to simulate IO errors, in order to test DRBD's
behavior.

The actual simulation of IO errors is done by writing 3 values to
/sys/module/drbd/parameters/

enable_faults: bitmask of...
1 meta data write
2 read
4 resync data write
8 read
16 data write
32 data read
64 read ahead
128 kmalloc of bitmap
256 allocation of EE (epoch_entries)

fault_devs: bitmask of minor numbers
fault_rate: frequency in percent

Example: Simulate data write errors on /dev/drbd0 with a probability of 5%.
echo 16 > /sys/module/drbd/parameters/enable_faults
echo 1 > /sys/module/drbd/parameters/fault_devs
echo 5 > /sys/module/drbd/parameters/fault_rate

If unsure, say N.
8 changes: 8 additions & 0 deletions drivers/block/drbd/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
drbd-y := drbd_bitmap.o drbd_proc.o
drbd-y += drbd_worker.o drbd_receiver.o drbd_req.o drbd_actlog.o
drbd-y += drbd_main.o drbd_strings.o drbd_nl.o

drbd_trace-y := drbd_tracing.o

obj-$(CONFIG_BLK_DEV_DRBD) += drbd.o
obj-$(CONFIG_DRBD_TRACE) += drbd_trace.o
Loading

0 comments on commit b411b36

Please sign in to comment.