Replicated Block Device backed by FoundationDB
This is an implementation of a block device in userspace which uses FoundationDB as a backend. It provides a replicated block device for non-replicated workloads so they can benefit from transparent block-level replication and enhanced fault tolerance.
Inspired by spullara/nbd
I did a small benchmark using a FoundationDB cluster of 2 nodes (linux running on macbooks with SSDs, not tuned for FDB at all). FIO benchmark on 1GB file resulted in 10K random read/write IOPS in 4KB blocks and the latency was below 10ms (direct io was used). While doing sequential reads it was able to saturate 1Gbit network link.
Postrgres running in virtualbox showed 900 TPS on TPC-B pgbench workload with a database of size 1g.
Currently there is a mechanism which relies on lease tokens and fdb transactions to transactionally transfer ownership to the new client and discard any in-flight write requests from the old one.
It's an early version. There are several important featues which are not implemented yet (such as IOPS limits and volume size estimation) but it works and it's relatively fast!
Commands are documented in the CLI:
$ ./fdbbd --help
NAME:
fdbbd - block device using FoundationDB as a backend.
Our motto: still more performant and reliable than EBS
USAGE:
fdbbd [global options] command [command options] [arguments...]
VERSION:
0.1.0
COMMANDS:
create Create a new volume
list List all volumes
attach Attach the volume
delete Delete the volume
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--help, -h show help
--version, -v print the version
- Set up a FoundationDB cluster.
- Build the driver:
sh build.sh
- Create a new volume:
$ ./fdbbd create --size 1GB myvolume
- If
nbd
kernel module is not loaded, load it:
$ sudo modprobe nbd
- Attach the volume to the system:
sudo ./fdbbd attach --bpt 4 myvolume /dev/nbd0
- Create a directory to mount the volume:
mkdir nbdmount
- Create a file system on your block device. XFS is a good option:
sudo mkfs.xfs /dev/nbd0
- Mount the attached volume:
sudo mount /dev/nbd0 nbdmount/
- Done! You have a replicated volume!
This project uses Network Block Device kernel module underneath. A unix pipe is used to talk to a kernel, and then driver translates NBD protocol into FoundationDB calls.
There are a few features planned in future releases, ordered by importance:
- Bulk insert support via batch transactions
- IOPS isolation
- CSI implementation
- Snapshots
- Volume size estimation (using roaring bitmaps or similar)
- Client-side encryption
- Control panel