layout | title |
---|---|
doc |
RDF Delta Patch Server |
The RDF Delta Patch Server provides the backend for a fault-tolerant cluster of machines such as Fuseki servers.
The patch server provides a number of different configuration options.
- Local file system using RocksDB for patch storage.
- Fully fault tolerant cloud form, with multiple RDF Delta servers and an object store for patch storage
- Fully fault tolerant cloud form, with multiple RDF Delta servers with separate Zookeeper servers.
- Local file system using plain files for patch storage
- Single server, in-memory patch logs.
When the server starts with a local mode, the storage choice (plain file, or local database) for existing patch logs is preserved. The server setup influences the storage for patch logs when they are created.
The fault tolerant form uses Apache Zookeeper as its fault-tolerant storage for indexes and either Zookeeper or AWS S3 (or compatible) for storing the content of patches.
The recommended deployment choices are using RocksDB storage (--store
), with
either a filesystem which should be on reliable storage or a backed up
filesystem, for smaller installations, and fully fault tolerant installation for
24x7 availability.
The in-memory option is useful for testing and development because it starts quickly and can run in the testing JVM process.
The plain file configuration is useful to explore how the system works.
The fully fault-tolerant configuration provides the best high availability system but requires more machines and is more complex to administer.
A single-server delta patch does not run arbitrary application code, which makes it more reliable, and it starts quickly. If a patch can not be stored, the Fuseki server will not commit its transaction but the Fuseki servers continue proving query operations. Restarting a patch server does not require other servers ot be restarted.
Where 24x7 continuous operation is not required, the simpler configuration of a single patch server may be more appropriate. Coupled with a file system that is itself sufficiently reliable (e.g. a disk array or RAID system), or run on a server machine whose filesystem is regularly backed up, it does provide a high degree of availability.
Argument | Configuration |
---|---|
--store |
Local database storage |
--base |
Plain file storage |
--mem |
In-emory, development/testing mode |
--zk |
Zookeeper connection string "host1:port1,host2:port2,host3:port3" |
The Embedded Zookeeper form has several additional arguments:
Argument for Zookeeper |
|
---|---|
--zkConf |
Configuration file |
--zkData |
Storage for the embedded Zookeeper |
--zkPort |
Port for the embedded Zookeeper |
When using an external Zookeeper connection, the following arguments are available:
Argument for External Zookeeper |
|
---|---|
--zkRootDir |
Root directory to use in Zookeeper to store patch info (defaults to /delta and is ignored if embedded Zookeeper is used) |
Patches can be stored in the Zookeeper database or externally in AWS S3, or a storage that provides the S3 API (other objects store may be supported in the future). An external object store is the preferred choice because it limits the growth of the Zookeeper database.
The S3 patch storage argument:
Argument for S3 | |
---|---|
--s3bucket |
S3 bucket name (required) |
--s3region |
S3 region |
--s3endpoint |
URL for an S3 alternative endpoint |
This example is for running a single patch log server on the default port (1066) with a RockDB database for patches. Patch logs are stored in the directory "Delta" which must exist.
dcmd server --store Delta
This example is for a high availability patch log service, consisting of 3 patch log servers, each using using the embedded Apache Zookeeper, and using AWS S3 for patch storage.
The three patch log server should be run on different machines.
The connection string for Zookeeper lists the machiens and the Zookeeper port used for coordination. Suppose we have machines "server1", "server2", "server3" with zookeeper running on port 1110 on each machine.
dcmd server --port 1060 \ --zkPort 1110 --zkData LocalDirectory \ --zk="server1:1110,server2:1110,server3:1110" \ --s3bucket=some-bucket-name --s3region=region
This command is run on each of the machines "server1", "server2" and "server3"
If for testing, multiple servers are run on one machine, the zookeeper and delta server ports must be different across all machines.