Skip to content

Riak tuning 1

Matthew Von-Maszewski edited this page Aug 14, 2013 · 10 revisions

WARNING: the research for this page is not complete

Summary:

leveldb has a higher read and write throughput in Riak if the Erlang scheduler count is limited to half the number of CPU cores. Tests have demonstrated improvements of 15% to 80% greater throughput.

The scheduler limit is set in the vm.args file:

+S x:x

where "x" is the number of schedulers Erlang may use. Erlang's default value of "x" is the total number of CPUs in the system. For Riak installations using leveldb, the recommendation is to set "x" to half the number of CPUs. Virtual environments are not yet tested.

Example: for 24 CPU system

+S 12:12

Discussion:

We have tested a limited number of CPU configurations and customer loads. In all cases, there is a performance increase when the +S option is added to the vm.args file to reduce the number of Erlang schedulers. The working hypothesis is that the Erlang schedulers perform enough "busy wait" work that they always create context switch away from leveldb when leveldb is actually the only system task with real work.

The tests included 8 CPU (no hyper threading, physical cores only) and 24 CPU (12 physical cores with hyper threading) systems. All were 64bit Intel platforms. Generalized findings:

  • servers running higher number of vnodes (64) had larger performance gains than those with fewer (8)
  • servers running SSD arrays had larger performance gains than those running SATA arrays
  • Get and Write operations showed performance gains, 2i query operations (leveldb iterators) were unchanged
  • Not recommended for servers with less than 8 CPUs (go no lower than +S 4:4)

Performance improvements were as high as 80% over extended, heavily loaded intervals on servers with SSD arrays and 64 vnodes. No test resulted in worse performance due to the addition of +S x:x.

The +S x:x configuration change does not have to be implemented simultaneously to an entire Riak cluster. The change may be applied to a single server for verification. Steps: update the vm.args file, then restart the Riak node. Erlang command line changes to schedules were ineffective.

This configuration change has been running in at least one large, multi-datacenter production environment for several months.

Clone this wiki locally