Skip to content

Ormuzd/leveldb

 
 

Repository files navigation

leveldb: A key-value store
Authors: Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

The original Google README is now README.GOOGLE.

This repository contains the Google source code as modified to benefit
the Riak environment.  The typical Riak environment has two attributes
that necessitate leveldb adjustments, both in options and code:

- production servers: Riak often runs in heavy Internet environments:
  servers with many CPU cores, lots of memory, and 24x7 disk activity.
  Basho's leveldb takes advantage of the environment by adding
  hardware CRC calculation, increasing Bloom filter accuracy, and
  defaulting to integrity checking enabled.

- multiple databases open: Riak opens 8 to 64 databases
  simultaneously.  Google's leveldb supports this, but its background
  compaction thread can fall behind.  leveldb will "stall" new user
  writes whenever the compaction thread gets too far behind.  Basho's
  leveldb modification include multiple thread blocks that each
  contain prioritized threads for specific compaction activities.


This repository follows the Basho standard for branch management 
as of November 28, 2013.  The standard is found here:

https://github.com/basho/riak/wiki/Basho-repository-management

In summary, the "develop" branch contains the most recently reviewed
engineering work.  The "master" branch contains the most recently
released work, i.e. distributed as part of a Riak release.


Those wishing to truly savor the benefits of Basho's modifications
need to initialize a new leveldb::Options structure similar to the
following before each call to leveldb::DB::Open:

    leveldb::Options * options;

    options=new Leveldb::Options;

    options.filter_policy=leveldb::NewBloomFilterPolicy2(16);
    options.block_cache=leveldb::NewLRUCache(536870912);
    options.max_open_files=500;
    options.write_buffer_size=62914560;
    options.env=leveldb::Env::Default();


Memory usage by leveldb is largely controlled by three
leveldb::Options variables: write_buffer_size, max_open_files, and
size used to initial block_cache.

- write_buffer_size: Riak randomizes the write_buffer_size across its
  multiple databases between 30Mbytes and 60Mbytes.  Internal, leveldb
  could have two of these active for an open database.  Google's
  default is 4Mbyte.  Basho's leveldb has been tuned to the 30Mbyte to
  60Mbyte range, but does not block larger values (though larger
  values are just untested).

- max_open_files: Riak 1.4 simplified calculating the memory impact of
  max_open_files.  Basho's leveldb now simply multiplies
  max_open_files by 4Mbytes to establish the file cache limit.  The
  file cache is therefore really a limit on many files' metadata can
  fit.  max_open_files was originally a file handle limit, but file
  metadata is now a much larger resource concern.  The impact of
  Basho's larger table files and Google's recent addition of Bloom
  filters made the accounting switch necessary.  Where server
  resources allow, the value of max_open_files should exceed the count
  of .sst table files within the database directory.  Memory budgeting
  for this variable is more important to random read operations than
  the block_cache.

- block_cache: The block cache holds data blocks that leveldb has
  recently retrieved from .sst table files.  Any given block contains
  one or more complete key/value pairs.  The cache speeds up repeat
  access to the same key and potential access to adjacent keys.  Basho
  has tested this as high as 2Gbytes.  (Note: the older leveldb
  released in Riak 1.2 had a bug that hurt performance with values
  above 8Mbytes … that was fixed in the Riak 1.3 release)

So estimated memory usage of a database =

    write_buffer_size*2 
  + max_open_files*4194304 
  + (size in NewLRUCache initialization)