-
Notifications
You must be signed in to change notification settings - Fork 246
Off Heap Maps, Persistent Maps
(avaiable with version 2.x)
Motivation/Target: FST's OffHeap Maps excel in:
- ease of use (can put any serializable object of arbitrary complexity [cyclic references etc.] )
- fast iteration of values
- simple persistance using memory mapped files
they are not meant as an IPC mechanism and do not support concurrent access from several processes. You might want to check out HugeCollections for an offheap map implementation focussing on concurrent access and IPC.
the base implementation is a map of <Bytes,Bytes>
. This means the key Object must be convertible to a byte sequence and you have to provide a max length for the key (required for internal hashtree implementation and to ease memory management (freelists etc.).
Currently there is only one ready-to-go implementation derived from the binary <byte,byte> map: FSTAsciiStringOffheapMap
If you have a look at the source you'l notice its pretty easy to add flavours supporting other types than String for key values.
FSTConfiguration conf = FSTConfiguration.createDefaultConfiguration(); // conf used for serialization
conf.registerClass(TestRec.class); // avoid save of classnames in serialization
int maxkeylen = 16;
FSTAsciiStringOffheapMap store = new FSTAsciiStringOffheapMap(maxkeylen, 2*FSTAsciiStringOffheapMap.GB, [estimated number of elements], conf);
store.put("test", new TestRec("test") );
....
store.free(); // release the map+mem. put/get after free will segfault !
Note:
- keep keys as short as possible, as lookup performance degrades with max key length.
- the standard JDK Map interface is not implemented (and never will).
- values have to be
Serializable
- no dynamic resize in case memory is not sufficient (not that bad, as OS will swap out non-accessed memory, so you can size greedy in advance)
- can use polymorphic values
Its also possible to have the allocated memory backed by a memory mapped file. Advantage is, in case your process crashes, the OS will track unwritten changes and asynchronously writes back to disk. Note that even if you allocate large files, only the amount of actually written memory will be consumed from your hard disk/ssd. Upon restart fst automatically restores the map+indizes.
This way you get a rather failsafe and fast object persistance.
FSTConfiguration conf = FSTConfiguration.createDefaultConfiguration();
conf.registerClass(TestRec.class); // optimization: avoid writing classname to stream
FSTAsciiStringOffheapMap store = new FSTAsciiStringOffheapMap<>("/tmp/test.mmf", klen, 8*FSTAsciiStringOffheapMap.GB, 100000, conf);
Performance
Serialization is the limiting factor. Expect put rates around 500.000 to 1 million per second (depends on length of keys and size of value objects). Note that updating an object means get+put so you'll get half the throughput in case you want to update existing records. values()
Iteration performance is same as get/put.
Note it might be required to tweak your OS settings in case of high write load. If the Virtual Memory manager is too eager in writing back changed memory regions, write throughput might slow down.
Performance can be very bad (depends on working set) if the file is larger than your servers main memory. However memory has become cheap and without off-heaping its hard to make use of large server memory capacity (GC).
Has been tested on Linux only so far.