-
-
Notifications
You must be signed in to change notification settings - Fork 51
Shared Strings
Shared Strings (or string deduplication) is a relatively lightweight way to decrease the size of a serialized FlatBuffer.
FlatBuffers store strings by pointers. Writing a regular string involves:
- Allocating a vector to hold the string
- Writing the string into the allocated spot
- Writing a pointer into the table or vector that holds the string
Shared Strings in FlatSharp defer these writes until later. So instead of writing "Dog" 10 times, FlatSharp can keep track of the 10 locations in the buffer that need to be updated to point at "Dog".
Shared Strings are not a compression technique, and they do not make reading or writing from Buffers any faster except in contrived cases. LZ4 or other fast compression algorithms will generally give better compression results than Shared Strings, but may be slower.
Shared Strings are useful only in narrow cases. A canonical example is a collection of property bags:
attribute "fs_sharedString";
table DataSet
{
Items:[Item];
}
table Item
{
Pairs:[KeyValuePair];
}
table KeyValuePair
{
Name:string (fs_sharedString);
Value:string;
}
With a relatively small number of Name
values, and a large number of items, the use of Shared Strings could achieve significant space savings.
- Annotate your string fields (or string vector fields) with the
fs_sharedString
attribute. - Optionally -- configure the shared string writer using
ISerializer<T>.WithSettings(...)
. The FlatSharp shared string writer is used by default, however, you may inject your own custom shared string writer or remove the shared string writer entirely. The FlatSharp Default Shared String writer is a flush-on-evict hash table. The size of this hash table is configurable via a constructor.
FlatSharp provides a default implementation of ISharedStringWriter
that is optimized for a balance of size reduction and speed. Custom implementations using Dictionary
are trivial and can be seen in the FlatSharp samples.