-
-
Notifications
You must be signed in to change notification settings - Fork 51
Deserialization Modes
Deserialization Modes are the primary configuration knob that FlatSharp exposes, and the best way to adjust read performance. The choice of mode should depend on data access patterns, size of data, and other benchmarking. This page aims to demystify deserialization modes.
Before we get into a full breakdown of the different performance options, it's useful to have some context on how FlatSharp actually works. FlatSharp treats tables and structs similiarly internally. Let's pretend we have this struct:
struct Location
{
X : float;
Y : float;
Z : float;
}
FlatSharp will emit a base class for this struct with approximately this signature:
public class Location
{
public virtual float X { get; set; }
public virtual float Y { get; set; }
public virtual float Z { get; set; }
}
And when serializing, FlatSharp will generate some code that looks approximately like this:
public static void WriteLocation<TSpanWriter>(
TSpanWriter spanWriter,
Span<byte> span,
Location value,
int offset,
SerializationContext context) where TSpanWriter : ISpanWriter
{
spanWriter.WriteFloat(span, value.X, (offset + 0), context);
spanWriter.WriteFloat(span, value.Y, (offset + 4), context);
spanWriter.WriteFloat(span, value.Z, (offset + 8), context);
}
Reasonably simple: we're writing each field of the struct at the predefined offset relative to the base offset.
Deserializing is more interesting. When deserializing, FlatSharp will generate a subclass of Location
that overrides X
, Y
, and Z
:
public class LocationReader<TInputBuffer> : Location where TInputBuffer : IInputBuffer
{
...
public LocationReader(TInputBuffer buffer, int offset) { ... }
public override float X
{
get => ...
set => ...
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static float ReadIndex0Value(TInputBuffer buffer, int offset) => buffer.ReadFloat(offset);
}
The deserialization code is generated differently depending on which Deserialization option is selected. However, when you parse an object with FlatSharp, you will get back a subclass of the type you requested. How that subclass is implemented depends upon the deserialization option that you select.
GreedyMutable deserialization is the simplest to understand. The full object graph is deserialized at once, and the input buffer is not needed after the fact. Code for a GreedyMutable deserializer looks like this:
public class LocationReader<TInputBuffer> : Location where TInputBuffer : IInputBuffer
{
private float index0Value;
public LocationReader(TInputBuffer buffer, int offset)
{
this.index0Value = ReadIndex0Value(buffer, (offset + 0));
}
public override float X
{
get => this.index0Value;
// When using Greedy instead of GreedyMutable, setters throw a NotMutableException.
set => this.index0Value = value;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static float ReadIndex0Value(TInputBuffer buffer, int offset) => buffer.ReadFloat(offset);
}
Notably, the buffer
parameter is not retained after the constructor has finished, which means you are free to reuse it immediately after
the deserialization operation has concluded.
Lazy
deserialization is the opposite of Greedy
. In Greedy
mode, everything is preallocated and stored. In Lazy mode, nothing is preallocated or stored:
public class LocationReader<TInputBuffer> : Location where TInputBuffer : IInputBuffer
{
private readonly TInputBuffer buffer;
private readonly int offset;
public LocationReader(TInputBuffer buffer, int offset)
{
this.buffer = buffer;
this.offset = offset;
}
public override float X
{
get => ReadIndex0Value(this.buffer, this.offset + 0);
// Lazy is always immutable (with the exception of the WriteThrough attribute)
set => throw new NotMutableException();
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static float ReadIndex0Value(TInputBuffer buffer, int offset) => buffer.ReadFloat(offset);
}
As we see here, Lazy
is as advertised. Properties will only be read as they are accessed. Repeated accesses of the same property result in repeated trips
to the InputBuffer
. Crucially, Lazy
maintains a reference to the InputBuffer
. If your access patterns are sparse, Lazy
deserialization can
be very effective, since cycles are not wasted reading data that isn't used.
Progressive
can be thought of as Lazy-with-caching. The difference between Lazy
and Progressive
mode is that Progressive
will
memoize the results of the reads from the underlying buffer.
public class LocationReader<TInputBuffer> : Location where TInputBuffer : IInputBuffer
{
private readonly TInputBuffer buffer;
private readonly int offset;
private bool hasIndex0Value;
private float index0Value;
public LocationReader(TInputBuffer buffer, int offset)
{
this.buffer = buffer;
this.offset = offset;
}
public override float X
{
get
{
if (!this.hasIndex0Value)
{
this.index0Value = ReadIndex0Value(this.buffer, this.offset + 0);
this.hasIndex0Value = true;
}
return this.index0Value;
}
// Progressive is always immutable (unless using write through)
set => throw new NotMutableException();
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static float ReadIndex0Value(TInputBuffer buffer, int offset) => buffer.ReadFloat(offset);
}
So we see the primary difference between Progressive
and Lazy
is the addition of two fields in the generated class, as well as an if
statement inside the getter.
Progressive
is a great choice when you cannot anticipate the access patterns of your deserialized FlatBuffer. It usually won't be the fastest (though it is possible to contrive a case where it is), but it will never be the slowest. Greedy
is not performant when only a small slice of your buffer is accessed, and Lazy
deteriorates when elements in the buffer are accessed repeatedly.
For repeated accesses, Progressive
is faster than Lazy
at the expense of more memory. For situations where fields are accessed at most once, Progressive
will be slower than Lazy
.
So, we've seen what kind of code Flatsharp will generate for you depending on your configuration. When should you use which options? The best answer is, of course, to benchmark. However, answers to the following should help inform your choices.
FlatSharp is really fast, even with the default GreedyMutable
settings. Don't preemptively optimize. Greedy
and GreedyMutable
also work well
because it guarantees you can immediately recycle your IInputBuffer
object. Using Greedy
deserialization on buffers with lots of data can cause spikes in the Garbage Collection since all of the objects are allocated at once, rather than getting amortized out as you use the buffer. GreedyMutable
is left as the default because it is the most straightforward and most like other serialization libraries.
Lazy
is great when your access patterns are sparse and at-most-once, or your buffers are enormous. If you're touching individual properties more than once, then Lazy
will likely be slower than other options. Lazy
also means that the deserialized objects carry references to the source buffer.
Data is read at-most-once, which is nice when access patterns cannot be anticipated, but full Greedy
mode is not appropriate. For repeated accesses, Progressive
mode approaches the speed of Greedy
and is much faster than Lazy
. For sparse accesses, it is only a small bit slower than Lazy
and much faster than Greedy
. These characteristics make it a great choice for nearly all scenarios.
When dealing with large FlatBuffers, it can be very helpful to use Lazy
or Progressive
. These modes allow pay-as-you-go semantics, so that there are no allocation spikes at once as there are with Greedy
. This allows the GC to exist at a steady state and scoop up most things in Generation 0. Combining Lazy
/Progressive
with write through or value-type structs on struct members can be particularly useful if you need to update large FlatBuffers, as this can be done in place without a Parse -> Update -> Reserialize flow that consumes memory and copies way more data than is necessary. More information can be found in the write through sample.