Skip to content

Vectors

James Courtney edited this page Mar 8, 2021 · 6 revisions

FlatBuffers supports vectors (lists) through the general syntax of:

table SomeTable
{
   SomeVector:[SomeType];
}

FlatSharp supports the following types for Vectors:

  • IList<T> / IReadOnlyList<T>
  • T[]
  • Memory<byte> / ReadOnlyMemory<byte>

This page attempts to provide some detail on when it is appropriate to choose each type of vector.

The general guidance, however, is to use IList and IReadOnlyList, which provide a balance between performance and the principle of least surprise.

However, there are always exceptions:

  • You need Key/Value lookups. In this case, refer to Indexed Vectors.
  • You need to map a chunk of the input buffer as raw bytes without copying (perhaps a nested flat buffer or a large vector). In this case, use a Memory<byte> vector. This will generally point to a location in the input buffer without any copies.
  • The vectors are known to be small and/or fast access is very important. In these cases, an Array may be useful.

This discussion on the rest of the page is heavily related to the concept of Deserialization Modes. The reader is assumed to have digested the contents of that article before reading the rest of this one.

Lists

By virtue of interfaces, IList<T> and IReadOnlyList<T> give FlatSharp lots of flexibility to satisfy the requested deserialization option in a non-surprising way. FlatSharp provides an implementation of IList<T> that sits directly on top of an IInputBuffer and allows lazy index-based access.

Deserialization Mode Behavior Actual Type
Lazy Elements are instantiated on-demand. A new T() is created for each element accessed. FlatBufferList<T>
PropertyCache Elements are instantiated on-demand. A new T() is created for each element accessed. This is to avoid large array allocations. FlatBufferList<T>
VectorCache New vector is allocated and initialized the first time the Vector property is accessed. The elements of the array access their data according to VectorCache rules. ReadOnlyCollection<T>
VectorCacheMutable Same as VectorCache. List<T>
Greedy New Array allocated and recursively initialized at deserialization time. The elements of the array are greedily initialized. ReadOnlyCollection<T>
GreedyMutable Same as Greedy. List<T>

Lists are great choices for nearly all scenarios, match developer expections about deserialization behavior, and are fast-enough for most cases. Unless the buffer contains binary vectors or small vectors that need very fast access, it is recommended to use lists.

Arrays

Arrays are the simplest vector type. However, Arrays do not allow FlatSharp to be lazy about initialization since they cannot be subclassed. This leads to FlatSharp producing sometimes-surprising behavior when using arrays. Deserializations on arrays always "greedy" on the array itself, though array elements will obey the Deserialization Mode about how lazy/greedy to be.

Deserialization Mode Behavior
Lazy New Array allocated and fully initialized each time the Vector property is accessed. The elements of the array access their data lazily.
PropertyCache New Array allocated and initialized the first time the Vector property is accessed. The elements of the array access their data according to PropertyCache rules.
VectorCache New Array allocated and initialized the first time the Vector property is accessed. The elements of the array access their data according to VectorCache rules.
Greedy New Array allocated and recursively initialized at deserialization time. The elements of the array are greedily initialized.

When to consider Arrays:

  • You need fastest possible access to a vector of data. The CLR overhead for accessing array members is nearly nothing. Be sure and measure the overhead of IList first.
  • When using Greedy deserialization. The vector is going to be allocated no matter what in this mode, so arrays can make sense.

When not to use Arrays:

  • You are using the Lazydeserialization option. This will force a new array allocation each time the vector is accessed through the table.
  • The vectors are very large. Arrays must be allocated all at once and can't be initialized lazily like IList.

Other Notes:

Arrays in .NET are mutable. However, any changes will not be reflected back into the source buffer. When using Deserialization Modes other than Lazy, array mutations will be visible to other readers (but not written to the buffer).

Memory

FlatSharp exposes a final kind of vector: Memory<byte> and ReadOnlyMemory<byte>. These two are special because they allow returning a reference into the IInputBuffer used to deserialize the original object.

table Packet {
   Source:string;
   Destination:string;
   MessageKind:string;
   NestedFlatBuffer:[ubyte] (fs_vector:"Memory");
}

When accessing packet.NestedFlatBuffer, the Memory<byte> that comes back will reference into the original input buffer.

byte[] message = ...;
var parsed = FlatBufferSerializer.Parse<Packet>(message);
var payload = FlatBufferSerializer.Parse<SomePayload>(parsed.NestedFlatBuffer); // this points into the original "message" array. No copies necessary!

When to consider Memory:

Your vector carries large binary payloads, such as files, compressed data, images, or nested FlatBuffers that you wish to avoid copying. Memory provides the ultimate in efficiency by pointing into the original buffer.

When to avoid Memory:

Because Memory<byte> is a pointer into the original input buffer, any modifications made will be written back to the original input buffer. This can be great for some scenarios, but may lead to erroneous behavior if the developer is unaware of this quirk. Using Greedy deserialization will create a copy of the input data.

Clone this wiki locally