Description
Background and Motivation
BinaryFormatter
is getting removed in .NET 9, but our customers need to be able to read the payloads that:
- were serialized with BF (using previous .NET versions) and persisted (to disk/db etc)
- are being generated by software they have no control over (example: 3rd party clients calling an existing web service with public API that allows for BF).
Our primary goal is to allow the users to read BF payloads in a secure manner from untrusted input. The principles:
- Treating every input as potentially hostile.
- No type loading of any kind (to avoid remote code execution).
- No recursion of any kind (to avoid unbound recursion, stack overflow and denial of service).
- No buffer pre-allocation based on size provided in payload (to avoid running out of memory and denial of service).
- Using collision-resistant dictionary to store records referenced by other records.
- Only primitive types can be instantiated in implicit way. Arrays can be instantiated on demand (with a default max size limit). Other types are never instantiated.
We also want to make the APIs easy to use, to avoid the customers using the OOB package with the copy of BinaryFormatter
(and remaining vulnerable to various attacks). That is why currently the public API surface is very narrow. We could expose more information, but we don't want to confuse the users or need them to become familiar with BF specification to get simple tasks done. Example: null
can be represented using three different serialization records (ObjectNull
, ObjectNullMultiple
and ObjectNullMultiple256
). The public APIs just return null
, rather than a record that represents it.
The new APIs need to be shipped in a new OOB package that supports older monikers, as we have first party customers running on Full Framework that are going to use it.
Proposed API
namespace System.Runtime.Serialization.BinaryFormat;
public static class NrbfReader
{
/// <summary>
/// Checks if given buffer starts with <see href="https://learn.microsoft.com/openspecs/windows_protocols/ms-nrbf/a7e578d3-400a-4249-9424-7529d10d1b3c">NRBF payload header</see>.
/// </summary>
/// <param name="bytes">The buffer to inspect.</param>
/// <returns><see langword="true" /> if it starts with NRBF payload header; otherwise, <see langword="false" />.</returns>
public static bool StartsWithPayloadHeader(byte[] bytes);
/// <summary>
/// Checks if given stream starts with <see href="https://learn.microsoft.com/openspecs/windows_protocols/ms-nrbf/a7e578d3-400a-4249-9424-7529d10d1b3c">NRBF payload header</see>.
/// </summary>
/// <param name="stream">The stream to inspect. The stream must be both readable and seekable.</param>
/// <returns><see langword="true" /> if it starts with NRBF payload header; otherwise, <see langword="false" />.</returns>
/// <exception cref="ArgumentNullException"><paramref name="stream" /> is <see langword="null" />.</exception>
/// <exception cref="NotSupportedException">The stream does not support reading or seeking.</exception>
/// <exception cref="ObjectDisposedException">The stream was closed.</exception>
/// <remarks><para>When this method returns, <paramref name="stream" /> will be restored to its original position.</para></remarks>
public static bool StartsWithPayloadHeader(Stream stream);
/// <summary>
/// Reads the provided NRBF payload.
/// </summary>
/// <param name="payload">The NRBF payload.</param>
/// <param name="options">Options to control behavior during parsing.</param>
/// <param name="leaveOpen">
/// <see langword="true" /> to leave <paramref name="payload"/> payload open
/// after the reading is finished; otherwise, <see langword="false" />.
/// </param>
/// <returns>A <see cref="SerializationRecord"/> that represents the root object.
/// It can be either <see cref="PrimitiveTypeRecord{T}"/>,
/// a <see cref="ClassRecord"/> or an <see cref="ArrayRecord"/>.</returns>
/// <exception cref="ArgumentNullException"><paramref name="payload"/> is <see langword="null" />.</exception>
/// <exception cref="ArgumentException"><paramref name="payload"/> does not support reading or is already closed.</exception>
/// <exception cref="SerializationException">Reading from <paramref name="payload"/> encounters invalid NRBF data.</exception>
/// <exception cref="DecoderFallbackException">Reading from <paramref name="payload"/>
/// encounters an invalid UTF8 sequence.</exception>
public static SerializationRecord Read(Stream payload, PayloadOptions? options = default, bool leaveOpen = false);
/// <param name="recordMap">
/// When this method returns, contains a mapping of <see cref="SerializationRecord.ObjectId" /> to the associated serialization record.
/// This parameter is treated as uninitialized.
/// </param>
public static SerializationRecord Read(Stream payload, out IReadOnlyDictionary<int, SerializationRecord> recordMap, PayloadOptions? options = default, bool leaveOpen = false);
/// <summary>
/// Reads the provided Binary Format payload that is expected to contain an instance of any class (or struct) that is not an <seealso cref="Array"/> or a primitive type.
/// </summary>
/// <returns>A <seealso cref="ClassRecord"/> that represents the root object.</returns>
public static ClassRecord ReadClassRecord(Stream payload, PayloadOptions? options = default, bool leaveOpen = false);
}
public sealed class PayloadOptions
{
public PayloadOptions() { }
public TypeNameParseOptions? TypeNameParseOptions { get; set; }
/// <summary>
/// Gets or sets a value that indicates whether type name truncation is undone.
/// </summary>
/// <value><see langword="true" /> if truncated type names should be reassembled; otherwise, <see langword="false" />.</value>
/// <remarks>
/// Example:
/// TypeName: "Namespace.TypeName`1[[Namespace.GenericArgName"
/// LibraryName: "AssemblyName]]"
/// Is combined into "Namespace.TypeName`1[[Namespace.GenericArgName, AssemblyName]]"
/// </remarks>
public bool UndoTruncatedTypeNames { get; set; }
}
/// <summary>
/// Abstract class that represents the serialization record.
/// </summary>
/// <remarks>
/// Every instance returned to the end user can be either <seealso cref="PrimitiveTypeRecord{T}"/>,
/// a <seealso cref="ClassRecord"/> or an <seealso cref="ArrayRecord"/>.
/// </remarks>
public abstract class SerializationRecord
{
internal SerializationRecord(); // others can't derive from this type
/// <summary>
/// Gets the type of the record.
/// </summary>
/// <value>The type of the record.</value>
public abstract RecordType RecordType { get; }
/// <summary>
/// Gets the ID of the record.
/// </summary>
/// <value>The ID of the record.</value>
public abstract int ObjectId { get; }
/// <summary>
/// Compares the type and assembly name read from the payload against the specified type.
/// </summary>
/// <remarks>
/// <para>This method takes type forwarding into account.</para>
/// <para>This method does NOT take into account member names or their types.</para>
/// </remarks>
/// <param name="type">The type to compare against.</param>
/// <returns><see langword="true" /> if the serialized type and assembly name match provided type; otherwise, <see langword="false" />.</returns>
public virtual bool IsTypeNameMatching(Type type);
}
/// <summary>
/// Record type.
/// </summary>
/// <remarks>
/// <para>
/// The enumeration does not contain all values supported by the <see href="https://learn.microsoft.com/openspecs/windows_protocols/ms-nrbf/954a0657-b901-4813-9398-4ec732fe8b32">
/// [MS-NRBF] 2.1.2.1</see>, but only those supported by the <see cref="PayloadReader"/>.
/// </para>
/// </remarks>
public enum RecordType : byte
{
SerializedStreamHeader,
ClassWithId,
// SystemClassWithMembers and ClassWithMembers are not supported by design (require type loading) and not included
SystemClassWithMembersAndTypes = 4,
ClassWithMembersAndTypes,
BinaryObjectString,
BinaryArray,
MemberPrimitiveTyped,
MemberReference,
ObjectNull,
MessageEnd,
BinaryLibrary,
ObjectNullMultiple256,
ObjectNullMultiple,
ArraySinglePrimitive,
ArraySingleObject,
ArraySingleString
}
/// <summary>
/// Represents a record that itself represents the primitive value of <typeparamref name="T"/> type.
/// </summary>
/// <typeparam name="T">The type of the primitive value.</typeparam>
/// <remarks>
/// <para>
/// The NRBF specification considers the following types to be primitive:
/// <see cref="string"/>, <see cref="bool"/>, <see cref="byte"/>, <see cref="sbyte"/>
/// <see cref="char"/>, <see cref="short"/>, <see cref="ushort"/>,
/// <see cref="int"/>, <see cref="uint"/>, <see cref="long"/>, <see cref="ulong"/>,
/// <see cref="float"/>, <see cref="double"/>, <see cref="decimal"/>,
/// <see cref="DateTime"/> and <see cref="TimeSpan"/>.
/// </para>
/// <para>Other serialization records are represented with <see cref="ClassRecord"/> or <see cref="ArrayRecord"/>.</para>
/// </remarks>
public abstract class PrimitiveTypeRecord<T> : SerializationRecord
{
private protected PrimitiveTypeRecord(T value);
public T Value { get; }
}
/// <summary>
/// Defines the core behavior for NRBF class records and provides a base for derived classes.
/// </summary>
public abstract class ClassRecord : SerializationRecord
{
private protected ClassRecord(ClassInfo classInfo);
public TypeName TypeName { get; }
public IEnumerable<string> MemberNames { get; }
/// <summary>
/// Checks if member of given name was present in the payload.
/// </summary>
/// <param name="memberName">The name of the member.</param>
/// <returns><see langword="true" /> if it was present, otherwise <see langword="false" />.</returns>
/// <remarks>
/// <para>
/// It's recommended to use this method when dealing with payload that may contain
/// different versions of the same type.
/// </para>
/// </remarks>
public bool HasMember(string memberName);
public string? GetString(string memberName);
public bool GetBoolean(string memberName);
public byte GetByte(string memberName);
public sbyte GetSByte(string memberName);
public short GetInt16(string memberName);
public ushort GetUInt16(string memberName);
public char GetChar(string memberName);
public int GetInt32(string memberName);
public uint GetUInt32(string memberName);
public float GetSingle(string memberName);
public long GetInt64(string memberName);
public ulong GetUInt64(string memberName);
public double GetDouble(string memberName);
public decimal GetDecimal(string memberName);
public TimeSpan GetTimeSpan(string memberName);
public DateTime GetDateTime(string memberName);
/// <summary>
/// Retrieves an array for the provided <paramref name="memberName"/>.
/// </summary>
/// <param name="memberName">The name of the field.</param>
/// <param name="allowNulls">Specifies whether null values are allowed.</param>
/// <returns>The array itself or null.</returns>
/// <exception cref="KeyNotFoundException">Member of such name does not exist.</exception>
/// <exception cref="InvalidOperationException">Member of such name has value of a different type.</exception>
public T?[]? GetArrayOfPrimitiveType<T>(string memberName, bool allowNulls = true);
/// <summary>
/// Retrieves the <see cref="SerializationRecord" /> of the provided <paramref name="memberName"/>.
/// </summary>
/// <param name="memberName">The name of the field.</param>
/// <returns>The serialization record, which can be any of <see cref="PrimitiveTypeRecord{T}"/>,
/// <see cref="ClassRecord"/>, <see cref="ArrayRecord"/> or <see langword="null" />.
/// </returns>
/// <exception cref="KeyNotFoundException"><paramref name="memberName" /> does not refer to a known member. You can use <see cref="HasMember(string)"/> to check if given member exists.</exception>
/// <exception cref="InvalidOperationException">The specified member is not a <see cref="SerializationRecord"/>, but just a raw primitive value.</exception>
public SerializationRecord? GetSerializationRecord(string memberName);
/// <summary>
/// Retrieves the value of the provided <paramref name="memberName"/>.
/// </summary>
/// <param name="memberName">The name of the member.</param>
/// <returns>The value.</returns>
/// <exception cref="KeyNotFoundException"><paramref name="memberName" /> does not refer to a known member. You can use <see cref="HasMember(string)"/> to check if given member exists.</exception>
/// <exception cref="InvalidOperationException">Member of such name has value of a different type.</exception>
public ClassRecord? GetClassRecord(string memberName);
/// <returns>
/// <para>For primitive types like <see cref="int"/>, <see langword="string"/> or <see cref="DateTime"/> returns their value.</para>
/// <para>For nulls, returns a null.</para>
/// <para>For other types that are not arrays, returns an instance of <see cref="ClassRecord"/>.</para>
/// <para>For single-dimensional arrays returns <see cref="ArrayRecord{T}"/> where the generic type is the primitive type or <see cref="ClassRecord"/>.</para>
/// <para>For jagged and multi-dimensional arrays, returns an instance of <see cref="ArrayRecord"/>.</para>
/// </returns>
public object? GetRawValue(string memberName);
}
/// <summary>
/// Defines the core behavior for NRBF array records and provides a base for derived classes.
/// </summary>
public abstract class ArrayRecord : SerializationRecord
{
private protected ArrayRecord(ArrayInfo arrayInfo);
/// <summary>
/// When overridden in a derived class, gets a buffer of integers that represent the number of elements in every dimension.
/// </summary>
/// <value>A buffer of integers that represent the number of elements in every dimension.</value>
public abstract ReadOnlySpan<int> Lengths { get; }
/// <summary>
/// Gets the rank of the array.
/// </summary>
/// <value>The rank of the array.</value>
public int Rank { get; }
/// <summary>
/// Gets the type of the array.
/// </summary>
/// <value>The type of the array.</value>
public BinaryArrayType ArrayType { get; }
/// <summary>
/// Gets the name of the array element type.
/// </summary>
/// <value>The name of the array element type.</value>
public abstract TypeName ElementTypeName { get; }
/// <summary>
/// Allocates an array and fills it with the data provided in the serialized records (in case of primitive types like <see cref="string"/> or <see cref="int"/>) or the serialized records themselves.
/// </summary>
/// <param name="expectedArrayType">Expected array type.</param>
/// <param name="allowNulls">
/// <see langword="true" /> to permit <see langword="null" /> values within the array;
/// otherwise, <see langword="false" />.
/// </param>
/// <returns>An array filled with the data provided in the serialized records.</returns>
/// <exception cref="InvalidOperationException"><paramref name="expectedArrayType" /> does not match the data from the payload.</exception>
public Array GetArray(Type expectedArrayType, bool allowNulls = true);
}
/// <summary>
/// Binary array type.
/// </summary>
/// <remarks>
/// BinaryArrayType enumeration is described in <see href="https://learn.microsoft.com/openspecs/windows_protocols/ms-nrbf/4dbbf3a8-6bc4-4dfc-aa7e-36a35be6ff58">[MS-NRBF] 2.4.1.1</see>.
/// </remarks>
public enum BinaryArrayType : byte
{
/// <summary>
/// A single-dimensional array.
/// </summary>
Single = 0,
/// <summary>
/// An array whose elements are arrays. The elements of a jagged array can be of different dimensions and sizes.
/// </summary>
Jagged = 1,
/// <summary>
/// A multi-dimensional rectangular array.
/// </summary>
Rectangular = 2,
/// <summary>
/// A single-dimensional array where the lower bound index is greater than 0.
/// </summary>
SingleOffset = 3,
/// <summary>
/// A jagged array where the lower bound index is greater than 0.
/// </summary>
JaggedOffset = 4,
/// <summary>
/// Multi-dimensional arrays where the lower bound index of at least one of the dimensions is greater than 0.
/// </summary>
RectangularOffset = 5
}
/// <summary>
/// Defines the core behavior for NRBF single dimensional, zero-indexed array records and provides a base for derived classes.
/// </summary>
public abstract class ArrayRecord<T> : ArrayRecord
{
private protected ArrayRecord(ArrayInfo arrayInfo);
/// <summary>
/// Gets the length of the array.
/// </summary>
/// <value>The length of the array.</value>
public int Length { get; }
/// <summary>
/// When overridden in a derived class, allocates an array of <typeparamref name="T"/> and fills it with the data provided in the serialized records (in case of primitive types like <see cref="string"/> or <see cref="int"/>) or the serialized records themselves.
/// </summary>
/// <param name="allowNulls">
/// <see langword="true" /> to permit <see langword="null" /> values within the array;
/// otherwise, <see langword="false" />.
/// </param>
/// <returns>An array filled with the data provided in the serialized records.</returns>
public abstract T?[] GetArray(bool allowNulls = true);
}
Usage Examples
The implementation with no dependency to dotnet/runtime can be found here.
Reading a class serialized with BF to a file
ClassRecord rootRecord = NrbfReader.ReadClassRecord(File.OpenRead("peristedPayload.bf"));
Sample output = new()
{
// using the dedicated methods to read primitive values
Integer = rootRecord.GetInt32(nameof(Sample.Integer)),
Text = rootRecord.GetString(nameof(Sample.Text)),
// using dedicated method to read an array of bytes
ArrayOfBytes = rootRecord.GetArrayOfPrimitiveType<byte>(nameof(Sample.ArrayOfBytes)),
// using GetClassRecord to read a class record
ClassInstance = new()
{
Text = rootRecord
.GetClassRecord(nameof(Sample.ClassInstance))!
.GetString(nameof(Sample.Text))
}
};
[Serializable]
public class Sample
{
public int Integer;
public string? Text;
public byte[]? ArrayOfBytes;
public Sample? ClassInstance;
}
Checking if Stream contains BF payload
The users need to be able to check if given Stream
contains BF
data, as they might want to migrate the data on demand to new serialization format:
static T Pseudocode<T>(Stream payload, NewSerializer newSerializer)
{
if (NrbfReader.StartsWithPayloadHeader(payload))
{
T fromPayload = UseThePayloadReaderToReadTheData<T>(payload);
payload.Seek(0, SeekOrigin.Begin);
newSerializer.Serialize(payload, fromPayload);
payload.Flush();
}
else
{
return newSerializer.Deserialize<T>(payload)
}
}
SzArrays
Single dimension, zero-indexed arrays are expected to be the most frequently used arrays.
SerializationRecord rootObject = NrbfReader.Read(File.OpenRead("peristedPayload.bf"));
if (rootObject is ArrayRecord<string> arrayOfStrings)
{
string?[] strings = arrayOfStrings.GetArray();
}
Other arrays
BF supports:
- jagged arrays
- multi-dimensional array
- non-zero indexed arrays
They are all represented by internal types that derive from ArrayRecord
. The users can use the API to instantiate such arrays, but they need to provide the expected array type. By doing that we make this advanced scenario possible and safe (the library is not loading any types, if there is a type mismatch it throws).
public abstract class ArrayRecord : SerializationRecrd
{
public Array GetArray(Type expectedArrayType, bool allowNulls = true);
}
ArrayRecord arrayRecord = (ArrayRecord)NrbfReader.Read(File.OpenRead("peristedPayload.bf"));
if (arrayRecord.ArrayType == ArrayType.Jagged)
{
int[][][] array = (int[][][])arrayRecord.GetArray(expectedArrayType: typeof(int[][][]));
}
For more usages of this API please refer to JaggedArraysTests.cs, RectangularArraysTests.cs and CustomOffsetArrays.cs.
Arrays of non-primitive types
Arrays of non-primitive types are represented as ArrayRecord<ClassRecord>
or just ArrayRecord
.
ArrayRecord<ClassRecord> rootRecord = (ArrayRecord<ClassRecord>)NrbfReader.Read(File.OpenRead("peristedPayload.bf"));
ClassRecord[] classRecords = rootRecord.GetArray(allowNulls: false)!;
Sample[] output = classRecords
.Select(classRecord => new Sample()
{
Integer = classRecord.GetInt32(nameof(Sample.Integer)),
Text = classRecord.GetString(nameof(Sample.Text))
})
.ToArray();
Risks
If the new APIs are not easy to use, some of the users might choose the new OOB package with a copy of BF and remain vulnerable to all attacks. This defeats the purpose of our initiative and must be avoided.