Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NRBF] Don't use Unsafe.As when decoding DateTime(s) #105749

Merged
merged 4 commits into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,7 @@ private static List<T> DecodeFromNonSeekableStream(BinaryReader reader, int coun
}
else if (typeof(T) == typeof(DateTime))
{
values.Add((T)(object)Utils.BinaryReaderExtensions.CreateDateTimeFromData(reader.ReadInt64()));
values.Add((T)(object)Utils.BinaryReaderExtensions.CreateDateTimeFromData(reader.ReadUInt64()));
}
else
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ private static SerializationRecord DecodeMemberPrimitiveTypedRecord(BinaryReader
PrimitiveType.Single => new MemberPrimitiveTypedRecord<float>(reader.ReadSingle()),
PrimitiveType.Double => new MemberPrimitiveTypedRecord<double>(reader.ReadDouble()),
PrimitiveType.Decimal => new MemberPrimitiveTypedRecord<decimal>(decimal.Parse(reader.ReadString(), CultureInfo.InvariantCulture)),
PrimitiveType.DateTime => new MemberPrimitiveTypedRecord<DateTime>(Utils.BinaryReaderExtensions.CreateDateTimeFromData(reader.ReadInt64())),
PrimitiveType.DateTime => new MemberPrimitiveTypedRecord<DateTime>(Utils.BinaryReaderExtensions.CreateDateTimeFromData(reader.ReadUInt64())),
// String is handled with a record, never on it's own
_ => new MemberPrimitiveTypedRecord<TimeSpan>(new TimeSpan(reader.ReadInt64())),
};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,10 @@ internal SerializationRecord TryToMapToUserFriendly()
}
else if (MemberValues.Count == 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This assumes that the BF format is set is stone and that new key/value pairs won't be added in future. Is it safe assumption to make?

  • If somebody constructs malicious payload with extra TimeSpan, DateTime or Guid fields or with fields of unexpected type, this pattern match won't kick in, there won't be any exception thrown and we return the raw data. Is it the desired behavior for Nrbf reader? (As far as I can tell, the reader tends to throw on anything unexpected or invalid instead of accepting it silently.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great questions.

BinaryFormatter can represent same primitive value using different record types based on the context.
In this case, when DateTime is the root object it is expressed as SystemClassWithMembersAndTypesRecord which is just a type name + key/value dictionary. In other cases, it can be represented as MemberPrimitiveTypedRecord<T> (or a raw 8 bytes).

My goal was to hide this from the end users and always map it to PrimitiveTypeRecord<T> so users don't need to become experts in this area.

This assumes that the BF format is set is stone and that new key/value pairs won't be added in future.

If we ever extend the binary representation of given types, we may need to handle the versioning here.

If somebody constructs malicious payload with extra TimeSpan, DateTime or Guid fields or with fields of unexpected type, this pattern match won't kick in, there won't be any exception thrown and we return the raw data. Is it the desired behavior for Nrbf reader?

It's allowed to create a type that is called System.DateTime and has different layout, in such cases we are going to return a ClassRecord and the users will need to handle it.

SerializationRecord rootObject = NrbfDecoder.Decode(payload);
if (rootObject is PrimitiveTypeRecord<DateTime> primitiveRecord)
{
    // DateTime
}
else if (rootObject is ClassRecord classRecord)
{
    // something else
}

@jkotas this is just the way I see it, please let me know if something is not clear or some other changes are needed

Copy link
Member

@jkotas jkotas Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal was to hide this from the end users and always map it to PrimitiveTypeRecord

You are not always mapping it to PrimitiveTypeRecord<T>.

You are only mapping it to PrimitiveTypeRecord<T> if the input has specific shape. You are not mapping it for all possible valid input shapes. For example, if the payload was produced by .NET Framework 1.x (I am sure there are a bunch of such payloads still alive in the wild), it will be missing dateData field and it is not going to be mapped. However, the classic BF deserializer is going to handle it just fine. If somebody runs into this case, they will have to do double the work: They will need to handle both mapped and the non-mapped cases.

In general, I would expect the behavior to be either:

  • 100% compatible with classic BF deserializer
  • Exception to be thrown

&& HasMember("ticks") && HasMember("dateData")
&& MemberValues[0] is long value && MemberValues[1] is ulong
&& MemberValues[0] is long && MemberValues[1] is ulong dateData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are MemberValues[0] and MemberValues[1] the same bits just typed differently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. "ticks" is "dataData" with ticks mask applied:

public long Ticks => (long)(_dateData & TicksMask);

// Serialize both the old and the new format
info.AddValue(TicksField, Ticks);
info.AddValue(DateDataField, _dateData);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to depend on the order of the fields in the payload? In other words, is the exact order of fields part of the BF contract for given type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SerializationInfo doesn't officially document the order, but in practice it enumerates elements in the same order in which they're added, and some types are sensitive to this ordering. It's akin to how if a dictionary / hashtable changes the order of enumeration or if a sort routine changes the relative order of "equal" elements, things break.

Copy link
Member

@jkotas jkotas Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant it in the connection with your other comment: Is the order considered a documented .NET Framework detail that it is ok to depend on; or is the order undocumented .NET Framework detail and we should not depend on it?

My hunch is that it should be the later. The de-serializing constructor is explicitly coded to accept any order, or to accept one of the fields missing completely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my mistake to rely on the order of fields, I am going to push a fix in a minute.

&& TypeNameMatches(typeof(DateTime)))
{
return Create(Utils.BinaryReaderExtensions.CreateDateTimeFromData(value));
return Create(Utils.BinaryReaderExtensions.CreateDateTimeFromData(dateData));
}
else if(MemberValues.Count == 4
&& HasMember("lo") && HasMember("mid") && HasMember("hi") && HasMember("flags")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,18 @@

using System.Globalization;
using System.IO;
using System.Reflection;
using System.Reflection.Metadata;
using System.Runtime.CompilerServices;
using System.Runtime.Serialization;
using System.Threading;

namespace System.Formats.Nrbf.Utils;

internal static class BinaryReaderExtensions
{
private static object? s_baseAmbiguousDstDateTime;

internal static BinaryArrayType ReadArrayType(this BinaryReader reader)
{
byte arrayType = reader.ReadByte();
Expand Down Expand Up @@ -70,36 +74,67 @@ internal static object ReadPrimitiveValue(this BinaryReader reader, PrimitiveTyp
PrimitiveType.Single => reader.ReadSingle(),
PrimitiveType.Double => reader.ReadDouble(),
PrimitiveType.Decimal => decimal.Parse(reader.ReadString(), CultureInfo.InvariantCulture),
PrimitiveType.DateTime => CreateDateTimeFromData(reader.ReadInt64()),
PrimitiveType.DateTime => CreateDateTimeFromData(reader.ReadUInt64()),
_ => new TimeSpan(reader.ReadInt64()),
};

// TODO: fix https://github.com/dotnet/runtime/issues/102826
/// <summary>
/// Creates a <see cref="DateTime"/> object from raw data with validation.
/// </summary>
/// <exception cref="SerializationException"><paramref name="data"/> was invalid.</exception>
internal static DateTime CreateDateTimeFromData(long data)
/// <exception cref="SerializationException"><paramref name="dateData"/> was invalid.</exception>
internal static DateTime CreateDateTimeFromData(ulong dateData)
{
// Copied from System.Runtime.Serialization.Formatters.Binary.BinaryParser

// Use DateTime's public constructor to validate the input, but we
// can't return that result as it strips off the kind. To address
// that, store the value directly into a DateTime via an unsafe cast.
// See BinaryFormatterWriter.WriteDateTime for details.
ulong ticks = dateData & 0x3FFFFFFF_FFFFFFFFUL;
DateTimeKind kind = (DateTimeKind)(dateData >> 62);

try
{
const long TicksMask = 0x3FFFFFFFFFFFFFFF;
_ = new DateTime(data & TicksMask);
return ((uint)kind <= (uint)DateTimeKind.Local) ? new DateTime((long)ticks, kind) : CreateFromAmbiguousDst(ticks);
}
catch (ArgumentException ex)
{
// Bad data
throw new SerializationException(ex.Message, ex);
}

return Unsafe.As<long, DateTime>(ref data);
[MethodImpl(MethodImplOptions.NoInlining)]
static DateTime CreateFromAmbiguousDst(ulong ticks)
{
// There's no public API to create a DateTime from an ambiguous DST, and we
// can't use private reflection to access undocumented .NET Framework APIs.
// However, the ISerializable pattern *is* a documented protocol, so we can
// use DateTime's serialization ctor to create a zero-tick "ambiguous" instance,
// then keep reusing it as the base to which we can add our tick offsets.

if (s_baseAmbiguousDstDateTime is not DateTime baseDateTime)
{
#pragma warning disable SYSLIB0050 // Type or member is obsolete
SerializationInfo si = new(typeof(DateTime), new FormatterConverter());
si.AddValue("ticks", 0L); // legacy value (serialized as long) - specify both just to be safe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does specifying tick makes us safe?

I think it can only hide problems and produce invalid values instead of throwing an exception. I cannot think about a case where it actually helps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not speak on the behalf of @GrabYourPitchforks (who authored the code), but my understanding is that initially the SerializationInfo for DateTime contained only ticks field. Later dateData was introduced, but the runtime kept emitting the old field in case it could be deserialized with older runtime.

I've double checked the code:

https://github.com/microsoft/referencesource/blob/51cf7850defa8a17d815b4700b67116e3fa283c2/mscorlib/system/datetime.cs#L388-L389

case TicksField:
_dateData = (ulong)Convert.ToInt64(enumerator.Value, CultureInfo.InvariantCulture);
foundTicks = true;

And you are right, in case this code were executed on a very old runtime, we would provide an invalid result. I've removed it and added comment.

si.AddValue("dateData", 0xC0000000_00000000UL); // new value (serialized as ulong)

#if NET
baseDateTime = CallPrivateSerializationConstructor(si, new StreamingContext(StreamingContextStates.All));
#else
ConstructorInfo ci = typeof(DateTime).GetConstructor(
BindingFlags.Instance | BindingFlags.NonPublic,
binder: null,
new Type[] { typeof(SerializationInfo), typeof(StreamingContext) },
modifiers: null);

baseDateTime = (DateTime)ci.Invoke(new object[] { si, new StreamingContext(StreamingContextStates.All) });
#endif

#pragma warning restore SYSLIB0050 // Type or member is obsolete
Volatile.Write(ref s_baseAmbiguousDstDateTime, baseDateTime); // it's ok if two threads race here
}

return baseDateTime.AddTicks((long)ticks);
}

#if NET
[UnsafeAccessor(UnsafeAccessorKind.Constructor)]
extern static DateTime CallPrivateSerializationConstructor(SerializationInfo si, StreamingContext ct);
#endif
Copy link
Member

@stephentoub stephentoub Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a private constructor like this from a separate package is considered safe / supported? We have a bunch of types now that implement ISerializable but that either throw from their deserialization ctor or don't have one at all... DateTime will never be on the same plan?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. System.Formats.Nrbf should not depend on any of the built-in legacy infrastructure for binary serialization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ctor is part of the ISerializable protocol and it's supported by other serializers as well (example: DataContractSerializer). AFAIK we have no plans to remove these ctors.

tagging @GrabYourPitchforks who suggested this solution in #102826 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK we have no plans to remove these ctors.

We've already made some of them throw PlatformNotSupportedException, e.g.

protected Regex(SerializationInfo info, StreamingContext context) =>
throw new PlatformNotSupportedException();

and entirely removed them from others, e.g.
public sealed class OperatingSystem : ISerializable, ICloneable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a public API that allows us to create the specific data time value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a public API that allows us to create the specific data time value?

Yes, but it would not solve the problem as this package needs to support older monikers, including netstandard2.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both Unsafe.As and reflection are ok for existing targets. The existing targets are set in stone and we can make assumptions about them.

Unsafe.As or reflection are less than ideal for future. They limit the changes we can do in future.

}

internal static bool? IsDataAvailable(this BinaryReader reader, long requiredBytes)
Expand Down
43 changes: 42 additions & 1 deletion src/libraries/System.Formats.Nrbf/tests/EdgeCaseTests.cs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
using System.IO;
using System.Collections.Generic;
using System.IO;
using System.Runtime.Serialization.Formatters;
using System.Runtime.Serialization.Formatters.Binary;
using Microsoft.DotNet.XUnitExtensions;
Expand Down Expand Up @@ -103,4 +104,44 @@ public void FormatterTypeStyleOtherThanTypesAlwaysAreNotSupportedByDesign(Format

Assert.Throws<NotSupportedException>(() => NrbfDecoder.Decode(ms));
}

public static IEnumerable<object[]> CanReadAllKindsOfDateTimes_Arguments
{
get
{
yield return new object[] { new DateTime(1990, 11, 24, 0, 0, 0, DateTimeKind.Local) };
yield return new object[] { new DateTime(1990, 11, 25, 0, 0, 0, DateTimeKind.Utc) };
yield return new object[] { new DateTime(1990, 11, 26, 0, 0, 0, DateTimeKind.Unspecified) };
}
}

[Theory]
[MemberData(nameof(CanReadAllKindsOfDateTimes_Arguments))]
public void CanReadAllKindsOfDateTimes_DateTimeIsTheRootRecord(DateTime input)
{
using MemoryStream stream = Serialize(input);

PrimitiveTypeRecord<DateTime> dateTimeRecord = (PrimitiveTypeRecord<DateTime>)NrbfDecoder.Decode(stream);

Assert.Equal(input.Ticks, dateTimeRecord.Value.Ticks);
Assert.Equal(input.Kind, dateTimeRecord.Value.Kind);
}

[Serializable]
public class ClassWithDateTime
{
public DateTime Value;
}

[Theory]
[MemberData(nameof(CanReadAllKindsOfDateTimes_Arguments))]
public void CanReadAllKindsOfDateTimes_DateTimeIsMemberOfTheRootRecord(DateTime input)
{
using MemoryStream stream = Serialize(new ClassWithDateTime() { Value = input });

ClassRecord classRecord = NrbfDecoder.DecodeClassRecord(stream);

Assert.Equal(input.Ticks, classRecord.GetDateTime(nameof(ClassWithDateTime.Value)).Ticks);
Assert.Equal(input.Kind, classRecord.GetDateTime(nameof(ClassWithDateTime.Value)).Kind);
}
}
Loading