Skip to content

Improved uninitialized object creation performance #98016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 8, 2024

Conversation

mikernet
Copy link
Contributor

@mikernet mikernet commented Feb 6, 2024

Added CreateUninitializedCache to speed up uninitialized object creation.

@ghost ghost added community-contribution Indicates that the PR has been added by a community member area-System.Reflection labels Feb 6, 2024
@ghost
Copy link

ghost commented Feb 6, 2024

Tagging subscribers to this area: @dotnet/area-system-reflection
See info in area-owners.md if you want to be subscribed.

Issue Details
  • Adapt ActivatorCache for uninitialized usage.

  • CoreCLR GetActivationInfo method now also returns info for types without default ctors.

  • Remove CoreCLR GetUninitializedObject method.

Author: mikernet
Assignees: -
Labels:

area-System.Reflection, community-contribution

Milestone: -

@mikernet mikernet force-pushed the feat-faster-uninit-objects branch from f5c7618 to 071f79d Compare February 6, 2024 03:38
@jkotas
Copy link
Member

jkotas commented Feb 6, 2024

Improved uninitialized object creation performance

Improving performance of this API has dubious value. This API exists for backward compatibility only. We are not interested in promote its use. When we implemented the performance improvements for Activator.CreateInstance some time ago, we intentionally left this API out.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

@jkotas

Improving performance of this API has dubious value. This API exists for backward compatibility only. We are not interested in promote its use. When we implemented the performance improvements for Activator.CreateInstance some time ago, we intentionally left this API out.

Would help with perf on binary serializer we have that is performance critical. We need to support creating types without default ctors, and it works better without calling default ctors at all for a couple different reasons.

@jkotas
Copy link
Member

jkotas commented Feb 6, 2024

Would help with perf on binary serializer we have that is performance critical.

We have multi-year project to eradicate binary serializers from .NET ecosystem due to security issues that they introduce.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

We have multi-year project to eradicate binary serializers from .NET ecosystem due to security issues that they introduce.

I know, was a bad design that couldn't be fixed without breaking it. Not an issue with ours.

@jkotas
Copy link
Member

jkotas commented Feb 6, 2024

Well, all binary serializers that create uninitialized objects are questionable design.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

Respectfully, that's an overly broad generalization that doesn't always apply. The only time it creates uninitialized objects is when the object itself is aware that it will be created as uninitialized during deserialization and it provides the logic to use for initializing itself in that case.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

Perhaps I should clarify this statement:

it works better without calling default ctors at all

It works better without calling default ctors at all on objects that provide deserialization behavior, which when using our lib, is most DTOs.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

@jkotas I guess I'll pause putting any more effort into this for now. Please let me know if there is any point in me continuing down this path or not. I don't want to spend more time on a PR that isn't going to get merged.

@jkotas
Copy link
Member

jkotas commented Feb 6, 2024

The only time it creates uninitialized objects is when the object itself is aware that it will be created as uninitialized during deserialization and it provides the logic to use for initializing itself in that case..

If the type is explicitly designed for binary serializer, part of the design can be a constructor that prepares the instance for binary serialization. Empty or near empty constructor costs nothing given the total costs involved in reflection-based serialization.

If the serializer serializes external type that is not explicitly designed for it, skipping the constructor is dangerous. I have spent a lot of time over the years debugging obscure crashes caused by binary serializers creating external uninitialized objects. One example from many: #67491 (comment)

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

If the type is explicitly designed for binary serializer, part of the design can be a constructor that prepares the instance for binary serialization.

Calling a parameterized constructor on a generic T is quite slow. We have some compiled expression stuff that makes it quick on normal .NET, but on NAOT it is much slower than creating uninitialized instances and then calling a method to deserialize (though the interpreted expressions are still quite a bit faster than Activator.CreateInstance with parameters in this case).

I'm aware of the alternatives in this case, but they all pose usability issues, especially with NRTs. It's a bit of a unique design that has lots of advantages and great usability. I'm not keen on sacrificing that.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

If the serializer serializes external type that is not explicitly designed for it, skipping the constructor is dangerous.

I know, that's why we don't do that :)

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

Ah, right, I also remembered another reason why the constructor approach won't actually work here at all - it would need to take a ref struct parameter, which interpreted expressions and Activator.CreateInstance don't support at all.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

Note, I could get around the ref struct constructor parameter like this, but then I'm back at the GetUninitializedObject perf issue so I've gotten nowhere 😅

var pfnCtor = typeof(Test).GetConstructors()[0].MethodHandle.GetFunctionPointer();
var ctor = (SpanCtor<Test>)Activator.CreateInstance(typeof(SpanCtor<Test>), null, pfnCtor)!;
var t = (Test)RuntimeHelpers.GetUninitializedObject(typeof(Test));
ctor(t, "Test");
t.Dump();

delegate void SpanCtor<TObject>(TObject uninitialized, ReadOnlySpan<char> p);

class Test
{
    public string Value;

    public Test(ReadOnlySpan<char> p) 
    {
        Value = p.ToString();
    }
}

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

I did a search across github for usages of GetUninitializedObject. Some entries that appeared notable that could benefit from this:

  • WPF
  • Odin Serializer
  • MongoDB BSON serializer
  • protobuf-net
  • Castle Windsor
  • IKVM
  • Telerik Just Mock
  • NCache

You might not want to "encourage" people to use it because it is "unsafe", but it's a tool that is useful in some situations when used carefully, just like all the other unsafe low-level things the runtime provides. The work to make it fast is basically done here, aside from some minor exception type fixes. The benefit is 15x perf improvement, the cost is 1 extra field in the activator cache.

@MichalStrehovsky
Copy link
Member

Ah, right, I also remembered another reason why the constructor approach won't actually work here at all - it would need to take a ref struct parameter, which interpreted expressions and Activator.CreateInstance don't support at all.

If you had the CreateInstanceFactoryTyped API from #23716, would that address your use case?

@MichalPetryka
Copy link
Contributor

Ah, right, I also remembered another reason why the constructor approach won't actually work here at all - it would need to take a ref struct parameter, which interpreted expressions and Activator.CreateInstance don't support at all.

If you had the CreateInstanceFactoryTyped API from #23716, would that address your use case?

I'd be happy with that and CreateFieldAccessor for my serialization scenarios, however for AOT scenarios I'll need to resurrect my #85197 and expand it to delegates to inline calls to those.

@jkotas
Copy link
Member

jkotas commented Feb 6, 2024

Ah, right, I also remembered another reason why the constructor approach won't actually work here at all - it would need to take a ref struct parameter, which interpreted expressions and Activator.CreateInstance don't support at all.

You can fix this problem by requiring the types that want to participate in your serialization scheme to provide a factory method with a specific signature. The factory method invocation can be invoked via cached delegate. It will be much faster than Activator.CreateInstance`` or GetUninitializedObject`. The factory method can be generated by source generators if you do not want your users to write it.

@jkotas
Copy link
Member

jkotas commented Feb 6, 2024

@jkotas I guess I'll pause putting any more effort into this for now. Please let me know if there is any point in me continuing down this path or not.

I guess I can live with it if it is a nice clean change.

@jkotas
Copy link
Member

jkotas commented Feb 6, 2024

The work to make it fast is basically done here, aside from some minor exception type fixes.

I do not think that it just minor exception type fixes. There are other subtle differences between Activator.CreateInstance and GetUninitializedObject, e.g. in how static constructor triggers are handled. Trying to prep the activation info for both Activator.CreateInstance and GetUninitializedObject at the same time does not look like the best approach. It increases the one-time cost of Activator.CreateInstance if nothing else.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

@MichalStrehovsky

if you had the CreateInstanceFactoryTyped API

Actually no, not in this case (though the use-case being discussed here isn't my only usage of GetUninitializedInstance), because it needs generic params.

@jkotas

Empty or near empty constructor costs nothing given the total costs involved in reflection-based serialization.

Just FYI, it's not reflection-based, quite the opposite. If you want an example:

Example
public abstract class Page : IBinaryProcessable
{
    // Header:
    //
    // Size     Type    Field               Description
    // ----     ----    -----               -----------
    //    8     ulong   Transaction ID      The ID of the transaction that last modified this page
    //    4     uint    Page number         Physical page number in the file
    //    1     byte    Struct type         The type of structure this page is a part of
    //    1     byte    Data type           The kind of data the structure this page belongs to holds
    //    1     byte    Page level          The page level in the tree (zero for leaf level pages)
    //    2     ushort  Owner ID            The ID of the schema object that owns this page
    //    2     ushort  Free space          The amount of free space left on the page
    //    4     uint    Previous page       Previous page number at the same level
    //    4     uint    Next page           Next page number at the same level
    //    2     ushort  Active start offset Active page segment index offset from start
    //    2     ushort  Active end offset   Active page segment index offset from end
    //   33     n/a     Reserved            Reserved for future use
    // ----
    //   64

    /// <summary>
    /// Size of the page header.
    /// </summary>
    public const int HeaderSize = 64;

    /// <summary>
    /// The number of reserved bytes at the end of the header
    /// </summary>
    private const int HeaderReservedBytes = 33;

    // Footer: 
    //
    // Size     TypeOf  Field           Description
    // ----     ----    -----           -----------
    //    4     int     Checksum        CRC32 Checksum to detect corruption
    //    8     ulong   Transaction ID  Repeated transaction ID to detect tearing on writes
    // ----
    //   12

    /// <summary>
    /// Size of the page footer.
    /// </summary>
    public const int FooterSize = 12;

    // Footer offsets are measured from the end of the page

    private const int FooterChecksumOffset = 12;
    private const int FooterTransactionIdOffset = 8;

    // Persisted fields

    private long _transactionId;
    private int _pageNumber;
    private PageStructType _structType;
    private PageDataType _dataType;
    private byte _level;
    private short _ownerId;
    private short _freeEntrySpace;
    private int _previousPageNumber;
    private int _nextPageNumber;
    private short _activeStartOffset = short.MaxValue;
    private short _activeEndOffset = short.MaxValue;

    /// <summary>
    /// Initializes a new page instance.
    /// </summary>
    protected Page(long transactionId, int pageNumber, PageStructType structType, PageDataType dataType, short ownerId, byte level)
    {
        _transactionId = transactionId;
        _pageNumber = pageNumber;
        _structType = structType;
        _dataType = dataType;
        _ownerId = ownerId;
        _level = level;
    }

    void IBinaryProcessable.Process<TByteOrder>(ref BinaryProcessor<TByteOrder> processor)
    {
        ProcessHeader(ref processor);
        ProcessEntries(ref processor);
        ProcessChecksum(ref processor);
    }

    protected abstract void ProcessEntries<TByteOrder>(ref BinaryProcessor<TByteOrder> processor) where TByteOrder : IByteOrder;

    /// <summary>
    /// Processes the header and the verification transaction ID used to detect page tears.
    /// The checksum is NOT processed - ProcessChecksum() must be called to process the
    /// checksum.
    /// </summary>
    internal void ProcessHeader<TByteOrder>(ref BinaryProcessor<TByteOrder> processor) where TByteOrder : IByteOrder
    {
        processor.Position = 0;

        // Process header fields

        processor.Process(ref _transactionId);
        processor.Process(ref _pageNumber);
        processor.Process(ref _structType);
        processor.Process(ref _dataType);
        processor.Process(ref _level);
        processor.Process(ref _ownerId);
        processor.Process(ref _freeEntrySpace);
        processor.Process(ref _previousPageNumber);
        processor.Process(ref _nextPageNumber);
        processor.Process(ref _activeStartOffset);
        processor.Process(ref _activeEndOffset);

        Debug.Assert(processor.Position == HeaderSize - HeaderReservedBytes, "Page header was not processed to the correct position.");

        // Verify footer transaction ID

        processor.Position = processor.Buffer.Length - FooterTransactionIdOffset;

        if (!processor.Verify(_transactionId))
        {
            static void Throw() => throw new PageVerifyException(true, "Page is torn.");
            Throw();
        }

        Debug.Assert(processor.Position == processor.Buffer.Length, "Page footer was not processed to the correct length.");
    }

    /// <summary>
    /// Reads and verifies the checksum (for reading processers) or generates and writes the checksum (for writing processors) directly 
    /// from the processor buffer. Checksum verification can be done immediately after initialization. Checksum generation must be done
    /// as the last processing step after the header and entries have been processed so that all the data has been written to the buffer.
    /// </summary>
    internal void ProcessChecksum<TByteOrder>(ref BinaryProcessor<TByteOrder> processor) where TByteOrder : IByteOrder
    {
        int checksum = Crc32Checksum.Calculate(processor.Buffer.Slice(0, processor.Buffer.Length - FooterSize - _freeEntrySpace));

        processor.Position = processor.Buffer.Length - FooterChecksumOffset;

        if (!processor.Verify(checksum))
        {
            static void Throw() => throw new PageVerifyException(false, "Page data is corrupt.");
            Throw();
        }

        Debug.Assert(processor.Position == processor.Buffer.Length - FooterTransactionIdOffset, "Page checksum was not processed to the correct position.");
    }
}

@jkotas
Copy link
Member

jkotas commented Feb 6, 2024

it's not reflection-based, quite the opposite. If you want an example:

If you use RuntimeHelpers.GetUninitializedObject, it is reflection-based, at least partially.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

Well...sure...but I wouldn't call a seriailizer "reflection-based" just because it uses Activator.CreateInstance either.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

Trying to prep the activation info for both Activator.CreateInstance and GetUninitializedObject at the same time does not look like the best approach. It increases the one-time cost of Activator.CreateInstance if nothing else.

I would be happy to refactor things so that GetUninitializedObject uses a different cached object on RuntimeType if that's what you prefer. That's a fairly trivial refactoring.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 6, 2024

There are other subtle differences between Activator.CreateInstance and GetUninitializedObject, e.g. in how static constructor triggers are handled.

I don't believe my approach changes how that works, but if I missed something about that, let me know.

@mikernet mikernet marked this pull request as ready for review February 6, 2024 23:10
@mikernet mikernet force-pushed the feat-faster-uninit-objects branch 2 times, most recently from 63aa3ed to 069618c Compare February 7, 2024 00:12
@mikernet mikernet force-pushed the feat-faster-uninit-objects branch from 069618c to 9a1dc54 Compare February 7, 2024 01:05
@AustinWise
Copy link
Contributor

Empty or near empty constructor costs nothing given the total costs involved in reflection-based serialization.

Just FYI, it's not reflection-based, quite the opposite. If you want an example:

If you are going to make all classes that participate in serialization implement an interface, you could also make them implement a static abstract method for creating new object. For example:

https://gist.github.com/AustinWise/e6a7ef2b72eded2a947d261ec8ee6422

Obviously this gets more complicated when inheritance is involved and requires changing all existing implementations of the interface. This PR prompted me to reflect on some similar serialization systems I maintain and how newer C# features could be used to replaced GetUninitializedObject.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 7, 2024

@AustinWise One of the main motivations for our lib is eliminating the need to maintain separate serialize/deserialize logic - as per the example, there is one Process method that handles both, so you don't need to worry about mismatched behavior. The static interface wouldn't eliminate any of the boilerplate in the message you replied to - it would just enforce the signature of CreateFromReader (which admittedly is a bit less error-prone and nicer than having to wing the signature, so if I did go that route, that's probably what I would do).

@mikernet
Copy link
Contributor Author

mikernet commented Feb 7, 2024

And yeah, inheritance, which we do use quite extensively in our DB engine, would get quite squirrely with a static factory method, interface or not. We have really nice inheritance usability right now.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 7, 2024

Just realized the boilerplate and all the inheritance issues can be avoided by splitting the instance creation from the deserialization and doing some CRGP (curiously recurring generic pattern). Hmmm...

interface IBinaryProcessable<T> : IBinaryProcessable where T : IBinaryProcessable<T>
{
    public static abstract T CreateInstance();
}

interface IBinaryProcessable
{
    public void Process(ref BinaryProcessor processor);
}

ref struct BinaryProcessor
{
    public void Process(ref int value)
    {
        value = 5;
    }
    
    public T Read<T>() where T : IBinaryProcessable<T>
    {
        T value = T.CreateInstance();
        value .Process(ref this);
        return value ;
    }
    
    public void Write<T>(T value) where T : IBinaryProcessable<T>
    {
        if (!typeof(T).IsValueType && value != null && value.GetType() != typeof(T))
            throw new ArgumentException($"Cannot process subclasses of a processable object directly. Consider configuring a '{typeof(MultiTypeProcessor<,>)}'.");
            
        value.Process(ref this);
    }
}

Then DTOs with inheritance can be implemented same as I have it now. Each type just needs to provide a factory that creates an empty instance to be used for deserialization. NRT warnings might still need to be suppressed if you want to provide an instance that doesn't have all the fields initialized (i.e. to avoid extraneous allocations that would need to be replaced during deserialization), but that's not terrible, at all, given the benefits. Subtypes can reuse the base type's processing logic without any fuss:

public class BaseDto : IBinaryProcessable<BaseDto>
{
    public int Value;
    
    static BaseDto IBinaryProcessable<BaseDto>.CreateInstance() => new BaseDto();
    
    void IBinaryProcessable.Process(ref BinaryProcessor processor) => Process(ref processor);
    
    protected virtual void Process(ref BinaryProcessor processor)
    {
        processor.Process(ref Value);
    }
}

public class DerivedDto : BaseDto, IBinaryProcessable<DerivedDto>
{
    public int Value2;
    
    static DerivedDto IBinaryProcessable<DerivedDto>.CreateInstance() => new DerivedDto();

    protected override void Process(ref BinaryProcessor processor)
    {
        base.Process(ref processor);
        processor.Process(ref Value2);
    }
}

var binaryProcessor = new BinaryProcessor();

var baseDto = binaryProcessor.Read<BaseDto>();
var derivedDto = binaryProcessor.Read<DerivedDto>();

binaryProcessor.Write<BaseDto>(derivedDto); // Exception, as expected

So I guess that's the new setup 😅

@mikernet
Copy link
Contributor Author

mikernet commented Feb 7, 2024

You can even do:

public interface IBinaryProcessable<T> : IBinaryProcessable where T : IBinaryProcessable<T>
{
    // Use this instead of catching MissingMethodException when trying to create an instance to avoid
    // catching MissingMethodExceptions that the constructor might throw.
    private static readonly bool _noDefaultCtor = !typeof(T).IsValueType && typeof(T).GetConstructor(Type.EmptyTypes) == null;

    public static virtual T CreateInstance()
    {
        if (typeof(T).IsValueType)
            return default!

        if (_noDefaultCtor)
            throw new BinaryProcessException($"Type '{typeof(T)}' does not have a public default constructor. Override '{typeof(IBinaryProcessable<T>)}.{nameof(CreateInstance)}()' to provide a factory.");

        return Activator.CreateInstance<T>();
    }
}

To avoid even more boilerplate if you just want to use a default ctor. Love it. Thanks for the ideas that led to this revelation 🥳

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@jkotas jkotas merged commit 9f3a439 into dotnet:main Feb 8, 2024
@mikernet
Copy link
Contributor Author

mikernet commented Feb 8, 2024

@jkotas Somewhat related: Activator.CreateInstance<T>() has the potential for almost 2.5x speed improvement on CoreCLR using similar techniques to what I did here: https://github.com/Singulink/Singulink.Reflection.ObjectFactory#benchmarks

@mikernet mikernet deleted the feat-faster-uninit-objects branch February 8, 2024 14:14
@jkotas
Copy link
Member

jkotas commented Feb 8, 2024

Your library does not implement the exact same behavior as Activator.CreateInstance (e.g. it does not wrap the exceptions in TargetInvocationException, the methods that take arguments do not perform the same set of argument conversions).

I can believe that the perf of reflection APIs can be improved, but it is likely a lot less than what you think because of compatibility. We know that you can build much faster reflection APIs if you can change the behavior. In .NET 8, we have done that by introducing MethodInvoker and ConstructorInvoker that are designed for the dependency injection use case, but do not have the exact same behavior as the classic reflection APIs.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 8, 2024

I was just talking about generic Activator.CreateInstance<T>() with no parameters. The API proposal mentioned above to create activator delegates by delegate signature (similar to my method) will have a lot of fans for that, I'm sure :) But yeah, I forgot about the exception wrapping. Going to try replicating that behavior just out of curiosity.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 8, 2024

Yeah, adding a try/catch/throw completely eliminates any benefit, haha.

@mikernet
Copy link
Contributor Author

mikernet commented Feb 9, 2024

@jkotas

In .NET 8, we have done that by introducing MethodInvoker and ConstructorInvoker

Didn't know about that before you mentioned it. Wow does that ever make a big difference on AOT.

Before:

| ObjectPrivateCtor_ObjectFactory          |  77.0504 ns | 
| ObjectPrivateCtor_ObjectFactory_Delegate |  76.3781 ns |
| ObjectWithParam_ObjectFactory_Delegate   | 125.5081 ns |
| ObjectFactory_StructWithParams_Delegate  | 193.1942 ns |

After:

| ObjectPrivateCtor_ObjectFactory          |  24.5102 ns |
| ObjectPrivateCtor_ObjectFactory_Delegate |  24.0892 ns |
| ObjectWithParam_ObjectFactory_Delegate   |  32.2363 ns |
| StructWithParams_ObjectFactory_Delegate  |  48.9843 ns |

Thanks ❤️ v2.1 release incoming just 24h after v2.0 😅

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Reflection community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants