Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

[WIP] Further improve performance of Memory<T>.Span property getter #22829

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions src/System.Private.CoreLib/shared/System/Buffers/MemoryManager.cs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
// See the LICENSE file in the project root for more information.

using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

namespace System.Buffers
{
Expand All @@ -11,6 +12,19 @@ namespace System.Buffers
/// </summary>
public abstract class MemoryManager<T> : IMemoryOwner<T>, IPinnable
{
#pragma warning disable CS0414
// It's imperative that this be the first field in the MemoryManager<T> instance
// and that it have a negative value. Because it's the first field in the object,
// it occupies the same position as String.Length and SzArray.Length; but because
// this value is negative and those values are not, we can easily determine whether
// any given Memory<T> is backed by a variable-length object (string, array) or
// a fixed-length object (MemoryManager).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't the runtime decide to rearrange the fields in this class?

Copy link
Member Author

@GrabYourPitchforks GrabYourPitchforks Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make it a single IntPtr field per your suggestion this should address the rearrangement concern within the base class. The runtime cannot rearrange fields between a base class and its derived types, so we're good there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, this padding is penalizing MemoryManager path to make the other path a bit faster. And it is pretty hacky change. I do not think this is a good tradeoff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had believed the most common usage of Memory<T> was to be backed by String or SzArray. (In fact, I had believed this to be overwhelmingly so.) If this assumption was invalid, then yeah, we shouldn't take the bulk of this PR.

There is still one thing we can take advantage of if we wish: in 32-bit procs, String and SzArray have the exact same layout. This means we could condition the if (typeof(T) == typeof(char)) check on 64-bit only, which should give a perf boost to 32-bit procs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had believed the most common usage of Memory was to be backed by String or SzArray

I think that's true for workloads like those in ASP.NET. But I expect that won't be true for certain apps or domains, in particular those that interop with native memory as the lingua franca. The improvements cited for the strings/arrays cases appears to generally be < 10%, whereas the regression for the MemoryManager case looks to be closer to ~40%.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The improvements cited for the strings/arrays cases appears to generally be < 10%, whereas the regression for the MemoryManager case looks to be closer to ~40%.

There are three very specific improvements this PR provides when compared to baseline: (a) improving the performance for String/SzArray, (b) reducing the codegen size by around ~20%, and (c) reducing stack spillage in the calling method due to needing to have temporary Span instances on the stack.

The trade-off for gaining (b) and (c) was to introduce the indirection in the MemoryManager code path. (And remember: even with this regression we're still within 5% of 2.1 RTM since we already made substantial improvements in this code path between 2.1 and 3.0 preview earlier.)

It is possible to make this PR so that we give up (b) and (c). This will improve perf for all three cases: String / SzArray / and MemoryManager. The trade-off is that we reduce the codegen size by only an insignificant amount (~3 - 5%), and we'd still have the dummy -1 field in MemoryManager.

If this still isn't an appropriate trade-off, that's fine. We can just close the PR - no harm. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I shouldn't have said "only an insignificant amount (~3 - 5%)" above. Since this property getter is marked with aggressive inlining it's not unreasonable to think it'd be expanded multiple times in the calling method, so even small improvements would add up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @GrabYourPitchforks. If I'm understanding you correctly, you're saying there's a version of this change that improves the throughout for strings and arrays equivalent to this one, improves rather than regresses the throughput for MemoryManager, and still shrinks code size some albeit not as much as in this change? That sounds better to me, personally.

private readonly int _dummy = -1;
#if BIT64
private readonly int _dummyPadding = 0;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than an ifdef with padding, could this just be a -1 IntPtr?

#pragma warning restore CS0414

/// <summary>
/// Returns a <see cref="System.Memory{T}"/>.
/// </summary>
Expand All @@ -21,6 +35,24 @@ public abstract class MemoryManager<T> : IMemoryOwner<T>, IPinnable
/// </summary>
public abstract Span<T> GetSpan();

/// <summary>
/// A special helper method which serves as a stub for the call to GetSpan(). It's
/// non-virtual and forbids inlining to keep the call site as compact as possible.
/// Additionally, since the span is deconstructed rather than returned by value,
/// it means the caller only has to allocate space for a single Int32 on the stack.
/// </summary>
[MethodImpl(MethodImplOptions.NoInlining)]
internal ByReference<T> GetSpanAndDeconstruct(out int spanLength)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced it's ok to penalize the MemoryManager case in this way.

{
// Returns ByReference<T> because the C# compiler sees the 'out' parameter and
// reasons that returning 'ref T' might lead to a reference escaping a local
// scope. The ByReference<T> return type prevents this false positive error.

Span<T> span = GetSpan();
spanLength = span.Length;
return new ByReference<T>(ref MemoryMarshal.GetReference(span));
}

/// <summary>
/// Returns a handle to the memory that has been pinned and hence its address can be taken.
/// </summary>
Expand Down
115 changes: 89 additions & 26 deletions src/System.Private.CoreLib/shared/System/ReadOnlyMemory.cs
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,14 @@

using Internal.Runtime.CompilerServices;

#if BIT64
using nuint = System.UInt64;
using nint = System.Int64;
#else
using nuint = System.UInt32;
using nint = System.Int32;
#endif

namespace System
{
/// <summary>
Expand Down Expand Up @@ -225,34 +233,84 @@ public unsafe ReadOnlySpan<T> Span
get
{
ref T refToReturn = ref Unsafe.AsRef<T>(null);
int lengthOfUnderlyingSpan = 0;
uint desiredLength = 0;

// Copy this field into a local so that it can't change out from under us mid-operation.

object tmpObject = _object;

if (tmpObject != null)
{
if (typeof(T) == typeof(char) && tmpObject.GetType() == typeof(string))
uint lengthOfUnderlyingSpan;

// Our underlying object can be a string (only when T = char), an SzArray U[] where
// T = U or T and U are convertible (e.g., int[] <-> uint[]), or a MemoryManager<T>.
// On 32-bit platforms, string and array have the exact same shape: length immediately
// followed by the string / array data. On 64-bit platforms, arrays have one extra
// 32-bit padding value between the length field and the start of the data. This means
// that we only need to special-case string when T is char _and_ we're running 64-bit.
//
// We're going to optimistically treat the underlying object as a string or an array
// first, since it's almost always the case that this reflects reality anyway.

if (IntPtr.Size == 8 && typeof(T) == typeof(char))
{
// Special-case string since it's the most common for ROM<char>.
// Optimistically assume it's a string for now. If this isn't the case, 'refToReturn'
// will point to the 32-bit padding field in the SzArray or MemoryManager<T>.

refToReturn = ref Unsafe.As<char, T>(ref Unsafe.As<string>(tmpObject).GetRawStringData());
lengthOfUnderlyingSpan = Unsafe.As<string>(tmpObject).Length;
}
else if (RuntimeHelpers.ObjectHasComponentSize(tmpObject))
{
// We know the object is not null, it's not a string, and it is variable-length. The only
// remaining option is for it to be a T[] (or a U[] which is blittable to T[], like int[]
// and uint[]). Otherwise somebody used private reflection to set this field, and we're not
// too worried about type safety violations at this point.

// 'tmpObject is T[]' below also handles things like int[] <-> uint[] being convertible
Debug.Assert(tmpObject is T[]);
// Read the length field of the string or SzArray (will be non-negative) or the
// _dummy field of the MemoryManager<T> (will be negative). We do it via the
// GetRawData() extension method rather than unsafe casting to string or SzArray
// and accessing its Length property because the JIT might try to out-clever us
// and assume Length can never return a non-negative value, which would break
// the checks later in this method.

lengthOfUnderlyingSpan = Unsafe.As<byte, uint>(ref tmpObject.GetRawData());

// If our initial guess that this was a string is correct, skip straight to the
// error checking at the end of the method.

if (tmpObject.GetType() == typeof(string))
{
goto CheckOffsetsAndReturn;
}

refToReturn = ref Unsafe.As<byte, T>(ref Unsafe.As<T[]>(tmpObject).GetRawSzArrayData());
lengthOfUnderlyingSpan = Unsafe.As<T[]>(tmpObject).Length;
// Otherwise, assume it's an SzArray instead of a string, and skip over the
// 32-bit padding. If our guess is still wrong and the underlying object is
// actually a MemoryManager<T>, this will point to the first field of the derived
// MemoryManager<T> instance, or it will point just past the end of the object
// if no fields are present in derived types. In either case the GC tracks it properly.

refToReturn = ref Unsafe.AddByteOffset(ref refToReturn, 4);

#if DEBUG
// Make sure the layout of string / SzArray didn't change on us.

Debug.Assert(
Unsafe.AreSame(
ref Unsafe.As<T, byte>(ref refToReturn),
ref Unsafe.As<Array>(tmpObject).GetRawSzArrayData()),
"We miscalculated where the start of data of the SzArray is.");
#endif
}
else
{
// T != char so we don't need to check for string, _or_ we're in a 32-bit proc
// where string and SzArray have the same layout. In either case optimistically
// assume the underlying object is SzArray. See comments above for why code is
// written this way.

refToReturn = ref Unsafe.As<byte, T>(ref Unsafe.As<Array>(tmpObject).GetRawSzArrayData());
lengthOfUnderlyingSpan = Unsafe.As<byte, uint>(ref tmpObject.GetRawData());
}

// If the underlying object really was a MemoryManager<T>, the "length" we read earlier
// will actually read the _dummy field on the base class, which is set to a negative
// value. Since string and SzArray can't have a negative length, this serves as a very
// inexpensive check to see if a MemoryManager<T> is in use.

if ((int)lengthOfUnderlyingSpan < 0)
{
// We know the object is not null, and it's not variable-length, so it must be a MemoryManager<T>.
// Otherwise somebody used private reflection to set this field, and we're not too worried about
Expand All @@ -261,23 +319,29 @@ public unsafe ReadOnlySpan<T> Span
// constructor or other public API which would allow such a conversion.

Debug.Assert(tmpObject is MemoryManager<T>);
Span<T> memoryManagerSpan = Unsafe.As<MemoryManager<T>>(tmpObject).GetSpan();
refToReturn = ref MemoryMarshal.GetReference(memoryManagerSpan);
lengthOfUnderlyingSpan = memoryManagerSpan.Length;
refToReturn = ref Unsafe.As<MemoryManager<T>>(tmpObject).GetSpanAndDeconstruct(out int tempLengthOfUnderlyingSpan).Value;
lengthOfUnderlyingSpan = (uint)tempLengthOfUnderlyingSpan;
}

CheckOffsetsAndReturn:

// If the Memory<T> or ReadOnlyMemory<T> instance is torn, this property getter has undefined behavior.
// We try to detect this condition and throw an exception, but it's possible that a torn struct might
// appear to us to be valid, and we'll return an undesired span. Such a span is always guaranteed at
// least to be in-bounds when compared with the original Memory<T> instance, so using the span won't
// AV the process.

int desiredStartIndex = _index & RemoveFlagsBitMask;
int desiredLength = _length;
nuint desiredStartIndex = (uint)(_index & RemoveFlagsBitMask);
desiredLength = (uint)_length;

#if BIT64
// See comment in Span<T>.Slice for how this works.
if ((ulong)(uint)desiredStartIndex + (ulong)(uint)desiredLength > (ulong)(uint)lengthOfUnderlyingSpan)
// This is a modified version of the code in Span<T>.Slice to check for out-of-bounds access.
// Since we know all inputs are at most UInt32.MaxValue, signed arithmetic over the Int64
// domain will work since we can't integer overflow or underflow. This allows us to save a
// register compared to the normal Span<T>.Slice logic, which requires a temporary register
// to hold the result of the addition.

if ((nint)desiredStartIndex > (nint)(nuint)lengthOfUnderlyingSpan - (nint)(nuint)desiredLength)
{
ThrowHelper.ThrowArgumentOutOfRangeException();
}
Expand All @@ -288,11 +352,10 @@ public unsafe ReadOnlySpan<T> Span
}
#endif

refToReturn = ref Unsafe.Add(ref refToReturn, desiredStartIndex);
lengthOfUnderlyingSpan = desiredLength;
refToReturn = ref Unsafe.Add(ref refToReturn, (IntPtr)(void*)desiredStartIndex);
}

return new ReadOnlySpan<T>(ref refToReturn, lengthOfUnderlyingSpan);
return new ReadOnlySpan<T>(ref refToReturn, (int)desiredLength);
}
}

Expand Down