-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[WIP] Further improve performance of Memory<T>.Span property getter #22829
Conversation
I'm not convinced it is. This is the only way today to use a pointer/length with Memory, and we're already seeing developers relying on that. It of course still might be worth a trade-off here, but we should still aim not to regress it. |
// it occupies the same position as String.Length and SzArray.Length; but because | ||
// this value is negative and those values are not, we can easily determine whether | ||
// any given Memory<T> is backed by a variable-length object (string, array) or | ||
// a fixed-length object (MemoryManager). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't the runtime decide to rearrange the fields in this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make it a single IntPtr field per your suggestion this should address the rearrangement concern within the base class. The runtime cannot rearrange fields between a base class and its derived types, so we're good there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, this padding is penalizing MemoryManager
path to make the other path a bit faster. And it is pretty hacky change. I do not think this is a good tradeoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had believed the most common usage of Memory<T>
was to be backed by String
or SzArray
. (In fact, I had believed this to be overwhelmingly so.) If this assumption was invalid, then yeah, we shouldn't take the bulk of this PR.
There is still one thing we can take advantage of if we wish: in 32-bit procs, String
and SzArray
have the exact same layout. This means we could condition the if (typeof(T) == typeof(char))
check on 64-bit only, which should give a perf boost to 32-bit procs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had believed the most common usage of Memory was to be backed by String or SzArray
I think that's true for workloads like those in ASP.NET. But I expect that won't be true for certain apps or domains, in particular those that interop with native memory as the lingua franca. The improvements cited for the strings/arrays cases appears to generally be < 10%, whereas the regression for the MemoryManager case looks to be closer to ~40%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The improvements cited for the strings/arrays cases appears to generally be < 10%, whereas the regression for the MemoryManager case looks to be closer to ~40%.
There are three very specific improvements this PR provides when compared to baseline: (a) improving the performance for String/SzArray, (b) reducing the codegen size by around ~20%, and (c) reducing stack spillage in the calling method due to needing to have temporary Span instances on the stack.
The trade-off for gaining (b) and (c) was to introduce the indirection in the MemoryManager code path. (And remember: even with this regression we're still within 5% of 2.1 RTM since we already made substantial improvements in this code path between 2.1 and 3.0 preview earlier.)
It is possible to make this PR so that we give up (b) and (c). This will improve perf for all three cases: String / SzArray / and MemoryManager. The trade-off is that we reduce the codegen size by only an insignificant amount (~3 - 5%), and we'd still have the dummy -1
field in MemoryManager.
If this still isn't an appropriate trade-off, that's fine. We can just close the PR - no harm. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I shouldn't have said "only an insignificant amount (~3 - 5%)" above. Since this property getter is marked with aggressive inlining it's not unreasonable to think it'd be expanded multiple times in the calling method, so even small improvements would add up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @GrabYourPitchforks. If I'm understanding you correctly, you're saying there's a version of this change that improves the throughout for strings and arrays equivalent to this one, improves rather than regresses the throughput for MemoryManager, and still shrinks code size some albeit not as much as in this change? That sounds better to me, personally.
private readonly int _dummy = -1; | ||
#if BIT64 | ||
private readonly int _dummyPadding = 0; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than an ifdef with padding, could this just be a -1 IntPtr?
/// it means the caller only has to allocate space for a single Int32 on the stack. | ||
/// </summary> | ||
[MethodImpl(MethodImplOptions.NoInlining)] | ||
internal ByReference<T> GetSpanAndDeconstruct(out int spanLength) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced it's ok to penalize the MemoryManager case in this way.
@jkotas @stephentoub Is it worthwhile for me to experiment with a change where we can make all three scenarios faster, but at the cost of increasing the size of Edit: Jan, you mentioned that smuggling a fake Length field through this type was hacky and you seemed unsettled by it. Is there something that would help ease your concerns here? It's also ok for us to say "no - everything is fine as is." :) |
I do not think that this optimization is worth looking into. I think there needs to be a ballance between performance and understandability/maintanability of the code. The fake length field does not strike the right ballance for me. |
Thanks both for the feedback! I'm going to go ahead and close this since the Length trick really was the primary mechanism to increase perf, and if that doesn't sit well with folks then this probably isn't worth pursuing further. |
Working through further experiments with minimizing the overhead of the
{ReadOnly}Memory<T>.Span
property getter. Trying to minimize the number of comparisons performed, minimize the stack space required, and utilize register allocations more effectively.Results from an x64 run are below. I didn't perform an x86 run yet.
Codegen for the x64 implementation of the
ReadOnlyMemory<char>.Span
property getter is also around 20% smaller: 118 bytes before, 95 bytes after.The regression in the from
MemoryManager<T>
runs isn't surprising due to the extra call now required here. This was a deliberate trade-off: ifMemoryManager<T>
usage is expected to be rare, deprioritize it and optimize the codegen for the more common cases. Since we already made improvements in this code path between 2.1 and 3.0 (see #20386), we're still within around 5% of the 2.1 performance numbers.Marked WIP because it's not fully implemented (haven't touched
Memory<T>.Span
yet) or tested.