-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PGO: Profile Buffer.Memmove's length #96311
Conversation
…Tests.WaitAsyncChain (dotnet#96187)" This reverts commit 476a455.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsCloses #84192 This PR adds a new generic way to profile integer values for various needs and uses it to profile [MethodImpl(MethodImplOptions.NoInlining)]
void Test(ReadOnlySpan<int> src, Span<int> dst) => src.CopyTo(dst); Current Tier1 codegen: ; Assembly listing for method Program.Test
sub rsp, 40
mov rax, bword ptr [rcx]
mov r8d, dword ptr [rcx+0x08]
mov rcx, bword ptr [rdx]
mov edx, dword ptr [rdx+0x08]
cmp r8d, edx
ja SHORT G_M34050_IG04
shl r8, 2
mov rdx, rax
call [System.Buffer:Memmove(byref,byref,ulong)]
nop
add rsp, 40
ret
G_M34050_IG04: ;; offset=0x0029
call [System.ThrowHelper:ThrowArgumentException_DestinationTooShort()]
int3 New codegen if this method is mostly invoked for ; Assembly listing for method Program.Test
sub rsp, 40
vzeroupper
mov rax, bword ptr [rcx]
mov ecx, dword ptr [rcx+0x08]
mov r10, bword ptr [rdx]
mov edx, dword ptr [rdx+0x08]
cmp ecx, edx
ja SHORT G_M34050_IG06
mov r8d, ecx
shl r8, 2
cmp r8, 40
jne SHORT G_M34050_IG05
;;
;; unroll to perform 2 SIMD moves for length = 40
vmovdqu ymm0, ymmword ptr [rax]
vmovdqu xmm1, xmmword ptr [rax+0x18]
vmovdqu ymmword ptr [r10], ymm0
vmovdqu xmmword ptr [r10+0x18], xmm1
G_M34050_IG04:
add rsp, 40
ret
G_M34050_IG05:
mov rcx, r10
mov rdx, rax
call [System.Buffer:Memmove(byref,byref,ulong)]
jmp SHORT G_M34050_IG04
G_M34050_IG06:
call [System.ThrowHelper:ThrowArgumentException_DestinationTooShort()]
int3 Open questions/items:
The current impl is a bit fragile due to lack of context-sensitive profiling but it's not a subject to change as part of this PR. The newly added infra can be used for other things we might want to profile in tier0.
|
ffd2fdf
to
451b138
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
/azp run runtime-coreclr pgo, runtime-coreclr libraries-pgo, runtime-coreclr pgostress |
Azure Pipelines successfully started running 3 pipeline(s). |
PTAL @AndyAyersMS @dotnet/jit-contrib when you have time I didn't change the PGO format (I am re-using type-histogram). I am using |
/azp run runtime-coreclr pgo, runtime-coreclr libraries-pgo, runtime-coreclr pgostress |
Azure Pipelines successfully started running 3 pipeline(s). |
ValueHistogramIntCount = (DescriptorMin * 8) | FourByte | AlignPointer, | ||
ValueHistogramLongCount = (DescriptorMin * 8) | EightByte, | ||
ValueHistogram = (DescriptorMin * 9) | EightByte, | ||
GetLikelyValue = (DescriptorMin * 10) | EightByte, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need proper handling in dotnet-pgo and crossgen2. Also, GetLikelyValue
should either be removed or should be computed by crossgen2.
cc @davidwrighton, this is once again going to break backwards compatibility of a .mibc produced by a new dotnet-pgo and consumed by an old crossgen2. Do we need to do anything special here? Also, do we need to bump R2R minor version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakobbotsch I've removed the compressed GetLikelyValue
for now, will implement it separately.
What kind of additional handling for dotnet-pgo and crossgen2 we talk about besides the compressed one?
I've validated that dotnet-pgo works properly, e.g.:
(dotnet-pgo dump mibc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you plan to implement it separately, it makes sense to add it now (and implementing the minimum format-handling stuff for it) so we don't need to update the format twice.
What kind of additional handling for dotnet-pgo and crossgen2 we talk about besides the compressed one?
Seems like PgoFormat.Merge
needs to be updated at least. Looks like the encoder/decoder (EncodePgoData
/ParsePgoData
) might be ok since this reuses EightByte
. That's the one where backwards compatibility was broken when I added MethodHandle
last time, so maybe this change doesn't break backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I put my request changes in to cover this sort of thing. I think you might be OK, but what you need to do is validate that the new instrumentation doesn't cause problems with old dotnet-pgo/crossgen2. If it does, we can have another discussion, but the first thing to see is if it doesn't. The MIBC format is designed to allow new things to appear, but this only works in a subset of ways that it is possible to add new content to the file, and @jakobbotsch didn't follow those restrictions with the last PGO data format change he made. (Honestly, though, that was my fault, as I'd forgotten about them.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and when you do this validation, please write down what scenarios you tested. I'll probably want to add a few weird wrinkles to it all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated dotnet-pgo (merge and dump) and SPMI to be aware of value probes but it looked like they were fine as is too (didn't crash).
also, I've done some testing that @davidwrighton requested and published my results and steps via mail
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
… into generic-probing-memmove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just two minor nits on comments/dump text.
You need to change the JIT-EE GUID. Do this need SPMI changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do some verification to find out if this sort of profiling data breaks old copies of dotnet-pgo. We have customers who do not always use the latest dotnet-pgo on their data, and it would be good to be compatible.
Co-authored-by: Andy Ayers <andya@microsoft.com>
Yep, made changes in b06db31 and updated the guid. |
Addressed internally |
Closes #84192
This PR adds a new generic way to profile integer values for various needs and, as an example, uses it to profile
Buffer.Memmove
's size in order to unroll it for the most popular size, example:Buffer.Memmove(dst, src, len); // non-constant len
to:
if Instrumented Tier0 says that
len
is usually 42.Codegen example:
Current Tier1 codegen:
New codegen if this method is invoked for
src
being 10 integers (10*4=40 bytes) according to PGO data:Benchmark:
Sadly, for now it requires R2R=0 to work properly, because:
I have some workarounds for it in mind for future PRs.
SpanHelpers.SequenceEqual
(to unroll fast path)String.Equals
(to unroll fast path)a / b (divisor's value)