-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Write barrier optimizations for ARM64 Windows #22003
Write barrier optimizations for ARM64 Windows #22003
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct to me. Nice perf wins discovered too.
It looks like you changed from two compare/branches to two compares and one branch, not the other way round... |
@AndyAyersMS Sorry for the confusion. My comment was in reference to the ways my change digresses from the optimizations done on ARM64 Unix. With that in mind, the cmp+ccmp in The checked write barrier, where the cmp+ccmp thing was also done, has not been modified in this way. It will be interesting to get data about whether the address tends to be within the bounds of the heap, but until then I'll leave it alone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thanks
@dotnet-bot test Windows_NT arm64 Cross Checked Innerloop Build and Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
@@ -366,57 +466,58 @@ NotInHeap | |||
; if ([x14] == x15) goto end | |||
ldr x13, [x14] | |||
cmp x13, x15 | |||
beq shadowupdateend | |||
beq ShadowUpdateEnd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit - can you please fix alignment of the label?
@dotnet-bot test Windows_NT arm64 Cross Checked Innerloop Build and Test |
…rrier_updates_arm64 Write barrier optimizations for ARM64 Windows Commit migrated from dotnet/coreclr@9fb7676
This change is a step towards unification of the ARM64 write barrier logic between Windows and Unix. It brings over some of the changes that were done for Unix in #12334 such as using a literal pool to hold heap location/geometry information used in the barriers.
Parts of the code have been tweaked in pursuit of performance gains.
Sampling a write barrier-heavy test after these changes shows a ~7-12% decrease in the time spent in the barrier relative to the current, post-optimization version that's in use on Unix today.
Once this is in, I plan to port the deltas over to Unix so that the barriers will be in sync. I left the CLR writewatch and manually managed card bundles stuff alone on Windows for now since it's not enabled yet, but I'll likely experiment with those in the near future and check in the remaining pieces after doing so.