-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve x86 inline object allocations #19514
Improve x86 inline object allocations #19514
Conversation
0xdaryl
commented
May 17, 2024
•
edited
Loading
edited
- Call NoZeroInit anewarray helper when appropriate on x86
- Misc. x86 allocation path improvements
- Remove obsolete TR_EnableNewAllocationProfiling code
- Misc. x86 allocation path improvements
- Cleanup x86 object initialization path
- Move x86 inlined allocation verification out of line
- Move off-heap dataAddr allocation into a separate function
- Use ScratchRegisterManager in x86 inline allocations
- Misc. code cleanup for readability
- Remove allocated object cache line alignment code on Power and x86
- Remove deprecated shouldAlignTLHAlloc() checks on TR::Nodes
- Compare variable array sizes as 32-bits for x86 inline allocations
- Add knobs to disable x86 object allocation features
- Misc. readability and formatting improvements to x86 inline allocation
- Do not perform zero initialization of TLH objects when batch clearing enabled
cad8c21
to
42f5414
Compare
Jenkins test sanity.functional,sanity.openjdk xlinux,win jdk21 |
#ifdef J9VM_GC_NON_ZERO_TLH | ||
// If we can skip zero init, and it is not outlined new, we use the new TLH | ||
// same logic also appears later, but we need to do this before generate the helper call | ||
// | ||
if (node->canSkipZeroInitialization() && !comp->getOption(TR_DisableDualTLH) && !comp->getOptions()->realTimeGC()) | ||
if (node->canSkipZeroInitialization() && (enableTLHBatchClearing || !comp->getOption(TR_DisableDualTLH)) && !comp->getOptions()->realTimeGC()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the code guarded by this condition only handle new
and newarray
? For example, would we allocate anewarray
and multianewarray
differently somehow ? Maybe those don't reach this evaluator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multianewarray initialization doesn't reach here, but anewarray does. I'm not sure why it isn't included here (it never was based on the long history of this code), but Z and P both support it. I can add a case for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 6ddd240
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay thanks.
Should the commit message say "Do not perform zero initialization of TLH objects when batch clearing enabled" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, fixed.
Jenkins test sanity.functional,sanity.openjdk xlinux,win jdk21 |
I think this PR build filled the xlinux machines with cores and took them offline. |
248b639
to
37ada16
Compare
37ada16
to
e532c6c
Compare
/** | ||
* @param[in] eaxReal : address of the newly allocated object | ||
* @param[in] nextTLHAllocReg : the new TLH alloc pointer after the object is allocated | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth clearly stating what the difference is with respect to genHeapAlloc
and the conditions under which either one gets used ? Just so there is more clarity (in addition to the re-naming and doc you added already, that are all steps in the right direction) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will rename genHeapAlloc
to genHeapAllocForDiscontiguousArraysOrRealtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in latest force push.
I had reviewed this before, but added one more minor comment/request. I have also asked @ymanton to review this change and will consider a merge after you reply to the comment and Younes gives his review approval. |
… enabled Separates the notion of batch clearing from dual TLH on x86. Assumes batch clearing is enabled via the TR_EnableBatchClear environment variable. Signed-off-by: Daryl Maier <maier@ca.ibm.com>
Signed-off-by: Daryl Maier <maier@ca.ibm.com>
* Add an environment variable TR_DisableAllocationAlignment that disables cache line alignment of newly allocated objects * Guard inline TLH prefetching code with CodeGenerator TLH prefetch enablement check Signed-off-by: Daryl Maier <maier@ca.ibm.com>
In some cases, 64-bit comparisons were used which were inadvertently forcing the out-of-line allocation path to be taken. This was still functionally correct, but poor for performance. Ensure a 32-bit comparison is used. Signed-off-by: Daryl Maier <maier@ca.ibm.com>
No longer set nor used anywhere in OMR or OpenJ9. Signed-off-by: Daryl Maier <maier@ca.ibm.com>
The code has been disabled on Power for many years. On x86, its performance was evaluated recently on modern Intel architectures and found to be detrimental ito performance on allocation-intensive workloads. The code is no longer useful and adds to technical debt, so remove it. Signed-off-by: Daryl Maier <maier@ca.ibm.com>
Signed-off-by: Daryl Maier <maier@ca.ibm.com>
Signed-off-by: Daryl Maier <maier@ca.ibm.com>
This improves readability and code density of the inline allocation sequence. Signed-off-by: Daryl Maier <maier@ca.ibm.com>
This improves readability and code density. Signed-off-by: Daryl Maier <maier@ca.ibm.com>
* Remove dead code * Rename genZeroInitObject|genZeroInitObject2 to genZeroInitEntireObject|genZeroInitEntireObject2 for clarity of purpose Signed-off-by: Daryl Maier <maier@ca.ibm.com>
* Create a convenience function insertAllocationPrefetch() for inserting prefetches * Rename genHeapAlloc2 to genHeapAllocNoArraylets * Rename genHeapAlloc to genHeapAllocForDiscontiguousArraysOrRealtime * Rename some variables for clarity * Misc dead code removal and reformatting Signed-off-by: Daryl Maier <maier@ca.ibm.com>
Signed-off-by: Daryl Maier <maier@ca.ibm.com>
* Distinguish between "arraylet" and "hybrid arraylet" in naming where appropriate * Various refactorings for readability and code density * Eliminate obsolete comments Signed-off-by: Daryl Maier <maier@ca.ibm.com>
e532c6c
to
c098ff4
Compare
Jenkins compile xlinux jdk17 |
zeroInitScratchReg, cg); | ||
generateRegImmInstruction(TR::InstOpCode::ADD8RegImms, node, segmentReg, 8, cg); | ||
generateRegImmInstruction(TR::InstOpCode::SUB8RegImms, node, numBytesToZeroInitReg, 8, cg); | ||
generateRegImmInstruction(TR::InstOpCode::CMP8RegImms, node, numBytesToZeroInitReg, 0, cg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the cmp necessary here or does the sub update the flags already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this particular branch (JG, which does not rely on the carry flag) the CMP to 0 can be eliminated because the relevant flags will have been set up already by the SUB. I will change it, but I want to re-run the tests (internally) though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing this change internally on x86-64 Linux with sanity and openjdk was successful. I've amended and rebased the commits.
Avoid REP STOS zero initialization for arrays whose length is below a prescribed threshold checked at runtime. Use faster GPR stores instead. Move REP STOS initialization out of line. Signed-off-by: Daryl Maier <maier@ca.ibm.com>
If an anewarray allocation must be performed out-of-line, be sure to call the NoZeroInit version if zero initialization can be avoided. Signed-off-by: Daryl Maier <maier@ca.ibm.com>
c098ff4
to
b767300
Compare
Jenkins test sanity.functional,sanity.openjdk xlinux,win jdk21 |
I'll wait for @ymanton to formally approve this, and the checks to pass, before I merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Going to take a look at this today. Tagging @ehsankianifar as he is working on fixing the issue we are seeing on Z. |
The bulk of the work in this PR addresses technical debt issues, as this code became very difficult to understand and make changes to over time. The code was refactored, dead code eliminated, comments added, more meaningful variable and function names created, etc. Hopefully it is in a better state now. In terms of pure enhancements, the main contribution was to improve the performance of zero initializing shorter arrays with better instruction sequences, and providing knobs to turn off features like TLH prefetching. |