Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve x86 inline object allocations #19514

Merged

Conversation

0xdaryl
Copy link
Contributor

@0xdaryl 0xdaryl commented May 17, 2024

  • Call NoZeroInit anewarray helper when appropriate on x86
  • Misc. x86 allocation path improvements
  • Remove obsolete TR_EnableNewAllocationProfiling code
  • Misc. x86 allocation path improvements
  • Cleanup x86 object initialization path
  • Move x86 inlined allocation verification out of line
  • Move off-heap dataAddr allocation into a separate function
  • Use ScratchRegisterManager in x86 inline allocations
  • Misc. code cleanup for readability
  • Remove allocated object cache line alignment code on Power and x86
  • Remove deprecated shouldAlignTLHAlloc() checks on TR::Nodes
  • Compare variable array sizes as 32-bits for x86 inline allocations
  • Add knobs to disable x86 object allocation features
  • Misc. readability and formatting improvements to x86 inline allocation
  • Do not perform zero initialization of TLH objects when batch clearing enabled

@0xdaryl
Copy link
Contributor Author

0xdaryl commented May 17, 2024

Jenkins test sanity.functional,sanity.openjdk xlinux,win jdk21

#ifdef J9VM_GC_NON_ZERO_TLH
// If we can skip zero init, and it is not outlined new, we use the new TLH
// same logic also appears later, but we need to do this before generate the helper call
//
if (node->canSkipZeroInitialization() && !comp->getOption(TR_DisableDualTLH) && !comp->getOptions()->realTimeGC())
if (node->canSkipZeroInitialization() && (enableTLHBatchClearing || !comp->getOption(TR_DisableDualTLH)) && !comp->getOptions()->realTimeGC())
Copy link
Contributor

@vijaysun-omr vijaysun-omr May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the code guarded by this condition only handle new and newarray ? For example, would we allocate anewarray and multianewarray differently somehow ? Maybe those don't reach this evaluator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multianewarray initialization doesn't reach here, but anewarray does. I'm not sure why it isn't included here (it never was based on the long history of this code), but Z and P both support it. I can add a case for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6ddd240

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay thanks.

Should the commit message say "Do not perform zero initialization of TLH objects when batch clearing enabled" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, fixed.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented May 21, 2024

Jenkins test sanity.functional,sanity.openjdk xlinux,win jdk21

@pshipton
Copy link
Member

I think this PR build filled the xlinux machines with cores and took them offline.

@0xdaryl 0xdaryl changed the title Improve x86 inline object allocations WIP: Improve x86 inline object allocations May 24, 2024
@0xdaryl 0xdaryl force-pushed the enablebatchclear2.noalign.cleanup branch from 248b639 to 37ada16 Compare May 24, 2024 14:53
@0xdaryl 0xdaryl force-pushed the enablebatchclear2.noalign.cleanup branch from 37ada16 to e532c6c Compare May 24, 2024 20:34
@0xdaryl 0xdaryl changed the title WIP: Improve x86 inline object allocations Improve x86 inline object allocations May 27, 2024
/**
* @param[in] eaxReal : address of the newly allocated object
* @param[in] nextTLHAllocReg : the new TLH alloc pointer after the object is allocated
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth clearly stating what the difference is with respect to genHeapAlloc and the conditions under which either one gets used ? Just so there is more clarity (in addition to the re-naming and doc you added already, that are all steps in the right direction) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rename genHeapAlloc to genHeapAllocForDiscontiguousArraysOrRealtime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest force push.

@vijaysun-omr
Copy link
Contributor

I had reviewed this before, but added one more minor comment/request. I have also asked @ymanton to review this change and will consider a merge after you reply to the comment and Younes gives his review approval.

0xdaryl added 14 commits May 28, 2024 08:48
… enabled

Separates the notion of batch clearing from dual TLH on x86.  Assumes batch
clearing is enabled via the TR_EnableBatchClear environment variable.

Signed-off-by: Daryl Maier <maier@ca.ibm.com>
Signed-off-by: Daryl Maier <maier@ca.ibm.com>
* Add an environment variable TR_DisableAllocationAlignment that disables cache
  line alignment of newly allocated objects
* Guard inline TLH prefetching code with CodeGenerator TLH prefetch enablement check

Signed-off-by: Daryl Maier <maier@ca.ibm.com>
In some cases, 64-bit comparisons were used which were inadvertently forcing
the out-of-line allocation path to be taken.  This was still functionally
correct, but poor for performance.

Ensure a 32-bit comparison is used.

Signed-off-by: Daryl Maier <maier@ca.ibm.com>
No longer set nor used anywhere in OMR or OpenJ9.

Signed-off-by: Daryl Maier <maier@ca.ibm.com>
The code has been disabled on Power for many years.

On x86, its performance was evaluated recently on modern Intel architectures
and found to be detrimental ito performance on allocation-intensive workloads.

The code is no longer useful and adds to technical debt, so remove it.

Signed-off-by: Daryl Maier <maier@ca.ibm.com>
Signed-off-by: Daryl Maier <maier@ca.ibm.com>
Signed-off-by: Daryl Maier <maier@ca.ibm.com>
This improves readability and code density of the inline allocation sequence.

Signed-off-by: Daryl Maier <maier@ca.ibm.com>
This improves readability and code density.

Signed-off-by: Daryl Maier <maier@ca.ibm.com>
* Remove dead code
* Rename genZeroInitObject|genZeroInitObject2 to
  genZeroInitEntireObject|genZeroInitEntireObject2 for clarity of purpose

Signed-off-by: Daryl Maier <maier@ca.ibm.com>
* Create a convenience function insertAllocationPrefetch() for inserting prefetches
* Rename genHeapAlloc2 to genHeapAllocNoArraylets
* Rename genHeapAlloc to genHeapAllocForDiscontiguousArraysOrRealtime
* Rename some variables for clarity
* Misc dead code removal and reformatting

Signed-off-by: Daryl Maier <maier@ca.ibm.com>
Signed-off-by: Daryl Maier <maier@ca.ibm.com>
* Distinguish between "arraylet" and "hybrid arraylet" in naming where appropriate
* Various refactorings for readability and code density
* Eliminate obsolete comments

Signed-off-by: Daryl Maier <maier@ca.ibm.com>
@0xdaryl 0xdaryl force-pushed the enablebatchclear2.noalign.cleanup branch from e532c6c to c098ff4 Compare May 28, 2024 12:49
@0xdaryl
Copy link
Contributor Author

0xdaryl commented May 28, 2024

Jenkins compile xlinux jdk17

zeroInitScratchReg, cg);
generateRegImmInstruction(TR::InstOpCode::ADD8RegImms, node, segmentReg, 8, cg);
generateRegImmInstruction(TR::InstOpCode::SUB8RegImms, node, numBytesToZeroInitReg, 8, cg);
generateRegImmInstruction(TR::InstOpCode::CMP8RegImms, node, numBytesToZeroInitReg, 0, cg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cmp necessary here or does the sub update the flags already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this particular branch (JG, which does not rely on the carry flag) the CMP to 0 can be eliminated because the relevant flags will have been set up already by the SUB. I will change it, but I want to re-run the tests (internally) though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing this change internally on x86-64 Linux with sanity and openjdk was successful. I've amended and rebased the commits.

Avoid REP STOS zero initialization for arrays whose length is below a prescribed
threshold checked at runtime.  Use faster GPR stores instead.  Move REP STOS
initialization out of line.

Signed-off-by: Daryl Maier <maier@ca.ibm.com>
If an anewarray allocation must be performed out-of-line, be sure
to call the NoZeroInit version if zero initialization can be avoided.

Signed-off-by: Daryl Maier <maier@ca.ibm.com>
@0xdaryl 0xdaryl force-pushed the enablebatchclear2.noalign.cleanup branch from c098ff4 to b767300 Compare May 29, 2024 17:45
@vijaysun-omr
Copy link
Contributor

Jenkins test sanity.functional,sanity.openjdk xlinux,win jdk21

@vijaysun-omr
Copy link
Contributor

I'll wait for @ymanton to formally approve this, and the checks to pass, before I merge.

Copy link
Member

@ymanton ymanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@vijaysun-omr vijaysun-omr merged commit 43bc29c into eclipse-openj9:master May 30, 2024
9 checks passed
@vijaysun-omr
Copy link
Contributor

Since we want to keep all codegens aware of key changes in one, I thought I'd mention @zl-wang @r30shah and @knn-k here so that they can consider whatever feels relevant to their platforms

@r30shah
Copy link
Contributor

r30shah commented May 30, 2024

Going to take a look at this today. Tagging @ehsankianifar as he is working on fixing the issue we are seeing on Z.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented May 30, 2024

The bulk of the work in this PR addresses technical debt issues, as this code became very difficult to understand and make changes to over time. The code was refactored, dead code eliminated, comments added, more meaningful variable and function names created, etc. Hopefully it is in a better state now.

In terms of pure enhancements, the main contribution was to improve the performance of zero initializing shorter arrays with better instruction sequences, and providing knobs to turn off features like TLH prefetching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants