Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checklist of features for new JIT backends #5455

Open
0xdaryl opened this issue Apr 11, 2019 · 74 comments
Open

Checklist of features for new JIT backends #5455

0xdaryl opened this issue Apr 11, 2019 · 74 comments
Labels

Comments

@0xdaryl
Copy link
Contributor

0xdaryl commented Apr 11, 2019

I'm compiling a list of features that a new OpenJ9 JIT backend would need to implement to be reasonably "feature complete" with the others. These features are typically performance features above and beyond a basic, functional OpenJ9 JIT.

This could also serve a secondary purpose as a cross checklist for existing platforms to determine if they are missing any opportunities.

At the moment, I am particularly interested in the work that has been done to support features in Java 9, 11, 12, ... to be sure we don't miss any of that work in the AArch64 implementation.

This is a bit of a brain dump, but I hope to provide some structure to it once everyone has provided their input. I would appreciate it if those that are familiar with a particular backend could review this list and add anything you think is relevant. You don't necessarily have to go into great detail here: it will either be enough for me to track down the feature myself, or I can ask you about it.

FYI @andrewcraik @fjeremic @gita-omr, but input from anyone is welcome.

  • software concurrent scavenge

  • constant dynamic

  • nestmates

  • read barriers

  • write barriers

  • field watch

  • JNI dispatch

  • lock reservation

  • recompilation

  • on-stack replacement? (@andrewcraik)

  • AOT / SVM? (@dsouzai)

  • per code-cache helpers

  • JSR292?

  • DLT?

    • no specific support seems needed, but DLT symbols are excluded in some situations
  • JProfiling?

  • J9-specific IL opcodes, including:

    • ArrayStoreCHK
    • ArrayCHK
    • SpineCHKs and variants, arraylets in general
    • loadFence, storeFence, fullFence
  • Platform-specific inlining

    • unsafe natives
    • juc methods
    • optimized String hashcode
    • String.indexof
    • String case conversions (UTF16 & Latin1)
    • String compression
    • currentTimeMillis/nanoTime inlining
  • inlined helpers

    • monitor enter/exit
    • checkcast
    • instanceof
    • new/newarray/anewarray
  • implicit NULLCHK (via signal handler)

  • implicit DIVCHK (via signal handler)

  • transactional memory (tstart/tfinish/tabort/tcommit)

  • what is "asyncCheckGCMapPatching" @0dvictor ?

@DanHeidinga
Copy link
Member

nestmates

This should be handled by the VM's resolve helpers. I'm not aware of any JIT surface to this feature.

VarHandles

New in Java 9, similar to JSR 292's MethodHandles but for field access under different modes.

JNI Dispatch

In additional to DirectToJNI, there's also the recentish work on atomic free vmaccess.

@andrewcraik
Copy link
Contributor

VarHandles are just like MethodHandles by the time you get to codegen. Nestmates are only seen as part of the resolve paths and not really for perf.

@andrewcraik
Copy link
Contributor

No special support needed for JProfiling - it is all done in trees.

@andrewcraik
Copy link
Contributor

Constant dynamic is transparent in codegen

@dsouzai
Copy link
Contributor

dsouzai commented Apr 12, 2019

AOT / SVM?

Validation (SVM): common code

Codegen: This is where the majority of the work would need to be done. The addMetaDataForCode* method of the various instructions would handle adding the external relocations necessary. However, those are't the only locations; the various snippets are going to have to make sure they add the necessarily external relocations. I'm sure there are lots of other places as well, which could be gleaned by looking at the other existing codegens. The codegen would also need to know when to generate different instructions under AOT (for example, if an address isn't guaranteed to fit in 32 bits, even though it happens to do so during compilation). At the moment there would be the need for yet another J9AheadOfCompile.cpp file; however, once the consolidation work is done (#4803) that won't be necessary.

Relocation: TR_RelocationTarget would need to be extended to handle circumstances like POWER, where there are different ways a pointer can be patched (ie different ways to patch a 5 instruction sequence).

Any other considerations @mstoodle ?

EDIT

For other SVM changes, also see #15121 (comment)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Apr 17, 2019

Complete atomicCompareAndSwapReturn support on Z, Power, and AArch64: eclipse-omr/omr#3759

@0xdaryl
Copy link
Contributor Author

0xdaryl commented May 21, 2019

Field watch: AArch64 (#8038), AArch32 (#8040)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented May 28, 2019

DLT for AArch64. #5917 tracks the work to enable. Disabled in AArch64 by #5919.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented May 28, 2019

OSR for AArch64 #5921

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jun 10, 2019

Lock reservation: AArch64 (#12097)
Lock reservation optimization: #2344

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jun 12, 2019

CompactLocals (ability to map compacted stack in some linkage) : #5910

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jun 12, 2019

JITaaS relocations

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jun 20, 2019

Quad recognized methods (hopefully not required for long!)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 3, 2019

Interface PICs (e.g., #6325)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 3, 2019

Hardware transactional memory support. Enable CodeGenerator getSupportsTM(). Implement evaluators for tstart, tfinish, tabort.

@fjeremic
Copy link
Contributor

fjeremic commented Jul 3, 2019

@jdmpapin
Copy link
Contributor

jdmpapin commented Jul 4, 2019

I'd like to clarify a few points about nestmates here.

nestmates

This should be handled by the VM's resolve helpers. I'm not aware of any JIT surface to this feature.

The JIT interaction is that with nestmates, invokevirtual and invokeinterface no longer necessarily do virtual and interface dispatch (respectively). They might instead call a private method directly, and because private methods are not in the vtable, direct dispatch is the only possible implementation. The JIT compiler recognizes this situation in the resolved case during both IL generation and inlining so that it can treat it as a direct call as required. In the unresolved case, the resolution path has to detect this situation at runtime.

Nestmates are only seen as part of the resolve paths and not really for perf.

This is true, but I want to emphasize that code generator support is required for nestmates to work correctly. The runtime resolution path must be capable of carrying out a direct dispatch when indicated by the VM. The current design for doing so requires a pointer to the "virtual" J2I thunk in the PIC data, which needs a relocation for AOT. In the case of invokeinterface, we also need to do a type check.

(It was technically possible for invokeinterface to require direct dispatch before nestmates for final methods of Object, which are also not kept in the vtable. However, calling those through invokeinterface is unusual bytecode, and the JIT compiler would simply refuse to compile methods containing such calls. Only with nestmates did it become important to conditionally do direct dispatch based on the result of resolution at runtime.)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 7, 2019

Internal pointers. AArch64 (#6367), ARM (#6368)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 11, 2019

Arraycopy transformations from value propagation (TR::arraycopy and TR::ArrayCHK node support). Disabled via TR_DisableArrayCopyOpts.

AArch64 (#12122)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 12, 2019

Inline dynamic cast class evaluation for checkcast : #5291

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 12, 2019

Inlining support of MultiANewArray for 2 dimensional arrays: x86 (#2408), P (#2424), Z (#11088), AArch64 (#12367).

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 12, 2019

CodeGenerator SupportsProfiledInlining. Currently enabled for X,P,Z. AArch64 (#6451) and ARM32 (#6452).

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 12, 2019

CodeGenerator SupportsAutoSIMD. Currently enabled on X,P,Z. AArch64 (#6453) and ARM32 (#6454).

@gita-omr
Copy link
Contributor

Please note that autoSIMD is supported only to the degree vector opcodes implemented in a particular codegen. Optimizer attempts to vectorize a loop and then asks codegen if particular opcode has vector version (there is a codegen method for that, takes opcode as a parameter). I am pretty sure only x,z,p have some number of opcodes that are supported.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 21, 2019

Exception Directed Optimization (EDO)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 29, 2019

CodeGenerator support for GlRegDeps:

setSupportsGlRegDeps();
setSupportsGlRegDepOnFirstBlock();

And by extension, global register allocation.

For AArch64 (#6606).

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 29, 2019

Method recompilation

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 29, 2019

Support atomic free JNI. #2576, AArch64 (#6608), ARM32 (#6609)

By extension, enable directToJNI in the JIT.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 30, 2019

A code generator must set setSupportsDivCheck() if it provides an implementation for the DIVCHK IL Opcode. This is needed before Walker will create trees for integer type division and remainder operations. Otherwise, the compilation will fail with an unimplemented opcode for any division or remainder operations.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Oct 22, 2019

Enable prefetchInsertion optimization. Requires backend support for prefetch instructions. AArch64 eclipse-omr/omr#4494, ARM32 eclipse-omr/omr#4495

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Dec 9, 2019

Enable support for compressed references.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Dec 9, 2019

Implement lock reservation. AArch64 (#8032), AArch32 (#8033)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Dec 9, 2019

Implement support for balanced GC: AArch64 (#8034), AArch32 (#8035)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Dec 9, 2019

Implicit NULLCHKs: AArch64 (#8036)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Dec 9, 2019

Implicit DIVCHKs: AArch64 (#8037)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Dec 18, 2019

Support ternary (soon-to-be-renamed-to select) opcodes. eclipse-omr/omr#4682

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jan 24, 2020

Exploit cached lastITable in JIT interface dispatch (#8390) : AArch64 (#8400), Z (#8399), ARM (#8401)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Feb 18, 2020

Live registers: eclipse-omr/omr#4843

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Apr 3, 2020

Support for value types: #9105

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Apr 14, 2020

Support for inlined monitor cache for monent/monexit: #9240

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Apr 23, 2020

Guarded devirtualization: AArch64 (#9334), AArch32 (#9335), common solution (#9333)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Apr 23, 2020

ASSOCREG pseudo-instruction: AArch64 (#9350)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented May 28, 2020

Patchable GCR guards: Epic: #9730

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jan 26, 2021

Support newInstanceImpl optimization: AArch64 (#11790)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Feb 22, 2021

Support mixed references: AArch64 (#11977), general issue (#8847)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Feb 26, 2021

Allocation TLH prefetching (via CodeGen enableTLHPrefetching): AArch64 (#12068)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Mar 4, 2021

JIT Server support: AArch64 (#12121)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Mar 11, 2021

Implement supportsPassThroughCopyToNewVirtualRegister to enable RegDepCopyRemoval optimization. AArch64 (#12188)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Apr 21, 2021

Enable CodeGenerator supportsInliningOfIsInstance: AArch64 (#12518)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Apr 29, 2021

Enable CodeGenerator supportsByteswap capability when evaluators are available for that opcode. This is really something that needs to happen in OMR, but is being tracked here for convenience. AArch64 (eclipse-omr/omr#5971)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 15, 2021

Ensure CodeGenerator getSupportsAlignedAccessOnly() returns the correct answer for the target processor. SequentialStoreSimplifier may do the wrong thing otherwise.

@fjeremic
Copy link
Contributor

Ensure CodeGenerator getSupportsAlignedAccessOnly() returns the correct answer for the target processor. SequentialStoreSimplifier may do the wrong thing otherwise.

Does this mean if a CodeGenerator doesn't do anything w.r.t. this API we may have functional bugs?

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jul 15, 2021

Does this mean if a CodeGenerator doesn't do anything w.r.t. this API we may have functional bugs?

Yes. Mis-aligned data accesses were being generated on AArch64, and there was no means of encoding instructions to access that data.

I believe this API defaults to "false", so if a processor can (for example) access 4-bytes on a mis-aligned boundary then it should be fine. We have a PR coming on AArch64 that sets this flag to fix its problem.

@fjeremic
Copy link
Contributor

fjeremic commented Jul 15, 2021

Yes. Mis-aligned data accesses were being generated on AArch64, and there was no means of encoding instructions to access that data.

Should we be making such APIs pure virtual functions to force concrete CodeGenerators to implement them correctly and not rely on a default value?

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Oct 7, 2021

Support bit IL opcodes: ihbit, ilbit, inolz, inotz, ipopcnt, lhbit, llbit, lnolz, lnotz, lpopcnt

Make Java API methods (e.g., LowestOneBit) intrinsics that map to them.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Oct 12, 2021

Profile guarded devirtualization: Z (#13606)

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Feb 11, 2022

Implement supportsDirectJNICallsForAOT(). Prior to enabling, ensure the direct JNI sequence has proper relocations added.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Jun 8, 2023

Counting recompilation: AArch64 (#17543)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants