Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve vector registers across JIT helpers on Power #11752

Closed
gacholio opened this issue Jan 22, 2021 · 28 comments · Fixed by #12239
Closed

Preserve vector registers across JIT helpers on Power #11752

gacholio opened this issue Jan 22, 2021 · 28 comments · Fixed by #12239
Assignees
Labels
arch:power comp:jit comp:vm project:panama Used to track Project Panama related work

Comments

@gacholio
Copy link
Contributor

The JIT helper contract is that all registers are preserved/updated with the exception of a possible return value. Once the Power JIT starts using vector registers, the helpers must preserve them.

@gacholio
Copy link
Contributor Author

@IBMJimmyk @gita-omr @zl-wang

@gacholio
Copy link
Contributor Author

A few basic questions to begin:

  1. How many VRs are there, how big are they and what alignment is required for preserving?

  2. Do the VRs overlap the FPRs (so if VRs are saved, FPRs need not be done separately).

I will need to save argument FPRs in certain cases (method resolve) for decompile even when VRs are saved (I don't want to deal with differing locations for the FPR values in the decompiler).

  1. Assuming that the JIT private linkage will not preserve any VRs, would it be worth not preserving VRs at method resolve helpers?

My initial feeling is that it would not be worth the effort.

  1. Treatment of VRs in the helpers will be controlled by J9_EXTENDED_RUNTIME_USE_VECTOR_REGISTERS as it is on Z

@IBMJimmyk
Copy link
Contributor

There are 64 128-bit vector registers. They are numbered from vsr0 to vsr63. These are also called Vector-Scalar Register. I don't recall the performance implications of unaligned access.

vsr0 to vsr31 overlap with the floating point registers fpr0 to fpr31. fpr0 overlaps the MSB half of vsr0. Similarly fpr1 overlaps the MSB half of vsr1 and so on for all 32 registers.

vsr32 to vsr63 overlap with the VMX registers which are vr0 to vr31. They fully overlap all 128 bits in each register. If you use VMX instructions to load or store to memory (eg. lvx), the access must be aligned to be correct. lvx can only load from a 16 byte aligned location.

@gacholio
Copy link
Contributor Author

gacholio commented Jan 22, 2021

So that brings up the next questions:

  1. What instructions are to be used to save/restore?

Are the instructions available in our assemblers for all P platforms?

  1. What are the C ABIs wrt vector registers and will the VM be required to save some on entry to the interpreter (given that the JIT will preserve none of them)? Does it matter where they are saved (i.e. are there specific places in the C stack frame that the ABI requires)?

@zl-wang
Copy link
Contributor

zl-wang commented Jan 22, 2021

How many VRs are there, how big are they and what alignment is required for preserving?

There are 64 VSRs, each is currently 16 bytes, and no strict alignment required but, of course, preferring 16 alignment for performance and easier access purposes.

Do the VRs overlap the FPRs (so if VRs are saved, FPRs need not be done separately).

Yes, FPRs overlap with upper-left quarter of VSRs. i.e. fpr0 overlapping with the top-half of vsr0, fpr1 with the top-half of vsr1, so on so forth. VMX/Altivec VRs (32 of them) overlap with the last portion of VSRs, i.e. v0 overlapping with vsr32, v1 with vsr33, ...., and v31 with vsr63.

Assuming that the JIT private linkage will not preserve any VRs, would it be worth not preserving VRs at method resolve helpers?

That is a reasonable thing to do.

I have a little concern for the performance of frequently-called helpers with this blindly saving/restoring all 64 VSRs (1KB in total), such as interface method look-up.

@gacholio
Copy link
Contributor Author

Interface lookup is also at a method invocation point, so I believe there could be no live VRs at that point. It's probably not difficult to split the helpers into those that require VR treatment and those that don't.

@zl-wang
Copy link
Contributor

zl-wang commented Jan 22, 2021

What instructions are to be used to save/restore?

lxvd2x or lxvw4x can be used to restore all of them, while corresponding stxvd2x or stxvw4x can be used to save all of them.

Are the instructions available in our assemblers for all P platforms?

No. power7 needs to be specified to have those assembly instructions. such as, . machine power7 pseudo op or something like that.

What are the C ABIs wrt vector registers and will the VM be required to save some on entry to the interpreter (given that the JIT will preserve none of them)? Does it matter where they are saved (i.e. are there specific places in the C stack frame that the ABI requires)?

vsr52 - vsr63 are preserved in the ABI. Entering interpreter (from JIT) doesn't need to save/restore VSRs just like FPRs. For C stack unwinding purpose, the saving area in C stack is the traditional location at the higher-end of the frame.

@zl-wang
Copy link
Contributor

zl-wang commented Jan 22, 2021

Interface lookup is also at a method invocation point, so I believe there could be no live VRs at that point. It's probably not difficult to split the helpers into those that require VR treatment and those that don't.

That would be good to do.

@zl-wang
Copy link
Contributor

zl-wang commented Jan 22, 2021

starting POWER10, there are better instructions to save/restore them (immediate offset form, instead of index form), but it is a long shot in the future for us to drop POWER7/POWER8/POWER9.

@zl-wang
Copy link
Contributor

zl-wang commented Jan 22, 2021

image

@zl-wang
Copy link
Contributor

zl-wang commented Jan 22, 2021

The preserved VR/VSR(s) are saved next (lower address) to the preserved GPRs ... as shown above.

@gacholio
Copy link
Contributor Author

Can someone please point me to some doc for the vector instructions?

@zl-wang
Copy link
Contributor

zl-wang commented Jan 26, 2021

Can someone please point me to some doc for the vector instructions?

https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0

3.0 corresponding to POWER9 ... look for lxvd2x and stxvd2x

@zl-wang
Copy link
Contributor

zl-wang commented Jan 26, 2021

those two instructions existed starting on POWER7, by the way.

@gacholio
Copy link
Contributor Author

I've read the doc, and it's typically confusing - can you provide examples of the instruction use? It seems to imply that I have to have a register preloaded with the constant 4. To store multiple registers sequentially, do I need to keep adding 16 to the base register?

@gacholio
Copy link
Contributor Author

Sorry, I was reading the doc for the 4x forms. The 2x ones seem more sane, though it's a shame there's no constant offset form.

@gacholio
Copy link
Contributor Author

Would this be how I load vector registers 0 and 1 from consecutive storage?

lxvd2x 0,base,offset
addi offset,offset,16
lxvd2x 1,base,offset

or

lxvd2x 0,0,addr
addi addr,addr,16
lxvd2x 1,0,addr

@zl-wang
Copy link
Contributor

zl-wang commented Jan 26, 2021

both will work.

@gacholio
Copy link
Contributor Author

Track my changes here:

https://github.com/gacholio/openj9/tree/vector

I won't promise it will always compile/work.

So far, I've modified the outer (JIT execution) native stack frame, and saved/restored the required VRs upon entry/exit from the interpreter.

@gacholio
Copy link
Contributor Author

@gita-omr Latest code in the branch should be preserving VRs across the helper calls now.

It's unoptimized and untested (everything compiles and works when the vector enable bit is not set).

@gita-omr
Copy link
Contributor

Thanks!

@rmnattas
Copy link
Contributor

rmnattas commented Feb 3, 2021

I believe there's an overlap in the code in saving/restoring VSR and FPR registers.
https://github.com/gacholio/openj9/blob/dfdbe7d0245c48723b29cf559e04f4c9c2f296c7/runtime/oti/phelpers.m4#L456-L734

The instructions lxvd2x and stxvd2x index VSR (not VR) registers and saving the registers VSR[0-51] then FPR[0-13] (loading FPR[0-13] then VSR[0-51] in restore) overlap.
FPR store/restore is under L_no_VR_save_ (which could be confusing and may should be using VSR instead of VR in the name) but there's no branch to not call FPR store/restore when VSR store/restore is performed/will-be-performed

Screen Shot 2021-02-03 at 2 55 43 PM

Screen Shot 2021-02-03 at 2 55 52 PM

@gacholio
Copy link
Contributor Author

gacholio commented Feb 3, 2021

This code is not optimized - for some of the helpers, I require the FPRs to be stored even when VRs are in use. The restore code restores the FPRs first, then the VRs if need be, so all of the values should be correctly preserved - this code could certainly be modified to do one or the other instead of both.

@0xdaryl
Copy link
Contributor

0xdaryl commented Feb 25, 2021

@knn-k FYI

@0xdaryl 0xdaryl added the project:panama Used to track Project Panama related work label Feb 28, 2021
gacholio added a commit to gacholio/openj9 that referenced this issue Mar 22, 2021
Optionally preserve the vector registers in the assembly helpers on
Power. This is controlled by the
J9_EXTENDED_RUNTIME_USE_VECTOR_REGISTERS bit in the J9JavaVM
extendedRuntimeFlags field.

Fixes: eclipse-openj9#11752

[ci skip]

Signed-off-by: Graham Chapman <graham_chapman@ca.ibm.com>
gacholio added a commit to gacholio/openj9 that referenced this issue Mar 23, 2021
Optionally preserve the vector registers in the assembly helpers on
Power. This is controlled by the
J9_EXTENDED_RUNTIME_USE_VECTOR_REGISTERS bit in the J9JavaVM
extendedRuntimeFlags field.

Fixes: eclipse-openj9#11752

[ci skip]

Signed-off-by: Graham Chapman <graham_chapman@ca.ibm.com>
@pshipton
Copy link
Member

Reopen since #12239 and #12275 where reverted.

@gacholio
Copy link
Contributor Author

gacholio commented Apr 5, 2021

Reopening for 32-bit support.

@gacholio gacholio reopened this Apr 5, 2021
@gacholio
Copy link
Contributor Author

gacholio commented Apr 7, 2021

@gita-omr @pshipton Can we have the discussion about 32-bit support?

@gacholio
Copy link
Contributor Author

gacholio commented Apr 7, 2021

32-bit support will be added at a later date if we deem it necessary.

@gacholio gacholio closed this as completed Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch:power comp:jit comp:vm project:panama Used to track Project Panama related work
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants