8336245: AArch64: remove extra register copy when converting from long to pointer #20157

fg1417 · 2024-07-12T13:44:25Z

In the cases like:

  UNSAFE.putLong(address + off1 + 1030, lseed);
  UNSAFE.putLong(address + 1023, lseed);
  UNSAFE.putLong(address + off2 + 1001, lseed);

Unsafe intrinsifies direct memory access using a long as the base address, generating a CastX2P node converting long to pointer in C2. Then we get optoassembly code like:

  ldr  R10, [R15, #120]    # int ! Field: address
  ldr  R11, [R16, #136]    # int ! Field: off1
  ldr  R12, [R16, #144]    # int ! Field: off2
  add  R11, R11, R10
  mov R11, R11    # long -> ptr
  add  R12, R12, R10
  mov R10, R10    # long -> ptr
  add R11, R11, #1030    # ptr
  str  R17, [R11]    # int
  add R10, R10, #1023    # ptr
  str  R17, [R10]    # int
  mov R10, R12    # long -> ptr
  add R10, R10, #1001    # ptr
  str  R17, [R10]    # int

In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for mov dst src only when dst == src [1], then we have assembly:

  ldr    x10, [x15,#120]
  ldp    x11, x12, [x16,#136]
  add    x11, x11, x10
  add    x12, x12, x10
  add    x11, x11, #0x406
  str    x17, [x11]
  add    x10, x10, #0x3ff
  str    x17, [x10]
  mov    x10, x12  <--- extra register copy
  add    x10, x10, #0x3e9
  str    x17, [x10]

There is still one extra register copy, which we're trying to remove in this patch.

This patch folds CastX2P into memory operands by introducing indirectX2P and indOffX2P. We also create a new opclass iRegPorL2P to remove extra copies from CastX2P in pointer addition.

Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so

[1]

jdk/src/hotspot/cpu/aarch64/aarch64.ad

Line 7906 in 5c612c2

instruct castX2P(iRegPNoSp dst, iRegL src) %{

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8336245: AArch64: remove extra register copy when converting from long to pointer (Enhancement - P4)

Reviewers

Andrew Haley (@theRealAph - Reviewer)
Andrew Dinn (@adinn - Reviewer)

Contributors

Andrew Haley <aph@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20157/head:pull/20157
$ git checkout pull/20157

Update a local copy of the PR:
$ git checkout pull/20157
$ git pull https://git.openjdk.org/jdk.git pull/20157/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20157

View PR using the GUI difftool:
$ git pr show -t 20157

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20157.diff

Webrev

Link to Webrev Comment

…g to pointer In the cases like: ``` UNSAFE.putLong(address + off1 + 1030, lseed); UNSAFE.putLong(address + 1023, lseed); UNSAFE.putLong(address + off2 + 1001, lseed); ``` Unsafe intrinsifies direct memory access using a long as the base address, generating a `CastX2P` node converting long to pointer in C2. Then we get optoassembly code like: ``` ldr R10, [R15, openjdk#120] # int ! Field: address ldr R11, [R16, openjdk#136] # int ! Field: off1 ldr R12, [R16, openjdk#144] # int ! Field: off2 add R11, R11, R10 mov R11, R11 # long -> ptr add R12, R12, R10 mov R10, R10 # long -> ptr add R11, R11, openjdk#1030 # ptr str R17, [R11] # int add R10, R10, openjdk#1023 # ptr str R17, [R10] # int mov R10, R12 # long -> ptr add R10, R10, openjdk#1001 # ptr str R17, [R10] # int ``` In aarch64, the conversion from long to pointer could be a nop but C2 doesn't know it. On the existing code, we do nothing for `mov dst src` only when `dst` == `src` [1], then we have assembly: ``` ldr x10, [x15,openjdk#120] ldp x11, x12, [x16,openjdk#136] add x11, x11, x10 add x12, x12, x10 add x11, x11, #0x406 str x17, [x11] add x10, x10, #0x3ff str x17, [x10] mov x10, x12 <--- extra register copy add x10, x10, #0x3e9 str x17, [x10] ``` There is still one extra register copy, which we're trying to remove in this patch. This patch folds `CastX2P` into memory operands by introducing `indirectX2P` and `indOffX2P`. We also create a new opclass `iRegPorL2P` to remove extra copies from `CastX2P` in pointer addition. Tier 1~3 passed on aarch64. No obvious change in size of libjvm.so [1] https://github.com/openjdk/jdk/blob/5c612c230b0a852aed5fd36e58b82ebf2e1838af/src/hotspot/cpu/aarch64/aarch64.ad#L7906

bridgekeeper · 2024-07-12T13:45:01Z

👋 Welcome back fgao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2024-07-12T13:45:18Z

@fg1417 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8336245: AArch64: remove extra register copy when converting from long to pointer

Co-authored-by: Andrew Haley <aph@openjdk.org>
Reviewed-by: aph, adinn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 173 new commits pushed to the master branch:

34ee06f: 8332850: javac crashes if container for repeatable annotation is not found
00b5348: 8337192: [BACKOUT] JDK-8336098 G1: Refactor G1RebuildRSAndScrubTask
3baff2a: 8335393: C2: assert(!had_error) failed: bad dominance
8081f87: 8334342: Add MergeStore JMH benchmarks
9d87918: 8335191: RISC-V: verify perf of chacha20
6e228ce: 8336254: Virtual thread implementation + test updates
d3e51da: 8334085: Test crash: assert(thread->held_monitor_count() == 0) failed: Must be
0898ab7: 8336741: Optimize LocalTime.toString with StringBuilder.repeat
24f67d0: 8334232: Optimize C1 classes layout
021c2c3: 8337067: Test runtime/classFileParserBug/Bad_NCDFE_Msg.java won't compile
... and 163 more: https://git.openjdk.org/jdk/compare/fb9a227e02ebf826edb762283e15dd7e402f8433...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2024-07-12T13:45:53Z

@fg1417 The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

fg1417 · 2024-07-12T13:46:26Z

/contributor add @theRealAph

openjdk · 2024-07-12T13:47:06Z

@fg1417
Contributor Andrew Haley <aph@openjdk.org> successfully added.

fg1417 · 2024-07-12T13:49:32Z

src/hotspot/share/opto/machnode.cpp

-  if (t->isa_intptr_t() && offset != 0 && offset != Type::OffsetBot) {
+
+  if (t->isa_intptr_t() &&
+#if !defined(AARCH64)


After applying the operand "IndirectX2P", we may have some patterns like:

str val, [CastX2P base]

The code path here will resolve the base, which is actually a intptr, not a ptr, and the offset is 0.

I guess the code here was intended to support [base, offset], where base can be a intptr but offset can not be 0. I'm not sure why there is such a limitation that offset can not be 0, maybe for some old machines?

I don't think the limitation is applied to aarch64 machines now. So I unblock it for aarch64.

I think it's the other way around. Isn't this code saying that if the address is an intptr + a nonzero offset, then the returned type is bottom, ie nothing? What effect does this change have?

Thanks for review! Yeah, this code says if the address is an intptr + a nonzero offset, then return TypeRawPtr::BOTTOM. Then it continues the verification.

Without the change here, an intptr + a zero offset would fail to assert on next lines,

jdk/src/hotspot/share/opto/machnode.cpp

Lines 409 to 413 in a96de6d

// be conservative if we do not recognize the type

if (tp == nullptr) {

assert(false, "this path may produce not optimal code");

return TypePtr::BOTTOM;

}

AFAIK, the return value TypeRawPtr::BOTTOM represents raw access to memory here. And an intptr + a zero offset is also a valid raw access, so I unblock it here. WDYT? Thanks.

I learn something every day, I guess. It's been a long while since I looked, but I expected "pointer to anything" to be TOP, not BOTTOM. Thinking some more, a TypeRawPtr must be casted to a usable physical type in order to use it, so "pointer to nothing" makes more sense than "pointer to anything".

mlbridge · 2024-07-12T13:52:04Z

Webrevs

00: Full (06f62c62)

This patch forces `CastX2P` to be a two-address instruction, so that C2 could allocate the same register for dst and src. Then we can remove the instruction completely in the assembly. The motivation comes from some cast operations like `castPP`. The difference for ADLC between `castPP` and `CastX2P` lies in that `CastX2P` always has different types for dst and src. We can force ADLC to generate an extra `two_adr()` for `CastX2P` like it does automatically for `castPP`, which could tell register allocator that the instruction needs the same register for dst and src. However, sometimes, RA and GCM in C2 can't work as we expected. For example, we have Assembly on the existing code: ``` ldp x10, x11, [x17,openjdk#136] add x10, x10, x15 add x11, x11, x10 ldr x12, [x17,openjdk#152] str x16, [x10] add x10, x12, x15 str x16, [x11] str x16, [x10] ``` After applying the patch, the assembly is: ``` ldr x10, [x16,openjdk#136] <--- 1 add x10, x10, x15 ldr x11, [x16,openjdk#144] <--- 2 mov x13, x10 <--- 3 str x17, [x13] ldr x12, [x16,openjdk#152] add x10, x11, x10 str x17, [x10] add x10, x12, x15 str x17, [x10] ``` C2 generate a totally extra mov, see 3, and we even lost the chance to merge load pair, see 1 and 2. That's terrible. Although this scenario would disappear after combining with openjdk#20157, I'm still not sure if this patch is worthwhile.

fg1417 · 2024-07-12T14:15:34Z

#20159 is also to fix the same issue. Please feel free to review the draft PR. Thanks.

theRealAph · 2024-07-15T11:00:39Z

This will need quite a lot of testing, perhaps higher tiers and jcstress. You can test these two PRs together.

fg1417 · 2024-07-17T15:02:24Z

#20159 is also to fix the same issue. Please feel free to review the draft PR. Thanks.

This will need quite a lot of testing, perhaps higher tiers and jcstress. You can test these two PRs together.

Thanks for approval, @theRealAph. I'll test jcstress on my local.

Could you help review and test these two PRs with higher tiers please? @TobiHartmann @vnkozlov Thanks.

TobiHartmann · 2024-07-18T11:52:20Z

Sure, I'll run this through our testing and report back once it passed.

TobiHartmann · 2024-07-22T11:29:07Z

All tests passed.

fg1417 · 2024-07-22T15:58:40Z

All tests passed.

@TobiHartmann thanks for your testing!

Jcstress also passed.

fg1417 · 2024-07-25T08:44:20Z

Can I have a second review please :-)

adinn · 2024-07-25T09:37:42Z

src/hotspot/cpu/aarch64/aarch64.ad

+operand immLOffset()
 %{
-  predicate(Address::offset_ok_for_immed(n->get_long(), 0));
+  predicate(n->get_long() >= -256 && n->get_long() <= 65520);


Why is this using hard wired constants rather than using Address::offset_ok_for_immed?

Also, why is the constant value 65520?

I think Address::offset_ok_for_immed is too restrictive: we want a predicate that is the superset of all possible address offsets.

jshell> ((1<<12)-1) <<4 $3 ==> 65520

Yes, I realise that this is 16 less than 65536. However, there are two things I don't follow.

In the original code immLoffset was only used to define indOffLN i.e. a long offset used with a narrow pointer. The use of Address::offset_ok_for_immed(n->get_long(), 0) in the predicate limited narrow pointer offsets to -256 <= offset <= (2^12 - 1). With this change the top end of the range is now (2^12 - 1) << 4. I am wondering why that is appropriate?

The change allows immLOffset to be used in the definition of indOffX2P. I am not clear why indOffX2P is not just defined using the existing operand immLoffset16 which has as its predicate Address::offset_ok_for_immed(n->get_long(), 4). The only difference I can see is that the alternative predicate used here will accept a positive offset that is not 16 byte aligned. Is that the intention of the redefinition? Again, why is that appropriate?

The change allows immLOffset to be used in the definition of indOffX2P. I am not clear why indOffX2P is not just defined using the existing operand immLoffset16 which has as its predicate Address::offset_ok_for_immed(n->get_long(), 4).

After this change, immLOffset is a more general-purpose type than immLoffset16. immLOffset matches all possible address offsets, along with some impossible ones. For example, it matches all of the misaligned Unsafe accesses at any offset, regardless of operand size. In the (rare) event that an operand size and offset don't fit a single instruction, we'll split the instruction when we emit it.

After this patch there will be a few rare cases where we have a regression in code size, but it's worth it for the simplicity and the size of the matcher logic, which would otherwise explode. I don't expect any significant regression in execution time.

This patch is not the last word on the matter; later patches may well further reduce the number of integer offset types in a similar way. I don't think that many of the offsetL/I/X/P types do anything useful, and we'd probably profit from removing them, but that's another patch for anther day.

Ok, I see. The use of immLoffset as defined in this patch is actually correct for narrow oops and, indeed, for all other address base types. It allows for all possible offsets that might fit into a load an immediate slot. Whether we can legitimately encode the operand offset as an immediate or need instead to use an auxiliary add does not actually depend on the type of the address base but on the size of the datum fetched by the indirect load that consumes the operand. So, an indirect operand with offset 4098 would be too big to encode in an ldrb, fine to encode in an ldrh and invalid for encoding in an ldrw or ldrx because it is not suitably aligned.

That does imply we should get rid of the other (now redundant) immLoffset operands. However, we can do that in a follow-up patch because it is not what this fix is addressing

fg1417 · 2024-07-26T09:34:52Z

@theRealAph @adinn Thanks for your reviews!

I'll integrate it.

/integrate

openjdk · 2024-07-26T09:36:25Z

Going to push as commit d10afa2.
Since your change was applied there have been 184 commits pushed to the master branch:

7f11935: 8337167: StringSize deduplication
487450c: 8216471: GTK LnF: Frame is clipped and does not show JTable,Tooltip and JTree demo in SwingSet2 demo
29f0f17: 8336879: Always true condition 'img != null' in GTKPainter.paintPopupMenuBackground
ee839b7: 8337239: Fix simple -Wzero-as-null-pointer-constant warnings in classfile code
0584af2: 8336685: Shenandoah: Remove experimental incremental update mode
8c28f25: 8337124: (fs) sun.nio.fs.WindowsSecurity.enablePrivilege should pin when continuations supported
b5b5a5b: 8336667: IAE in DerInputStream.toByteArray
cf0d9e0: 8337037: compiler internal options are not printing the stacktrace after a compiler crash
81ed028: 8337060: Test java/foreign/TestConcurrentClose.java failed: IllegalStateException: SegmentAccessor::doAccess method not being compiled
e74edba: 8336640: Shenandoah: Parallel worker use in parallel_heap_region_iterate
... and 174 more: https://git.openjdk.org/jdk/compare/fb9a227e02ebf826edb762283e15dd7e402f8433...master

Your commit was automatically rebased without conflicts.

openjdk · 2024-07-26T09:36:31Z

@fg1417 Pushed as commit d10afa2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk bot added the rfr Pull request is ready for review label Jul 12, 2024

openjdk bot added the hotspot hotspot-dev@openjdk.org label Jul 12, 2024

fg1417 commented Jul 12, 2024

View reviewed changes

This was referenced Jul 12, 2024

8336464: C2: Force CastX2P to be a two-address instruction #20159

Closed

8338442: AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands #16991

Closed

theRealAph approved these changes Jul 15, 2024

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jul 15, 2024

adinn reviewed Jul 25, 2024

View reviewed changes

adinn approved these changes Jul 25, 2024

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label Jul 26, 2024

openjdk bot closed this Jul 26, 2024

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 26, 2024

	// be conservative if we do not recognize the type
	if (tp == nullptr) {
	assert(false, "this path may produce not optimal code");
	return TypePtr::BOTTOM;
	}

8336245: AArch64: remove extra register copy when converting from long to pointer #20157

8336245: AArch64: remove extra register copy when converting from long to pointer #20157

Uh oh!

Conversation

fg1417 commented Jul 12, 2024 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Contributors

Reviewing

Webrev

Uh oh!

bridgekeeper bot commented Jul 12, 2024

Uh oh!

openjdk bot commented Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Jul 12, 2024

Uh oh!

fg1417 commented Jul 12, 2024

Uh oh!

openjdk bot commented Jul 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlbridge bot commented Jul 12, 2024

Webrevs

Uh oh!

fg1417 commented Jul 12, 2024

Uh oh!

theRealAph commented Jul 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fg1417 commented Jul 17, 2024

Uh oh!

TobiHartmann commented Jul 18, 2024

Uh oh!

TobiHartmann commented Jul 22, 2024

Uh oh!

fg1417 commented Jul 22, 2024

Uh oh!

fg1417 commented Jul 25, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adinn Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fg1417 commented Jul 26, 2024

Uh oh!

openjdk bot commented Jul 26, 2024

Uh oh!

openjdk bot commented Jul 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

fg1417 commented Jul 12, 2024 •

edited by openjdk bot

Loading

openjdk bot commented Jul 12, 2024 •

edited

Loading

theRealAph commented Jul 15, 2024 •

edited

Loading

adinn Jul 25, 2024 •

edited

Loading