Skip to content

Conversation

@fg1417
Copy link

@fg1417 fg1417 commented Dec 6, 2023

On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by:

cast<64> (32-bit compressed reference) + field_offset

When the field_offset is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., base_plus_offset, supported by LDR/STR instructions. But for certain data width, not all immediates can be encoded in the instruction field of LDR/STR [1]. The ranges are different as data widths vary.

For example, when we try to load a value of long type at offset of 1030, the address expression is (AddP (DecodeN base) 1030). Before the patch, the expression was matching with operand indOffIN(). But, for 64-bit LDR/STR, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760 [2]. 1030 can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail.

In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as base_plus_offset. We introduce indOffIN4/indOffLN4 and indOffIN8/indOffLN8 for 32-bit data type and 64-bit data type separately in the patch. E.g., for memory4, we remove the generic indOffIN/indOffLN, which matches wrong unscaled immediate range, and replace them with indOffIN4/indOffLN4 instead.

Since 8-bit and 16-bit LDR/STR instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of memory1 and memory2 by introducing indOffIN1/indOffLN1 and indOffIN2/indOffLN2.

We also remove unused operands indOffI/indOffl/indOffIN/indOffLN to avoid misuse.

Tier 1-3 passed on aarch64.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8338442: AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16991/head:pull/16991
$ git checkout pull/16991

Update a local copy of the PR:
$ git checkout pull/16991
$ git pull https://git.openjdk.org/jdk.git pull/16991/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16991

View PR using the GUI difftool:
$ git pr show -t 16991

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16991.diff

Webrev

Link to Webrev Comment

…2 compiler bug"

On LP64 systems, if the heap can be moved into low virtual
address space (below 4GB) and the heap size is smaller than the
interesting threshold of 4 GB, we can use unscaled decoding
pattern for narrow klass decoding. It means that a generic field
reference can be decoded by:
```
cast<64> (32-bit compressed reference) + field_offset
```

When the `field_offset` is an immediate, on aarch64 platform, the
unscaled decoding pattern can match perfectly with a direct
addressing mode, i.e., `base_plus_offset`, supported by LDR/STR
instructions. But for certain data width, not all immediates can
be encoded in the instruction field of LDR/STR[1]. The ranges are
different as data widths vary.

For example, when we try to load a value of long type at offset of
`1030`, the address expression is `(AddP (DecodeN base) 1030)`.
Before the patch, the expression was matching with
`operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate
byte offset must be in the range -256 to 255 or positive immediate
byte offset must be a multiple of 8 in the range 0 to 32760[2].
`1030` can't be encoded in the instruction field. So, after
matching, when we do checking for instruction encoding, the
assertion would fail.

In this patch, we're going to filter out invalid immediates
when deciding if current addressing mode can be matched as
`base_plus_offset`. We introduce `indOffIN4/indOffLN4` and
`indOffIN8/indOffLN8` for 32-bit data type and 64-bit data
type separately in the patch. E.g., for `memory4`, we remove
the generic `indOffIN/indOffLN`, which matches wrong unscaled
immediate range, and replace them with `indOffIN4/indOffLN4`
instead.

Since 8-bit and 16-bit LDR/STR instructions also support the
unscaled decoding pattern, we add the addressing mode in the
lists of `memory1` and `memory2` by introducing
`indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`.

We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN`
to avoid misuse.

Tier 1-3 passed on aarch64.

[1] https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33
[2] https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en
@bridgekeeper
Copy link

bridgekeeper bot commented Dec 6, 2023

👋 Welcome back fgao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 6, 2023
@openjdk
Copy link

openjdk bot commented Dec 6, 2023

@fg1417 The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Dec 6, 2023
@mlbridge
Copy link

mlbridge bot commented Dec 6, 2023

Webrevs

@theRealAph
Copy link
Contributor

This is a complex fix for a corner case that can trivially be fixed via legitimize_address. Unless we can find something simpler, we should do so.

@theRealAph
Copy link
Contributor

One question: is this only about misaligned loads?

@dean-long
Copy link
Member

One question: is this only about misaligned loads?

It looks to me that it also handles aligned loads with too-large offsets.

@dean-long
Copy link
Member

After this change, immIOffset and immLOffset appear to be obsolete.

@fg1417
Copy link
Author

fg1417 commented Dec 7, 2023

One question: is this only about misaligned loads?

It looks to me that it also handles aligned loads with too-large offsets.

Yes. E.g., for long type, before the patch, it only accepts aligned offsets ranging from 0 to 4095, after the patch, it now accepts aligned offsets ranging from 0 to 32760.

@fg1417
Copy link
Author

fg1417 commented Dec 7, 2023

After this change, immIOffset and immLOffset appear to be obsolete.

Removed them in the new commit. Thanks!

@theRealAph
Copy link
Contributor

I think this patch is excessive for the problem and introduces a lot of code dupiication. Maybe it would be simpler, smaller, and faster to check for what we need:

diff --git a/src/hotspot/cpu/aarch64/aarch64.ad b/src/hotspot/cpu/aarch64/aarch64.ad
index 233f9b6af7c..ea842912ce9 100644
--- a/src/hotspot/cpu/aarch64/aarch64.ad
+++ b/src/hotspot/cpu/aarch64/aarch64.ad
@@ -5911,7 +5911,8 @@ operand indIndexN(iRegN reg, iRegL lreg)
 
 operand indOffIN(iRegN reg, immIOffset off)
 %{
-  predicate(CompressedOops::shift() == 0);
+  predicate(CompressedOops::shift() == 0
+            && Address::offset_ok_for_immed(n->in(3)->find_int_con(min_jint), exact_log2(sizeof(jint))));
   constraint(ALLOC_IN_RC(ptr_reg));
   match(AddP (DecodeN reg) off);
   op_cost(0);
@@ -5926,7 +5927,8 @@ operand indOffIN(iRegN reg, immIOffset off)
 
 operand indOffLN(iRegN reg, immLoffset off)
 %{
-  predicate(CompressedOops::shift() == 0);
+  predicate(CompressedOops::shift() == 0
+            && Address::offset_ok_for_immed(n->in(3)->find_long_con(min_jint), exact_log2(sizeof(jlong))));
   constraint(ALLOC_IN_RC(ptr_reg));
   match(AddP (DecodeN reg) off);
   op_cost(0);

@dean-long
Copy link
Member

@theRealAph , your patch only works if when indOffIN is used in memory4 and indOffLN is used in memory8, right?
Introducing new operands like indOffIN4 is consistent with how the code currently works with indOffI4. In fact I think the new indOffIN<size> could be folded into the existing indOffI<size> by using multiple match lines and a better predicate.

@theRealAph
Copy link
Contributor

@theRealAph , your patch only works if when indOffIN is used in memory4 and indOffLN is used in memory8, right? Introducing new operands like indOffIN4 is consistent with how the code currently works with indOffI4.

Yes, it is, but clearly that does not scale, leading to a great profusion of operand kinds.

In fact I think the new indOffIN<size> could be folded into the existing indOffI<size> by using multiple match lines and a better predicate.

Sure, that would be better still. Best of all would IMO be for each kind of memory access to check its offset operand by calling ok_for_immed, but let's see what folding and the use of better predicates does.

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 5, 2024

@fg1417 This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@eme64
Copy link
Contributor

eme64 commented Jan 5, 2024

@fg1417 what is the state on this?

The example here may look like a strange edge-case. But with my plans this may become more common:
JDK-8318446 C2: implement StoreNode::Ideal_merge_stores

I will merge stores, which can then look like unaligned stores of a larger type.
This bug here blocks my progress. I'm not in a hurry, just curious if there is now a plan how to proceed here ;)

@theRealAph
Copy link
Contributor

@fg1417 what is the state on this?

The example here may look like a strange edge-case. But with my plans this may become more common: JDK-8318446 C2: implement StoreNode::Ideal_merge_stores

I will merge stores, which can then look like unaligned stores of a larger type. This bug here blocks my progress. I'm not in a hurry, just curious if there is now a plan how to proceed here ;)

The problem with this PR is that the code is way too complex for such a simple problem. The port is correct as it is, in the release build.

The only problem is an assertion. We could simply remove that assertion, but if it were me I'd fix the problem properly. Both @dean-long and I have suggested ways to improve this patch with less code. If @fg1417 decides to drop this PR I'll fix it.

@fg1417
Copy link
Author

fg1417 commented Jan 8, 2024

Sorry, I can't work on this right now. @theRealAph could you help to push the changes please? Thanks.

@eme64
Copy link
Contributor

eme64 commented Jan 29, 2024

@theRealAph do you want to fix this? Otherwise I'll just push a PR removing the assert, since it blocks this: #16245

@theRealAph
Copy link
Contributor

@theRealAph do you want to fix this? Otherwise I'll just push a PR removing the assert, since it blocks this: #16245

I've just started working on it. If you like, push a removing the assert, and then my fix will re-insert it.

@theRealAph
Copy link
Contributor

theRealAph commented Jan 31, 2024

I'm withdrawing my objection to this PR.

I've had a good look at some alternatives, and none of them are any better. I've reluctantly concluded that, given the design of C2, there's no better way to fix it.

My apologies to @fg1417 . You were right.

@fg1417
Copy link
Author

fg1417 commented Feb 2, 2024

@theRealAph thanks for your kind review. All comments inspired me a lot and helped me think more. Also, thanks for your help on fixing it!

@eme64
Copy link
Contributor

eme64 commented Feb 22, 2024

@fg1417 are you still working on this?

@theRealAph
Copy link
Contributor

@fg1417 are you still working on this?

The best thing to do now is to remove the failing assertion.

@theRealAph
Copy link
Contributor

On 6/6/24 13:42, Fei Gao wrote:

Sorry, did you mean loading from base plus offset, like ldr x0, [x6, #8] or ldr x0, [x6, x7], takes one more cycle than loading from base
register only, like ldr x0, [x6]? Does the address addition take one
cycle?

We know that, on many Arm cores, Store μOPs are split into address and
data μOPs which are executed separately. That doesn't usually cause
any additional delay, because cores execute many operations in
parallel, so an address generation μOP for base+offset very probably
will execute in parallel with some previous instructions, meaning that
the target address is ready before it is needed. This split of address
generation must happen regardless of whether a store (or a load) is a
single instruction

str x0, [x1, #80]

or a pair of instructions

add r8, x1, #80; str x0, [x8].

Of course, a pair of instructions occupies twice as much icache space,
and you can run out of instruction decode bandwidth. However, in the
case of Unsafe operations, I don't believe that an occasional
unnecessary two-instruction operation will result in a performance
regression.

@fg1417
Copy link
Author

fg1417 commented Jun 11, 2024

On 6/6/24 13:42, Fei Gao wrote:

Sorry, did you mean loading from base plus offset, like ldr x0, [x6, #8] or ldr x0, [x6, x7], takes one more cycle than loading from base
register only, like ldr x0, [x6]? Does the address addition take one
cycle?

We know that, on many Arm cores, Store μOPs are split into address and data μOPs which are executed separately. That doesn't usually cause any additional delay, because cores execute many operations in parallel, so an address generation μOP for base+offset very probably will execute in parallel with some previous instructions, meaning that the target address is ready before it is needed. This split of address generation must happen regardless of whether a store (or a load) is a single instruction

str x0, [x1, #80]

or a pair of instructions

add r8, x1, #80; str x0, [x8].

Of course, a pair of instructions occupies twice as much icache space, and you can run out of instruction decode bandwidth. However, in the case of Unsafe operations, I don't believe that an occasional unnecessary two-instruction operation will result in a performance regression.

Thanks for your kind explanation @theRealAph . That quite makes sense to me. I'll continue processing this pull request to implement it.

@fg1417
Copy link
Author

fg1417 commented Jun 24, 2024

However, I don't think that all the IndOffXX types do us any good. It would be simpler and faster to match a general-purpose IndOff type then let legitimize_address() fix any out-of-range operands. That'd reduce the size of the match rules and the time it takes to run them.

Hi @theRealAph @dean-long , the new commit implemented the scheme discussed before. It keeps two general operands immIOffset and immLOffset and lets legitimize_address() fix any out-of-range operands for different memory operations. To avoid confusion, it also keeps indOffI and indOffIN as separate operands.

Could you help review it please? Thanks!

@theRealAph
Copy link
Contributor

That looks good. I'll do a more detailed review tomorrow.

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This is a good cleanup.

I've tried a bunch of different things, and I haven't seen any measurable performance regressions.

But I do see this code expansion for an unaligned store long:

             0x000000010fca2210:   mov          x2, x22
            ;; legitimize_address {
             0x000000010fca2214:   add          x8, x2, #0x3ff
            ;; } legitimize_address
             0x000000010fca2218:   str          x4, [x8]

instead of, currently,

            0x000000010e5024f0:   add		x3, x22, #0x3ff
            0x000000010e5024f4:   str		x2, [x3]

Huh? What on Earth can have caused that added register copy? It looks like C2 has decided that the indOffL operand must clobber its input register x22, but it can't be that.

Looking at OptoAssembly for the same test, I see:

21c +   mov R11, R23	# long -> ptr
220     str  R12, [R11, #1023]	# int

whereas existing code has OptoAssembly

0184 +  add R2, R6, #1023	# ptr
0188    str  R4, [R2]	# int

At a guess, the add operation in the existing code does the long->ptr conversions as well as the addition.

This is very annoying. It's not terribly important, though.

@theRealAph
Copy link
Contributor

I think this fixes it. Unsafe intrinsifies direct memory access using a long as the base address, and it seems C2 needs to be told that the conversion from long->ptr is a nop. Having said that, I'm not sure why this PR causes the extra register copy.

diff --git a/src/hotspot/cpu/aarch64/aarch64.ad b/src/hotspot/cpu/aarch64/aarch64.ad
index 985784c70fa..ed8cf0c698b 100644
--- a/src/hotspot/cpu/aarch64/aarch64.ad
+++ b/src/hotspot/cpu/aarch64/aarch64.ad
@@ -5134,6 +5134,20 @@ operand indOffL(iRegP reg, immLOffset off)
   %}
 %}
 
+operand indOffX2P(iRegL reg, immLOffset off)
+%{
+  constraint(ALLOC_IN_RC(ptr_reg));
+  match(AddP (CastX2P reg) off);
+  op_cost(0);
+  format %{ "[$reg, $off]" %}
+  interface(MEMORY_INTER) %{
+    base($reg);
+    index(0xffffffff);
+    scale(0x0);
+    disp($off);
+  %}
+%}
+
 operand indirectN(iRegN reg)
 %{
   predicate(CompressedOops::shift() == 0);
@@ -5469,7 +5483,7 @@ opclass vmem(indirect, indIndex, indOffI, indOffL, indOffIN, indOffLN);
 // memory is used to define read/write location for load/store
 // instruction defs. we can turn a memory op into an Address
 
-opclass memory(indirect, indIndexScaled, indIndexScaledI2L, indIndexI2L, indIndex, indOffI, indOffL,
+opclass memory(indirect, indIndexScaled, indIndexScaledI2L, indIndexI2L, indIndex, indOffI, indOffL, indOffX2P,
                indirectN, indIndexScaledN, indIndexScaledI2LN, indIndexI2LN, indIndexN, indOffIN, indOffLN);
 
 

@fg1417
Copy link
Author

fg1417 commented Jul 3, 2024

Hi @theRealAph , thanks for your work and kind explanation!

I tried to write a small case to reproduce the problem you found, with java -XX:-TieredCompilation -XX:CompileCommand=compileonly,TestUnsafe*::"<clinit>" --add-exports java.base/jdk.internal.misc=ALL-UNNAMED -XX:+PrintAssembly TestUnsafe

import jdk.internal.misc.Unsafe;

public class TestUnsafe {

    public static final int LEN1 = 2040;
    public static final int LEN2 = 1020;

    static final Unsafe UNSAFE = Unsafe.getUnsafe();

    public static long lseed = 1;
    public static long lres = lseed;
    public static long off1 = 16;
    public static long off2 = 16;

    public static class TestLong {
        private static long address = UNSAFE.allocateMemory(LEN1);
        private static long heater = UNSAFE.allocateMemory(LEN2);

        static {
            for (int k = 0; k < 10_000_000; k++) {
                for (int i = 0; i < LEN2; i++) {
                    UNSAFE.putLong(heater+i, lseed);
                }
            }
            
            UNSAFE.putLong(address + off1 + 1030, lseed);
            UNSAFE.putLong(address + 1023, lseed);
            UNSAFE.putLong(address + off2 + 1001, lseed);
        }
    }

    static void test() {
        TestLong ta = new TestLong();
        lres = UNSAFE.getLong(ta.address + 1023);
        System.out.println(lres);
        lres = UNSAFE.getLong(ta.address + 127);
        System.out.println(lres);
    }

    public static void main(String[] strArr) {
        test();
    }
}

The existing code has the following OptoAssembly:

240 +   ldr  R10, [R15, #120]	# int ! Field: TestUnsafe$TestLong.address
244 +   ldr  R11, [R16, #136]	# int ! Field: TestUnsafe.off1
248 +   ldr  R12, [R16, #144]	# int ! Field: TestUnsafe.off2
248 +   add  R11, R11, R10
24c +   mov R11, R11	# long -> ptr
24c     add  R12, R12, R10
250 +   mov R10, R10	# long -> ptr
250     add R11, R11, #1030	# ptr
254 +   str  R17, [R11]	# int
258     add R10, R10, #1023	# ptr
25c +   str  R17, [R10]	# int
260     mov R10, R12	# long -> ptr
264 +   add R10, R10, #1001	# ptr
268     str  R17, [R10]	# int

Because we do nothing for mov dst src when dst == src, then we have assembly:

ldr	x10, [x15,#120]
ldp	x11, x12, [x16,#136]
add	x11, x11, x10
add	x12, x12, x10
add	x11, x11, #0x406
str	x17, [x11]
add	x10, x10, #0x3ff
str	x17, [x10]
mov	x10, x12
add	x10, x10, #0x3e9
str	x17, [x10]

We can still see the extra register copy. So, I guess it is not caused by this PR and it does exist before? Maybe, as you said, C2 needs to be told that the conversion from long->ptr is a nop.

After applying the PR, we have:

240 +   ldr  R10, [R15, #120]	# int ! Field: TestUnsafe$TestLong.address
244 +   ldr  R11, [R16, #136]	# int ! Field: TestUnsafe.off1
248 +   add  R11, R11, R10
24c     ldr  R12, [R16, #144]	# int ! Field: TestUnsafe.off2
250 +   mov R11, R11	# long -> ptr
250 +   str  R17, [R11, #1030]	# int
258     add  R11, R12, R10
25c +   mov R10, R10	# long -> ptr
25c     str  R17, [R10, #1023]	# int
264 +   mov R10, R11	# long -> ptr
268     str  R17, [R10, #1001]	# int

After applying the PR with your fix, we have:

240 +   ldr  R10, [R15, #120]	# int ! Field: TestUnsafe$TestLong.address
244 +   ldr  R11, [R16, #136]	# int ! Field: TestUnsafe.off1
248 +   add  R11, R11, R10
24c     str  R17, [R11, #1030]	# int
254 +   ldr  R12, [R16, #144]	# int ! Field: TestUnsafe.off2
258 +   str  R17, [R10, #1023]	# int
260 +   add  R10, R12, R10
264     str  R17, [R10, #1001]	# int

I'll update the PR with your fix soon. Thanks!

@adinn
Copy link
Contributor

adinn commented Jul 3, 2024

We can still see the extra register copy. So, I guess it is not caused by this PR and it does exist before?

If so then might it be better to relocate Andrew's patch to a separate issue? We may want to backport it independent from this change.

@fg1417
Copy link
Author

fg1417 commented Jul 4, 2024

We can still see the extra register copy. So, I guess it is not caused by this PR and it does exist before?

If so then might it be better to relocate Andrew's patch to a separate issue? We may want to backport it independent from this change.

Thanks for your review @adinn . That sounds quite worthwhile. I'm afraid we can't relocate the fix directly, which depends on part of my cleanup in this PR. I'll try to withdraw the common part to make the separate fix simple and clean. Thanks.

@theRealAph
Copy link
Contributor

We can still see the extra register copy. So, I guess it is not caused by this PR and it does exist before?

If so then might it be better to relocate Andrew's patch to a separate issue? We may want to backport it independent from this change.

And there may be a more powerful way to fix it than my suggestion. I'm thinking of something analgous to iRegIorL2I which may be used instead of regP as an input to all address-producing patterns.

@fg1417
Copy link
Author

fg1417 commented Jul 12, 2024

We can still see the extra register copy. So, I guess it is not caused by this PR and it does exist before?

If so then might it be better to relocate Andrew's patch to a separate issue? We may want to backport it independent from this change.

And there may be a more powerful way to fix it than my suggestion. I'm thinking of something analgous to iRegIorL2I which may be used instead of regP as an input to all address-producing patterns.

Hi @theRealAph @adinn, I proposed two pull requests, #20157 and #20159, for the issue. #20159 is a draft because I'm not sure if it's worthwhile. I'd appreciate it if you could help review them. Thanks!

@fg1417
Copy link
Author

fg1417 commented Jul 30, 2024

Rebased to the latest JDK after merging d10afa2

I don't think the failure of linux-x64 build is caused by this PR.

Can I have a review please? Thanks.

@fg1417
Copy link
Author

fg1417 commented Aug 12, 2024

Hi @theRealAph @adinn @dean-long , can I have a review please? Thanks :)

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. That's a nice simplification and cleanup.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 12, 2024
@fg1417
Copy link
Author

fg1417 commented Aug 13, 2024

Looks good. That's a nice simplification and cleanup.

@theRealAph thanks for your approval!

Copy link
Member

@dean-long dean-long left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@TobiHartmann
Copy link
Member

This PR is still referring to JDK-8319690 which got integrated. Should this refer to a new RFE?

@eme64
Copy link
Contributor

eme64 commented Aug 15, 2024

@TobiHartmann @fg1417 Yes, the bug as such is already "fixed" (I had removed the assert because it was not necessary for correctness). I guess this should be an RFE. Either as a cleanup or performance improvement (if it can be measured).

@fg1417 fg1417 changed the title 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" 8338442: AArch64: Clean up IndOffXX type and let legitimize_address() fix out-of-range operands Aug 15, 2024
@fg1417
Copy link
Author

fg1417 commented Aug 15, 2024

@dean-long thanks for your approval!

Thanks for your suggestions @eme64 @TobiHartmann. Updated with a new RFE :)

I'll integrate it.

/integrate

@openjdk
Copy link

openjdk bot commented Aug 15, 2024

Going to push as commit 3859131.
Since your change was applied there have been 208 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Aug 15, 2024
@openjdk openjdk bot closed this Aug 15, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 15, 2024
@openjdk
Copy link

openjdk bot commented Aug 15, 2024

@fg1417 Pushed as commit 3859131.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

6 participants