8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays #26747

IvaVladimir · 2025-08-12T14:54:22Z

On the SRF platform for runs with intrinsic scores for the ArrayFill test reports ~2x drop for several sizes due to a lot of the 'MEM_UOPS_RETIRED.SPLIT_STORES' events. The 'good' case for the ArraysFill.testCharFill with size=8195 reports numbers like
MEM_UOPS_RETIRED.SPLIT_LOADS | 22.6711
MEM_UOPS_RETIRED.SPLIT_STORES | 4.0859
while for 'bad' case these metrics are
MEM_UOPS_RETIRED.SPLIT_LOADS | 69.1785
MEM_UOPS_RETIRED.SPLIT_STORES | 259200.3659

With alignment on the cache size no score drops due to split_stores but small reduction may be reported due to extra

SRF 6740E	Size	orig	pathed	pO/orig
ArraysFill.testByteFill	16	152031.2	157001.2	1.03
ArraysFill.testByteFill	31	125795.9	177399.2	1.41
ArraysFill.testByteFill	250	57961.69	120981.9	2.09
ArraysFill.testByteFill	266	44900.15	147893.8	3.29
ArraysFill.testByteFill	511	61908.17	129830.1	2.10
ArraysFill.testByteFill	2047	32255.51	41986.6	1.30
ArraysFill.testByteFill	2048	31928.97	42154.3	1.32
ArraysFill.testByteFill	8195	10690.15	11036.3	1.03
ArraysFill.testIntFill	16	145030.7	318796.9	2.20
ArraysFill.testIntFill	31	134138.4	212487	1.58
ArraysFill.testIntFill	250	74179.23	79522.66	1.07
ArraysFill.testIntFill	266	68112.72	60116.49	0.88
ArraysFill.testIntFill	511	39693.28	36225.09	0.91
ArraysFill.testIntFill	2047	11504.14	10616.91	0.92
ArraysFill.testIntFill	2048	11244.71	10969.14	0.98
ArraysFill.testIntFill	8195	2751.289	2692.216	0.98
ArraysFill.testLongFill	16	212532.5	212526	1.00
ArraysFill.testLongFill	31	137432.4	137283.3	1.00
ArraysFill.testLongFill	250	43185	43159.78	1.00
ArraysFill.testLongFill	266	42172.22	42170.5	1.00
ArraysFill.testLongFill	511	23370.15	23370.86	1.00
ArraysFill.testLongFill	2047	6123.008	6122.73	1.00
ArraysFill.testLongFill	2048	5793.722	5792.855	1.00
ArraysFill.testLongFill	8195	616.552	616.585	1.00
ArraysFill.testShortFill	16	152088.6	265646.1	1.75
ArraysFill.testShortFill	31	137369.8	185596.4	1.35
ArraysFill.testShortFill	250	58872.03	99621.15	1.69
ArraysFill.testShortFill	266	91085.31	93746.62	1.03
ArraysFill.testShortFill	511	65331.96	78003.83	1.19
ArraysFill.testShortFill	2047	21716.32	21216.81	0.98
ArraysFill.testShortFill	2048	21664.91	21328.72	0.98
ArraysFill.testShortFill	8195	5922.547	5799.964	0.98

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26747/head:pull/26747
$ git checkout pull/26747

Update a local copy of the PR:
$ git checkout pull/26747
$ git pull https://git.openjdk.org/jdk.git pull/26747/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26747

View PR using the GUI difftool:
$ git pr show -t 26747

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26747.diff

Using Webrev

Link to Webrev Comment

…unaligned arrays

bridgekeeper · 2025-08-12T14:55:30Z

👋 Welcome back vaivanov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-08-12T14:57:27Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-08-12T14:58:42Z

@IvaVladimir The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-08-12T15:02:20Z

Webrevs

sviswa7 · 2025-08-26T23:00:19Z

@IvaVladimir Thanks for looking into this. It would be good to make this intrinsic (via OptimizeFill) available by default for ECore platforms by making the following change in vm_versions_x86.cpp:

-    if (MaxVectorSize < 32 || !VM_Version::supports_avx512vlbw()) {
+    if (MaxVectorSize < 32 || (!EnableX86ECoreOpts && !VM_Version::supports_avx512vlbw())) {
       OptimizeFill = false;
     }

sviswa7 · 2025-08-26T23:12:28Z

src/hotspot/cpu/x86/macroAssembler_x86.cpp

+      // align 'big' arrays to 64 bytes (cache line size) to minimize split_stores
+      cmpptr(count, 256<<shift);
+      jcc(Assembler::below, L_fill_32_bytes);
+


I see you have an overhead for small sizes, may be we could do a check for small sizes before line 5885 something like below:

movdl(xtmp, value);
vpbroadcastd(xtmp, xtmp, Assembler::AVX_256bit);
subptr(count, 16 << shift);
jcc(Assembler::less, L_check_fill_32_bytes);

Or alternatively move the entire if (EnableX86ECoreOpts) { } block of code to line 5933 adjusting the jump labels accordingly.

sviswa7 · 2025-08-26T23:13:39Z

src/hotspot/cpu/x86/macroAssembler_x86.cpp

+      cmpptr(count, 256<<shift);
+      jcc(Assembler::below, L_fill_32_bytes);
+
+      BIND(L_align_64_bytes);


Need to add an align(16) before BIND(L_align_64_bytes);

…unaligned arrays

IvaVladimir · 2025-09-01T15:29:16Z

Later alignment improve performance a little bit. Current numbers are:
SRF | size | jdk26 | patched with "+optFill" | patched/jdk26
ArraysFill.testByteFill | 16 | 151937.634 | 175045.819 | 1.15
ArraysFill.testByteFill | 31 | 125661.092 | 211226.668 | 1.68
ArraysFill.testByteFill | 250 | 57599.684 | 123670.638 | 2.15
ArraysFill.testByteFill | 266 | 44617.505 | 147306.352 | 3.30
ArraysFill.testByteFill | 511 | 61541.499 | 129234.48 | 2.10
ArraysFill.testByteFill | 2047 | 32073.997 | 41503.438 | 1.29
ArraysFill.testByteFill | 2048 | 31729.263 | 41977.271 | 1.32
ArraysFill.testByteFill | 8195 | 10620.363 | 10911.334 | 1.03
ArraysFill.testIntFill | 16 | 144924.577 | 264101.45 | 1.82
ArraysFill.testIntFill | 31 | 128877.207 | 211225.233 | 1.64
ArraysFill.testIntFill | 250 | 73785.182 | 79204.674 | 1.07
ArraysFill.testIntFill | 266 | 67703.171 | 75436.831 | 1.11
ArraysFill.testIntFill | 511 | 39489.095 | 36011.078 | 0.91
ArraysFill.testIntFill | 2047 | 11431.835 | 10509.545 | 0.92
ArraysFill.testIntFill | 2048 | 11178.661 | 10882.991 | 0.97
ArraysFill.testIntFill | 8195 | 2629.065 | 2601.19 | 0.99
ArraysFill.testLongFill | 16 | 211218.892 | 211250.585 | 1.00
ArraysFill.testLongFill | 31 | 133026.186 | 137374.876 | 1.03
ArraysFill.testLongFill | 250 | 42907.745 | 42937.988 | 1.00
ArraysFill.testLongFill | 266 | 41935.645 | 41920.801 | 1.00
ArraysFill.testLongFill | 511 | 23217.606 | 23227.904 | 1.00
ArraysFill.testLongFill | 2047 | 6083.099 | 6083.384 | 1.00
ArraysFill.testLongFill | 2048 | 5751.203 | 5753.409 | 1.00
ArraysFill.testLongFill | 8195 | 612.17 | 612.634 | 1.00
ArraysFill.testShortFill | 16 | 151917.079 | 352122.571 | 2.32
ArraysFill.testShortFill | 31 | 138000.217 | 226271.221 | 1.64
ArraysFill.testShortFill | 250 | 58641.362 | 99043.571 | 1.69
ArraysFill.testShortFill | 266 | 90499.649 | 93200.335 | 1.03
ArraysFill.testShortFill | 511 | 64958.462 | 77930.734 | 1.20
ArraysFill.testShortFill | 2047 | 21577.954 | 21210.006 | 0.98
ArraysFill.testShortFill | 2048 | 21538.005 | 21429.382 | 0.99
ArraysFill.testShortFill | 8195 | 5883.097 | 5775.499 | 0.98

sviswa7 · 2025-09-02T22:58:27Z

src/hotspot/cpu/x86/macroAssembler_x86.cpp

+            movl(Address(to, 0), value);
+            addptr(to, 4);
+            subptr(count, 1<<shift);
+            jmpb(L_align_64_bytes);


This should be a conditional jump.

JDK-8365290 [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for …

f390613

…unaligned arrays

openjdk bot changed the title ~~JDK-8365290 [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays~~ 8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays Aug 12, 2025

openjdk bot added hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review labels Aug 12, 2025

sviswa7 reviewed Aug 26, 2025

View reviewed changes

IvaVladimir added 3 commits August 29, 2025 15:46

JDK-8365290 [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for …

bca697c

…unaligned arrays

JDK-8365290 [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for …

7bc1b45

…unaligned arrays

JDK-8365290 [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for …

a428899

…unaligned arrays

sviswa7 reviewed Sep 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays #26747

8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays #26747

IvaVladimir commented Aug 12, 2025 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented Aug 12, 2025

Uh oh!

openjdk bot commented Aug 12, 2025

Uh oh!

openjdk bot commented Aug 12, 2025

Uh oh!

mlbridge bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

sviswa7 commented Aug 26, 2025 •

edited

Loading

Uh oh!

sviswa7 Aug 26, 2025

Uh oh!

sviswa7 Aug 26, 2025

Uh oh!

IvaVladimir commented Sep 1, 2025

Uh oh!

sviswa7 Sep 2, 2025

Uh oh!

Uh oh!

8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays #26747

Are you sure you want to change the base?

8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays #26747

Conversation

IvaVladimir commented Aug 12, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Aug 12, 2025

Uh oh!

openjdk bot commented Aug 12, 2025

Uh oh!

openjdk bot commented Aug 12, 2025

Uh oh!

mlbridge bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

sviswa7 commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sviswa7 Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

sviswa7 Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

IvaVladimir commented Sep 1, 2025

Uh oh!

sviswa7 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

IvaVladimir commented Aug 12, 2025 •

edited by openjdk bot

Loading

mlbridge bot commented Aug 12, 2025 •

edited

Loading

sviswa7 commented Aug 26, 2025 •

edited

Loading