[llvm-mca][AArch64] Merge Neoverse NEON tests (NFC) #170881

c-rhodes · 2025-12-05T16:24:11Z

Follow-on from #170324 to also refactor the NEON tests to reuse the
input assembly across all Neoverse cores.

Posting as a single PR, but to help with review I've staged the commits
to hopefully make it clearer how this is done as it's a little confusing
understanding the differences. I can also post as individual patches if
that would help.

The approach is as follows:

Inputs for Neoverse N1/N2/N3 NEON tests are already identical, so
first combine those.
Inputs for V2/V3/V3AE NEON tests are also already identical, but
differ from N-cores, so combine those separately.
Most significantly, input for V1 differs from all other cores
primarily because of 24f0901 ([MCA] Adding missing instructions in AArch64 Neoverse V1 tests #128892).
Split out features that are not supported across all cores.
- Split out FEAT_I8MM, FEAT_FHM, FEAT_FCMA. N1 doesn't have this
  feature but all other Neoverse cores do. Also adds coverage for
  N2/N3 since they were missing tests.
- Split out FEAT_BF16. V1 doesn't have this feature but all other
  Neoverse cores do. Also adds coverage for N1/N2/N3 since they were
  missing tests.
- Split out FEAT_FRINTTS. V1/N1 don't have this feature but all other
  Neoverse cores do. Also adds coverage for N2/N3 since they were
  missing tests.
Bring Neoverse V2/V3/V3AE and N1/N2/N3 neon tests inline. Comparing
N[1-3] against V[2-3] the only change the N cores have that V[2-3]
dont is:

  < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5
  ---
  > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5

So we take it for all cores. The rest of the diff is
instructions in V[2-3] that arent in N cores, so we also take them.

All Neoverse cores can optionally support the Cryptographic Extension.
The related features (AES, ...) are enabled by default for V1/N1 but not
the other cores, so need to be explicitly enabled via -mattr.

Finally bring Neoverse V1 inline with V2/V3/V3AE/N1/N2/N3
- loads/stores are blended
- duplicates with different spaces like shll v0.2d, v0.2s, #32 are
  removed
- the rest of the diff is instructions in V1 that are not tested in the
  other cores, so we add them for the other cores

V1/N1 don't have this feature but all other Neoverse cores do. Reduces the V1/N1-neon-instructions.s diff against other cores. Also adds coverage for N2/N3 since they were missing tests.

The inputs for these tests are identical, CHECK lines remain unchanged.

V1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N1/N2/N3 since they were missing tests.

N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests.

At this point N[1-3]-neon-instructions.s match. V[2-3]-neon-instructions.s also match. So this commit unifies these so after this commit the last remaining diff across the Neoverse NEON tests is between V1 (which has better coverage due to 24f0901 llvm#128892) and all the other Neoverse cores. Comparing N[1-3] against V[2-3] the only change the N cores have that V[2-3] dont is: 908c783 < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5 --- > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5 Checked if this has difference performance characteristics: llvm-mca -mtriple=aarch64 -mcpu=neoverse-n1 6 5 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 6 5 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] llvm-mca -mtriple=aarch64 -mcpu=neoverse-n2 6 6 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 6 6 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] llvm-mca -mtriple=aarch64 -mcpu=neoverse-n3 4 2 1.00 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 4 2 1.00 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] and imm of 1 matches 9, so lets go with that. The rest of the diff is instructions in V[2-3] that arent in N cores, so we take them. All Neoverse cores can optionally support the Cryptographic Extension. The related features (AES, ...) are enabled by default for V1/N1 but not the other cores, so need to be explicitly enabled via -mattr.

Since the last commit unified these tests there's now two identical inputs, so we delete one and keep the other. Which one doesnt matter. CHECK lines remain unchanged. The last remaining diff after this commit is between V1 and all other cores that are now covered by (V2-V3-neon-instructions.s).

…sts inline - loads/stores are blended - duplicates with different spaces like 'shll v0.2d, v0.2s, llvm#32' are removed - the rest of the diff is instructions in V1 that are not tested in the other cores, so we add them for the other cores

Follow-on from llvm#170324 to also refactor the NEON tests to reuse the input assembly across all Neoverse cores. Posting as a single PR, but to help with review I've staged the commits to hopefully make it clearer how this is done as it's a little confusing understand the differences. I can also post as individual patches if that would help. The approach is as follows: - Inputs for Neoverse N1/N2/N3 NEON tests are already identical, so first combine those. - Inputs for V2/V3/V3AE NEON tests are also already identical, but differ from N-cores, so combine those separately. - Most significantly, input for V1 differs from all other cores primarily because of 24f0901 (llvm#128892). - Split out features that are not supported across all cores. - Split out FEAT_I8MM, FEAT_FHM, FEAT_FCMA. N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Split out FEAT_BF16. V1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N1/N2/N3 since they were missing tests. - Split out FEAT_FRINTTS. V1/N1 don't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Bring Neoverse V2/V3/V3AE and N1/N2/N3 neon tests inline. Comparing N[1-3] against V[2-3] the only change the N cores have that V[2-3] dont is: < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5 --- > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5 Checked if this has difference performance characteristics: llvm-mca -mtriple=aarch64 -mcpu=neoverse-n1 6 5 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 6 5 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] llvm-mca -mtriple=aarch64 -mcpu=neoverse-n2 6 6 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 6 6 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] llvm-mca -mtriple=aarch64 -mcpu=neoverse-n3 4 2 1.00 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 4 2 1.00 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] and imm of 1 matches 9, so took that. The rest of the diff is instructions in V[2-3] that arent in N cores, so we take them. All Neoverse cores can optionally support the Cryptographic Extension. The related features (AES, ...) are enabled by default for V1/N1 but not the other cores, so need to be explicitly enabled via -mattr. - Finally bring Neoverse V1 inline with V2/V3/V3AE/N1/N2/N3 - loads/stores are blended - duplicates with different spaces like 'shll v0.2d, v0.2s, llvm#32' are removed - the rest of the diff is instructions in V1 that are not tested in the other cores, so we add them for the other cores

this instruction was in the N cores but not V. When comparing the diff: < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5 --- > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5 I misread it and thought only the immediate was different when infact the element size is also different. Performance is different so I've re-added it.

Asher8118

LGTM, thanks for the change!

Asher8118 · 2025-12-08T15:29:52Z

llvm/test/tools/llvm-mca/AArch64/Neoverse/Inputs/complxnum-instructions.s

Nit: I would rename this file and the rest of the associated tests to complex-add-instructions instead.

I've just been using the -mattr feature name so this would be inconsistent then so I think I'll just leave as is. It is slightly annoying a character was skipped tho, would one more character hurt!

I've just been using the -mattr feature name so this would be inconsistent then so I think I'll just leave as is

I see, no problem then. The only reason I asked is because I thought it made it more clear which instructions were getting tested. But it's in no way a big deal.

rj-jesus

Thanks very much for working on this, LGTM!

Do you think it would make sense to place these inputs somewhere outside Neoverse/?

c-rhodes · 2025-12-09T11:01:23Z

Thanks very much for working on this, LGTM!

Do you think it would make sense to place these inputs somewhere outside Neoverse/?

Thanks for reviewing! Yes I think that'd be a good direction to go eventually, I'd like to do the SVE instructions next but not sure when I'll get to that.

c-rhodes added 12 commits December 4, 2025 16:14

[llvm-mca][AArch64] Split out FEAT_FRINTTS Neoverse tests

6456b4c

V1/N1 don't have this feature but all other Neoverse cores do. Reduces the V1/N1-neon-instructions.s diff against other cores. Also adds coverage for N2/N3 since they were missing tests.

[llvm-mca][AArch64] Split out Neoverse V2/V3/V3AE neon tests

bf6771e

The inputs for these tests are identical, CHECK lines remain unchanged.

[llvm-mca][AArch64] Split out Neoverse N1/N2/N3 neon tests

1fc74f4

The inputs for these tests are identical, CHECK lines remain unchanged.

[llvm-mca][AArch64] Split out FEAT_BF16 Neoverse tests

7cf24bf

V1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N1/N2/N3 since they were missing tests.

[llvm-mca][AArch64] Split out FEAT_FCMA Neoverse tests

2420f11

N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests.

[llvm-mca][AArch64] Split out FEAT_FHM Neoverse tests

ac9c48a

N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests.

[llvm-mca][AArch64] Split out FEAT_I8MM Neoverse tests

6f484cd

N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests.

c-rhodes requested review from Asher8118, davemgreen and rj-jesus December 8, 2025 13:22

c-rhodes marked this pull request as ready for review December 8, 2025 13:23

Asher8118 approved these changes Dec 8, 2025

View reviewed changes

rj-jesus approved these changes Dec 9, 2025

View reviewed changes

c-rhodes merged commit 95bd878 into llvm:main Dec 9, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llvm-mca][AArch64] Merge Neoverse NEON tests (NFC) #170881

[llvm-mca][AArch64] Merge Neoverse NEON tests (NFC) #170881

Uh oh!

c-rhodes commented Dec 5, 2025 •

edited

Loading

Uh oh!

Asher8118 left a comment

Uh oh!

Asher8118 Dec 8, 2025

Uh oh!

c-rhodes Dec 9, 2025

Uh oh!

Asher8118 Dec 9, 2025

Uh oh!

rj-jesus left a comment

Uh oh!

c-rhodes commented Dec 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[llvm-mca][AArch64] Merge Neoverse NEON tests (NFC) #170881

[llvm-mca][AArch64] Merge Neoverse NEON tests (NFC) #170881

Uh oh!

Conversation

c-rhodes commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Asher8118 left a comment

Choose a reason for hiding this comment

Uh oh!

Asher8118 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

c-rhodes Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Asher8118 Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

rj-jesus left a comment

Choose a reason for hiding this comment

Uh oh!

c-rhodes commented Dec 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

c-rhodes commented Dec 5, 2025 •

edited

Loading