Skip to content

Conversation

@c-rhodes
Copy link
Collaborator

@c-rhodes c-rhodes commented Dec 5, 2025

Follow-on from #170324 to also refactor the NEON tests to reuse the
input assembly across all Neoverse cores.

Posting as a single PR, but to help with review I've staged the commits
to hopefully make it clearer how this is done as it's a little confusing
understanding the differences. I can also post as individual patches if
that would help.

The approach is as follows:

  • Inputs for Neoverse N1/N2/N3 NEON tests are already identical, so
    first combine those.
  • Inputs for V2/V3/V3AE NEON tests are also already identical, but
    differ from N-cores, so combine those separately.
  • Most significantly, input for V1 differs from all other cores
    primarily because of 24f0901 ([MCA] Adding missing instructions in AArch64 Neoverse V1 tests #128892).
  • Split out features that are not supported across all cores.
    • Split out FEAT_I8MM, FEAT_FHM, FEAT_FCMA. N1 doesn't have this
      feature but all other Neoverse cores do. Also adds coverage for
      N2/N3 since they were missing tests.
    • Split out FEAT_BF16. V1 doesn't have this feature but all other
      Neoverse cores do. Also adds coverage for N1/N2/N3 since they were
      missing tests.
    • Split out FEAT_FRINTTS. V1/N1 don't have this feature but all other
      Neoverse cores do. Also adds coverage for N2/N3 since they were
      missing tests.
  • Bring Neoverse V2/V3/V3AE and N1/N2/N3 neon tests inline. Comparing
    N[1-3] against V[2-3] the only change the N cores have that V[2-3]
    dont is:
  < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5
  ---
  > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5

So we take it for all cores. The rest of the diff is
instructions in V[2-3] that arent in N cores, so we also take them.

All Neoverse cores can optionally support the Cryptographic Extension.
The related features (AES, ...) are enabled by default for V1/N1 but not
the other cores, so need to be explicitly enabled via -mattr.

  • Finally bring Neoverse V1 inline with V2/V3/V3AE/N1/N2/N3
    • loads/stores are blended
    • duplicates with different spaces like shll v0.2d, v0.2s, #32 are
      removed
    • the rest of the diff is instructions in V1 that are not tested in the
      other cores, so we add them for the other cores

V1/N1 don't have this feature but all other Neoverse cores do. Reduces
the V1/N1-neon-instructions.s diff against other cores. Also adds
coverage for N2/N3 since they were missing tests.
The inputs for these tests are identical, CHECK lines remain unchanged.
The inputs for these tests are identical, CHECK lines remain unchanged.
V1 doesn't have this feature but all other Neoverse cores do. Also adds
coverage for N1/N2/N3 since they were missing tests.
N1 doesn't have this feature but all other Neoverse cores do. Also adds
coverage for N2/N3 since they were missing tests.
N1 doesn't have this feature but all other Neoverse cores do. Also adds
coverage for N2/N3 since they were missing tests.
N1 doesn't have this feature but all other Neoverse cores do. Also adds
coverage for N2/N3 since they were missing tests.
At this point N[1-3]-neon-instructions.s match.
V[2-3]-neon-instructions.s also match.

So this commit unifies these so after this commit the last remaining
diff across the Neoverse NEON tests is between V1 (which has better
coverage due to 24f0901 llvm#128892) and all the other Neoverse cores.

Comparing N[1-3] against V[2-3] the only change the N cores have that
V[2-3] dont is:

908c783
< st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5
---
> st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5

Checked if this has difference performance characteristics:

  llvm-mca -mtriple=aarch64 -mcpu=neoverse-n1
  6      5     1.50           *            st4   { v0.b, v1.b, v2.b, v3.b }[1], [x0]
  6      5     1.50           *            st4   { v0.b, v1.b, v2.b, v3.b }[9], [x0]

  llvm-mca -mtriple=aarch64 -mcpu=neoverse-n2
  6      6     1.50           *            st4   { v0.b, v1.b, v2.b, v3.b }[1], [x0]
  6      6     1.50           *            st4   { v0.b, v1.b, v2.b, v3.b }[9], [x0]

  llvm-mca -mtriple=aarch64 -mcpu=neoverse-n3
  4      2     1.00           *            st4   { v0.b, v1.b, v2.b, v3.b }[1], [x0]
  4      2     1.00           *            st4   { v0.b, v1.b, v2.b, v3.b }[9], [x0]

and imm of 1 matches 9, so lets go with that. The rest of the diff is
instructions in V[2-3] that arent in N cores, so we take them.

All Neoverse cores can optionally support the Cryptographic Extension.
The related features (AES, ...) are enabled by default for V1/N1 but not
the other cores, so need to be explicitly enabled via -mattr.
Since the last commit unified these tests there's now two identical
inputs, so we delete one and keep the other. Which one doesnt matter.
CHECK lines remain unchanged.

The last remaining diff after this commit is between V1 and all other
cores that are now covered by (V2-V3-neon-instructions.s).
…sts inline

- loads/stores are blended
- duplicates with different spaces like 'shll   v0.2d, v0.2s, llvm#32' are
  removed
- the rest of the diff is instructions in V1 that are not tested in the
  other cores, so we add them for the other cores
Follow-on from llvm#170324 to also refactor the NEON tests to reuse the
input assembly across all Neoverse cores.

Posting as a single PR, but to help with review I've staged the commits
to hopefully make it clearer how this is done as it's a little confusing
understand the differences. I can also post as individual patches if
that would help.

The approach is as follows:

- Inputs for Neoverse N1/N2/N3 NEON tests are already identical, so
  first combine those.
- Inputs for V2/V3/V3AE NEON tests are also already identical, but
  differ from N-cores, so combine those separately.
- Most significantly, input for V1 differs from all other cores
  primarily because of 24f0901 (llvm#128892).
- Split out features that are not supported across all cores.
  - Split out FEAT_I8MM, FEAT_FHM, FEAT_FCMA. N1 doesn't have this
    feature but all other Neoverse cores do. Also adds coverage for
    N2/N3 since they were missing tests.
  - Split out FEAT_BF16. V1 doesn't have this feature but all other
    Neoverse cores do. Also adds coverage for N1/N2/N3 since they were
    missing tests.
  - Split out FEAT_FRINTTS. V1/N1 don't have this feature but all other
    Neoverse cores do. Also adds coverage for N2/N3 since they were
    missing tests.
- Bring Neoverse V2/V3/V3AE and N1/N2/N3 neon tests inline. Comparing
  N[1-3] against V[2-3] the only change the N cores have that V[2-3]
  dont is:

  < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5
  ---
  > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5

  Checked if this has difference performance characteristics:

    llvm-mca -mtriple=aarch64 -mcpu=neoverse-n1
    6      5     1.50           *            st4   { v0.b, v1.b, v2.b, v3.b }[1], [x0]
    6      5     1.50           *            st4   { v0.b, v1.b, v2.b, v3.b }[9], [x0]

    llvm-mca -mtriple=aarch64 -mcpu=neoverse-n2
    6      6     1.50           *            st4   { v0.b, v1.b, v2.b, v3.b }[1], [x0]
    6      6     1.50           *            st4   { v0.b, v1.b, v2.b, v3.b }[9], [x0]

    llvm-mca -mtriple=aarch64 -mcpu=neoverse-n3
    4      2     1.00           *            st4   { v0.b, v1.b, v2.b, v3.b }[1], [x0]
    4      2     1.00           *            st4   { v0.b, v1.b, v2.b, v3.b }[9], [x0]

  and imm of 1 matches 9, so took that. The rest of the diff is
  instructions in V[2-3] that arent in N cores, so we take them.

  All Neoverse cores can optionally support the Cryptographic Extension.
  The related features (AES, ...) are enabled by default for V1/N1 but not
  the other cores, so need to be explicitly enabled via -mattr.
- Finally bring Neoverse V1 inline with V2/V3/V3AE/N1/N2/N3
  - loads/stores are blended
  - duplicates with different spaces like 'shll   v0.2d, v0.2s, llvm#32' are
    removed
  - the rest of the diff is instructions in V1 that are not tested in the
    other cores, so we add them for the other cores
this instruction was in the N cores but not V. When comparing the diff:

  < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5
  ---
  > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5

I misread it and thought only the immediate was different when infact
the element size is also different. Performance is different so I've
re-added it.
@c-rhodes c-rhodes marked this pull request as ready for review December 8, 2025 13:23
Copy link
Contributor

@Asher8118 Asher8118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the change!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would rename this file and the rest of the associated tests to complex-add-instructions instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just been using the -mattr feature name so this would be inconsistent then so I think I'll just leave as is. It is slightly annoying a character was skipped tho, would one more character hurt!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just been using the -mattr feature name so this would be inconsistent then so I think I'll just leave as is

I see, no problem then. The only reason I asked is because I thought it made it more clear which instructions were getting tested. But it's in no way a big deal.

Copy link
Contributor

@rj-jesus rj-jesus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for working on this, LGTM!

Do you think it would make sense to place these inputs somewhere outside Neoverse/?

@c-rhodes
Copy link
Collaborator Author

c-rhodes commented Dec 9, 2025

Thanks very much for working on this, LGTM!

Do you think it would make sense to place these inputs somewhere outside Neoverse/?

Thanks for reviewing! Yes I think that'd be a good direction to go eventually, I'd like to do the SVE instructions next but not sure when I'll get to that.

@c-rhodes c-rhodes merged commit 95bd878 into llvm:main Dec 9, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants