-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[llvm-mca][AArch64] Merge Neoverse NEON tests (NFC) #170881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm-mca][AArch64] Merge Neoverse NEON tests (NFC) #170881
Conversation
V1/N1 don't have this feature but all other Neoverse cores do. Reduces the V1/N1-neon-instructions.s diff against other cores. Also adds coverage for N2/N3 since they were missing tests.
The inputs for these tests are identical, CHECK lines remain unchanged.
The inputs for these tests are identical, CHECK lines remain unchanged.
V1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N1/N2/N3 since they were missing tests.
N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests.
N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests.
N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests.
At this point N[1-3]-neon-instructions.s match. V[2-3]-neon-instructions.s also match. So this commit unifies these so after this commit the last remaining diff across the Neoverse NEON tests is between V1 (which has better coverage due to 24f0901 llvm#128892) and all the other Neoverse cores. Comparing N[1-3] against V[2-3] the only change the N cores have that V[2-3] dont is: 908c783 < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5 --- > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5 Checked if this has difference performance characteristics: llvm-mca -mtriple=aarch64 -mcpu=neoverse-n1 6 5 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 6 5 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] llvm-mca -mtriple=aarch64 -mcpu=neoverse-n2 6 6 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 6 6 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] llvm-mca -mtriple=aarch64 -mcpu=neoverse-n3 4 2 1.00 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 4 2 1.00 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] and imm of 1 matches 9, so lets go with that. The rest of the diff is instructions in V[2-3] that arent in N cores, so we take them. All Neoverse cores can optionally support the Cryptographic Extension. The related features (AES, ...) are enabled by default for V1/N1 but not the other cores, so need to be explicitly enabled via -mattr.
Since the last commit unified these tests there's now two identical inputs, so we delete one and keep the other. Which one doesnt matter. CHECK lines remain unchanged. The last remaining diff after this commit is between V1 and all other cores that are now covered by (V2-V3-neon-instructions.s).
…sts inline - loads/stores are blended - duplicates with different spaces like 'shll v0.2d, v0.2s, llvm#32' are removed - the rest of the diff is instructions in V1 that are not tested in the other cores, so we add them for the other cores
Follow-on from llvm#170324 to also refactor the NEON tests to reuse the input assembly across all Neoverse cores. Posting as a single PR, but to help with review I've staged the commits to hopefully make it clearer how this is done as it's a little confusing understand the differences. I can also post as individual patches if that would help. The approach is as follows: - Inputs for Neoverse N1/N2/N3 NEON tests are already identical, so first combine those. - Inputs for V2/V3/V3AE NEON tests are also already identical, but differ from N-cores, so combine those separately. - Most significantly, input for V1 differs from all other cores primarily because of 24f0901 (llvm#128892). - Split out features that are not supported across all cores. - Split out FEAT_I8MM, FEAT_FHM, FEAT_FCMA. N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Split out FEAT_BF16. V1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N1/N2/N3 since they were missing tests. - Split out FEAT_FRINTTS. V1/N1 don't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Bring Neoverse V2/V3/V3AE and N1/N2/N3 neon tests inline. Comparing N[1-3] against V[2-3] the only change the N cores have that V[2-3] dont is: < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5 --- > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5 Checked if this has difference performance characteristics: llvm-mca -mtriple=aarch64 -mcpu=neoverse-n1 6 5 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 6 5 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] llvm-mca -mtriple=aarch64 -mcpu=neoverse-n2 6 6 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 6 6 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] llvm-mca -mtriple=aarch64 -mcpu=neoverse-n3 4 2 1.00 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 4 2 1.00 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] and imm of 1 matches 9, so took that. The rest of the diff is instructions in V[2-3] that arent in N cores, so we take them. All Neoverse cores can optionally support the Cryptographic Extension. The related features (AES, ...) are enabled by default for V1/N1 but not the other cores, so need to be explicitly enabled via -mattr. - Finally bring Neoverse V1 inline with V2/V3/V3AE/N1/N2/N3 - loads/stores are blended - duplicates with different spaces like 'shll v0.2d, v0.2s, llvm#32' are removed - the rest of the diff is instructions in V1 that are not tested in the other cores, so we add them for the other cores
this instruction was in the N cores but not V. When comparing the diff:
< st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5
---
> st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5
I misread it and thought only the immediate was different when infact
the element size is also different. Performance is different so I've
re-added it.
Asher8118
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would rename this file and the rest of the associated tests to complex-add-instructions instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just been using the -mattr feature name so this would be inconsistent then so I think I'll just leave as is. It is slightly annoying a character was skipped tho, would one more character hurt!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just been using the -mattr feature name so this would be inconsistent then so I think I'll just leave as is
I see, no problem then. The only reason I asked is because I thought it made it more clear which instructions were getting tested. But it's in no way a big deal.
rj-jesus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for working on this, LGTM!
Do you think it would make sense to place these inputs somewhere outside Neoverse/?
Thanks for reviewing! Yes I think that'd be a good direction to go eventually, I'd like to do the SVE instructions next but not sure when I'll get to that. |
Follow-on from #170324 to also refactor the NEON tests to reuse the
input assembly across all Neoverse cores.
Posting as a single PR, but to help with review I've staged the commits
to hopefully make it clearer how this is done as it's a little confusing
understanding the differences. I can also post as individual patches if
that would help.
The approach is as follows:
first combine those.
differ from N-cores, so combine those separately.
primarily because of 24f0901 ([MCA] Adding missing instructions in AArch64 Neoverse V1 tests #128892).
feature but all other Neoverse cores do. Also adds coverage for
N2/N3 since they were missing tests.
Neoverse cores do. Also adds coverage for N1/N2/N3 since they were
missing tests.
Neoverse cores do. Also adds coverage for N2/N3 since they were
missing tests.
N[1-3] against V[2-3] the only change the N cores have that V[2-3]
dont is:
So we take it for all cores. The rest of the diff is
instructions in V[2-3] that arent in N cores, so we also take them.
All Neoverse cores can optionally support the Cryptographic Extension.
The related features (AES, ...) are enabled by default for V1/N1 but not
the other cores, so need to be explicitly enabled via -mattr.
shll v0.2d, v0.2s, #32areremoved
other cores, so we add them for the other cores