-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPUID] Add ISA entries for A64FX and M1 #44194
Conversation
Why add a separate ISA entry just for the a64fx? Is it needed somewhere upstream? Also since I think only the a64fx has that ISA combinations I guess you could add fullfp16 too. Besides that LGTM. |
Agreed; it's probably more useful to define something like |
The M1 Feature detection has been merged #41924, so that should be working right now, just needs the ISA then. |
This ia mainly used in BinaryBuilder to target specific microarchitectures. We definitely don't want to target all CPUs out there, but we need prototypes of somewhat relevant CPU families (like for the Intel chips above).
It doesn't have it? For the record, with #41924, on the base M1 I get julia> Base.BinaryPlatforms.CPUID.cpu_isa()
Base.BinaryPlatforms.CPUID.ISA(Set(UInt32[0x00000004, 0x00000006, 0x00000007, 0x00000014, 0x0000000c, 0x00000008, 0x00000017])) which corresponds to the set julia> @eval Base.BinaryPlatforms.CPUID JL_AArch64_aes, JL_AArch64_sha2, JL_AArch64_crc, JL_AArch64_dotprod, JL_AArch64_rdm, JL_AArch64_lse, JL_AArch64_fp16fml
(0x00000004, 0x00000006, 0x00000007, 0x00000014, 0x0000000c, 0x00000008, 0x00000017) For reference, features enabled by Apple Clang on this CPU are
I'll update the PR later |
Line 328 in 1a3da30
then this is wrong. Though I checked upstream llvm and I agreed with it. |
I mean, I showed above what we detect with Line 20 in 1a3da30
Also, for reference these are the features enabled by the Fujitsu compiler on clang mode (based on LLVM 7):
|
LLVM claims that it has fullfp16: https://github.com/llvm/llvm-project/blob/97c151de3de0266b896bb01e98b005fb31f6d3cd/llvm/lib/Target/AArch64/AArch64.td#L984-L986 |
Alright: julia> CPUID.test_cpu_feature(CPUID.JL_AArch64_fullfp16)
true The ARM C/C++ compiler reference says that I think we need to refine Line 97 in 1a3da30
|
For completeness, these are all the features we can detect on A64FX, among all those we know: julia> using Base.BinaryPlatforms.CPUID
julia> aarch64_features = filter!(n -> startswith(String(n), "JL_AArch64"), (names(CPUID; all=true)));
julia> filter!(x -> last(x), [(feat, CPUID.test_cpu_feature(getfield(CPUID, feat))) for feat in aarch64_features])
11-element Vector{Tuple{Symbol, Bool}}:
(:JL_AArch64_aes, 1)
(:JL_AArch64_ccpp, 1)
(:JL_AArch64_complxnum, 1)
(:JL_AArch64_crc, 1)
(:JL_AArch64_fullfp16, 1)
(:JL_AArch64_lse, 1)
(:JL_AArch64_rdm, 1)
(:JL_AArch64_sha2, 1)
(:JL_AArch64_sve, 1)
(:JL_AArch64_v8_1a, 1)
(:JL_AArch64_v8_2a, 1) |
201fc6c
to
d485afb
Compare
d485afb
to
2703058
Compare
base/cpuid.jl
Outdated
@eval function cpu_isa() | ||
return ISA(Set{UInt32}(feat for feat in $(ALL_FEATURES[normalize_arch(String(Sys.ARCH))]) if test_cpu_feature(feat))) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realised we can avoid always recomputing the list of features for the current architecture and just inline it at precompile time. On my laptop, before:
julia> @benchmark CPUID.cpu_isa()
BenchmarkTools.Trial: 10000 samples with 48 evaluations.
Range (min … max): 895.604 ns … 90.301 μs ┊ GC (min … max): 0.00% … 98.34%
Time (median): 984.552 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.153 μs ± 2.866 μs ┊ GC (mean ± σ): 9.56% ± 3.80%
▃▆█▆▃▁
▇██████▇▆▄▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
896 ns Histogram: frequency by time 2 μs <
Memory estimate: 1.41 KiB, allocs estimate: 17.
after:
julia> @benchmark CPUID.cpu_isa()
BenchmarkTools.Trial: 10000 samples with 155 evaluations.
Range (min … max): 679.871 ns … 18.916 μs ┊ GC (min … max): 0.00% … 89.46%
Time (median): 745.200 ns ┊ GC (median): 0.00%
Time (mean ± σ): 849.687 ns ± 709.196 ns ┊ GC (mean ± σ): 4.12% ± 4.95%
▁▆█▇▅▃▃▃▃▃▃▂▂▁▁▁ ▂▃ ▂
█████████████████▇█▇▇▇▆▆▇▆▆▆▆▆▆▄▅▅▄▁▅▄▅▅▅▃▁▁▄▁▃▁▁▃▄▃▃▄▃▁▁▄▆██ █
680 ns Histogram: log(frequency) by time 1.97 μs <
Memory estimate: 848 bytes, allocs estimate: 7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Less allocations, always a good thing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the latest version:
julia> @benchmark Base.BinaryPlatforms.CPUID.cpu_isa()
BenchmarkTools.Trial: 10000 samples with 196 evaluations.
Range (min … max): 480.342 ns … 12.980 μs ┊ GC (min … max): 0.00% … 94.89%
Time (median): 527.505 ns ┊ GC (median): 0.00%
Time (mean ± σ): 598.004 ns ± 670.168 ns ┊ GC (mean ± σ): 6.65% ± 5.69%
▄▆███▇▆▄▄▄▃▄▃▂▂▂▂▁▁▁▁▁▁▁▂▁▁▁▂▁ ▂
▆██████████████████████████████▇▇▇▆█▇█▆▇▆▇▇▇▆▅▅▇▇▅▅▃▆▆▃▆▆▅▄▄▅ █
480 ns Histogram: log(frequency) by time 1 μs <
Memory estimate: 848 bytes, allocs estimate: 7.
I believe it's a bit faster because the new version collects only the features we are interested in, instead of all of those for the given architecture, so we're just doing fewer iterations. The new version is also closer in spirit to what we're currently doing.
bf87ed4
to
d94c647
Compare
d94c647
to
22eb7a9
Compare
@yuyichao do you know whether A64FX requires julia> CPUID.test_cpu_feature(CPUID.JL_AArch64_aes)
false while on Fugaku I get julia> CPUID.test_cpu_feature(CPUID.JL_AArch64_aes)
true Also, https://github.com/llvm/llvm-project/blob/97c151de3de0266b896bb01e98b005fb31f6d3cd/llvm/lib/Target/AArch64/AArch64.td#L984-L986 lists only |
As for the spec I only know as much as the llvm and gcc target feature set says.... An independent way to check the feature set would be
I have no idea, but do the two have the same midr? Their values should be available under |
Isambard: $ cat /sys/devices/system/cpu/cpu1/regs/identification/midr_el1
0x00000000461f0010
$ LD_SHOW_AUXV=1 /bin/true
AT_SYSINFO_EHDR: 0xffffa28f0000
AT_??? (0x33): 0x1270
AT_HWCAP: 415fe7
AT_PAGESZ: 65536
AT_CLKTCK: 100
AT_PHDR: 0xaaaaaf000040
AT_PHENT: 56
AT_PHNUM: 9
AT_BASE: 0xffffa2900000
AT_FLAGS: 0x0
AT_ENTRY: 0xaaaaaf0016e0
AT_UID: 415400694
AT_EUID: 415400694
AT_GID: 415400694
AT_EGID: 415400694
AT_SECURE: 0
AT_RANDOM: 0xffffd9c9a258
AT_EXECFN: /bin/true
AT_PLATFORM: aarch64
$ head -n8 /proc/cpuinfo
processor : 0
BogoMIPS : 200.00
Features : fp asimd evtstrm sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve
CPU implementer : 0x46
CPU architecture: 8
CPU variant : 0x1
CPU part : 0x001
CPU revision : 0
$ julia -E 'using Base.BinaryPlatforms.CPUID; CPUID.test_cpu_feature(CPUID.JL_AArch64_aes)'
false So it looks like AES is indeed not here on this chip? For comparison, on Fugaku: $ cat /sys/devices/system/cpu/cpu1/regs/identification/midr_el1
0x00000000461f0010
$ LD_SHOW_AUXV=1 /bin/true
AT_SYSINFO_EHDR: 0x400000070000
AT_??? (0x33): 0x1270
AT_HWCAP: 415fff
AT_PAGESZ: 65536
AT_CLKTCK: 100
AT_PHDR: 0xaaaaaaaa0040
AT_PHENT: 56
AT_PHNUM: 9
AT_BASE: 0x400000000000
AT_FLAGS: 0x0
AT_ENTRY: 0xaaaaaaaa16e0
AT_UID: 14463
AT_EUID: 14463
AT_GID: 14026
AT_EGID: 14026
AT_SECURE: 0
AT_RANDOM: 0xffffffffe1f8
AT_EXECFN: /bin/true
AT_PLATFORM: aarch64
$ head -n8 /proc/cpuinfo
processor : 0
BogoMIPS : 200.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve
CPU implementer : 0x46
CPU architecture: 8
CPU variant : 0x1
CPU part : 0x001
CPU revision : 0
$ julia -E 'using Base.BinaryPlatforms.CPUID; CPUID.test_cpu_feature(CPUID.JL_AArch64_aes)'
true |
Does seem like it... But I suspect it's mainly a question for Fujitsu. Their online document shows aes instructions in the performance section but didn't seem to mention anywhere in there if the support for it is conditional. Nor does it mention the values for the system registers the same way the one from ARM does... |
Ok, thanks for confirming it, then I'll remove AES, as it appears not to be always there (and LLVM doesn't seem to require it either). |
* [CPUID] Rework how current ISA is determined * [CPUID] Add ISA entry for A64FX * [CPUID] Add ISA entry for Apple Silicon M1 * [CPUID] Simplify collection of full set of features for architecture * [CPUID] Remove AES from A64FX ISA, not all chips appear to have it (cherry picked from commit f45b6ad)
* [CPUID] Rework how current ISA is determined * [CPUID] Add ISA entry for A64FX * [CPUID] Add ISA entry for Apple Silicon M1 * [CPUID] Simplify collection of full set of features for architecture * [CPUID] Remove AES from A64FX ISA, not all chips appear to have it (cherry picked from commit f45b6ad)
* [CPUID] Rework how current ISA is determined * [CPUID] Add ISA entry for A64FX * [CPUID] Add ISA entry for Apple Silicon M1 * [CPUID] Simplify collection of full set of features for architecture * [CPUID] Remove AES from A64FX ISA, not all chips appear to have it
* [CPUID] Rework how current ISA is determined * [CPUID] Add ISA entry for A64FX * [CPUID] Add ISA entry for Apple Silicon M1 * [CPUID] Simplify collection of full set of features for architecture * [CPUID] Remove AES from A64FX ISA, not all chips appear to have it
* [CPUID] Rework how current ISA is determined * [CPUID] Add ISA entry for A64FX * [CPUID] Add ISA entry for Apple Silicon M1 * [CPUID] Simplify collection of full set of features for architecture * [CPUID] Remove AES from A64FX ISA, not all chips appear to have it (cherry picked from commit f45b6ad)
For the record, other people have observed the same differences between A64FX on Isambard and Fugaku: archspec/archspec-json#23 At least I'm glad it isn't just julia 🙂 |
* [CPUID] Rework how current ISA is determined * [CPUID] Add ISA entry for A64FX * [CPUID] Add ISA entry for Apple Silicon M1 * [CPUID] Simplify collection of full set of features for architecture * [CPUID] Remove AES from A64FX ISA, not all chips appear to have it (cherry picked from commit f45b6ad)
On A64FX we get
which corresponds to the following set
which is literally the
armv8.2-a+crypto
set +JL_AArch64_sve
.I'd even go as far as removing
armv8.4-a+crypto+sve
, I don't think there is any existing CPU at the moment with all those capabilities, it isn't very useful there.Also, what about backporting to v1.6 and v1.7?