-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8352675: Support Intel AVX10 converged vector ISA feature detection #24329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/label add hotspot-compiler-dev |
👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into |
@jatin-bhateja This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 15 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@jatin-bhateja |
Webrevs
|
ff03a06
to
b95ac21
Compare
@jatin-bhateja Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving a few drive-by comments, I'm really not very familiar with this code. It would be nice if someone from Intel reviewed this also.
Also: you should probably update some more copyright dates ;)
src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotJVMCIBackendFactory.java
Outdated
Show resolved
Hide resolved
int res = jio_snprintf( | ||
buf, sizeof(buf), | ||
"(%u cores per cpu, %u threads per core) family %d model %d stepping %d microcode 0x%x", | ||
cores_per_cpu(), threads_per_core(), | ||
cpu_family(), _model, _stepping, os::cpu_microcode_revision()); | ||
assert(res > 0, "not enough temporary space allocated"); | ||
insert_features_names(buf + res, sizeof(buf) - res, _features_names); | ||
insert_features_names(_features, buf + res, sizeof(buf) - res, _features_names); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x86 is the only platform which uses insert_features_names
. Other platforms rely on macros. Maybe it's time to do the same on x86?
@@ -56,6 +56,9 @@ class Abstract_VM_Version: AllStatic { | |||
|
|||
// CPU feature flags, can be affected by VM settings. | |||
static uint64_t _features; | |||
// Extra CPU feature flags used when all 64 bits of _features are exhausted for | |||
// on a given target, currently only used for x86_64, can be affected by VM settings. | |||
static uint64_t _extra_features; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's unfortunate. Maybe it's time to turn _features
into a fixed size (platform-specific) bitmap instead? (RegMask
is one existing example.) Having 2 independent fields is error-prone (look at _cpu_features
).
5b3f3b6
to
5d09adb
Compare
/label add graal-dev |
@jatin-bhateja |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks much better! Thanks, Jatin.
I'm curious why don't you represent feature bitmap as a POD (with all the accessors on it) and pass it around by value when needed? (It's size will vary across platforms, but will be fixed at runtime.) It should significantly simplify the implementation.
As an example, take a look at RegMask
in C2. It accommodates significantly more bits than needed for VM_Version
.
Hi @iwanowww, |
I'm not suggesting to reuse RegMask, but introduce a separate class (e.g., VMFeatures) and embed its instances into Abstract_VM_Version (as JVMCI can still operate on in-memory representation at (BTW all CPU feature constants in |
|
||
class VM_Features { | ||
public: | ||
using FeatureVector = uint64_t [MAX_FEATURE_VEC_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be better to refactor this into a separate class analogous to std::bitset
? You can start with only implementing test
, set
, reset
. This would help in other use cases, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In essence, what we have currently is a bitmap implementation, but its utility is limited to VM_Version for now. The current approach simplifies the JVMCI side of handling. We have an existing utility for bitset src/hotspot/share/utilities/bitMap.hpp, we have multiple implementations for feature detection currently for different targets, it will be good to have the unified solution in the future. For now our intent is just to lift the hard limation of 64 feature bits for x86 target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
I made a cleanup pass over the code [1]. Feel free to incorporate it or let me know if you have any questions/concerns.
Meanwhile, submitted it for testing.
[1] iwanowww@35aeb88
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JVMCI changes look good. Will run some Graal tests on this PR
src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotJVMCIBackendFactory.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some SA-related failures. Fixed by [1]. Otherwise, testing results are good.
[1] iwanowww@9d4b85a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing results (hs-tier1 - hs-tier4) are clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU features in Graal remain the same after this PR. Passed all Graal compiler unit tests.
@@ -452,13 +461,11 @@ class VM_Version_StubGenerator: public StubCodeGenerator { | |||
__ lea(rsi, Address(rbp, in_bytes(VM_Version::std_cpuid1_offset()))); | |||
__ movl(rcx, 0x18000000); // cpuid1 bits osxsave | avx | |||
__ andl(rcx, Address(rsi, 8)); // cpuid1 bits osxsave | avx | |||
__ cmpl(rcx, 0x18000000); | |||
__ jccb(Assembler::notEqual, done); // jump if AVX is not supported | |||
__ jccb(Assembler::equal, done); // jump if AVX is not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and all the following places with multi-bit check still need to be fixed. If you walk through stock and new code in this PR when Address(rsi, 8) on line 468 has 0x10000000, you will observe that stock code will jump to done and new code will not jump to done. Let me know if I am missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest of the PR looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Thanks @iwanowww , @sviswa7 , @mur47x111 , @merykitty for your reviews. |
/integrate |
Going to push as commit 3b336a9.
Your commit was automatically rebased without conflicts. |
@jatin-bhateja Pushed as commit 3b336a9. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers.
The patch has been regressed through tier1 and jvmci tests
Please review and share your feedback.
Best Regards,
Jatin
[1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329
$ git checkout pull/24329
Update a local copy of the PR:
$ git checkout pull/24329
$ git pull https://git.openjdk.org/jdk.git pull/24329/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 24329
View PR using the GUI difftool:
$ git pr show -t 24329
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24329.diff
Using Webrev
Link to Webrev Comment