-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: Intel AMX support #5065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Intel AMX support #5065
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5065 +/- ##
==========================================
- Coverage 83.18% 83.15% -0.03%
==========================================
Files 247 248 +1
Lines 26816 26901 +85
==========================================
+ Hits 22306 22370 +64
- Misses 4510 4531 +21
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d7875f2
to
2ae659a
Compare
9ed7e6c
to
c7331eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we already want to add a changelog entry for this? I know the title says "make perf tests work", but really what we're doing is making firecracker support AMX xP
I'd like to declare official support of Intel AMX when snapshot restore is supported :) |
27ae1dc
to
9e074a2
Compare
9e074a2
to
7556037
Compare
e49f406
to
3d9e5de
Compare
17ceb3a
to
fc8effa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm, but I think that fam_len
thing is so confusing right now. If we don't want/can't change the api, I think we should better explain in the comment what is happening because it makes my head spin every time I look at it
ea4225f
to
0e8ec6c
Compare
Bindings for ARCH_REQ_XCOMP_GUEST_PERM and ARCH_XCOMP_TILEDATA are required to enable Intel AMX's XTILEDATA for XSAVE. Note that the required bits were added in kernel v6.4+. Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Eq and PartialEq are not necessary for KvmError, rather disallows me to add error variants wrapping std::io::Error to handle syscall errors. Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Intel AMX (Advanced Matrix Extensions) was introduced in Intel Sapphire Rapids to accelerate deep learning and AI workloads. Since it requires a larger area to save its state, the TILEDATA feature is disabled by default. We request permission for it by default because it can be disabled via CPU template. Otherwise, kernels prior to v6.4 have a bug where KVM_GET_SUPPORTED_CPUID returns an inconsistent state of TILECFG enabled but TILEDATA disabled by default, causing guest's #GP fault on xsetbv instruction. Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
0e8ec6c
to
f997aca
Compare
Intel AMX is an XSTATE feature and TILEDATA is disabled by default because it requires a larger area to save its state than the traditional 4096 bytes. Instead, Linux kernel allows VMMs to request the guest permission via `arch_prctl()`. As such, the size of the XSTATE buffer required to save XSTASTE is dynamic. To support dynamically-sized buffer, `KVM_CAP_XSAVE2` was introduced with `KVM_GET_XSAVE2`. Accordingly, kvm-bindings added `Xsave` that is an alias of `FamStructWrapper` for the `kvm_xsave` struct with FAM in the end, and kvm-ioctls added `get_xsave2()` for `KVM_GET_XSAVE2` and `set_xsave2()` to take `Xsave` to call `KVM_SET_XSAVE`. Change the type of `xsave` in `VcpuState` from `kvm_xsave` to `Xsave`. Use `get_xsave2()` and `set_xsave2()`. Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
KVM_GET_XSAVE2 is called when taking a snapshot, so it has to be allowed by seccomp filter. Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Intel AMX support was introduced but it is only supported on Intel Sapphire Rapids at the moment. We have to skip Intel AMX tests on older processors. Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
To check that Intel AMX is indeed supported inside guest, check related features are listed in CPUID output. Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Now all the required changes for Intel AMX have been done :) Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
a5d5d2c
to
17ad9e9
Compare
This is the first PR for Intel Sapphire Rapids (EC2 7th-gen Intel instance type) support.
Note that this PR focuses on Intel AMX support and any integration tests for Intel Sapphire Rapids will be added in the upcoming PR.
Changes
arch_prctl()
beforeKVM_GET_SUPPORTED_CPUID
to boot guests successfullykvm_xsave
to support snapshot restore of Intel AMXReason
Intel AMX (Advanced Matrix Extensions) is introduced in Intel Sapphire Rapids and a new instruction set for deep learning / AI workloads. Intel AMX is supported in
XSAVE
/XRSTOR
that are instructions to save/restore extensional CPU features' states into memory (e.g. for context switch). Which states to be saved/restored is configured by writing a bit vector to XCR0 viaXSETBV
instruction. Intel AMX introduces two new bits (TILECFG and TILEDATA) in (1) the bit vector (XCR0.TILECFG[bit 17]
andXCR0.TILEDATA[bit 18]
) as well as (2) CPUID to enumerate their supports (CPUID.(EAX=0DH,ECX=0):EAX.TILECFG[bit 17]
andCPUID.(EAX=0DH,ECX=0):EAX.TILEDATA[bit 18]
).Since the memory size required to save TILEDATA state is 8KB and it is larger than previously statically allocated memory size (4KB), Linux kernel decided to disable TILEDATA by default and allows userspace applications to enable it dynamically via
arch_prctl()
syscall. This default disabling behavior is also the case with KVM. To enable TILEDATA for guests, VMM has to callarch_prctl(ARCH_REQ_XCOMP_GUEST_PERM, ...)
, which makesKVM_GET_SUPPORTED_CPUID
return a value withCPUID.(EAX=0DH,ECX=0):EAX.TILEDATA[bit 18]
set. Conversely, withoutarch_prctl()
prior toKVM_GET_SUPPORTED_CPUID
, it returns an inconsistent state whereCPUID.(EAX=0DH,ECX=0):EAX.TILECFG[bit 17]
is set butCPUID.(EAX=0DH,ECX=0):EAX.TILEDATA[bit 18]
is not set. If such a AMX-half-enabled CPUID is passed toKVM_SET_CPUID2
as it is, guests will crash with general protection fault during boot (See Appendix). This is because Linux kernel attempts to executeXSETBV
instruction with all XSAVE feature bits enumerated on CPUID during boot andXSETBV
only accepts either of both Intel AMX bits enabled or disabled. This bug ofKVM_GET_SUPPORTED_CPUID
returning such a half-enabled state was fixed in kernel v6.4. But in any case, Firecracker supports the CPU template feature that enables users to mask CPU features (including Intel AMX), so we enable TILEDATA by default to make it work even on earlier kernels.To support a dynamically-sized XSTATE buffer, the Linux kernel extended the existing
kvm_xsave
by adding a flexible array member (FAM) in the end. Along with it,KVM_GET_XSAVE2
API was added andKVM_SET_XSAVE
API was extended. To support these changes, rust-vmm added (1)kvm_xsave2
that holdskvm_xsave
and the length of the FAM, (2)Xsave
asFamStructWrapper
ofkvm_xsave2
, (3)get_xsave2()
forKVM_GET_XSAVE2
and (4)set_xsave2()
to takeXsave
and callKVM_SET_XSAVE
. Accordingly, use these methods and structs to support Intel AMX in Firecracker.License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
tools/devtool checkstyle
to verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
[ ] I have updated any relevant documentation (both in code and in the docs)in the PR.
[ ] I have mentioned all user-facing changes inCHANGELOG.md
.[ ] If a specific issue led to this PR, this PR closes the issue.[ ] When making API changes, I have followed theRunbook for Firecracker API changes.
integration tests.
[ ] I have linked an issue to every newTODO
.rust-vmm
.Appendix: GP fault on guest boot without Intel AMX enablement
With this PR, the GP fault doesn't happen. (Note that a functional test for CPU feature set will be added in an upcoming PR for functional integration tests support.)