Skip to content

Commit 1ae0995

Browse files
agrafbonzini
authored andcommitted
KVM: x86: Allow deflecting unknown MSR accesses to user space
MSRs are weird. Some of them are normal control registers, such as EFER. Some however are registers that really are model specific, not very interesting to virtualization workloads, and not performance critical. Others again are really just windows into package configuration. Out of these MSRs, only the first category is necessary to implement in kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against certain CPU models and MSRs that contain information on the package level are much better suited for user space to process. However, over time we have accumulated a lot of MSRs that are not the first category, but still handled by in-kernel KVM code. This patch adds a generic interface to handle WRMSR and RDMSR from user space. With this, any future MSR that is part of the latter categories can be handled in user space. Furthermore, it allows us to replace the existing "ignore_msrs" logic with something that applies per-VM rather than on the full system. That way you can run productive VMs in parallel to experimental ones where you don't care about proper MSR handling. Signed-off-by: Alexander Graf <graf@amazon.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200925143422.21718-3-graf@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
1 parent 90218e4 commit 1ae0995

File tree

6 files changed

+226
-15
lines changed

6 files changed

+226
-15
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 77 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4872,14 +4872,13 @@ to the byte array.
48724872

48734873
.. note::
48744874

4875-
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
4876-
KVM_EXIT_EPR the corresponding
4877-
4878-
operations are complete (and guest state is consistent) only after userspace
4879-
has re-entered the kernel with KVM_RUN. The kernel side will first finish
4880-
incomplete operations and then check for pending signals. Userspace
4881-
can re-enter the guest with an unmasked signal pending to complete
4882-
pending operations.
4875+
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR,
4876+
KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
4877+
operations are complete (and guest state is consistent) only after userspace
4878+
has re-entered the kernel with KVM_RUN. The kernel side will first finish
4879+
incomplete operations and then check for pending signals. Userspace
4880+
can re-enter the guest with an unmasked signal pending to complete
4881+
pending operations.
48834882

48844883
::
48854884

@@ -5166,6 +5165,43 @@ Note that KVM does not skip the faulting instruction as it does for
51665165
KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state
51675166
if it decides to decode and emulate the instruction.
51685167

5168+
::
5169+
5170+
/* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */
5171+
struct {
5172+
__u8 error; /* user -> kernel */
5173+
__u8 pad[7];
5174+
__u32 reason; /* kernel -> user */
5175+
__u32 index; /* kernel -> user */
5176+
__u64 data; /* kernel <-> user */
5177+
} msr;
5178+
5179+
Used on x86 systems. When the VM capability KVM_CAP_X86_USER_SPACE_MSR is
5180+
enabled, MSR accesses to registers that would invoke a #GP by KVM kernel code
5181+
will instead trigger a KVM_EXIT_X86_RDMSR exit for reads and KVM_EXIT_X86_WRMSR
5182+
exit for writes.
5183+
5184+
The "reason" field specifies why the MSR trap occurred. User space will only
5185+
receive MSR exit traps when a particular reason was requested during through
5186+
ENABLE_CAP. Currently valid exit reasons are:
5187+
5188+
KVM_MSR_EXIT_REASON_UNKNOWN - access to MSR that is unknown to KVM
5189+
KVM_MSR_EXIT_REASON_INVAL - access to invalid MSRs or reserved bits
5190+
5191+
For KVM_EXIT_X86_RDMSR, the "index" field tells user space which MSR the guest
5192+
wants to read. To respond to this request with a successful read, user space
5193+
writes the respective data into the "data" field and must continue guest
5194+
execution to ensure the read data is transferred into guest register state.
5195+
5196+
If the RDMSR request was unsuccessful, user space indicates that with a "1" in
5197+
the "error" field. This will inject a #GP into the guest when the VCPU is
5198+
executed again.
5199+
5200+
For KVM_EXIT_X86_WRMSR, the "index" field tells user space which MSR the guest
5201+
wants to write. Once finished processing the event, user space must continue
5202+
vCPU execution. If the MSR write was unsuccessful, user space also sets the
5203+
"error" field to "1".
5204+
51695205
::
51705206

51715207
/* Fix the size of the union. */
@@ -5855,6 +5891,28 @@ controlled by the kvm module parameter halt_poll_ns. This capability allows
58555891
the maximum halt time to specified on a per-VM basis, effectively overriding
58565892
the module parameter for the target VM.
58575893

5894+
7.21 KVM_CAP_X86_USER_SPACE_MSR
5895+
-------------------------------
5896+
5897+
:Architectures: x86
5898+
:Target: VM
5899+
:Parameters: args[0] contains the mask of KVM_MSR_EXIT_REASON_* events to report
5900+
:Returns: 0 on success; -1 on error
5901+
5902+
This capability enables trapping of #GP invoking RDMSR and WRMSR instructions
5903+
into user space.
5904+
5905+
When a guest requests to read or write an MSR, KVM may not implement all MSRs
5906+
that are relevant to a respective system. It also does not differentiate by
5907+
CPU type.
5908+
5909+
To allow more fine grained control over MSR handling, user space may enable
5910+
this capability. With it enabled, MSR accesses that match the mask specified in
5911+
args[0] and trigger a #GP event inside the guest by KVM will instead trigger
5912+
KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications which user space
5913+
can then handle to implement model specific MSR handling and/or user notifications
5914+
to inform a user that an MSR was not handled.
5915+
58585916
8. Other capabilities.
58595917
======================
58605918

@@ -6196,3 +6254,14 @@ distribution...)
61966254

61976255
If this capability is available, then the CPNC and CPVC can be synchronized
61986256
between KVM and userspace via the sync regs mechanism (KVM_SYNC_DIAG318).
6257+
6258+
8.26 KVM_CAP_X86_USER_SPACE_MSR
6259+
-------------------------------
6260+
6261+
:Architectures: x86
6262+
6263+
This capability indicates that KVM supports deflection of MSR reads and
6264+
writes to user space. It can be enabled on a VM level. If enabled, MSR
6265+
accesses that would usually trigger a #GP by KVM into the guest will
6266+
instead get bounced to user space through the KVM_EXIT_X86_RDMSR and
6267+
KVM_EXIT_X86_WRMSR exit notifications.

arch/x86/include/asm/kvm_host.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -961,6 +961,9 @@ struct kvm_arch {
961961
bool guest_can_read_msr_platform_info;
962962
bool exception_payload_enabled;
963963

964+
/* Deflect RDMSR and WRMSR to user space when they trigger a #GP */
965+
u32 user_space_msr_mask;
966+
964967
struct kvm_pmu_event_filter *pmu_event_filter;
965968
struct task_struct *nx_lpage_recovery_thread;
966969
};

arch/x86/kvm/emulate.c

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3701,21 +3701,35 @@ static int em_dr_write(struct x86_emulate_ctxt *ctxt)
37013701

37023702
static int em_wrmsr(struct x86_emulate_ctxt *ctxt)
37033703
{
3704+
u64 msr_index = reg_read(ctxt, VCPU_REGS_RCX);
37043705
u64 msr_data;
3706+
int r;
37053707

37063708
msr_data = (u32)reg_read(ctxt, VCPU_REGS_RAX)
37073709
| ((u64)reg_read(ctxt, VCPU_REGS_RDX) << 32);
3708-
if (ctxt->ops->set_msr(ctxt, reg_read(ctxt, VCPU_REGS_RCX), msr_data))
3710+
r = ctxt->ops->set_msr(ctxt, msr_index, msr_data);
3711+
3712+
if (r == X86EMUL_IO_NEEDED)
3713+
return r;
3714+
3715+
if (r)
37093716
return emulate_gp(ctxt, 0);
37103717

37113718
return X86EMUL_CONTINUE;
37123719
}
37133720

37143721
static int em_rdmsr(struct x86_emulate_ctxt *ctxt)
37153722
{
3723+
u64 msr_index = reg_read(ctxt, VCPU_REGS_RCX);
37163724
u64 msr_data;
3725+
int r;
3726+
3727+
r = ctxt->ops->get_msr(ctxt, msr_index, &msr_data);
3728+
3729+
if (r == X86EMUL_IO_NEEDED)
3730+
return r;
37173731

3718-
if (ctxt->ops->get_msr(ctxt, reg_read(ctxt, VCPU_REGS_RCX), &msr_data))
3732+
if (r)
37193733
return emulate_gp(ctxt, 0);
37203734

37213735
*reg_write(ctxt, VCPU_REGS_RAX) = (u32)msr_data;

arch/x86/kvm/x86.c

Lines changed: 116 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1590,12 +1590,89 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
15901590
}
15911591
EXPORT_SYMBOL_GPL(kvm_set_msr);
15921592

1593+
static int complete_emulated_msr(struct kvm_vcpu *vcpu, bool is_read)
1594+
{
1595+
if (vcpu->run->msr.error) {
1596+
kvm_inject_gp(vcpu, 0);
1597+
return 1;
1598+
} else if (is_read) {
1599+
kvm_rax_write(vcpu, (u32)vcpu->run->msr.data);
1600+
kvm_rdx_write(vcpu, vcpu->run->msr.data >> 32);
1601+
}
1602+
1603+
return kvm_skip_emulated_instruction(vcpu);
1604+
}
1605+
1606+
static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
1607+
{
1608+
return complete_emulated_msr(vcpu, true);
1609+
}
1610+
1611+
static int complete_emulated_wrmsr(struct kvm_vcpu *vcpu)
1612+
{
1613+
return complete_emulated_msr(vcpu, false);
1614+
}
1615+
1616+
static u64 kvm_msr_reason(int r)
1617+
{
1618+
switch (r) {
1619+
case -ENOENT:
1620+
return KVM_MSR_EXIT_REASON_UNKNOWN;
1621+
default:
1622+
return KVM_MSR_EXIT_REASON_INVAL;
1623+
}
1624+
}
1625+
1626+
static int kvm_msr_user_space(struct kvm_vcpu *vcpu, u32 index,
1627+
u32 exit_reason, u64 data,
1628+
int (*completion)(struct kvm_vcpu *vcpu),
1629+
int r)
1630+
{
1631+
u64 msr_reason = kvm_msr_reason(r);
1632+
1633+
/* Check if the user wanted to know about this MSR fault */
1634+
if (!(vcpu->kvm->arch.user_space_msr_mask & msr_reason))
1635+
return 0;
1636+
1637+
vcpu->run->exit_reason = exit_reason;
1638+
vcpu->run->msr.error = 0;
1639+
memset(vcpu->run->msr.pad, 0, sizeof(vcpu->run->msr.pad));
1640+
vcpu->run->msr.reason = msr_reason;
1641+
vcpu->run->msr.index = index;
1642+
vcpu->run->msr.data = data;
1643+
vcpu->arch.complete_userspace_io = completion;
1644+
1645+
return 1;
1646+
}
1647+
1648+
static int kvm_get_msr_user_space(struct kvm_vcpu *vcpu, u32 index, int r)
1649+
{
1650+
return kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_RDMSR, 0,
1651+
complete_emulated_rdmsr, r);
1652+
}
1653+
1654+
static int kvm_set_msr_user_space(struct kvm_vcpu *vcpu, u32 index, u64 data, int r)
1655+
{
1656+
return kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_WRMSR, data,
1657+
complete_emulated_wrmsr, r);
1658+
}
1659+
15931660
int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
15941661
{
15951662
u32 ecx = kvm_rcx_read(vcpu);
15961663
u64 data;
1664+
int r;
1665+
1666+
r = kvm_get_msr(vcpu, ecx, &data);
15971667

1598-
if (kvm_get_msr(vcpu, ecx, &data)) {
1668+
/* MSR read failed? See if we should ask user space */
1669+
if (r && kvm_get_msr_user_space(vcpu, ecx, r)) {
1670+
/* Bounce to user space */
1671+
return 0;
1672+
}
1673+
1674+
/* MSR read failed? Inject a #GP */
1675+
if (r) {
15991676
trace_kvm_msr_read_ex(ecx);
16001677
kvm_inject_gp(vcpu, 0);
16011678
return 1;
@@ -1613,8 +1690,18 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
16131690
{
16141691
u32 ecx = kvm_rcx_read(vcpu);
16151692
u64 data = kvm_read_edx_eax(vcpu);
1693+
int r;
16161694

1617-
if (kvm_set_msr(vcpu, ecx, data)) {
1695+
r = kvm_set_msr(vcpu, ecx, data);
1696+
1697+
/* MSR write failed? See if we should ask user space */
1698+
if (r && kvm_set_msr_user_space(vcpu, ecx, data, r)) {
1699+
/* Bounce to user space */
1700+
return 0;
1701+
}
1702+
1703+
/* MSR write failed? Inject a #GP */
1704+
if (r) {
16181705
trace_kvm_msr_write_ex(ecx, data);
16191706
kvm_inject_gp(vcpu, 0);
16201707
return 1;
@@ -3526,6 +3613,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
35263613
case KVM_CAP_EXCEPTION_PAYLOAD:
35273614
case KVM_CAP_SET_GUEST_DEBUG:
35283615
case KVM_CAP_LAST_CPU:
3616+
case KVM_CAP_X86_USER_SPACE_MSR:
35293617
r = 1;
35303618
break;
35313619
case KVM_CAP_SYNC_REGS:
@@ -5046,6 +5134,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
50465134
kvm->arch.exception_payload_enabled = cap->args[0];
50475135
r = 0;
50485136
break;
5137+
case KVM_CAP_X86_USER_SPACE_MSR:
5138+
kvm->arch.user_space_msr_mask = cap->args[0];
5139+
r = 0;
5140+
break;
50495141
default:
50505142
r = -EINVAL;
50515143
break;
@@ -6378,13 +6470,33 @@ static void emulator_set_segment(struct x86_emulate_ctxt *ctxt, u16 selector,
63786470
static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
63796471
u32 msr_index, u64 *pdata)
63806472
{
6381-
return kvm_get_msr(emul_to_vcpu(ctxt), msr_index, pdata);
6473+
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
6474+
int r;
6475+
6476+
r = kvm_get_msr(vcpu, msr_index, pdata);
6477+
6478+
if (r && kvm_get_msr_user_space(vcpu, msr_index, r)) {
6479+
/* Bounce to user space */
6480+
return X86EMUL_IO_NEEDED;
6481+
}
6482+
6483+
return r;
63826484
}
63836485

63846486
static int emulator_set_msr(struct x86_emulate_ctxt *ctxt,
63856487
u32 msr_index, u64 data)
63866488
{
6387-
return kvm_set_msr(emul_to_vcpu(ctxt), msr_index, data);
6489+
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
6490+
int r;
6491+
6492+
r = kvm_set_msr(vcpu, msr_index, data);
6493+
6494+
if (r && kvm_set_msr_user_space(vcpu, msr_index, data, r)) {
6495+
/* Bounce to user space */
6496+
return X86EMUL_IO_NEEDED;
6497+
}
6498+
6499+
return r;
63886500
}
63896501

63906502
static u64 emulator_get_smbase(struct x86_emulate_ctxt *ctxt)

include/trace/events/kvm.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
ERSN(NMI), ERSN(INTERNAL_ERROR), ERSN(OSI), ERSN(PAPR_HCALL), \
1818
ERSN(S390_UCONTROL), ERSN(WATCHDOG), ERSN(S390_TSCH), ERSN(EPR),\
1919
ERSN(SYSTEM_EVENT), ERSN(S390_STSI), ERSN(IOAPIC_EOI), \
20-
ERSN(HYPERV), ERSN(ARM_NISV)
20+
ERSN(HYPERV), ERSN(ARM_NISV), ERSN(X86_RDMSR), ERSN(X86_WRMSR)
2121

2222
TRACE_EVENT(kvm_userspace_exit,
2323
TP_PROTO(__u32 reason, int errno),

include/uapi/linux/kvm.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,6 +248,8 @@ struct kvm_hyperv_exit {
248248
#define KVM_EXIT_IOAPIC_EOI 26
249249
#define KVM_EXIT_HYPERV 27
250250
#define KVM_EXIT_ARM_NISV 28
251+
#define KVM_EXIT_X86_RDMSR 29
252+
#define KVM_EXIT_X86_WRMSR 30
251253

252254
/* For KVM_EXIT_INTERNAL_ERROR */
253255
/* Emulate instruction failed. */
@@ -413,6 +415,16 @@ struct kvm_run {
413415
__u64 esr_iss;
414416
__u64 fault_ipa;
415417
} arm_nisv;
418+
/* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */
419+
struct {
420+
__u8 error; /* user -> kernel */
421+
__u8 pad[7];
422+
#define KVM_MSR_EXIT_REASON_INVAL (1 << 0)
423+
#define KVM_MSR_EXIT_REASON_UNKNOWN (1 << 1)
424+
__u32 reason; /* kernel -> user */
425+
__u32 index; /* kernel -> user */
426+
__u64 data; /* kernel <-> user */
427+
} msr;
416428
/* Fix the size of the union. */
417429
char padding[256];
418430
};
@@ -1037,6 +1049,7 @@ struct kvm_ppc_resize_hpt {
10371049
#define KVM_CAP_SMALLER_MAXPHYADDR 185
10381050
#define KVM_CAP_S390_DIAG318 186
10391051
#define KVM_CAP_STEAL_TIME 187
1052+
#define KVM_CAP_X86_USER_SPACE_MSR 188
10401053

10411054
#ifdef KVM_CAP_IRQ_ROUTING
10421055

0 commit comments

Comments
 (0)