@@ -272,18 +272,6 @@ the VCPU file descriptor can be mmap-ed, including:
272
272
KVM_CAP_DIRTY_LOG_RING, see section 8.3.
273
273
274
274
275
- 4.6 KVM_SET_MEMORY_REGION
276
- -------------------------
277
-
278
- :Capability: basic
279
- :Architectures: all
280
- :Type: vm ioctl
281
- :Parameters: struct kvm_memory_region (in)
282
- :Returns: 0 on success, -1 on error
283
-
284
- This ioctl is obsolete and has been removed.
285
-
286
-
287
275
4.7 KVM_CREATE_VCPU
288
276
-------------------
289
277
@@ -368,17 +356,6 @@ see the description of the capability.
368
356
Note that the Xen shared info page, if configured, shall always be assumed
369
357
to be dirty. KVM will not explicitly mark it such.
370
358
371
- 4.9 KVM_SET_MEMORY_ALIAS
372
- ------------------------
373
-
374
- :Capability: basic
375
- :Architectures: x86
376
- :Type: vm ioctl
377
- :Parameters: struct kvm_memory_alias (in)
378
- :Returns: 0 (success), -1 (error)
379
-
380
- This ioctl is obsolete and has been removed.
381
-
382
359
383
360
4.10 KVM_RUN
384
361
------------
@@ -1332,7 +1309,7 @@ yet and must be cleared on entry.
1332
1309
__u64 userspace_addr; /* start of the userspace allocated memory */
1333
1310
};
1334
1311
1335
- /* for kvm_memory_region ::flags */
1312
+ /* for kvm_userspace_memory_region ::flags */
1336
1313
#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
1337
1314
#define KVM_MEM_READONLY (1UL << 1)
1338
1315
@@ -1377,10 +1354,6 @@ the memory region are automatically reflected into the guest. For example, an
1377
1354
mmap() that affects the region will be made visible immediately. Another
1378
1355
example is madvise(MADV_DROP).
1379
1356
1380
- It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
1381
- The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
1382
- allocation and is deprecated.
1383
-
1384
1357
1385
1358
4.36 KVM_SET_TSS_ADDR
1386
1359
---------------------
@@ -3293,6 +3266,7 @@ valid entries found.
3293
3266
----------------------
3294
3267
3295
3268
:Capability: KVM_CAP_DEVICE_CTRL
3269
+ :Architectures: all
3296
3270
:Type: vm ioctl
3297
3271
:Parameters: struct kvm_create_device (in/out)
3298
3272
:Returns: 0 on success, -1 on error
@@ -3333,6 +3307,7 @@ number.
3333
3307
:Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device,
3334
3308
KVM_CAP_VCPU_ATTRIBUTES for vcpu device
3335
3309
KVM_CAP_SYS_ATTRIBUTES for system (/dev/kvm) device (no set)
3310
+ :Architectures: x86, arm64, s390
3336
3311
:Type: device ioctl, vm ioctl, vcpu ioctl
3337
3312
:Parameters: struct kvm_device_attr
3338
3313
:Returns: 0 on success, -1 on error
@@ -4104,80 +4079,71 @@ flags values for ``struct kvm_msr_filter_range``:
4104
4079
``KVM_MSR_FILTER_READ ``
4105
4080
4106
4081
Filter read accesses to MSRs using the given bitmap. A 0 in the bitmap
4107
- indicates that a read should immediately fail , while a 1 indicates that
4108
- a read for a particular MSR should be handled regardless of the default
4082
+ indicates that read accesses should be denied , while a 1 indicates that
4083
+ a read for a particular MSR should be allowed regardless of the default
4109
4084
filter action.
4110
4085
4111
4086
``KVM_MSR_FILTER_WRITE ``
4112
4087
4113
4088
Filter write accesses to MSRs using the given bitmap. A 0 in the bitmap
4114
- indicates that a write should immediately fail , while a 1 indicates that
4115
- a write for a particular MSR should be handled regardless of the default
4089
+ indicates that write accesses should be denied , while a 1 indicates that
4090
+ a write for a particular MSR should be allowed regardless of the default
4116
4091
filter action.
4117
4092
4118
- ``KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE ``
4119
-
4120
- Filter both read and write accesses to MSRs using the given bitmap. A 0
4121
- in the bitmap indicates that both reads and writes should immediately fail,
4122
- while a 1 indicates that reads and writes for a particular MSR are not
4123
- filtered by this range.
4124
-
4125
4093
flags values for ``struct kvm_msr_filter ``:
4126
4094
4127
4095
``KVM_MSR_FILTER_DEFAULT_ALLOW ``
4128
4096
4129
4097
If no filter range matches an MSR index that is getting accessed, KVM will
4130
- fall back to allowing access to the MSR .
4098
+ allow accesses to all MSRs by default .
4131
4099
4132
4100
``KVM_MSR_FILTER_DEFAULT_DENY ``
4133
4101
4134
4102
If no filter range matches an MSR index that is getting accessed, KVM will
4135
- fall back to rejecting access to the MSR. In this mode, all MSRs that should
4136
- be processed by KVM need to explicitly be marked as allowed in the bitmaps.
4103
+ deny accesses to all MSRs by default.
4137
4104
4138
- This ioctl allows user space to define up to 16 bitmaps of MSR ranges to
4139
- specify whether a certain MSR access should be explicitly filtered for or not.
4105
+ This ioctl allows userspace to define up to 16 bitmaps of MSR ranges to deny
4106
+ guest MSR accesses that would normally be allowed by KVM. If an MSR is not
4107
+ covered by a specific range, the "default" filtering behavior applies. Each
4108
+ bitmap range covers MSRs from [base .. base+nmsrs).
4140
4109
4141
- If this ioctl has never been invoked, MSR accesses are not guarded and the
4142
- default KVM in-kernel emulation behavior is fully preserved.
4110
+ If an MSR access is denied by userspace, the resulting KVM behavior depends on
4111
+ whether or not KVM_CAP_X86_USER_SPACE_MSR's KVM_MSR_EXIT_REASON_FILTER is
4112
+ enabled. If KVM_MSR_EXIT_REASON_FILTER is enabled, KVM will exit to userspace
4113
+ on denied accesses, i.e. userspace effectively intercepts the MSR access. If
4114
+ KVM_MSR_EXIT_REASON_FILTER is not enabled, KVM will inject a #GP into the guest
4115
+ on denied accesses.
4116
+
4117
+ If an MSR access is allowed by userspace, KVM will emulate and/or virtualize
4118
+ the access in accordance with the vCPU model. Note, KVM may still ultimately
4119
+ inject a #GP if an access is allowed by userspace, e.g. if KVM doesn't support
4120
+ the MSR, or to follow architectural behavior for the MSR.
4121
+
4122
+ By default, KVM operates in KVM_MSR_FILTER_DEFAULT_ALLOW mode with no MSR range
4123
+ filters.
4143
4124
4144
4125
Calling this ioctl with an empty set of ranges (all nmsrs == 0) disables MSR
4145
4126
filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY `` is invalid and causes
4146
4127
an error.
4147
4128
4148
- As soon as the filtering is in place, every MSR access is processed through
4149
- the filtering except for accesses to the x2APIC MSRs (from 0x800 to 0x8ff);
4150
- x2APIC MSRs are always allowed, independent of the ``default_allow `` setting,
4151
- and their behavior depends on the ``X2APIC_ENABLE `` bit of the APIC base
4152
- register.
4153
-
4154
4129
.. warning ::
4155
- MSR accesses coming from nested vmentry/vmexit are not filtered.
4130
+ MSR accesses as part of nested VM-Enter/VM-Exit are not filtered.
4156
4131
This includes both writes to individual VMCS fields and reads/writes
4157
4132
through the MSR lists pointed to by the VMCS.
4158
4133
4159
- If a bit is within one of the defined ranges, read and write accesses are
4160
- guarded by the bitmap's value for the MSR index if the kind of access
4161
- is included in the ``struct kvm_msr_filter_range `` flags. If no range
4162
- cover this particular access, the behavior is determined by the flags
4163
- field in the kvm_msr_filter struct: ``KVM_MSR_FILTER_DEFAULT_ALLOW ``
4164
- and ``KVM_MSR_FILTER_DEFAULT_DENY ``.
4165
-
4166
- Each bitmap range specifies a range of MSRs to potentially allow access on.
4167
- The range goes from MSR index [base .. base+nmsrs]. The flags field
4168
- indicates whether reads, writes or both reads and writes are filtered
4169
- by setting a 1 bit in the bitmap for the corresponding MSR index.
4170
-
4171
- If an MSR access is not permitted through the filtering, it generates a
4172
- #GP inside the guest. When combined with KVM_CAP_X86_USER_SPACE_MSR, that
4173
- allows user space to deflect and potentially handle various MSR accesses
4174
- into user space.
4134
+ x2APIC MSR accesses cannot be filtered (KVM silently ignores filters that
4135
+ cover any x2APIC MSRs).
4175
4136
4176
4137
Note, invoking this ioctl while a vCPU is running is inherently racy. However,
4177
4138
KVM does guarantee that vCPUs will see either the previous filter or the new
4178
4139
filter, e.g. MSRs with identical settings in both the old and new filter will
4179
4140
have deterministic behavior.
4180
4141
4142
+ Similarly, if userspace wishes to intercept on denied accesses,
4143
+ KVM_MSR_EXIT_REASON_FILTER must be enabled before activating any filters, and
4144
+ left enabled until after all filters are deactivated. Failure to do so may
4145
+ result in KVM injecting a #GP instead of exiting to userspace.
4146
+
4181
4147
4.98 KVM_CREATE_SPAPR_TCE_64
4182
4148
----------------------------
4183
4149
@@ -5339,6 +5305,7 @@ KVM_PV_ASYNC_CLEANUP_PERFORM
5339
5305
union {
5340
5306
__u8 long_mode;
5341
5307
__u8 vector;
5308
+ __u8 runstate_update_flag;
5342
5309
struct {
5343
5310
__u64 gfn;
5344
5311
} shared_info;
@@ -5416,6 +5383,14 @@ KVM_XEN_ATTR_TYPE_XEN_VERSION
5416
5383
event channel delivery, so responding within the kernel without
5417
5384
exiting to userspace is beneficial.
5418
5385
5386
+ KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG
5387
+ This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates
5388
+ support for KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG. It enables the
5389
+ XEN_RUNSTATE_UPDATE flag which allows guest vCPUs to safely read
5390
+ other vCPUs' vcpu_runstate_info. Xen guests enable this feature via
5391
+ the VM_ASST_TYPE_runstate_update_flag of the HYPERVISOR_vm_assist
5392
+ hypercall.
5393
+
5419
5394
4.127 KVM_XEN_HVM_GET_ATTR
5420
5395
--------------------------
5421
5396
@@ -6473,31 +6448,33 @@ if it decides to decode and emulate the instruction.
6473
6448
6474
6449
Used on x86 systems. When the VM capability KVM_CAP_X86_USER_SPACE_MSR is
6475
6450
enabled, MSR accesses to registers that would invoke a #GP by KVM kernel code
6476
- will instead trigger a KVM_EXIT_X86_RDMSR exit for reads and KVM_EXIT_X86_WRMSR
6451
+ may instead trigger a KVM_EXIT_X86_RDMSR exit for reads and KVM_EXIT_X86_WRMSR
6477
6452
exit for writes.
6478
6453
6479
- The "reason" field specifies why the MSR trap occurred. User space will only
6480
- receive MSR exit traps when a particular reason was requested during through
6454
+ The "reason" field specifies why the MSR interception occurred. Userspace will
6455
+ only receive MSR exits when a particular reason was requested during through
6481
6456
ENABLE_CAP. Currently valid exit reasons are:
6482
6457
6483
6458
KVM_MSR_EXIT_REASON_UNKNOWN - access to MSR that is unknown to KVM
6484
6459
KVM_MSR_EXIT_REASON_INVAL - access to invalid MSRs or reserved bits
6485
6460
KVM_MSR_EXIT_REASON_FILTER - access blocked by KVM_X86_SET_MSR_FILTER
6486
6461
6487
- For KVM_EXIT_X86_RDMSR, the "index" field tells user space which MSR the guest
6488
- wants to read. To respond to this request with a successful read, user space
6462
+ For KVM_EXIT_X86_RDMSR, the "index" field tells userspace which MSR the guest
6463
+ wants to read. To respond to this request with a successful read, userspace
6489
6464
writes the respective data into the "data" field and must continue guest
6490
6465
execution to ensure the read data is transferred into guest register state.
6491
6466
6492
- If the RDMSR request was unsuccessful, user space indicates that with a "1" in
6467
+ If the RDMSR request was unsuccessful, userspace indicates that with a "1" in
6493
6468
the "error" field. This will inject a #GP into the guest when the VCPU is
6494
6469
executed again.
6495
6470
6496
- For KVM_EXIT_X86_WRMSR, the "index" field tells user space which MSR the guest
6497
- wants to write. Once finished processing the event, user space must continue
6498
- vCPU execution. If the MSR write was unsuccessful, user space also sets the
6471
+ For KVM_EXIT_X86_WRMSR, the "index" field tells userspace which MSR the guest
6472
+ wants to write. Once finished processing the event, userspace must continue
6473
+ vCPU execution. If the MSR write was unsuccessful, userspace also sets the
6499
6474
"error" field to "1".
6500
6475
6476
+ See KVM_X86_SET_MSR_FILTER for details on the interaction with MSR filtering.
6477
+
6501
6478
::
6502
6479
6503
6480
@@ -7263,19 +7240,27 @@ the module parameter for the target VM.
7263
7240
:Parameters: args[0] contains the mask of KVM_MSR_EXIT_REASON_* events to report
7264
7241
:Returns: 0 on success; -1 on error
7265
7242
7266
- This capability enables trapping of #GP invoking RDMSR and WRMSR instructions
7267
- into user space .
7243
+ This capability allows userspace to intercept RDMSR and WRMSR instructions if
7244
+ access to an MSR is denied. By default, KVM injects #GP on denied accesses .
7268
7245
7269
7246
When a guest requests to read or write an MSR, KVM may not implement all MSRs
7270
7247
that are relevant to a respective system. It also does not differentiate by
7271
7248
CPU type.
7272
7249
7273
- To allow more fine grained control over MSR handling, user space may enable
7250
+ To allow more fine grained control over MSR handling, userspace may enable
7274
7251
this capability. With it enabled, MSR accesses that match the mask specified in
7275
- args[0] and trigger a #GP event inside the guest by KVM will instead trigger
7276
- KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications which user space
7277
- can then handle to implement model specific MSR handling and/or user notifications
7278
- to inform a user that an MSR was not handled.
7252
+ args[0] and would trigger a #GP inside the guest will instead trigger
7253
+ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications. Userspace
7254
+ can then implement model specific MSR handling and/or user notifications
7255
+ to inform a user that an MSR was not emulated/virtualized by KVM.
7256
+
7257
+ The valid mask flags are:
7258
+
7259
+ KVM_MSR_EXIT_REASON_UNKNOWN - intercept accesses to unknown (to KVM) MSRs
7260
+ KVM_MSR_EXIT_REASON_INVAL - intercept accesses that are architecturally
7261
+ invalid according to the vCPU model and/or mode
7262
+ KVM_MSR_EXIT_REASON_FILTER - intercept accesses that are denied by userspace
7263
+ via KVM_X86_SET_MSR_FILTER
7279
7264
7280
7265
7.22 KVM_CAP_X86_BUS_LOCK_EXIT
7281
7266
-------------------------------
@@ -7936,7 +7921,7 @@ KVM_EXIT_X86_WRMSR exit notifications.
7936
7921
This capability indicates that KVM supports that accesses to user defined MSRs
7937
7922
may be rejected. With this capability exposed, KVM exports new VM ioctl
7938
7923
KVM_X86_SET_MSR_FILTER which user space can call to specify bitmaps of MSR
7939
- ranges that KVM should reject access to.
7924
+ ranges that KVM should deny access to.
7940
7925
7941
7926
In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to
7942
7927
trap and emulate MSRs that are outside of the scope of KVM as well as
@@ -8080,12 +8065,13 @@ KVM device "kvm-arm-vgic-its" when dirty ring is enabled.
8080
8065
This capability indicates the features that Xen supports for hosting Xen
8081
8066
PVHVM guests. Valid flags are::
8082
8067
8083
- #define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR (1 << 0)
8084
- #define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL (1 << 1)
8085
- #define KVM_XEN_HVM_CONFIG_SHARED_INFO (1 << 2)
8086
- #define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 3)
8087
- #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4)
8088
- #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5)
8068
+ #define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR (1 << 0)
8069
+ #define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL (1 << 1)
8070
+ #define KVM_XEN_HVM_CONFIG_SHARED_INFO (1 << 2)
8071
+ #define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 3)
8072
+ #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4)
8073
+ #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5)
8074
+ #define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG (1 << 6)
8089
8075
8090
8076
The KVM_XEN_HVM_CONFIG_HYPERCALL_MSR flag indicates that the KVM_XEN_HVM_CONFIG
8091
8077
ioctl is available, for the guest to set its hypercall page.
@@ -8117,6 +8103,18 @@ KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID/TIMER/UPCALL_VECTOR vCPU attributes.
8117
8103
related to event channel delivery, timers, and the XENVER_version
8118
8104
interception.
8119
8105
8106
+ The KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG flag indicates that KVM supports
8107
+ the KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG attribute in the KVM_XEN_SET_ATTR
8108
+ and KVM_XEN_GET_ATTR ioctls. This controls whether KVM will set the
8109
+ XEN_RUNSTATE_UPDATE flag in guest memory mapped vcpu_runstate_info during
8110
+ updates of the runstate information. Note that versions of KVM which support
8111
+ the RUNSTATE feature above, but not thie RUNSTATE_UPDATE_FLAG feature, will
8112
+ always set the XEN_RUNSTATE_UPDATE flag when updating the guest structure,
8113
+ which is perhaps counterintuitive. When this flag is advertised, KVM will
8114
+ behave more correctly, not using the XEN_RUNSTATE_UPDATE flag until/unless
8115
+ specifically enabled (by the guest making the hypercall, causing the VMM
8116
+ to enable the KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG attribute).
8117
+
8120
8118
8.31 KVM_CAP_PPC_MULTITCE
8121
8119
-------------------------
8122
8120
0 commit comments