Skip to content

Commit 8fa590b

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini: "ARM64: - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on (see merge commit 382b5b8: "Fix a number of issues with MTE, such as races on the tags being initialised vs the PG_mte_tagged flag as well as the lack of support for VM_SHARED when KVM is involved. Patches from Catalin Marinas and Peter Collingbourne"). - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. s390: - Second batch of the lazy destroy patches - First batch of KVM changes for kernel virtual != physical address support - Removal of a unused function x86: - Allow compiling out SMM support - Cleanup and documentation of SMM state save area format - Preserve interrupt shadow in SMM state save area - Respond to generic signals during slow page faults - Fixes and optimizations for the non-executable huge page errata fix. - Reprogram all performance counters on PMU filter change - Cleanups to Hyper-V emulation and tests - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest running on top of a L1 Hyper-V hypervisor) - Advertise several new Intel features - x86 Xen-for-KVM: - Allow the Xen runstate information to cross a page boundary - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured - Add support for 32-bit guests in SCHEDOP_poll - Notable x86 fixes and cleanups: - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. - Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. - Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. - Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency. - Advertise (on AMD) that the SMM_CTL MSR is not supported - Remove unnecessary exports Generic: - Support for responding to signals during page faults; introduces new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks Selftests: - Fix an inverted check in the access tracking perf test, and restore support for asserting that there aren't too many idle pages when running on bare metal. - Fix build errors that occur in certain setups (unsure exactly what is unique about the problematic setup) due to glibc overriding static_assert() to a variant that requires a custom message. - Introduce actual atomics for clear/set_bit() in selftests - Add support for pinning vCPUs in dirty_log_perf_test. - Rename the so called "perf_util" framework to "memstress". - Add a lightweight psuedo RNG for guest use, and use it to randomize the access pattern and write vs. read percentage in the memstress tests. - Add a common ucall implementation; code dedup and pre-work for running SEV (and beyond) guests in selftests. - Provide a common constructor and arch hook, which will eventually be used by x86 to automatically select the right hypercall (AMD vs. Intel). - A bunch of added/enabled/fixed selftests for ARM64, covering memslots, breakpoints, stage-2 faults and access tracking. - x86-specific selftest changes: - Clean up x86's page table management. - Clean up and enhance the "smaller maxphyaddr" test, and add a related test to cover generic emulation failure. - Clean up the nEPT support checks. - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values. - Fix an ordering issue in the AMX test introduced by recent conversions to use kvm_cpu_has(), and harden the code to guard against similar bugs in the future. Anything that tiggers caching of KVM's supported CPUID, kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if the caching occurs before the test opts in via prctl(). Documentation: - Remove deleted ioctls from documentation - Clean up the docs for the x86 MSR filter. - Various fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits) KVM: x86: Add proper ReST tables for userspace MSR exits/flags KVM: selftests: Allocate ucall pool from MEM_REGION_DATA KVM: arm64: selftests: Align VA space allocator with TTBR0 KVM: arm64: Fix benign bug with incorrect use of VA_BITS KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow KVM: x86: Advertise that the SMM_CTL MSR is not supported KVM: x86: remove unnecessary exports KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic" tools: KVM: selftests: Convert clear/set_bit() to actual atomics tools: Drop "atomic_" prefix from atomic test_and_set_bit() tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests perf tools: Use dedicated non-atomic clear/set bit helpers tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers KVM: arm64: selftests: Enable single-step without a "full" ucall() KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself KVM: Remove stale comment about KVM_REQ_UNHALT KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR KVM: Reference to kvm_userspace_memory_region in doc and comments KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl ...
2 parents 057b40f + 549a715 commit 8fa590b

File tree

257 files changed

+12068
-4988
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

257 files changed

+12068
-4988
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 165 additions & 109 deletions
Large diffs are not rendered by default.

Documentation/virt/kvm/arm/pvtime.rst

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,21 +23,23 @@ the PV_TIME_FEATURES hypercall should be probed using the SMCCC 1.1
2323
ARCH_FEATURES mechanism before calling it.
2424

2525
PV_TIME_FEATURES
26-
============= ======== ==========
26+
27+
============= ======== =================================================
2728
Function ID: (uint32) 0xC5000020
2829
PV_call_id: (uint32) The function to query for support.
2930
Currently only PV_TIME_ST is supported.
3031
Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
3132
PV-time feature is supported by the hypervisor.
32-
============= ======== ==========
33+
============= ======== =================================================
3334

3435
PV_TIME_ST
35-
============= ======== ==========
36+
37+
============= ======== ==============================================
3638
Function ID: (uint32) 0xC5000021
3739
Return value: (int64) IPA of the stolen time data structure for this
3840
VCPU. On failure:
3941
NOT_SUPPORTED (-1)
40-
============= ======== ==========
42+
============= ======== ==============================================
4143

4244
The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
4345
with inner and outer write back caching attributes, in the inner shareable
@@ -76,5 +78,5 @@ It is advisable that one or more 64k pages are set aside for the purpose of
7678
these structures and not used for other purposes, this enables the guest to map
7779
the region using 64k pages and avoids conflicting attributes with other memory.
7880

79-
For the user space interface see Documentation/virt/kvm/devices/vcpu.rst
80-
section "3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL".
81+
For the user space interface see
82+
:ref:`Documentation/virt/kvm/devices/vcpu.rst <kvm_arm_vcpu_pvtime_ctrl>`.

Documentation/virt/kvm/devices/arm-vgic-its.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,10 @@ KVM_DEV_ARM_VGIC_GRP_CTRL
5252

5353
KVM_DEV_ARM_ITS_SAVE_TABLES
5454
save the ITS table data into guest RAM, at the location provisioned
55-
by the guest in corresponding registers/table entries.
55+
by the guest in corresponding registers/table entries. Should userspace
56+
require a form of dirty tracking to identify which pages are modified
57+
by the saving process, it should use a bitmap even if using another
58+
mechanism to track the memory dirtied by the vCPUs.
5659

5760
The layout of the tables in guest memory defines an ABI. The entries
5861
are laid out in little endian format as described in the last paragraph.

Documentation/virt/kvm/devices/vcpu.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,8 @@ configured values on other VCPUs. Userspace should configure the interrupt
171171
numbers on at least one VCPU after creating all VCPUs and before running any
172172
VCPUs.
173173

174+
.. _kvm_arm_vcpu_pvtime_ctrl:
175+
174176
3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL
175177
==================================
176178

MAINTAINERS

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11438,6 +11438,16 @@ F: arch/x86/kvm/svm/hyperv.*
1143811438
F: arch/x86/kvm/svm/svm_onhyperv.*
1143911439
F: arch/x86/kvm/vmx/evmcs.*
1144011440

11441+
KVM X86 Xen (KVM/Xen)
11442+
M: David Woodhouse <dwmw2@infradead.org>
11443+
M: Paul Durrant <paul@xen.org>
11444+
M: Sean Christopherson <seanjc@google.com>
11445+
M: Paolo Bonzini <pbonzini@redhat.com>
11446+
L: kvm@vger.kernel.org
11447+
S: Supported
11448+
T: git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
11449+
F: arch/x86/kvm/xen.*
11450+
1144111451
KERNFS
1144211452
M: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1144311453
M: Tejun Heo <tj@kernel.org>

arch/arm64/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1988,6 +1988,7 @@ config ARM64_MTE
19881988
depends on ARM64_PAN
19891989
select ARCH_HAS_SUBPAGE_FAULTS
19901990
select ARCH_USES_HIGH_VMA_FLAGS
1991+
select ARCH_USES_PG_ARCH_X
19911992
help
19921993
Memory Tagging (part of the ARMv8.5 Extensions) provides
19931994
architectural support for run-time, always-on detection of

arch/arm64/include/asm/kvm_arm.h

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@
135135
* 40 bits wide (T0SZ = 24). Systems with a PARange smaller than 40 bits are
136136
* not known to exist and will break with this configuration.
137137
*
138-
* The VTCR_EL2 is configured per VM and is initialised in kvm_arm_setup_stage2().
138+
* The VTCR_EL2 is configured per VM and is initialised in kvm_init_stage2_mmu.
139139
*
140140
* Note that when using 4K pages, we concatenate two first level page tables
141141
* together. With 16K pages, we concatenate 16 first level page tables.
@@ -340,9 +340,13 @@
340340
* We have
341341
* PAR [PA_Shift - 1 : 12] = PA [PA_Shift - 1 : 12]
342342
* HPFAR [PA_Shift - 9 : 4] = FIPA [PA_Shift - 1 : 12]
343+
*
344+
* Always assume 52 bit PA since at this point, we don't know how many PA bits
345+
* the page table has been set up for. This should be safe since unused address
346+
* bits in PAR are res0.
343347
*/
344348
#define PAR_TO_HPFAR(par) \
345-
(((par) & GENMASK_ULL(PHYS_MASK_SHIFT - 1, 12)) >> 8)
349+
(((par) & GENMASK_ULL(52 - 1, 12)) >> 8)
346350

347351
#define ECN(x) { ESR_ELx_EC_##x, #x }
348352

arch/arm64/include/asm/kvm_asm.h

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,9 @@ enum __kvm_host_smccc_func {
7676
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
7777
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs,
7878
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
79+
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
80+
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
81+
__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
7982
};
8083

8184
#define DECLARE_KVM_VHE_SYM(sym) extern char sym[]
@@ -106,7 +109,7 @@ enum __kvm_host_smccc_func {
106109
#define per_cpu_ptr_nvhe_sym(sym, cpu) \
107110
({ \
108111
unsigned long base, off; \
109-
base = kvm_arm_hyp_percpu_base[cpu]; \
112+
base = kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu]; \
110113
off = (unsigned long)&CHOOSE_NVHE_SYM(sym) - \
111114
(unsigned long)&CHOOSE_NVHE_SYM(__per_cpu_start); \
112115
base ? (typeof(CHOOSE_NVHE_SYM(sym))*)(base + off) : NULL; \
@@ -211,7 +214,7 @@ DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
211214
#define __kvm_hyp_init CHOOSE_NVHE_SYM(__kvm_hyp_init)
212215
#define __kvm_hyp_vector CHOOSE_HYP_SYM(__kvm_hyp_vector)
213216

214-
extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
217+
extern unsigned long kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[];
215218
DECLARE_KVM_NVHE_SYM(__per_cpu_start);
216219
DECLARE_KVM_NVHE_SYM(__per_cpu_end);
217220

arch/arm64/include/asm/kvm_host.h

Lines changed: 74 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,63 @@ u32 __attribute_const__ kvm_target_cpu(void);
7373
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
7474
void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
7575

76+
struct kvm_hyp_memcache {
77+
phys_addr_t head;
78+
unsigned long nr_pages;
79+
};
80+
81+
static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc,
82+
phys_addr_t *p,
83+
phys_addr_t (*to_pa)(void *virt))
84+
{
85+
*p = mc->head;
86+
mc->head = to_pa(p);
87+
mc->nr_pages++;
88+
}
89+
90+
static inline void *pop_hyp_memcache(struct kvm_hyp_memcache *mc,
91+
void *(*to_va)(phys_addr_t phys))
92+
{
93+
phys_addr_t *p = to_va(mc->head);
94+
95+
if (!mc->nr_pages)
96+
return NULL;
97+
98+
mc->head = *p;
99+
mc->nr_pages--;
100+
101+
return p;
102+
}
103+
104+
static inline int __topup_hyp_memcache(struct kvm_hyp_memcache *mc,
105+
unsigned long min_pages,
106+
void *(*alloc_fn)(void *arg),
107+
phys_addr_t (*to_pa)(void *virt),
108+
void *arg)
109+
{
110+
while (mc->nr_pages < min_pages) {
111+
phys_addr_t *p = alloc_fn(arg);
112+
113+
if (!p)
114+
return -ENOMEM;
115+
push_hyp_memcache(mc, p, to_pa);
116+
}
117+
118+
return 0;
119+
}
120+
121+
static inline void __free_hyp_memcache(struct kvm_hyp_memcache *mc,
122+
void (*free_fn)(void *virt, void *arg),
123+
void *(*to_va)(phys_addr_t phys),
124+
void *arg)
125+
{
126+
while (mc->nr_pages)
127+
free_fn(pop_hyp_memcache(mc, to_va), arg);
128+
}
129+
130+
void free_hyp_memcache(struct kvm_hyp_memcache *mc);
131+
int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages);
132+
76133
struct kvm_vmid {
77134
atomic64_t id;
78135
};
@@ -115,6 +172,13 @@ struct kvm_smccc_features {
115172
unsigned long vendor_hyp_bmap;
116173
};
117174

175+
typedef unsigned int pkvm_handle_t;
176+
177+
struct kvm_protected_vm {
178+
pkvm_handle_t handle;
179+
struct kvm_hyp_memcache teardown_mc;
180+
};
181+
118182
struct kvm_arch {
119183
struct kvm_s2_mmu mmu;
120184

@@ -163,9 +227,19 @@ struct kvm_arch {
163227

164228
u8 pfr0_csv2;
165229
u8 pfr0_csv3;
230+
struct {
231+
u8 imp:4;
232+
u8 unimp:4;
233+
} dfr0_pmuver;
166234

167235
/* Hypercall features firmware registers' descriptor */
168236
struct kvm_smccc_features smccc_feat;
237+
238+
/*
239+
* For an untrusted host VM, 'pkvm.handle' is used to lookup
240+
* the associated pKVM instance in the hypervisor.
241+
*/
242+
struct kvm_protected_vm pkvm;
169243
};
170244

171245
struct kvm_vcpu_fault_info {
@@ -925,8 +999,6 @@ int kvm_set_ipa_limit(void);
925999
#define __KVM_HAVE_ARCH_VM_ALLOC
9261000
struct kvm *kvm_arch_alloc_vm(void);
9271001

928-
int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
929-
9301002
static inline bool kvm_vm_is_protected(struct kvm *kvm)
9311003
{
9321004
return false;

arch/arm64/include/asm/kvm_hyp.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,4 +123,7 @@ extern u64 kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val);
123123
extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
124124
extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
125125

126+
extern unsigned long kvm_nvhe_sym(__icache_flags);
127+
extern unsigned int kvm_nvhe_sym(kvm_arm_vmid_bits);
128+
126129
#endif /* __ARM64_KVM_HYP_H__ */

arch/arm64/include/asm/kvm_mmu.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
166166
void free_hyp_pgds(void);
167167

168168
void stage2_unmap_vm(struct kvm *kvm);
169-
int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
169+
int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type);
170170
void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
171171
int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
172172
phys_addr_t pa, unsigned long size, bool writable);

0 commit comments

Comments
 (0)