Skip to content

Commit

Permalink
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Browse files Browse the repository at this point in the history
Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Generalized infrastructure for 'writable' ID registers, effectively
     allowing userspace to opt-out of certain vCPU features for its
     guest

   - Optimization for vSGI injection, opportunistically compressing
     MPIDR to vCPU mapping into a table

   - Improvements to KVM's PMU emulation, allowing userspace to select
     the number of PMCs available to a VM

   - Guest support for memory operation instructions (FEAT_MOPS)

   - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
     bugs and getting rid of useless code

   - Changes to the way the SMCCC filter is constructed, avoiding wasted
     memory allocations when not in use

   - Load the stage-2 MMU context at vcpu_load() for VHE systems,
     reducing the overhead of errata mitigations

   - Miscellaneous kernel and selftest fixes

  LoongArch:

   - New architecture for kvm.

     The hardware uses the same model as x86, s390 and RISC-V, where
     guest/host mode is orthogonal to supervisor/user mode. The
     virtualization extensions are very similar to MIPS, therefore the
     code also has some similarities but it's been cleaned up to avoid
     some of the historical bogosities that are found in arch/mips. The
     kernel emulates MMU, timer and CSR accesses, while interrupt
     controllers are only emulated in userspace, at least for now.

  RISC-V:

   - Support for the Smstateen and Zicond extensions

   - Support for virtualizing senvcfg

   - Support for virtualized SBI debug console (DBCN)

  S390:

   - Nested page table management can be monitored through tracepoints
     and statistics

  x86:

   - Fix incorrect handling of VMX posted interrupt descriptor in
     KVM_SET_LAPIC, which could result in a dropped timer IRQ

   - Avoid WARN on systems with Intel IPI virtualization

   - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs
     without forcing more common use cases to eat the extra memory
     overhead.

   - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
     SBPB, aka Selective Branch Predictor Barrier).

   - Fix a bug where restoring a vCPU snapshot that was taken within 1
     second of creating the original vCPU would cause KVM to try to
     synchronize the vCPU's TSC and thus clobber the correct TSC being
     set by userspace.

   - Compute guest wall clock using a single TSC read to avoid
     generating an inaccurate time, e.g. if the vCPU is preempted
     between multiple TSC reads.

   - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which
     complain about a "Firmware Bug" if the bit isn't set for select
     F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to
     appease Windows Server 2022.

   - Don't apply side effects to Hyper-V's synthetic timer on writes
     from userspace to fix an issue where the auto-enable behavior can
     trigger spurious interrupts, i.e. do auto-enabling only for guest
     writes.

   - Remove an unnecessary kick of all vCPUs when synchronizing the
     dirty log without PML enabled.

   - Advertise "support" for non-serializing FS/GS base MSR writes as
     appropriate.

   - Harden the fast page fault path to guard against encountering an
     invalid root when walking SPTEs.

   - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.

   - Use the fast path directly from the timer callback when delivering
     Xen timer events, instead of waiting for the next iteration of the
     run loop. This was not done so far because previously proposed code
     had races, but now care is taken to stop the hrtimer at critical
     points such as restarting the timer or saving the timer information
     for userspace.

   - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future
     flag.

   - Optimize injection of PMU interrupts that are simultaneous with
     NMIs.

   - Usual handful of fixes for typos and other warts.

  x86 - MTRR/PAT fixes and optimizations:

   - Clean up code that deals with honoring guest MTRRs when the VM has
     non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.

   - Zap EPT entries when non-coherent DMA assignment stops/start to
     prevent using stale entries with the wrong memtype.

   - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y

     This was done as a workaround for virtual machine BIOSes that did
     not bother to clear CR0.CD (because ancient KVM/QEMU did not bother
     to set it, in turn), and there's zero reason to extend the quirk to
     also ignore guest PAT.

  x86 - SEV fixes:

   - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts
     SHUTDOWN while running an SEV-ES guest.

   - Clean up the recognition of emulation failures on SEV guests, when
     KVM would like to "skip" the instruction but it had already been
     partially emulated. This makes it possible to drop a hack that
     second guessed the (insufficient) information provided by the
     emulator, and just do the right thing.

  Documentation:

   - Various updates and fixes, mostly for x86

   - MTRR and PAT fixes and optimizations"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits)
  KVM: selftests: Avoid using forced target for generating arm64 headers
  tools headers arm64: Fix references to top srcdir in Makefile
  KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
  KVM: arm64: selftest: Perform ISB before reading PAR_EL1
  KVM: arm64: selftest: Add the missing .guest_prepare()
  KVM: arm64: Always invalidate TLB for stage-2 permission faults
  KVM: x86: Service NMI requests after PMI requests in VM-Enter path
  KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
  KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
  KVM: arm64: Refine _EL2 system register list that require trap reinjection
  arm64: Add missing _EL2 encodings
  arm64: Add missing _EL12 encodings
  KVM: selftests: aarch64: vPMU test for validating user accesses
  KVM: selftests: aarch64: vPMU register test for unimplemented counters
  KVM: selftests: aarch64: vPMU register test for implemented counters
  KVM: selftests: aarch64: Introduce vpmu_counter_access test
  tools: Import arm_pmuv3.h
  KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
  KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
  KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
  ...
  • Loading branch information
torvalds committed Nov 3, 2023
2 parents 5be9911 + 45b890f commit 6803bd7
Show file tree
Hide file tree
Showing 127 changed files with 8,885 additions and 1,503 deletions.
12 changes: 12 additions & 0 deletions Documentation/devicetree/bindings/riscv/extensions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,12 @@ properties:
changes to interrupts as frozen at commit ccbddab ("Merge pull
request #42 from riscv/jhauser-2023-RC4") of riscv-aia.
- const: smstateen
description: |
The standard Smstateen extension for controlling access to CSRs
added by other RISC-V extensions in H/S/VS/U/VU modes and as
ratified at commit a28bfae (Ratified (#7)) of riscv-state-enable.
- const: ssaia
description: |
The standard Ssaia supervisor-level extension for the advanced
Expand Down Expand Up @@ -212,6 +218,12 @@ properties:
ratified in the 20191213 version of the unprivileged ISA
specification.

- const: zicond
description:
The standard Zicond extension for conditional arithmetic and
conditional-select/move operations as ratified in commit 95cf1f9
("Add changes requested by Ved during signoff") of riscv-zicond.

- const: zicsr
description: |
The standard Zicsr extension for control and status register
Expand Down
158 changes: 140 additions & 18 deletions Documentation/virt/kvm/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,13 @@ Reads the general purpose registers from the vcpu.
__u64 pc;
};

/* LoongArch */
struct kvm_regs {
/* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
unsigned long gpr[32];
unsigned long pc;
};


4.12 KVM_SET_REGS
-----------------
Expand Down Expand Up @@ -506,7 +513,7 @@ translation mode.
------------------

:Capability: basic
:Architectures: x86, ppc, mips, riscv
:Architectures: x86, ppc, mips, riscv, loongarch
:Type: vcpu ioctl
:Parameters: struct kvm_interrupt (in)
:Returns: 0 on success, negative on failure.
Expand Down Expand Up @@ -540,7 +547,7 @@ ioctl is useful if the in-kernel PIC is not used.
PPC:
^^^^

Queues an external interrupt to be injected. This ioctl is overleaded
Queues an external interrupt to be injected. This ioctl is overloaded
with 3 different irq values:

a) KVM_INTERRUPT_SET
Expand Down Expand Up @@ -592,6 +599,14 @@ b) KVM_INTERRUPT_UNSET

This is an asynchronous vcpu ioctl and can be invoked from any thread.

LOONGARCH:
^^^^^^^^^^

Queues an external interrupt to be injected into the virtual CPU. A negative
interrupt number dequeues the interrupt.

This is an asynchronous vcpu ioctl and can be invoked from any thread.


4.17 KVM_DEBUG_GUEST
--------------------
Expand Down Expand Up @@ -737,7 +752,7 @@ signal mask.
----------------

:Capability: basic
:Architectures: x86
:Architectures: x86, loongarch
:Type: vcpu ioctl
:Parameters: struct kvm_fpu (out)
:Returns: 0 on success, -1 on error
Expand All @@ -746,7 +761,7 @@ Reads the floating point state from the vcpu.

::

/* for KVM_GET_FPU and KVM_SET_FPU */
/* x86: for KVM_GET_FPU and KVM_SET_FPU */
struct kvm_fpu {
__u8 fpr[8][16];
__u16 fcw;
Expand All @@ -761,12 +776,21 @@ Reads the floating point state from the vcpu.
__u32 pad2;
};

/* LoongArch: for KVM_GET_FPU and KVM_SET_FPU */
struct kvm_fpu {
__u32 fcsr;
__u64 fcc;
struct kvm_fpureg {
__u64 val64[4];
}fpr[32];
};


4.23 KVM_SET_FPU
----------------

:Capability: basic
:Architectures: x86
:Architectures: x86, loongarch
:Type: vcpu ioctl
:Parameters: struct kvm_fpu (in)
:Returns: 0 on success, -1 on error
Expand All @@ -775,7 +799,7 @@ Writes the floating point state to the vcpu.

::

/* for KVM_GET_FPU and KVM_SET_FPU */
/* x86: for KVM_GET_FPU and KVM_SET_FPU */
struct kvm_fpu {
__u8 fpr[8][16];
__u16 fcw;
Expand All @@ -790,6 +814,15 @@ Writes the floating point state to the vcpu.
__u32 pad2;
};

/* LoongArch: for KVM_GET_FPU and KVM_SET_FPU */
struct kvm_fpu {
__u32 fcsr;
__u64 fcc;
struct kvm_fpureg {
__u64 val64[4];
}fpr[32];
};


4.24 KVM_CREATE_IRQCHIP
-----------------------
Expand Down Expand Up @@ -965,7 +998,7 @@ be set in the flags field of this ioctl:
The KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL flag requests KVM to generate
the contents of the hypercall page automatically; hypercalls will be
intercepted and passed to userspace through KVM_EXIT_XEN. In this
ase, all of the blob size and address fields must be zero.
case, all of the blob size and address fields must be zero.

The KVM_XEN_HVM_CONFIG_EVTCHN_SEND flag indicates to KVM that userspace
will always use the KVM_XEN_HVM_EVTCHN_SEND ioctl to deliver event
Expand Down Expand Up @@ -1070,7 +1103,7 @@ Other flags returned by ``KVM_GET_CLOCK`` are accepted but ignored.
:Extended by: KVM_CAP_INTR_SHADOW
:Architectures: x86, arm64
:Type: vcpu ioctl
:Parameters: struct kvm_vcpu_event (out)
:Parameters: struct kvm_vcpu_events (out)
:Returns: 0 on success, -1 on error

X86:
Expand Down Expand Up @@ -1193,7 +1226,7 @@ directly to the virtual CPU).
:Extended by: KVM_CAP_INTR_SHADOW
:Architectures: x86, arm64
:Type: vcpu ioctl
:Parameters: struct kvm_vcpu_event (in)
:Parameters: struct kvm_vcpu_events (in)
:Returns: 0 on success, -1 on error

X86:
Expand Down Expand Up @@ -1387,7 +1420,7 @@ documentation when it pops into existence).
-------------------

:Capability: KVM_CAP_ENABLE_CAP
:Architectures: mips, ppc, s390, x86
:Architectures: mips, ppc, s390, x86, loongarch
:Type: vcpu ioctl
:Parameters: struct kvm_enable_cap (in)
:Returns: 0 on success; -1 on error
Expand Down Expand Up @@ -1442,7 +1475,7 @@ for vm-wide capabilities.
---------------------

:Capability: KVM_CAP_MP_STATE
:Architectures: x86, s390, arm64, riscv
:Architectures: x86, s390, arm64, riscv, loongarch
:Type: vcpu ioctl
:Parameters: struct kvm_mp_state (out)
:Returns: 0 on success; -1 on error
Expand All @@ -1460,7 +1493,7 @@ Possible values are:

========================== ===============================================
KVM_MP_STATE_RUNNABLE the vcpu is currently running
[x86,arm64,riscv]
[x86,arm64,riscv,loongarch]
KVM_MP_STATE_UNINITIALIZED the vcpu is an application processor (AP)
which has not yet received an INIT signal [x86]
KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is
Expand Down Expand Up @@ -1516,11 +1549,14 @@ For riscv:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.

On LoongArch, only the KVM_MP_STATE_RUNNABLE state is used to reflect
whether the vcpu is runnable.

4.39 KVM_SET_MP_STATE
---------------------

:Capability: KVM_CAP_MP_STATE
:Architectures: x86, s390, arm64, riscv
:Architectures: x86, s390, arm64, riscv, loongarch
:Type: vcpu ioctl
:Parameters: struct kvm_mp_state (in)
:Returns: 0 on success; -1 on error
Expand All @@ -1538,6 +1574,9 @@ For arm64/riscv:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.

On LoongArch, only the KVM_MP_STATE_RUNNABLE state is used to reflect
whether the vcpu is runnable.

4.40 KVM_SET_IDENTITY_MAP_ADDR
------------------------------

Expand Down Expand Up @@ -2841,6 +2880,19 @@ Following are the RISC-V D-extension registers:
0x8020 0000 0600 0020 fcsr Floating point control and status register
======================= ========= =============================================

LoongArch registers are mapped using the lower 32 bits. The upper 16 bits of
that is the register group type.

LoongArch csr registers are used to control guest cpu or get status of guest
cpu, and they have the following id bit patterns::

0x9030 0000 0001 00 <reg:5> <sel:3> (64-bit)

LoongArch KVM control registers are used to implement some new defined functions
such as set vcpu counter or reset vcpu, and they have the following id bit patterns::

0x9030 0000 0002 <reg:16>


4.69 KVM_GET_ONE_REG
--------------------
Expand Down Expand Up @@ -3063,7 +3115,7 @@ as follow::
};

An entry with a "page_shift" of 0 is unused. Because the array is
organized in increasing order, a lookup can stop when encoutering
organized in increasing order, a lookup can stop when encountering
such an entry.

The "slb_enc" field provides the encoding to use in the SLB for the
Expand Down Expand Up @@ -3370,6 +3422,8 @@ return indicates the attribute is implemented. It does not necessarily
indicate that the attribute can be read or written in the device's
current state. "addr" is ignored.

.. _KVM_ARM_VCPU_INIT:

4.82 KVM_ARM_VCPU_INIT
----------------------

Expand Down Expand Up @@ -3455,7 +3509,7 @@ Possible features:
- KVM_RUN and KVM_GET_REG_LIST are not available;

- KVM_GET_ONE_REG and KVM_SET_ONE_REG cannot be used to access
the scalable archietctural SVE registers
the scalable architectural SVE registers
KVM_REG_ARM64_SVE_ZREG(), KVM_REG_ARM64_SVE_PREG() or
KVM_REG_ARM64_SVE_FFR;

Expand Down Expand Up @@ -4401,7 +4455,7 @@ This will have undefined effects on the guest if it has not already
placed itself in a quiescent state where no vcpu will make MMU enabled
memory accesses.

On succsful completion, the pending HPT will become the guest's active
On successful completion, the pending HPT will become the guest's active
HPT and the previous HPT will be discarded.

On failure, the guest will still be operating on its previous HPT.
Expand Down Expand Up @@ -5016,7 +5070,7 @@ before the vcpu is fully usable.

Between KVM_ARM_VCPU_INIT and KVM_ARM_VCPU_FINALIZE, the feature may be
configured by use of ioctls such as KVM_SET_ONE_REG. The exact configuration
that should be performaned and how to do it are feature-dependent.
that should be performed and how to do it are feature-dependent.

Other calls that depend on a particular feature being finalized, such as
KVM_RUN, KVM_GET_REG_LIST, KVM_GET_ONE_REG and KVM_SET_ONE_REG, will fail with
Expand Down Expand Up @@ -5124,6 +5178,24 @@ Valid values for 'action'::
#define KVM_PMU_EVENT_ALLOW 0
#define KVM_PMU_EVENT_DENY 1

Via this API, KVM userspace can also control the behavior of the VM's fixed
counters (if any) by configuring the "action" and "fixed_counter_bitmap" fields.

Specifically, KVM follows the following pseudo-code when determining whether to
allow the guest FixCtr[i] to count its pre-defined fixed event::

FixCtr[i]_is_allowed = (action == ALLOW) && (bitmap & BIT(i)) ||
(action == DENY) && !(bitmap & BIT(i));
FixCtr[i]_is_denied = !FixCtr[i]_is_allowed;

KVM always consumes fixed_counter_bitmap, it's userspace's responsibility to
ensure fixed_counter_bitmap is set correctly, e.g. if userspace wants to define
a filter that only affects general purpose counters.

Note, the "events" field also applies to fixed counters' hardcoded event_select
and unit_mask values. "fixed_counter_bitmap" has higher priority than "events"
if there is a contradiction between the two.

4.121 KVM_PPC_SVM_OFF
---------------------

Expand Down Expand Up @@ -5475,7 +5547,7 @@ KVM_XEN_ATTR_TYPE_EVTCHN
from the guest. A given sending port number may be directed back to
a specified vCPU (by APIC ID) / port / priority on the guest, or to
trigger events on an eventfd. The vCPU and priority can be changed
by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call, but but other
by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call, but other
fields cannot change for a given sending port. A port mapping is
removed by using KVM_XEN_EVTCHN_DEASSIGN in the flags field. Passing
KVM_XEN_EVTCHN_RESET in the flags field removes all interception of
Expand Down Expand Up @@ -6070,6 +6142,56 @@ writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the SET_ONE_REG
interface. No error will be returned, but the resulting offset will not be
applied.

.. _KVM_ARM_GET_REG_WRITABLE_MASKS:

4.139 KVM_ARM_GET_REG_WRITABLE_MASKS
-------------------------------------------

:Capability: KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES
:Architectures: arm64
:Type: vm ioctl
:Parameters: struct reg_mask_range (in/out)
:Returns: 0 on success, < 0 on error


::

#define KVM_ARM_FEATURE_ID_RANGE 0
#define KVM_ARM_FEATURE_ID_RANGE_SIZE (3 * 8 * 8)

struct reg_mask_range {
__u64 addr; /* Pointer to mask array */
__u32 range; /* Requested range */
__u32 reserved[13];
};

This ioctl copies the writable masks for a selected range of registers to
userspace.

The ``addr`` field is a pointer to the destination array where KVM copies
the writable masks.

The ``range`` field indicates the requested range of registers.
``KVM_CHECK_EXTENSION`` for the ``KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES``
capability returns the supported ranges, expressed as a set of flags. Each
flag's bit index represents a possible value for the ``range`` field.
All other values are reserved for future use and KVM may return an error.

The ``reserved[13]`` array is reserved for future use and should be 0, or
KVM may return an error.

KVM_ARM_FEATURE_ID_RANGE (0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The Feature ID range is defined as the AArch64 System register space with
op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, op2=={0-7}.

The mask returned array pointed to by ``addr`` is indexed by the macro
``ARM64_FEATURE_ID_RANGE_IDX(op0, op1, crn, crm, op2)``, allowing userspace
to know what fields can be changed for the system register described by
``op0, op1, crn, crm, op2``. KVM rejects ID register values that describe a
superset of the features supported by the system.

5. The kvm_run structure
========================

Expand Down
1 change: 1 addition & 0 deletions Documentation/virt/kvm/arm/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ ARM
hypercalls
pvtime
ptp_kvm
vcpu-features
Loading

0 comments on commit 6803bd7

Please sign in to comment.