rebase patches on top of Ubuntu-6.2.0-39.40
(generated with debian/scripts/import-upstream-tag)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit ddd91a3b05)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
			
			
This commit is contained in:
		
							parent
							
								
									be7f6da7d4
								
							
						
					
					
						commit
						93ecd382ac
					
				@ -55,10 +55,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
 | 
				
			|||||||
 2 files changed, 111 insertions(+)
 | 
					 2 files changed, 111 insertions(+)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
 | 
					diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
 | 
				
			||||||
index fa73bbcb0c8d..4964bb2e931e 100644
 | 
					index 5d47f23514d0..f06df077504b 100644
 | 
				
			||||||
--- a/Documentation/admin-guide/kernel-parameters.txt
 | 
					--- a/Documentation/admin-guide/kernel-parameters.txt
 | 
				
			||||||
+++ b/Documentation/admin-guide/kernel-parameters.txt
 | 
					+++ b/Documentation/admin-guide/kernel-parameters.txt
 | 
				
			||||||
@@ -4209,6 +4209,15 @@
 | 
					@@ -4210,6 +4210,15 @@
 | 
				
			||||||
 				Also, it enforces the PCI Local Bus spec
 | 
					 				Also, it enforces the PCI Local Bus spec
 | 
				
			||||||
 				rule that those bits should be 0 in system reset
 | 
					 				rule that those bits should be 0 in system reset
 | 
				
			||||||
 				events (useful for kexec/kdump cases).
 | 
					 				events (useful for kexec/kdump cases).
 | 
				
			||||||
 | 
				
			|||||||
@ -1,75 +0,0 @@
 | 
				
			|||||||
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
 | 
					 | 
				
			||||||
From: Sean Christopherson <seanjc@google.com>
 | 
					 | 
				
			||||||
Date: Wed, 23 Aug 2023 18:01:04 -0700
 | 
					 | 
				
			||||||
Subject: [PATCH] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that
 | 
					 | 
				
			||||||
 hangs vCPUs
 | 
					 | 
				
			||||||
MIME-Version: 1.0
 | 
					 | 
				
			||||||
Content-Type: text/plain; charset=UTF-8
 | 
					 | 
				
			||||||
Content-Transfer-Encoding: 8bit
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq in
 | 
					 | 
				
			||||||
kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
 | 
					 | 
				
			||||||
how KVM tracks the sequence counter snapshot.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Take the vCPU's mmu_seq snapshot as an "unsigned long" instead of an "int"
 | 
					 | 
				
			||||||
when checking to see if a page fault is stale, as the sequence count is
 | 
					 | 
				
			||||||
stored as an "unsigned long" everywhere else in KVM.  This fixes a bug
 | 
					 | 
				
			||||||
where KVM will effectively hang vCPUs due to always thinking page faults
 | 
					 | 
				
			||||||
are stale, which results in KVM refusing to "fix" faults.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
mmu_invalidate_seq (née mmu_notifier_seq) is a sequence counter used when
 | 
					 | 
				
			||||||
KVM is handling page faults to detect if userspace mappings relevant to
 | 
					 | 
				
			||||||
the guest were invalidated between snapshotting the counter and acquiring
 | 
					 | 
				
			||||||
mmu_lock, i.e. to ensure that the userspace mapping KVM is using to
 | 
					 | 
				
			||||||
resolve the page fault is fresh.  If KVM sees that the counter has
 | 
					 | 
				
			||||||
changed, KVM simply resumes the guest without fixing the fault.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
What _should_ happen is that the source of the mmu_notifier invalidations
 | 
					 | 
				
			||||||
eventually goes away, mmu_invalidate_seq becomes stable, and KVM can once
 | 
					 | 
				
			||||||
again fix guest page fault(s).
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
But for a long-lived VM and/or a VM that the host just doesn't particularly
 | 
					 | 
				
			||||||
like, it's possible for a VM to be on the receiving end of 2 billion (with
 | 
					 | 
				
			||||||
a B) mmu_notifier invalidations.  When that happens, bit 31 will be set in
 | 
					 | 
				
			||||||
mmu_invalidate_seq.  This causes the value to be turned into a 32-bit
 | 
					 | 
				
			||||||
negative value when implicitly cast to an "int" by is_page_fault_stale(),
 | 
					 | 
				
			||||||
and then sign-extended into a 64-bit unsigned when the signed "int" is
 | 
					 | 
				
			||||||
implicitly cast back to an "unsigned long" on the call to
 | 
					 | 
				
			||||||
mmu_invalidate_retry_hva().
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
As a result of the casting and sign-extension, given a sequence counter of
 | 
					 | 
				
			||||||
e.g. 0x8002dc25, mmu_invalidate_retry_hva() ends up doing
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
	if (0x8002dc25 != 0xffffffff8002dc25)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
and signals that the page fault is stale and needs to be retried even
 | 
					 | 
				
			||||||
though the sequence counter is stable, and KVM effectively hangs any vCPU
 | 
					 | 
				
			||||||
that takes a page fault (EPT violation or #NPF when TDP is enabled).
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Reported-by: Brian Rak <brak@vultr.com>
 | 
					 | 
				
			||||||
Reported-by: Amaan Cheval <amaan.cheval@gmail.com>
 | 
					 | 
				
			||||||
Reported-by: Eric Wheeler <kvm@lists.ewheeler.net>
 | 
					 | 
				
			||||||
Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com
 | 
					 | 
				
			||||||
Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
 | 
					 | 
				
			||||||
Signed-off-by: Sean Christopherson <seanjc@google.com>
 | 
					 | 
				
			||||||
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 | 
					 | 
				
			||||||
(cherry-picked from commit 82d811ff566594de3676f35808e8a9e19c5c864c in stable v6.1.51)
 | 
					 | 
				
			||||||
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
 | 
					 | 
				
			||||||
---
 | 
					 | 
				
			||||||
 arch/x86/kvm/mmu/mmu.c | 3 ++-
 | 
					 | 
				
			||||||
 1 file changed, 2 insertions(+), 1 deletion(-)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
 | 
					 | 
				
			||||||
index 3220c1285984..c42ba5cde7a4 100644
 | 
					 | 
				
			||||||
--- a/arch/x86/kvm/mmu/mmu.c
 | 
					 | 
				
			||||||
+++ b/arch/x86/kvm/mmu/mmu.c
 | 
					 | 
				
			||||||
@@ -4261,7 +4261,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 | 
					 | 
				
			||||||
  * root was invalidated by a memslot update or a relevant mmu_notifier fired.
 | 
					 | 
				
			||||||
  */
 | 
					 | 
				
			||||||
 static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
 | 
					 | 
				
			||||||
-				struct kvm_page_fault *fault, int mmu_seq)
 | 
					 | 
				
			||||||
+				struct kvm_page_fault *fault,
 | 
					 | 
				
			||||||
+				unsigned long mmu_seq)
 | 
					 | 
				
			||||||
 {
 | 
					 | 
				
			||||||
 	struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa);
 | 
					 | 
				
			||||||
 
 | 
					 | 
				
			||||||
@ -45,10 +45,10 @@ index ebbf80d8b8bd..a79b10e57757 100644
 | 
				
			|||||||
 #define MSR_F16H_L2I_PERF_CTL		0xc0010230
 | 
					 #define MSR_F16H_L2I_PERF_CTL		0xc0010230
 | 
				
			||||||
 #define MSR_F16H_L2I_PERF_CTR		0xc0010231
 | 
					 #define MSR_F16H_L2I_PERF_CTR		0xc0010231
 | 
				
			||||||
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
 | 
					diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
 | 
				
			||||||
index 6daf6a8fa0c7..044e3869620c 100644
 | 
					index a608a2b78073..154e9c0c16bd 100644
 | 
				
			||||||
--- a/arch/x86/kernel/cpu/amd.c
 | 
					--- a/arch/x86/kernel/cpu/amd.c
 | 
				
			||||||
+++ b/arch/x86/kernel/cpu/amd.c
 | 
					+++ b/arch/x86/kernel/cpu/amd.c
 | 
				
			||||||
@@ -79,6 +79,10 @@ static const int amd_div0[] =
 | 
					@@ -80,6 +80,10 @@ static const int amd_div0[] =
 | 
				
			||||||
 	AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0x00, 0x0, 0x2f, 0xf),
 | 
					 	AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0x00, 0x0, 0x2f, 0xf),
 | 
				
			||||||
 			   AMD_MODEL_RANGE(0x17, 0x50, 0x0, 0x5f, 0xf));
 | 
					 			   AMD_MODEL_RANGE(0x17, 0x50, 0x0, 0x5f, 0xf));
 | 
				
			||||||
 
 | 
					 
 | 
				
			||||||
@ -59,7 +59,7 @@ index 6daf6a8fa0c7..044e3869620c 100644
 | 
				
			|||||||
 static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
 | 
					 static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
 | 
				
			||||||
 {
 | 
					 {
 | 
				
			||||||
 	int osvw_id = *erratum++;
 | 
					 	int osvw_id = *erratum++;
 | 
				
			||||||
@@ -1124,6 +1128,10 @@ static void init_amd(struct cpuinfo_x86 *c)
 | 
					@@ -1125,6 +1129,10 @@ static void init_amd(struct cpuinfo_x86 *c)
 | 
				
			||||||
 		pr_notice_once("AMD Zen1 DIV0 bug detected. Disable SMT for full protection.\n");
 | 
					 		pr_notice_once("AMD Zen1 DIV0 bug detected. Disable SMT for full protection.\n");
 | 
				
			||||||
 		setup_force_cpu_bug(X86_BUG_DIV0);
 | 
					 		setup_force_cpu_bug(X86_BUG_DIV0);
 | 
				
			||||||
 	}
 | 
					 	}
 | 
				
			||||||
@ -23,10 +23,10 @@ Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
 | 
				
			|||||||
 1 file changed, 1 insertion(+)
 | 
					 1 file changed, 1 insertion(+)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
 | 
					diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
 | 
				
			||||||
index fb9cde86930d..db8028864094 100644
 | 
					index cf31babfbbb9..99a7e93b2edf 100644
 | 
				
			||||||
--- a/arch/x86/kvm/svm/svm.c
 | 
					--- a/arch/x86/kvm/svm/svm.c
 | 
				
			||||||
+++ b/arch/x86/kvm/svm/svm.c
 | 
					+++ b/arch/x86/kvm/svm/svm.c
 | 
				
			||||||
@@ -4921,6 +4921,7 @@ static __init void svm_set_cpu_caps(void)
 | 
					@@ -4920,6 +4920,7 @@ static __init void svm_set_cpu_caps(void)
 | 
				
			||||||
 	if (nested) {
 | 
					 	if (nested) {
 | 
				
			||||||
 		kvm_cpu_cap_set(X86_FEATURE_SVM);
 | 
					 		kvm_cpu_cap_set(X86_FEATURE_SVM);
 | 
				
			||||||
 		kvm_cpu_cap_set(X86_FEATURE_VMCBCLEAN);
 | 
					 		kvm_cpu_cap_set(X86_FEATURE_VMCBCLEAN);
 | 
				
			||||||
@ -48,7 +48,7 @@ index b475d9a582b8..e829fa4c6788 100644
 | 
				
			|||||||
 
 | 
					 
 | 
				
			||||||
 static inline void fpstate_set_confidential(struct fpu_guest *gfpu)
 | 
					 static inline void fpstate_set_confidential(struct fpu_guest *gfpu)
 | 
				
			||||||
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
 | 
					diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
 | 
				
			||||||
index caf33486dc5e..cddd5018e6a4 100644
 | 
					index a083f9ac9e4f..1d190761d00f 100644
 | 
				
			||||||
--- a/arch/x86/kernel/fpu/core.c
 | 
					--- a/arch/x86/kernel/fpu/core.c
 | 
				
			||||||
+++ b/arch/x86/kernel/fpu/core.c
 | 
					+++ b/arch/x86/kernel/fpu/core.c
 | 
				
			||||||
@@ -369,14 +369,15 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
 | 
					@@ -369,14 +369,15 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
 | 
				
			||||||
		Loading…
	
		Reference in New Issue
	
	Block a user