cherry-pick fix for KVM vCPU page fault loop
The mailing list thread [0] (found by Friedrich, many thanks!) leading
up to this patch sounds very familiar to issues users reported in the
community forum [1] and enterprise support channel, where a VM would
be stuck for no discernable reason with all vCPU threads spinning.
[0]: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com/T/#u
[1]: https://forum.proxmox.com/threads/127459/
Suggested-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
(cherry picked from commit 6810c247a1)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
			
			
This commit is contained in:
		
							parent
							
								
									e55e2c5f6b
								
							
						
					
					
						commit
						44f3d669a5
					
				@ -0,0 +1,75 @@
 | 
				
			|||||||
 | 
					From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
 | 
				
			||||||
 | 
					From: Sean Christopherson <seanjc@google.com>
 | 
				
			||||||
 | 
					Date: Wed, 23 Aug 2023 18:01:04 -0700
 | 
				
			||||||
 | 
					Subject: [PATCH] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that
 | 
				
			||||||
 | 
					 hangs vCPUs
 | 
				
			||||||
 | 
					MIME-Version: 1.0
 | 
				
			||||||
 | 
					Content-Type: text/plain; charset=UTF-8
 | 
				
			||||||
 | 
					Content-Transfer-Encoding: 8bit
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq in
 | 
				
			||||||
 | 
					kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
 | 
				
			||||||
 | 
					how KVM tracks the sequence counter snapshot.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Take the vCPU's mmu_seq snapshot as an "unsigned long" instead of an "int"
 | 
				
			||||||
 | 
					when checking to see if a page fault is stale, as the sequence count is
 | 
				
			||||||
 | 
					stored as an "unsigned long" everywhere else in KVM.  This fixes a bug
 | 
				
			||||||
 | 
					where KVM will effectively hang vCPUs due to always thinking page faults
 | 
				
			||||||
 | 
					are stale, which results in KVM refusing to "fix" faults.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					mmu_invalidate_seq (née mmu_notifier_seq) is a sequence counter used when
 | 
				
			||||||
 | 
					KVM is handling page faults to detect if userspace mappings relevant to
 | 
				
			||||||
 | 
					the guest were invalidated between snapshotting the counter and acquiring
 | 
				
			||||||
 | 
					mmu_lock, i.e. to ensure that the userspace mapping KVM is using to
 | 
				
			||||||
 | 
					resolve the page fault is fresh.  If KVM sees that the counter has
 | 
				
			||||||
 | 
					changed, KVM simply resumes the guest without fixing the fault.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					What _should_ happen is that the source of the mmu_notifier invalidations
 | 
				
			||||||
 | 
					eventually goes away, mmu_invalidate_seq becomes stable, and KVM can once
 | 
				
			||||||
 | 
					again fix guest page fault(s).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					But for a long-lived VM and/or a VM that the host just doesn't particularly
 | 
				
			||||||
 | 
					like, it's possible for a VM to be on the receiving end of 2 billion (with
 | 
				
			||||||
 | 
					a B) mmu_notifier invalidations.  When that happens, bit 31 will be set in
 | 
				
			||||||
 | 
					mmu_invalidate_seq.  This causes the value to be turned into a 32-bit
 | 
				
			||||||
 | 
					negative value when implicitly cast to an "int" by is_page_fault_stale(),
 | 
				
			||||||
 | 
					and then sign-extended into a 64-bit unsigned when the signed "int" is
 | 
				
			||||||
 | 
					implicitly cast back to an "unsigned long" on the call to
 | 
				
			||||||
 | 
					mmu_invalidate_retry_hva().
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					As a result of the casting and sign-extension, given a sequence counter of
 | 
				
			||||||
 | 
					e.g. 0x8002dc25, mmu_invalidate_retry_hva() ends up doing
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
						if (0x8002dc25 != 0xffffffff8002dc25)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					and signals that the page fault is stale and needs to be retried even
 | 
				
			||||||
 | 
					though the sequence counter is stable, and KVM effectively hangs any vCPU
 | 
				
			||||||
 | 
					that takes a page fault (EPT violation or #NPF when TDP is enabled).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Reported-by: Brian Rak <brak@vultr.com>
 | 
				
			||||||
 | 
					Reported-by: Amaan Cheval <amaan.cheval@gmail.com>
 | 
				
			||||||
 | 
					Reported-by: Eric Wheeler <kvm@lists.ewheeler.net>
 | 
				
			||||||
 | 
					Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com
 | 
				
			||||||
 | 
					Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
 | 
				
			||||||
 | 
					Signed-off-by: Sean Christopherson <seanjc@google.com>
 | 
				
			||||||
 | 
					Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 | 
				
			||||||
 | 
					(cherry-picked from commit 82d811ff566594de3676f35808e8a9e19c5c864c in stable v6.1.51)
 | 
				
			||||||
 | 
					Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					 arch/x86/kvm/mmu/mmu.c | 3 ++-
 | 
				
			||||||
 | 
					 1 file changed, 2 insertions(+), 1 deletion(-)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
 | 
				
			||||||
 | 
					index 3220c1285984..c42ba5cde7a4 100644
 | 
				
			||||||
 | 
					--- a/arch/x86/kvm/mmu/mmu.c
 | 
				
			||||||
 | 
					+++ b/arch/x86/kvm/mmu/mmu.c
 | 
				
			||||||
 | 
					@@ -4261,7 +4261,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 | 
				
			||||||
 | 
					  * root was invalidated by a memslot update or a relevant mmu_notifier fired.
 | 
				
			||||||
 | 
					  */
 | 
				
			||||||
 | 
					 static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
 | 
				
			||||||
 | 
					-				struct kvm_page_fault *fault, int mmu_seq)
 | 
				
			||||||
 | 
					+				struct kvm_page_fault *fault,
 | 
				
			||||||
 | 
					+				unsigned long mmu_seq)
 | 
				
			||||||
 | 
					 {
 | 
				
			||||||
 | 
					 	struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa);
 | 
				
			||||||
 | 
					 
 | 
				
			||||||
		Loading…
	
		Reference in New Issue
	
	Block a user