102 lines
4.2 KiB
Diff
102 lines
4.2 KiB
Diff
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
|
|
From: Andy Lutomirski <luto@kernel.org>
|
|
Date: Sat, 4 Nov 2017 04:16:12 -0700
|
|
Subject: [PATCH] Revert "x86/mm: Stop calling leave_mm() in idle code"
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain; charset=UTF-8
|
|
Content-Transfer-Encoding: 8bit
|
|
|
|
CVE-2017-5754
|
|
|
|
This reverts commit 43858b4f25cf0adc5c2ca9cf5ce5fdf2532941e5.
|
|
|
|
The reason I removed the leave_mm() calls in question is because the
|
|
heuristic wasn't needed after that patch. With the original version
|
|
of my PCID series, we never flushed a "lazy cpu" (i.e. a CPU running
|
|
kernel thread) due a flush on the loaded mm.
|
|
|
|
Unfortunately, that caused architectural issues, so now I've
|
|
reinstated these flushes on non-PCID systems in:
|
|
|
|
commit b956575bed91 ("x86/mm: Flush more aggressively in lazy TLB mode").
|
|
|
|
That, in turn, gives us a power management and occasionally
|
|
performance regression as compared to old kernels: a process that
|
|
goes into a deep idle state on a given CPU and gets its mm flushed
|
|
due to activity on a different CPU will wake the idle CPU.
|
|
|
|
Reinstate the old ugly heuristic: if a CPU goes into ACPI C3 or an
|
|
intel_idle state that is likely to cause a TLB flush gets its mm
|
|
switched to init_mm before going idle.
|
|
|
|
FWIW, this heuristic is lousy. Whether we should change CR3 before
|
|
idle isn't a good hint except insofar as the performance hit is a bit
|
|
lower if the TLB is getting flushed by the idle code anyway. What we
|
|
really want to know is whether we anticipate being idle long enough
|
|
that the mm is likely to be flushed before we wake up. This is more a
|
|
matter of the expected latency than the idle state that gets chosen.
|
|
This heuristic also completely fails on systems that don't know
|
|
whether the TLB will be flushed (e.g. AMD systems?). OTOH it may be a
|
|
bit obsolete anyway -- PCID systems don't presently benefit from this
|
|
heuristic at all.
|
|
|
|
We also shouldn't do this callback from innermost bit of the idle code
|
|
due to the RCU nastiness it causes. All the information need is
|
|
available before rcu_idle_enter() needs to happen.
|
|
|
|
Signed-off-by: Andy Lutomirski <luto@kernel.org>
|
|
Cc: Borislav Petkov <bp@alien8.de>
|
|
Cc: Borislav Petkov <bpetkov@suse.de>
|
|
Cc: Brian Gerst <brgerst@gmail.com>
|
|
Cc: Denys Vlasenko <dvlasenk@redhat.com>
|
|
Cc: H. Peter Anvin <hpa@zytor.com>
|
|
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Cc: Peter Zijlstra <peterz@infradead.org>
|
|
Cc: Thomas Gleixner <tglx@linutronix.de>
|
|
Fixes: 43858b4f25cf "x86/mm: Stop calling leave_mm() in idle code"
|
|
Link: http://lkml.kernel.org/r/c513bbd4e653747213e05bc7062de000bf0202a5.1509793738.git.luto@kernel.org
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
(cherry picked from commit 675357362aeba19688440eb1aaa7991067f73b12)
|
|
Signed-off-by: Andy Whitcroft <apw@canonical.com>
|
|
Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
|
|
(cherry picked from commit b607843145fd0593fcd87e2596d1dc5a1d5f79a5)
|
|
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
|
|
---
|
|
arch/x86/mm/tlb.c | 16 +++++++++++++---
|
|
1 file changed, 13 insertions(+), 3 deletions(-)
|
|
|
|
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
|
|
index b27aceaf7ed1..ed06f1593390 100644
|
|
--- a/arch/x86/mm/tlb.c
|
|
+++ b/arch/x86/mm/tlb.c
|
|
@@ -194,12 +194,22 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
|
|
this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
|
|
this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
|
|
write_cr3(build_cr3(next, new_asid));
|
|
- trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH,
|
|
- TLB_FLUSH_ALL);
|
|
+
|
|
+ /*
|
|
+ * NB: This gets called via leave_mm() in the idle path
|
|
+ * where RCU functions differently. Tracing normally
|
|
+ * uses RCU, so we need to use the _rcuidle variant.
|
|
+ *
|
|
+ * (There is no good reason for this. The idle code should
|
|
+ * be rearranged to call this before rcu_idle_enter().)
|
|
+ */
|
|
+ trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
|
|
} else {
|
|
/* The new ASID is already up to date. */
|
|
write_cr3(build_cr3_noflush(next, new_asid));
|
|
- trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0);
|
|
+
|
|
+ /* See above wrt _rcuidle. */
|
|
+ trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
|
|
}
|
|
|
|
this_cpu_write(cpu_tlbstate.loaded_mm, next);
|
|
--
|
|
2.14.2
|
|
|