68be554e71
Use the current ZFS 2.2.4 staging tree [0] with commit deb7a8423 ("Fix corruption caused by mmap flushing problems") on top. Additionally, include an open, but ack'd, pull request [1] that avoids a potential general protection fault due to touching a vbio after it was handed off to the kernel. [0]: https://github.com/openzfs/zfs/commits/zfs-2.2.4-staging/ [1]: https://github.com/openzfs/zfs/pull/16049 Both should mostly touch the module code. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
97 lines
4.1 KiB
Diff
97 lines
4.1 KiB
Diff
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
|
|
From: Rob Norris <rob.norris@klarasystems.com>
|
|
Date: Thu, 14 Mar 2024 10:57:30 +1100
|
|
Subject: [PATCH] abd_iter_page: don't use compound heads on Linux <4.5
|
|
|
|
Before 4.5 (specifically, torvalds/linux@ddc58f2), head and tail pages
|
|
in a compound page were refcounted separately. This means that using the
|
|
head page without taking a reference to it could see it cleaned up later
|
|
before we're finished with it. Specifically, bio_add_page() would take a
|
|
reference, and drop its reference after the bio completion callback
|
|
returns.
|
|
|
|
If the zio is executed immediately from the completion callback, this is
|
|
usually ok, as any data is referenced through the tail page referenced
|
|
by the ABD, and so becomes "live" that way. If there's a delay in zio
|
|
execution (high load, error injection), then the head page can be freed,
|
|
along with any dirty flags or other indicators that the underlying
|
|
memory is used. Later, when the zio completes and that memory is
|
|
accessed, its either unmapped and an unhandled fault takes down the
|
|
entire system, or it is mapped and we end up messing around in someone
|
|
else's memory. Both of these are very bad.
|
|
|
|
The solution on these older kernels is to take a reference to the head
|
|
page when we use it, and release it when we're done. There's not really
|
|
a sensible way under our current structure to do this; the "best" would
|
|
be to keep a list of head page references in the ABD, and release them
|
|
when the ABD is freed.
|
|
|
|
Since this additional overhead is totally unnecessary on 4.5+, where
|
|
head and tail pages share refcounts, I've opted to simply not use the
|
|
compound head in ABD page iteration there. This is theoretically less
|
|
efficient (though cleaning up head page references would add overhead),
|
|
but its safe, and we still get the other benefits of not mapping pages
|
|
before adding them to a bio and not mis-splitting pages.
|
|
|
|
There doesn't appear to be an obvious symbol name or config option we
|
|
can match on to discover this behaviour in configure (and the mm/page
|
|
APIs have changed a lot since then anyway), so I've gone with a simple
|
|
version check.
|
|
|
|
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
|
|
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
|
|
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
|
|
Sponsored-by: Klara, Inc.
|
|
Sponsored-by: Wasabi Technology, Inc.
|
|
Closes #15533
|
|
Closes #15588
|
|
(cherry picked from commit c6be6ce1755a3d9a3cbe70256cd8958ef83d8542)
|
|
---
|
|
module/os/linux/zfs/abd_os.c | 14 ++++++++++++++
|
|
1 file changed, 14 insertions(+)
|
|
|
|
diff --git a/module/os/linux/zfs/abd_os.c b/module/os/linux/zfs/abd_os.c
|
|
index 3fe01c0b7..d3255dcbc 100644
|
|
--- a/module/os/linux/zfs/abd_os.c
|
|
+++ b/module/os/linux/zfs/abd_os.c
|
|
@@ -62,6 +62,7 @@
|
|
#include <linux/kmap_compat.h>
|
|
#include <linux/mm_compat.h>
|
|
#include <linux/scatterlist.h>
|
|
+#include <linux/version.h>
|
|
#endif
|
|
|
|
#ifdef _KERNEL
|
|
@@ -1061,6 +1062,7 @@ abd_iter_page(struct abd_iter *aiter)
|
|
}
|
|
ASSERT(page);
|
|
|
|
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 5, 0)
|
|
if (PageTail(page)) {
|
|
/*
|
|
* This page is part of a "compound page", which is a group of
|
|
@@ -1082,11 +1084,23 @@ abd_iter_page(struct abd_iter *aiter)
|
|
* To do this, we need to adjust the offset to be counted from
|
|
* the head page. struct page for compound pages are stored
|
|
* contiguously, so we can just adjust by a simple offset.
|
|
+ *
|
|
+ * Before kernel 4.5, compound page heads were refcounted
|
|
+ * separately, such that moving back to the head page would
|
|
+ * require us to take a reference to it and releasing it once
|
|
+ * we're completely finished with it. In practice, that means
|
|
+ * when our caller is done with the ABD, which we have no
|
|
+ * insight into from here. Rather than contort this API to
|
|
+ * track head page references on such ancient kernels, we just
|
|
+ * compile this block out and use the tail pages directly. This
|
|
+ * is slightly less efficient, but makes everything far
|
|
+ * simpler.
|
|
*/
|
|
struct page *head = compound_head(page);
|
|
doff += ((page - head) * PAGESIZE);
|
|
page = head;
|
|
}
|
|
+#endif
|
|
|
|
/* final page and position within it */
|
|
aiter->iter_page = page;
|