Compare commits

...

28 Commits

Author SHA1 Message Date
Tony Hutter baa5031456 Tag zfs-2.2.6
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-08-27 14:53:03 -07:00
shodanshok cd42e992b5 Enable L2 cache of all (MRU+MFU) metadata but MFU data only
`l2arc_mfuonly` was added to avoid wasting L2 ARC on read-once MRU
data and metadata. However it can be useful to cache as much
metadata as possible while, at the same time, restricting data
cache to MFU buffers only.

This patch allow for such behavior by setting `l2arc_mfuonly` to 2
(or higher). The list of possible values is the following:
0: cache both MRU and MFU for both data and metadata;
1: cache only MFU for both data and metadata;
2: cache both MRU and MFU for metadata, but only MFU for data.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #16343 
Closes #16402
2024-08-27 14:53:03 -07:00
Ameer Hamza c60df6a801 linux/zvol_os: fix zvol queue limits initialization
zvol queue limits initialization depends on `zv_volblocksize`, but it is
initialized later, leading to several limits being initialized with
incorrect values, including `max_discard_*` limits. This also causes
`blkdiscard` command to consistently fail, as `blk_ioctl_discard` reads
`bdev_max_discard_sectors()` limits as 0, leading to failure. The fix is
straightforward: initialize `zv->zv_volblocksize` early, before setting
the queue limits. This PR should fix `zvol/zvol_misc/zvol_misc_trim`
failure on recent PRs, as the test case issues `blkdiscard` for a zvol.
Additionally, `zvol_misc_trim` was recently enabled in `6c7d41a`,
which is why the issue wasn't identified earlier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16454
2024-08-26 15:10:16 -07:00
Rob Norris d8fa32a79d linux/zvol_os: tidy and document queue limit/config setup
It gets hairier again in Linux 6.11, so I want some actual theory of
operation laid out for next time.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-26 15:10:16 -07:00
Tino Reichardt 88a5ee0706 ZTS: fix zfs_copies_006_pos test on Ubuntu 20.04 (#16409)
This test was failing before:
- FAIL cli_root/zfs_copies/zfs_copies_006_pos (expected PASS)

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-08-26 15:10:16 -07:00
Tino Reichardt 0465fbecd7 ZTS: fix history_007_pos test on Ubuntu 24.04 (#16410)
The timezone "US/Mountain" isn't supported on newer linux versions.
Using the correct timezone "America/Denver" like it's done in FreeBSD
will fix this. Older Linux distros should behave also okay with this.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-08-26 15:10:16 -07:00
Shengqi Chen a99a37991e contrib: link zpool to zfs in bash-completion (#16376)
Currently user won't have completion of `zpool` command until they
trigger completion of `zfs` first. This patch adds a link to `zfs`,
thus user can use both to initialize the completion.

Fixes: #16320

Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2024-08-26 15:10:16 -07:00
Shengqi Chen 2ca1515374 module/icp/asm-arm/sha2: enable non-SIMD asm kernels on armv5/6
My merged pull request #15557 fixes compilation of sha2 kernels on arm
v5/6. However, the compiler guards only allows sha256/512_armv7_impl to
be used when __ARM_ARCH > 6. This patch enables these ASM kernels on all
arm architectures. Some compiler guards are adjusted accordingly to
avoid the unnecessary compilation of SIMD (e.g., neon, armv8ce) kernels
on old architectures.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15623
2024-08-26 15:10:16 -07:00
Shengqi Chen bce36e21ca module/icp/asm-arm/sha2: auto detect __ARM_ARCH
This patch uses __ARM_ARCH set by compiler (both
GCC and Clang have this) whenever possible instead
of hardcoding it to 7. This change allows code to
compile on earlier ARM architectures such as armv5te.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15557
2024-08-26 15:10:16 -07:00
Tony Hutter 86492e3c96 Linux 6.10 compat: META
Update the META file to reflect compatibility with the 6.10 kernel.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16466
2024-08-22 15:43:46 -07:00
Ameer Hamza 07f0465742 linux/zvol_os.c: cleanup limits for non-blk mq case
Rob Noris suggested that we could clean up redundant limits for the case
of non-blk mq scenario.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16462
2024-08-22 15:43:34 -07:00
Ameer Hamza 0f9457d1dd linux/zvol_os.c: Fix max_discard_sectors limit for 6.8+ kernel
In kernels 6.8 and later, the zvol block device is allocated with
qlimits passed during initialization. However, the zvol driver does not
set `max_hw_discard_sectors`, which is necessary to properly
initialize `max_discard_sectors`. This causes the `zvol_misc_trim` test
to fail on 6.8+ kernels when invoking the `blkdiscard` command. Setting
`max_hw_discard_sectors` in the `HAVE_BLK_ALLOC_DISK_2ARG` case resolve
the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16462
2024-08-22 15:43:34 -07:00
Justin Gottula 859f906a4b Fix null ptr deref when renaming a zvol with snaps and snapdev=visible (#16316)
If a zvol is renamed, and it has one or more snapshots, and
snapdev=visible is true for the zvol, then the rename causes a kernel
null pointer dereference error. This has the effect (on Linux, anyway)
of killing the z_zvol taskq kthread, with locks still held; which in
turn causes a variety of zvol-related operations afterward to hang
indefinitely (such as udev workers, among other things).

The problem occurs because of an oversight in #15486
(e36ff84c33). As documented in
dataset_kstats_create, some datasets may not actually have kstats
allocated for them; and at least at the present time, this is true for
snapshots. In practical terms, this means that for snapshots,
dk->dk_kstats will be NULL. The dataset_kstats_rename function
introduced in the patch above does not first check whether dk->dk_kstats
is NULL before proceeding, unlike e.g. the nearby
dataset_kstats_update_* functions.

In the very particular circumstance in which a zvol is renamed, AND that
zvol has one or more snapshots, AND that zvol also has snapdev=visible,
zvol_rename_minors_impl will loop over not just the zvol dataset itself,
but each of the zvol's snapshots as well, so that their device nodes
will be renamed as well. This results in dataset_kstats_create being
called for snapshots, where, as we've established, dk->dk_kstats is
NULL.

Fix this by simply adding a NULL check before doing anything in
dataset_kstats_rename.

This still allows the dataset_name kstat value for the zvol to be
updated (as was the intent of the original patch), and merely blocks
attempts by the code to act upon the zvol's non-kstat-having snapshots.
If at some future time, kstats are added for snapshots, then things
should work as intended in that case as well.

Signed-off-by: Justin Gottula <justin@jgottula.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-22 15:42:49 -07:00
Tony Hutter 84a9861536 Linux 6.10 compat: Fix zvol NULL pointer deference
zvol_alloc_non_blk_mq()->blk_queue_set_write_cache() needs the disk
queue setup to prevent a NULL pointer deference.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16453
2024-08-22 15:42:49 -07:00
Tony Hutter 9d64d1bfad Linux 6.10 compat: fix rpm-kmod and builtin
The 6.10 kernel broke our rpm-kmod builds.  The 6.10 kernel really
wants the source files in the same directory as the object files.
This workaround makes rpm-kmod work again.  It also updates
the builtin kernel codepath to work correctly with 6.10.

See kernel commits:

b1992c3772e6 kbuild: use $(src) instead of $(srctree)/$(src) for source
                     directory
9a0ebe5011f4 kbuild: use $(obj)/ instead of $(src)/ for common pattern
                     rules

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16439
Closes #16450
2024-08-22 15:42:49 -07:00
Tony Hutter ce22dc2589 ZTS: Use /dev/urandom instead of /dev/random
Use /dev/urandom so we never have to wait on entropy.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16442
2024-08-22 15:42:14 -07:00
Rob Norris 8479a45abe Linux 6.11: avoid passing "end" sentinel to register_sysctl()
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Rob Norris 8156099cf2 Linux 6.11: add compat macro for page_mapping()
Since the change to folios it has just been a wrapper anyway. Linux has
removed their wrapper, so we add one.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Rob Norris 11ad6124c3 Linux 6.11: add more queue_limit fields with removed setters
These fields are very old, so no detection necessary; we just move them
into the limit setup functions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Rob Norris 11de432c8b Linux 6.11: IO stats is now a queue feature flag
Apply them with with the rest of the settings.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Rob Norris 464747ffd3 Linux 6.11: first arg to proc_handler is now const
Detect it, and use a macro to make sure we always match the prototype.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Rob Norris 92a8af0f8b Linux 6.11: get backing_dev_info through queue gendisk
It's no longer available directly on the request queue, but its easy to
get from the attached disk.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Rob Norris 4fa84563b8 Linux 6.11: enable queue flush through queue limits
In 6.11 struct queue_limits gains a 'features' field, where, among other
things, flush and write-cache are enabled. Detect it and use it.

Along the way, the blk_queue_set_write_cache() compat wrapper gets a
little cleanup. Since both flags are alway set together, its now a
single bool. Also the very very ancient version that sets q->flush_flags
directly couldn't actually turn it off, so I've fixed that. Not that we
use it, but still.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
2024-08-22 15:42:14 -07:00
Mark Johnston 6961d4fb57 ZTS: Add a test to verify that copy_file_range obeys RLIMIT_FSIZE
Signed-off-by: Mark Johnston <markj@FreeBSD.org>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-22 15:31:56 -07:00
Mark Johnston 3a36797ad6 FreeBSD: Fix RLIMIT_FSIZE handling for block cloning
ZFS implements copy_file_range(2) using block cloning when possible.
This implementation must respect the RLIMIT_FSIZE limit.

zfs_clone_range() already checks the limit, so it is safe to remove this
check in zfs_freebsd_copy_file_range().  Moreover, the removed check
produces false positives: the length passed to copy_file_range(2) may be
larger than the input file size; as the man page notes, "for best
performance, call copy_file_range() with the largest len value
possible."  In particular, some existing code passes SSIZE_MAX there.

The check in zfs_clone_range() clamps the length to the input file's
size before checking, but the removed check uses the caller supplied
length, so something like

$ echo a > /tmp/foo
$ limits -f 1024 cat /tmp/foo > /tmp/bar

fails because FreeBSD's cat(1) uses copy_file_range(2) in the manner
described above.

Reported-by: Philip Paeps <philip@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-22 15:17:21 -07:00
c1ick ac6500389b zfs: add bounds checking to zil_parse (#16308)
Make sure log record don't stray beyond valid memory region.

There is a lack of verification of the space occupied by fixed members
of lr_t in the zil_parse.

We can create a crafted image to trigger an out of bounds read by
following these steps:
    1) Do some file operations and reboot to simulate abnormal exit
       without umount
    2) zil_chain.zc_nused: 0x1000
    3) First lr_t
       lr_t.lrc_txtype: 0x0
       lr_t.lrc_reclen: 0x1000-0xb8-0x1
       lr_t.lrc_txg: 0x0
       lr_t.lrc_seq: 0x1
    4) Update checksum in zil_chain.zc_eck

Fix:
Add some checks to make sure the remaining bytes are large enough to
hold an log record.

Signed-off-by: XDTG <click1799@163.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-08-22 15:12:54 -07:00
Rob Norris 1f055436f3 linux/zvol_os: fix SET_ERROR with negative return codes
SET_ERROR is our facility for tracking errors internally. The negation
is to match the what the kernel expects from us. Thus, the negation
should happen outside of the SET_ERROR.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16364
2024-08-22 15:11:44 -07:00
Tino Reichardt 0172ee525b ZTS: fix io_uring test on RHEL 9 variants (#16411)
Simplify the test, by using the variable "$PLATFORM_ID" in favor
of "$REDHAT_SUPPORT_PRODUCT_VERSION".

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-08-22 15:06:40 -07:00
35 changed files with 484 additions and 125 deletions
+2 -2
View File
@@ -1,10 +1,10 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 2.2.5
Version: 2.2.6
Release: 1
Release-Tags: relext
License: CDDL
Author: OpenZFS
Linux-Maximum: 6.9
Linux-Maximum: 6.10
Linux-Minimum: 3.10
+28
View File
@@ -25,6 +25,8 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_PLUG], [
dnl #
dnl # 2.6.32 - 4.11: statically allocated bdi in request_queue
dnl # 4.12: dynamically allocated bdi in request_queue
dnl # 6.11: bdi no longer available through request_queue, so get it from
dnl # the gendisk attached to the queue
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLK_QUEUE_BDI], [
ZFS_LINUX_TEST_SRC([blk_queue_bdi], [
@@ -47,6 +49,30 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_BDI], [
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLK_QUEUE_DISK_BDI], [
ZFS_LINUX_TEST_SRC([blk_queue_disk_bdi], [
#include <linux/blkdev.h>
#include <linux/backing-dev.h>
], [
struct request_queue q;
struct gendisk disk;
struct backing_dev_info bdi __attribute__ ((unused));
q.disk = &disk;
q.disk->bdi = &bdi;
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_DISK_BDI], [
AC_MSG_CHECKING([whether backing_dev_info is available through queue gendisk])
ZFS_LINUX_TEST_RESULT([blk_queue_disk_bdi], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_DISK_BDI, 1,
[backing_dev_info is available through queue gendisk])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 5.9: added blk_queue_update_readahead(),
dnl # 5.15: renamed to disk_update_readahead()
@@ -407,6 +433,7 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_MQ], [
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLK_QUEUE], [
ZFS_AC_KERNEL_SRC_BLK_QUEUE_PLUG
ZFS_AC_KERNEL_SRC_BLK_QUEUE_BDI
ZFS_AC_KERNEL_SRC_BLK_QUEUE_DISK_BDI
ZFS_AC_KERNEL_SRC_BLK_QUEUE_UPDATE_READAHEAD
ZFS_AC_KERNEL_SRC_BLK_QUEUE_DISCARD
ZFS_AC_KERNEL_SRC_BLK_QUEUE_SECURE_ERASE
@@ -421,6 +448,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLK_QUEUE], [
AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE], [
ZFS_AC_KERNEL_BLK_QUEUE_PLUG
ZFS_AC_KERNEL_BLK_QUEUE_BDI
ZFS_AC_KERNEL_BLK_QUEUE_DISK_BDI
ZFS_AC_KERNEL_BLK_QUEUE_UPDATE_READAHEAD
ZFS_AC_KERNEL_BLK_QUEUE_DISCARD
ZFS_AC_KERNEL_BLK_QUEUE_SECURE_ERASE
+21
View File
@@ -58,6 +58,13 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_MAKE_REQUEST_FN], [
disk = blk_alloc_disk(lim, NUMA_NO_NODE);
])
ZFS_LINUX_TEST_SRC([blkdev_queue_limits_features], [
#include <linux/blkdev.h>
],[
struct queue_limits *lim = NULL;
lim->features = 0;
])
ZFS_LINUX_TEST_SRC([blk_cleanup_disk], [
#include <linux/blkdev.h>
],[
@@ -114,6 +121,20 @@ AC_DEFUN([ZFS_AC_KERNEL_MAKE_REQUEST_FN], [
AC_MSG_RESULT(yes)
AC_DEFINE([HAVE_BLK_ALLOC_DISK_2ARG], 1, [blk_alloc_disk() exists and takes 2 args])
dnl #
dnl # Linux 6.11 API change:
dnl # struct queue_limits gains a 'features' field,
dnl # used to set flushing options
dnl #
AC_MSG_CHECKING([whether struct queue_limits has a features field])
ZFS_LINUX_TEST_RESULT([blkdev_queue_limits_features], [
AC_MSG_RESULT(yes)
AC_DEFINE([HAVE_BLKDEV_QUEUE_LIMITS_FEATURES], 1,
[struct queue_limits has a features field])
], [
AC_MSG_RESULT(no)
])
dnl #
dnl # 5.20 API change,
dnl # Removed blk_cleanup_disk(), put_disk() should be used.
-17
View File
@@ -1,17 +0,0 @@
AC_DEFUN([ZFS_AC_KERNEL_SRC_MM_PAGE_SIZE], [
ZFS_LINUX_TEST_SRC([page_size], [
#include <linux/mm.h>
],[
unsigned long s;
s = page_size(NULL);
])
])
AC_DEFUN([ZFS_AC_KERNEL_MM_PAGE_SIZE], [
AC_MSG_CHECKING([whether page_size() is available])
ZFS_LINUX_TEST_RESULT([page_size], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_MM_PAGE_SIZE, 1, [page_size() is available])
],[
AC_MSG_RESULT(no)
])
])
+36
View File
@@ -0,0 +1,36 @@
AC_DEFUN([ZFS_AC_KERNEL_SRC_MM_PAGE_SIZE], [
ZFS_LINUX_TEST_SRC([page_size], [
#include <linux/mm.h>
],[
unsigned long s;
s = page_size(NULL);
])
])
AC_DEFUN([ZFS_AC_KERNEL_MM_PAGE_SIZE], [
AC_MSG_CHECKING([whether page_size() is available])
ZFS_LINUX_TEST_RESULT([page_size], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_MM_PAGE_SIZE, 1, [page_size() is available])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_MM_PAGE_MAPPING], [
ZFS_LINUX_TEST_SRC([page_mapping], [
#include <linux/pagemap.h>
],[
struct page *p = NULL;
struct address_space *m = page_mapping(NULL);
])
])
AC_DEFUN([ZFS_AC_KERNEL_MM_PAGE_MAPPING], [
AC_MSG_CHECKING([whether page_mapping() is available])
ZFS_LINUX_TEST_RESULT([page_mapping], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_MM_PAGE_MAPPING, 1, [page_mapping() is available])
],[
AC_MSG_RESULT(no)
])
])
+59
View File
@@ -25,3 +25,62 @@ AC_DEFUN([ZFS_AC_KERNEL_REGISTER_SYSCTL_TABLE], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # Linux 6.11 register_sysctl() enforces that sysctl tables no longer
dnl # supply a sentinel end-of-table element. 6.6 introduces
dnl # register_sysctl_sz() to enable callers to choose, so we use it if
dnl # available for backward compatibility.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_SZ], [
ZFS_LINUX_TEST_SRC([has_register_sysctl_sz], [
#include <linux/sysctl.h>
],[
struct ctl_table test_table[] __attribute__((unused)) = {0};
register_sysctl_sz("", test_table, 0);
])
])
AC_DEFUN([ZFS_AC_KERNEL_REGISTER_SYSCTL_SZ], [
AC_MSG_CHECKING([whether register_sysctl_sz exists])
ZFS_LINUX_TEST_RESULT([has_register_sysctl_sz], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_REGISTER_SYSCTL_SZ, 1,
[register_sysctl_sz exists])
],[
AC_MSG_RESULT([no])
])
])
dnl #
dnl # Linux 6.11 makes const the ctl_table arg of proc_handler
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_PROC_HANDLER_CTL_TABLE_CONST], [
ZFS_LINUX_TEST_SRC([has_proc_handler_ctl_table_const], [
#include <linux/sysctl.h>
static int test_handler(
const struct ctl_table *ctl __attribute((unused)),
int write __attribute((unused)),
void *buffer __attribute((unused)),
size_t *lenp __attribute((unused)),
loff_t *ppos __attribute((unused)))
{
return (0);
}
], [
proc_handler *ph __attribute((unused)) =
&test_handler;
])
])
AC_DEFUN([ZFS_AC_KERNEL_PROC_HANDLER_CTL_TABLE_CONST], [
AC_MSG_CHECKING([whether proc_handler ctl_table arg is const])
ZFS_LINUX_TEST_RESULT([has_proc_handler_ctl_table_const], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_PROC_HANDLER_CTL_TABLE_CONST, 1,
[proc_handler ctl_table arg is const])
], [
AC_MSG_RESULT([no])
])
])
+6
View File
@@ -166,9 +166,12 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_WRITEPAGE_T
ZFS_AC_KERNEL_SRC_RECLAIMED
ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_TABLE
ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_SZ
ZFS_AC_KERNEL_SRC_PROC_HANDLER_CTL_TABLE_CONST
ZFS_AC_KERNEL_SRC_COPY_SPLICE_READ
ZFS_AC_KERNEL_SRC_SYNC_BDEV
ZFS_AC_KERNEL_SRC_MM_PAGE_SIZE
ZFS_AC_KERNEL_SRC_MM_PAGE_MAPPING
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_SRC_CPU_HAS_FEATURE
@@ -317,9 +320,12 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_WRITEPAGE_T
ZFS_AC_KERNEL_RECLAIMED
ZFS_AC_KERNEL_REGISTER_SYSCTL_TABLE
ZFS_AC_KERNEL_REGISTER_SYSCTL_SZ
ZFS_AC_KERNEL_PROC_HANDLER_CTL_TABLE_CONST
ZFS_AC_KERNEL_COPY_SPLICE_READ
ZFS_AC_KERNEL_SYNC_BDEV
ZFS_AC_KERNEL_MM_PAGE_SIZE
ZFS_AC_KERNEL_MM_PAGE_MAPPING
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_CPU_HAS_FEATURE
+1
View File
@@ -1 +1,2 @@
/zfs
/zpool
+8 -4
View File
@@ -1,5 +1,9 @@
nodist_bashcompletion_DATA = %D%/zfs
SUBSTFILES += $(nodist_bashcompletion_DATA)
nodist_bashcompletion_DATA = %D%/zfs %D%/zpool
COMPLETION_FILES = %D%/zfs
SUBSTFILES += $(COMPLETION_FILES)
SHELLCHECKSCRIPTS += $(nodist_bashcompletion_DATA)
$(call SHELLCHECK_OPTS,$(nodist_bashcompletion_DATA)): SHELLCHECK_SHELL = bash
SHELLCHECKSCRIPTS += $(COMPLETION_FILES)
$(call SHELLCHECK_OPTS,$(COMPLETION_FILES)): SHELLCHECK_SHELL = bash
%D%/zpool: %D%/zfs
$(LN_S) zfs $@
+22 -13
View File
@@ -57,6 +57,11 @@ blk_queue_flag_clear(unsigned int flag, struct request_queue *q)
#endif
/*
* 6.11 API
* Setting the flush flags directly is no longer possible; flush flags are set
* on the queue_limits structure and passed to blk_disk_alloc(). In this case
* we remove this function entirely.
*
* 4.7 API,
* The blk_queue_write_cache() interface has replaced blk_queue_flush()
* interface. However, the new interface is GPL-only thus we implement
@@ -68,39 +73,43 @@ blk_queue_flag_clear(unsigned int flag, struct request_queue *q)
* new one is GPL-only. Thus if the GPL-only version is detected we
* implement our own trivial helper.
*/
#if !defined(HAVE_BLK_ALLOC_DISK_2ARG) || \
!defined(HAVE_BLKDEV_QUEUE_LIMITS_FEATURES)
static inline void
blk_queue_set_write_cache(struct request_queue *q, bool wc, bool fua)
blk_queue_set_write_cache(struct request_queue *q, bool on)
{
#if defined(HAVE_BLK_QUEUE_WRITE_CACHE_GPL_ONLY)
if (wc)
if (on) {
blk_queue_flag_set(QUEUE_FLAG_WC, q);
else
blk_queue_flag_clear(QUEUE_FLAG_WC, q);
if (fua)
blk_queue_flag_set(QUEUE_FLAG_FUA, q);
else
} else {
blk_queue_flag_clear(QUEUE_FLAG_WC, q);
blk_queue_flag_clear(QUEUE_FLAG_FUA, q);
}
#elif defined(HAVE_BLK_QUEUE_WRITE_CACHE)
blk_queue_write_cache(q, wc, fua);
blk_queue_write_cache(q, on, on);
#elif defined(HAVE_BLK_QUEUE_FLUSH_GPL_ONLY)
if (wc)
q->flush_flags |= REQ_FLUSH;
if (fua)
q->flush_flags |= REQ_FUA;
if (on)
q->flush_flags |= REQ_FLUSH | REQ_FUA;
else
q->flush_flags &= ~(REQ_FLUSH | REQ_FUA);
#elif defined(HAVE_BLK_QUEUE_FLUSH)
blk_queue_flush(q, (wc ? REQ_FLUSH : 0) | (fua ? REQ_FUA : 0));
blk_queue_flush(q, on ? (REQ_FLUSH | REQ_FUA) : 0);
#else
#error "Unsupported kernel"
#endif
}
#endif /* !HAVE_BLK_ALLOC_DISK_2ARG || !HAVE_BLKDEV_QUEUE_LIMITS_FEATURES */
static inline void
blk_queue_set_read_ahead(struct request_queue *q, unsigned long ra_pages)
{
#if !defined(HAVE_BLK_QUEUE_UPDATE_READAHEAD) && \
!defined(HAVE_DISK_UPDATE_READAHEAD)
#ifdef HAVE_BLK_QUEUE_BDI_DYNAMIC
#if defined(HAVE_BLK_QUEUE_BDI_DYNAMIC)
q->backing_dev_info->ra_pages = ra_pages;
#elif defined(HAVE_BLK_QUEUE_DISK_BDI)
q->disk->bdi->ra_pages = ra_pages;
#else
q->backing_dev_info.ra_pages = ra_pages;
#endif
@@ -21,16 +21,23 @@
/*
* Copyright (c) 2023, 2024, Klara Inc.
* Copyright (c) 2024, Rob Norris <robn@despairlabs.com>
*/
#ifndef _ZFS_MM_COMPAT_H
#define _ZFS_MM_COMPAT_H
#include <linux/mm.h>
#include <linux/pagemap.h>
/* 5.4 introduced page_size(). Older kernels can use a trivial macro instead */
#ifndef HAVE_MM_PAGE_SIZE
#define page_size(p) ((unsigned long)(PAGE_SIZE << compound_order(p)))
#endif
/* 6.11 removed page_mapping(). A simple wrapper around folio_mapping() works */
#ifndef HAVE_MM_PAGE_MAPPING
#define page_mapping(p) folio_mapping(page_folio(p))
#endif
#endif /* _ZFS_MM_COMPAT_H */
+10 -4
View File
@@ -121,20 +121,26 @@ Controls whether buffers present on special vdevs are eligible for caching
into L2ARC.
If set to 1, exclude dbufs on special vdevs from being cached to L2ARC.
.
.It Sy l2arc_mfuonly Ns = Ns Sy 0 Ns | Ns 1 Pq int
.It Sy l2arc_mfuonly Ns = Ns Sy 0 Ns | Ns 1 Ns | Ns 2 Pq int
Controls whether only MFU metadata and data are cached from ARC into L2ARC.
This may be desired to avoid wasting space on L2ARC when reading/writing large
amounts of data that are not expected to be accessed more than once.
.Pp
The default is off,
The default is 0,
meaning both MRU and MFU data and metadata are cached.
When turning off this feature, some MRU buffers will still be present
in ARC and eventually cached on L2ARC.
When turning off this feature (setting it to 0), some MRU buffers will
still be present in ARC and eventually cached on L2ARC.
.No If Sy l2arc_noprefetch Ns = Ns Sy 0 ,
some prefetched buffers will be cached to L2ARC, and those might later
transition to MRU, in which case the
.Sy l2arc_mru_asize No arcstat will not be Sy 0 .
.Pp
Setting it to 1 means to L2 cache only MFU data and metadata.
.Pp
Setting it to 2 means to L2 cache all metadata (MRU+MFU) but
only MFU data (ie: MRU data are not cached). This can be the right setting
to cache as much metadata as possible even when having high data turnover.
.Pp
Regardless of
.Sy l2arc_noprefetch ,
some MFU buffers might be evicted from ARC,
+2 -2
View File
@@ -16,8 +16,8 @@ src = @abs_srcdir@
obj = @abs_builddir@
else
zfs_include = $(srctree)/include/zfs
icp_include = $(srctree)/$(src)/icp/include
zstd_include = $(srctree)/$(src)/zstd/include
icp_include = $(src)/icp/include
zstd_include = $(src)/zstd/include
ZFS_MODULE_CFLAGS += -include $(zfs_include)/zfs_config.h
endif
+13 -9
View File
@@ -118,7 +118,15 @@ const sha256_ops_t sha256_shani_impl = {
};
#endif
#elif defined(__aarch64__) || (defined(__arm__) && __ARM_ARCH > 6)
#elif defined(__aarch64__) || defined(__arm__)
extern void zfs_sha256_block_armv7(uint32_t s[8], const void *, size_t);
const sha256_ops_t sha256_armv7_impl = {
.is_supported = sha2_is_supported,
.transform = zfs_sha256_block_armv7,
.name = "armv7"
};
#if __ARM_ARCH > 6
static boolean_t sha256_have_neon(void)
{
return (kfpu_allowed() && zfs_neon_available());
@@ -129,13 +137,6 @@ static boolean_t sha256_have_armv8ce(void)
return (kfpu_allowed() && zfs_sha256_available());
}
extern void zfs_sha256_block_armv7(uint32_t s[8], const void *, size_t);
const sha256_ops_t sha256_armv7_impl = {
.is_supported = sha2_is_supported,
.transform = zfs_sha256_block_armv7,
.name = "armv7"
};
TF(zfs_sha256_block_neon, tf_sha256_neon);
const sha256_ops_t sha256_neon_impl = {
.is_supported = sha256_have_neon,
@@ -149,6 +150,7 @@ const sha256_ops_t sha256_armv8_impl = {
.transform = tf_sha256_armv8ce,
.name = "armv8-ce"
};
#endif
#elif defined(__PPC64__)
static boolean_t sha256_have_isa207(void)
@@ -192,11 +194,13 @@ static const sha256_ops_t *const sha256_impls[] = {
#if defined(__x86_64) && defined(HAVE_SSE4_1)
&sha256_shani_impl,
#endif
#if defined(__aarch64__) || (defined(__arm__) && __ARM_ARCH > 6)
#if defined(__aarch64__) || defined(__arm__)
&sha256_armv7_impl,
#if __ARM_ARCH > 6
&sha256_neon_impl,
&sha256_armv8_impl,
#endif
#endif
#if defined(__PPC64__)
&sha256_ppc_impl,
&sha256_power8_impl,
+8 -11
View File
@@ -88,7 +88,7 @@ const sha512_ops_t sha512_avx2_impl = {
};
#endif
#elif defined(__aarch64__)
#elif defined(__aarch64__) || defined(__arm__)
extern void zfs_sha512_block_armv7(uint64_t s[8], const void *, size_t);
const sha512_ops_t sha512_armv7_impl = {
.is_supported = sha2_is_supported,
@@ -96,6 +96,7 @@ const sha512_ops_t sha512_armv7_impl = {
.name = "armv7"
};
#if defined(__aarch64__)
static boolean_t sha512_have_armv8ce(void)
{
return (kfpu_allowed() && zfs_sha512_available());
@@ -107,15 +108,9 @@ const sha512_ops_t sha512_armv8_impl = {
.transform = tf_sha512_armv8ce,
.name = "armv8-ce"
};
#endif
#elif defined(__arm__) && __ARM_ARCH > 6
extern void zfs_sha512_block_armv7(uint64_t s[8], const void *, size_t);
const sha512_ops_t sha512_armv7_impl = {
.is_supported = sha2_is_supported,
.transform = zfs_sha512_block_armv7,
.name = "armv7"
};
#if defined(__arm__) && __ARM_ARCH > 6
static boolean_t sha512_have_neon(void)
{
return (kfpu_allowed() && zfs_neon_available());
@@ -127,6 +122,7 @@ const sha512_ops_t sha512_neon_impl = {
.transform = tf_sha512_neon,
.name = "neon"
};
#endif
#elif defined(__PPC64__)
TF(zfs_sha512_ppc, tf_sha512_ppc);
@@ -164,14 +160,15 @@ static const sha512_ops_t *const sha512_impls[] = {
#if defined(__x86_64) && defined(HAVE_AVX2)
&sha512_avx2_impl,
#endif
#if defined(__aarch64__)
#if defined(__aarch64__) || defined(__arm__)
&sha512_armv7_impl,
#if defined(__aarch64__)
&sha512_armv8_impl,
#endif
#if defined(__arm__) && __ARM_ARCH > 6
&sha512_armv7_impl,
&sha512_neon_impl,
#endif
#endif
#if defined(__PPC64__)
&sha512_ppc_impl,
&sha512_power8_impl,
+8 -3
View File
@@ -21,8 +21,11 @@
#if defined(__arm__)
#define __ARM_ARCH__ 7
#define __ARM_MAX_ARCH__ 7
#ifndef __ARM_ARCH
# define __ARM_ARCH__ 7
#else
# define __ARM_ARCH__ __ARM_ARCH
#endif
#if defined(__thumb2__)
.syntax unified
@@ -1834,6 +1837,7 @@ zfs_sha256_block_armv7:
#endif
.size zfs_sha256_block_armv7,.-zfs_sha256_block_armv7
#if __ARM_ARCH__ >= 7
.arch armv7-a
.fpu neon
@@ -2766,4 +2770,5 @@ zfs_sha256_block_armv8:
bx lr @ bx lr
.size zfs_sha256_block_armv8,.-zfs_sha256_block_armv8
#endif
#endif // #if __ARM_ARCH__ >= 7
#endif // #if defined(__arm__)
+8 -3
View File
@@ -21,8 +21,11 @@
#if defined(__arm__)
#define __ARM_ARCH__ 7
#define __ARM_MAX_ARCH__ 7
#ifndef __ARM_ARCH
# define __ARM_ARCH__ 7
#else
# define __ARM_ARCH__ __ARM_ARCH
#endif
#ifndef __KERNEL__
# define VFP_ABI_PUSH vstmdb sp!,{d8-d15}
@@ -490,6 +493,7 @@ zfs_sha512_block_armv7:
#endif
.size zfs_sha512_block_armv7,.-zfs_sha512_block_armv7
#if __ARM_ARCH__ >= 7
.arch armv7-a
.fpu neon
@@ -1819,4 +1823,5 @@ zfs_sha512_block_neon:
VFP_ABI_POP
bx lr @ .word 0xe12fff1e
.size zfs_sha512_block_neon,.-zfs_sha512_block_neon
#endif
#endif // #if __ARM_ARCH__ >= 7
#endif // #if defined(__arm__)
-7
View File
@@ -6268,7 +6268,6 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap)
struct vnode *invp = ap->a_invp;
struct vnode *outvp = ap->a_outvp;
struct mount *mp;
struct uio io;
int error;
uint64_t len = *ap->a_lenp;
@@ -6316,12 +6315,6 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap)
goto out_locked;
#endif
io.uio_offset = *ap->a_outoffp;
io.uio_resid = *ap->a_lenp;
error = vn_rlimit_fsize(outvp, &io, ap->a_fsizetd);
if (error != 0)
goto out_locked;
error = zfs_clone_range(VTOZ(invp), ap->a_inoffp, VTOZ(outvp),
ap->a_outoffp, &len, ap->a_outcred);
if (error == EXDEV || error == EAGAIN || error == EINVAL ||
+47 -6
View File
@@ -22,6 +22,9 @@
*
* Solaris Porting Layer (SPL) Proc Implementation.
*/
/*
* Copyright (c) 2024, Rob Norris <robn@despairlabs.com>
*/
#include <sys/systeminfo.h>
#include <sys/kstat.h>
@@ -43,6 +46,12 @@ typedef struct ctl_table __no_const spl_ctl_table;
typedef struct ctl_table spl_ctl_table;
#endif
#ifdef HAVE_PROC_HANDLER_CTL_TABLE_CONST
#define CONST_CTL_TABLE const struct ctl_table
#else
#define CONST_CTL_TABLE struct ctl_table
#endif
static unsigned long table_min = 0;
static unsigned long table_max = ~0;
@@ -60,7 +69,7 @@ struct proc_dir_entry *proc_spl_kstat = NULL;
#ifdef DEBUG_KMEM
static int
proc_domemused(struct ctl_table *table, int write,
proc_domemused(CONST_CTL_TABLE *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
int rc = 0;
@@ -88,7 +97,7 @@ proc_domemused(struct ctl_table *table, int write,
#endif /* DEBUG_KMEM */
static int
proc_doslab(struct ctl_table *table, int write,
proc_doslab(CONST_CTL_TABLE *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
int rc = 0;
@@ -135,7 +144,7 @@ proc_doslab(struct ctl_table *table, int write,
}
static int
proc_dohostid(struct ctl_table *table, int write,
proc_dohostid(CONST_CTL_TABLE *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
char *end, str[32];
@@ -688,6 +697,37 @@ static void spl_proc_cleanup(void)
}
}
#ifndef HAVE_REGISTER_SYSCTL_TABLE
/*
* Traditionally, struct ctl_table arrays have been terminated by an "empty"
* sentinel element (specifically, one with .procname == NULL).
*
* Linux 6.6 began migrating away from this, adding register_sysctl_sz() so
* that callers could provide the size directly, and redefining
* register_sysctl() to just call register_sysctl_sz() with the array size. It
* retained support for the terminating element so that existing callers would
* continue to work.
*
* Linux 6.11 removed support for the terminating element, instead interpreting
* it as a real malformed element, and rejecting it.
*
* In order to continue support older kernels, we retain the terminating
* sentinel element for our sysctl tables, but instead detect availability of
* register_sysctl_sz(). If it exists, we pass it the array size -1, stopping
* the kernel from trying to process the terminator. For pre-6.6 kernels that
* don't have register_sysctl_sz(), we just use register_sysctl(), which can
* handle the terminating element as it always has.
*/
#ifdef HAVE_REGISTER_SYSCTL_SZ
#define spl_proc_register_sysctl(p, t) \
register_sysctl_sz(p, t, ARRAY_SIZE(t)-1)
#else
#define spl_proc_register_sysctl(p, t) \
register_sysctl(p, t)
#endif
#endif
int
spl_proc_init(void)
{
@@ -698,16 +738,17 @@ spl_proc_init(void)
if (spl_header == NULL)
return (-EUNATCH);
#else
spl_header = register_sysctl("kernel/spl", spl_table);
spl_header = spl_proc_register_sysctl("kernel/spl", spl_table);
if (spl_header == NULL)
return (-EUNATCH);
spl_kmem = register_sysctl("kernel/spl/kmem", spl_kmem_table);
spl_kmem = spl_proc_register_sysctl("kernel/spl/kmem", spl_kmem_table);
if (spl_kmem == NULL) {
rc = -EUNATCH;
goto out;
}
spl_kstat = register_sysctl("kernel/spl/kstat", spl_kstat_table);
spl_kstat = spl_proc_register_sysctl("kernel/spl/kstat",
spl_kstat_table);
if (spl_kstat == NULL) {
rc = -EUNATCH;
goto out;
+1
View File
@@ -69,6 +69,7 @@
#include <sys/zpl.h>
#include <sys/zil.h>
#include <sys/sa_impl.h>
#include <linux/mm_compat.h>
/*
* Programming rules.
+71 -28
View File
@@ -20,6 +20,7 @@
*/
/*
* Copyright (c) 2012, 2020 by Delphix. All rights reserved.
* Copyright (c) 2024, Rob Norris <robn@despairlabs.com>
*/
#include <sys/dataset_kstats.h>
@@ -730,7 +731,7 @@ retry:
#endif
if (zv == NULL) {
rw_exit(&zvol_state_lock);
return (SET_ERROR(-ENXIO));
return (-SET_ERROR(ENXIO));
}
mutex_enter(&zv->zv_state_lock);
@@ -794,10 +795,10 @@ retry:
#ifdef HAVE_BLKDEV_GET_ERESTARTSYS
schedule();
return (SET_ERROR(-ERESTARTSYS));
return (-SET_ERROR(ERESTARTSYS));
#else
if ((gethrtime() - start) > timeout)
return (SET_ERROR(-ERESTARTSYS));
return (-SET_ERROR(ERESTARTSYS));
schedule_timeout(MSEC_TO_TICK(10));
goto retry;
@@ -819,7 +820,7 @@ retry:
if (zv->zv_open_count == 0)
zvol_last_close(zv);
error = SET_ERROR(-EROFS);
error = -SET_ERROR(EROFS);
} else {
zv->zv_open_count++;
}
@@ -1074,11 +1075,42 @@ static const struct block_device_operations zvol_ops = {
#endif
};
/*
* Since 6.9, Linux has been removing queue limit setters in favour of an
* initial queue_limits struct applied when the device is open. Since 6.11,
* queue_limits is being extended to allow more things to be applied when the
* device is open. Setters are also being removed for this.
*
* For OpenZFS, this means that depending on kernel version, some options may
* be set up before the device is open, and some applied to an open device
* (queue) after the fact.
*
* We manage this complexity by having our own limits struct,
* zvol_queue_limits_t, in which we carry any queue config that we're
* interested in setting. This structure is the same on all kernels.
*
* These limits are then applied to the queue at device open time by the most
* appropriate method for the kernel.
*
* zvol_queue_limits_convert() is used on 6.9+ (where the two-arg form of
* blk_alloc_disk() exists). This converts our limits struct to a proper Linux
* struct queue_limits, and passes it in. Any fields added in later kernels are
* (obviously) not set up here.
*
* zvol_queue_limits_apply() is called on all kernel versions after the queue
* is created, and applies any remaining config. Before 6.9 that will be
* everything, via setter methods. After 6.9 that will be whatever couldn't be
* put into struct queue_limits. (This implies that zvol_queue_limits_apply()
* will always be a no-op on the latest kernel we support).
*/
typedef struct zvol_queue_limits {
unsigned int zql_max_hw_sectors;
unsigned short zql_max_segments;
unsigned int zql_max_segment_size;
unsigned int zql_io_opt;
unsigned int zql_physical_block_size;
unsigned int zql_max_discard_sectors;
unsigned int zql_discard_granularity;
} zvol_queue_limits_t;
static void
@@ -1147,6 +1179,11 @@ zvol_queue_limits_init(zvol_queue_limits_t *limits, zvol_state_t *zv,
}
limits->zql_io_opt = zv->zv_volblocksize;
limits->zql_physical_block_size = zv->zv_volblocksize;
limits->zql_max_discard_sectors =
(zvol_max_discard_blocks * zv->zv_volblocksize) >> 9;
limits->zql_discard_granularity = zv->zv_volblocksize;
}
#ifdef HAVE_BLK_ALLOC_DISK_2ARG
@@ -1159,18 +1196,35 @@ zvol_queue_limits_convert(zvol_queue_limits_t *limits,
qlimits->max_segments = limits->zql_max_segments;
qlimits->max_segment_size = limits->zql_max_segment_size;
qlimits->io_opt = limits->zql_io_opt;
qlimits->physical_block_size = limits->zql_physical_block_size;
qlimits->max_discard_sectors = limits->zql_max_discard_sectors;
qlimits->max_hw_discard_sectors = limits->zql_max_discard_sectors;
qlimits->discard_granularity = limits->zql_discard_granularity;
#ifdef HAVE_BLKDEV_QUEUE_LIMITS_FEATURES
qlimits->features =
BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA | BLK_FEAT_IO_STAT;
#endif
}
#else
#endif
static void
zvol_queue_limits_apply(zvol_queue_limits_t *limits,
struct request_queue *queue)
{
#ifndef HAVE_BLK_ALLOC_DISK_2ARG
blk_queue_max_hw_sectors(queue, limits->zql_max_hw_sectors);
blk_queue_max_segments(queue, limits->zql_max_segments);
blk_queue_max_segment_size(queue, limits->zql_max_segment_size);
blk_queue_io_opt(queue, limits->zql_io_opt);
}
blk_queue_physical_block_size(queue, limits->zql_physical_block_size);
blk_queue_max_discard_sectors(queue, limits->zql_max_discard_sectors);
blk_queue_discard_granularity(queue, limits->zql_discard_granularity);
#endif
#ifndef HAVE_BLKDEV_QUEUE_LIMITS_FEATURES
blk_queue_set_write_cache(queue, B_TRUE);
blk_queue_flag_set(QUEUE_FLAG_IO_STAT, queue);
#endif
}
static int
zvol_alloc_non_blk_mq(struct zvol_state_os *zso, zvol_queue_limits_t *limits)
@@ -1183,7 +1237,6 @@ zvol_alloc_non_blk_mq(struct zvol_state_os *zso, zvol_queue_limits_t *limits)
zso->zvo_disk->minors = ZVOL_MINORS;
zso->zvo_queue = zso->zvo_disk->queue;
zvol_queue_limits_apply(limits, zso->zvo_queue);
#elif defined(HAVE_BLK_ALLOC_DISK_2ARG)
struct queue_limits qlimits;
zvol_queue_limits_convert(limits, &qlimits);
@@ -1196,6 +1249,7 @@ zvol_alloc_non_blk_mq(struct zvol_state_os *zso, zvol_queue_limits_t *limits)
zso->zvo_disk = disk;
zso->zvo_disk->minors = ZVOL_MINORS;
zso->zvo_queue = zso->zvo_disk->queue;
#else
zso->zvo_queue = blk_alloc_queue(NUMA_NO_NODE);
if (zso->zvo_queue == NULL)
@@ -1208,7 +1262,6 @@ zvol_alloc_non_blk_mq(struct zvol_state_os *zso, zvol_queue_limits_t *limits)
}
zso->zvo_disk->queue = zso->zvo_queue;
zvol_queue_limits_apply(limits, zso->zvo_queue);
#endif /* HAVE_BLK_ALLOC_DISK */
#else
zso->zvo_queue = blk_generic_alloc_queue(zvol_request, NUMA_NO_NODE);
@@ -1222,8 +1275,10 @@ zvol_alloc_non_blk_mq(struct zvol_state_os *zso, zvol_queue_limits_t *limits)
}
zso->zvo_disk->queue = zso->zvo_queue;
zvol_queue_limits_apply(limits, zso->zvo_queue);
#endif /* HAVE_SUBMIT_BIO_IN_BLOCK_DEVICE_OPERATIONS */
zvol_queue_limits_apply(limits, zso->zvo_queue);
return (0);
}
@@ -1245,7 +1300,6 @@ zvol_alloc_blk_mq(zvol_state_t *zv, zvol_queue_limits_t *limits)
return (1);
}
zso->zvo_queue = zso->zvo_disk->queue;
zvol_queue_limits_apply(limits, zso->zvo_queue);
zso->zvo_disk->minors = ZVOL_MINORS;
#elif defined(HAVE_BLK_ALLOC_DISK_2ARG)
struct queue_limits qlimits;
@@ -1276,10 +1330,11 @@ zvol_alloc_blk_mq(zvol_state_t *zv, zvol_queue_limits_t *limits)
/* Our queue is now created, assign it to our disk */
zso->zvo_disk->queue = zso->zvo_queue;
zvol_queue_limits_apply(limits, zso->zvo_queue);
#endif
zvol_queue_limits_apply(limits, zso->zvo_queue);
#endif
#endif
return (0);
}
@@ -1288,7 +1343,7 @@ zvol_alloc_blk_mq(zvol_state_t *zv, zvol_queue_limits_t *limits)
* request queue and generic disk structures for the block device.
*/
static zvol_state_t *
zvol_alloc(dev_t dev, const char *name)
zvol_alloc(dev_t dev, const char *name, uint64_t volblocksize)
{
zvol_state_t *zv;
struct zvol_state_os *zso;
@@ -1308,6 +1363,7 @@ zvol_alloc(dev_t dev, const char *name)
zso = kmem_zalloc(sizeof (struct zvol_state_os), KM_SLEEP);
zv->zv_zso = zso;
zv->zv_volmode = volmode;
zv->zv_volblocksize = volblocksize;
list_link_init(&zv->zv_next);
mutex_init(&zv->zv_state_lock, NULL, MUTEX_DEFAULT, NULL);
@@ -1344,8 +1400,6 @@ zvol_alloc(dev_t dev, const char *name)
if (ret != 0)
goto out_kmem;
blk_queue_set_write_cache(zso->zvo_queue, B_TRUE, B_TRUE);
/* Limit read-ahead to a single page to prevent over-prefetching. */
blk_queue_set_read_ahead(zso->zvo_queue, 1);
@@ -1354,9 +1408,6 @@ zvol_alloc(dev_t dev, const char *name)
blk_queue_flag_set(QUEUE_FLAG_NOMERGES, zso->zvo_queue);
}
/* Enable /proc/diskstats */
blk_queue_flag_set(QUEUE_FLAG_IO_STAT, zso->zvo_queue);
zso->zvo_queue->queuedata = zv;
zso->zvo_dev = dev;
zv->zv_open_count = 0;
@@ -1599,7 +1650,8 @@ zvol_os_create_minor(const char *name)
if (error)
goto out_dmu_objset_disown;
zv = zvol_alloc(MKDEV(zvol_major, minor), name);
zv = zvol_alloc(MKDEV(zvol_major, minor), name,
doi->doi_data_block_size);
if (zv == NULL) {
error = SET_ERROR(EAGAIN);
goto out_dmu_objset_disown;
@@ -1609,20 +1661,11 @@ zvol_os_create_minor(const char *name)
if (dmu_objset_is_snapshot(os))
zv->zv_flags |= ZVOL_RDONLY;
zv->zv_volblocksize = doi->doi_data_block_size;
zv->zv_volsize = volsize;
zv->zv_objset = os;
set_capacity(zv->zv_zso->zvo_disk, zv->zv_volsize >> 9);
blk_queue_physical_block_size(zv->zv_zso->zvo_queue,
zv->zv_volblocksize);
blk_queue_max_discard_sectors(zv->zv_zso->zvo_queue,
(zvol_max_discard_blocks * zv->zv_volblocksize) >> 9);
blk_queue_discard_granularity(zv->zv_zso->zvo_queue,
zv->zv_volblocksize);
#ifdef QUEUE_FLAG_DISCARD
blk_queue_flag_set(QUEUE_FLAG_DISCARD, zv->zv_zso->zvo_queue);
#endif
+8 -3
View File
@@ -9055,12 +9055,17 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz)
*/
for (int pass = 0; pass < L2ARC_FEED_TYPES; pass++) {
/*
* If pass == 1 or 3, we cache MRU metadata and data
* respectively.
* pass == 0: MFU meta
* pass == 1: MRU meta
* pass == 2: MFU data
* pass == 3: MRU data
*/
if (l2arc_mfuonly) {
if (l2arc_mfuonly == 1) {
if (pass == 1 || pass == 3)
continue;
} else if (l2arc_mfuonly > 1) {
if (pass == 3)
continue;
}
uint64_t passed_sz = 0;
+3
View File
@@ -201,6 +201,9 @@ dataset_kstats_destroy(dataset_kstats_t *dk)
void
dataset_kstats_rename(dataset_kstats_t *dk, const char *name)
{
if (dk->dk_kstats == NULL)
return;
dataset_kstat_values_t *dkv = dk->dk_kstats->ks_data;
char *ds_name;
+19 -2
View File
@@ -513,9 +513,26 @@ zil_parse(zilog_t *zilog, zil_parse_blk_func_t *parse_blk_func,
for (; lrp < end; lrp += reclen) {
lr_t *lr = (lr_t *)lrp;
/*
* Are the remaining bytes large enough to hold an
* log record?
*/
if ((char *)(lr + 1) > end) {
cmn_err(CE_WARN, "zil_parse: lr_t overrun");
error = SET_ERROR(ECKSUM);
arc_buf_destroy(abuf, &abuf);
goto done;
}
reclen = lr->lrc_reclen;
ASSERT3U(reclen, >=, sizeof (lr_t));
ASSERT3U(reclen, <=, end - lrp);
if (reclen < sizeof (lr_t) || reclen > end - lrp) {
cmn_err(CE_WARN,
"zil_parse: lr_t has an invalid reclen");
error = SET_ERROR(ECKSUM);
arc_buf_destroy(abuf, &abuf);
goto done;
}
if (lr->lrc_seq > claim_lr_seq) {
arc_buf_destroy(abuf, &abuf);
goto done;
+18
View File
@@ -145,6 +145,24 @@ for kernel_version in %{?kernel_versions}; do
%{?kernel_cc} \
%{?kernel_ld} \
%{?kernel_llvm}
# Pre-6.10 kernel builds didn't need to copy over the source files to the
# build directory. However we do need to do it though post-6.10 due to
# these commits:
#
# b1992c3772e6 kbuild: use $(src) instead of $(srctree)/$(src) for source
# directory
#
# 9a0ebe5011f4 kbuild: use $(obj)/ instead of $(src)/ for common pattern
# rules
#
# Note that kmodtool actually copies over the source into the build
# directory, so what we're doing here is normal. For efficiency reasons
# though we just use hardlinks instead of copying.
#
# See https://github.com/openzfs/zfs/issues/16439 for more info.
cp -lR ../%{module}-%{version}/module/* module/
make %{?_smp_mflags}
cd ..
done
+1
View File
@@ -532,6 +532,7 @@ systemctl --system daemon-reload >/dev/null || true
%attr(440, root, root) %config(noreplace) %{_sysconfdir}/sudoers.d/*
%config(noreplace) %{_bashcompletiondir}/zfs
%config(noreplace) %{_bashcompletiondir}/zpool
%files -n libzpool5
%{_libdir}/libzpool.so.*
+2 -1
View File
@@ -81,7 +81,8 @@ tests = ['block_cloning_clone_mmap_cached',
'block_cloning_cross_enc_dataset',
'block_cloning_copyfilerange_fallback_same_txg',
'block_cloning_replay', 'block_cloning_replay_encrypted',
'block_cloning_lwb_buffer_overflow', 'block_cloning_clone_mmap_write']
'block_cloning_lwb_buffer_overflow', 'block_cloning_clone_mmap_write',
'block_cloning_rlimit_fsize']
tags = ['functional', 'block_cloning']
[tests/functional/bootfs]
+2
View File
@@ -330,6 +330,8 @@ elif sys.platform.startswith('linux'):
['SKIP', cfr_reason],
'block_cloning/block_cloning_replay_encrypted':
['SKIP', cfr_reason],
'block_cloning/block_cloning_rlimit_fsize':
['SKIP', cfr_reason],
'cli_root/zfs_rename/zfs_rename_002_pos': ['FAIL', known_reason],
'cli_root/zpool_reopen/zpool_reopen_003_pos': ['FAIL', known_reason],
'cp_files/cp_files_002_pos': ['SKIP', cfr_reason],
+1
View File
@@ -478,6 +478,7 @@ nobase_dist_datadir_zfs_tests_tests_SCRIPTS += \
functional/block_cloning/block_cloning_replay.ksh \
functional/block_cloning/block_cloning_replay_encrypted.ksh \
functional/block_cloning/block_cloning_lwb_buffer_overflow.ksh \
functional/block_cloning/block_cloning_rlimit_fsize.ksh \
functional/bootfs/bootfs_001_pos.ksh \
functional/bootfs/bootfs_002_neg.ksh \
functional/bootfs/bootfs_003_pos.ksh \
@@ -55,7 +55,7 @@ function display_status
((ret |= $?))
typeset mntpnt=$(get_prop mountpoint $pool)
dd if=/dev/random of=$mntpnt/testfile.$$ &
dd if=/dev/urandom of=$mntpnt/testfile.$$ &
typeset pid=$!
zpool iostat -v 1 3 > /dev/null
@@ -0,0 +1,64 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/block_cloning/block_cloning.kshlib
#
# DESCRIPTION:
# When block cloning is used to implement copy_file_range(2), the
# RLIMIT_FSIZE limit must be respected.
#
# STRATEGY:
# 1. Create a pool.
# 2. ???
#
verify_runnable "global"
VDIR=$TEST_BASE_DIR/disk-bclone
VDEV="$VDIR/a"
function cleanup
{
datasetexists $TESTPOOL && destroy_pool $TESTPOOL
rm -rf $VDIR
}
log_onexit cleanup
log_assert "Test for RLIMIT_FSIZE handling with block cloning enabled"
log_must rm -rf $VDIR
log_must mkdir -p $VDIR
log_must truncate -s 1G $VDEV
log_must zpool create -o feature@block_cloning=enabled $TESTPOOL $VDEV
log_must dd if=/dev/urandom of=/$TESTPOOL/file1 bs=1 count=1000
ulimit -f 2
log_must clonefile -f /$TESTPOOL/file1 /$TESTPOOL/file2 0 0 all
ulimit -f 1
log_mustnot clonefile -f /$TESTPOOL/file1 /$TESTPOOL/file3 0 0 all
log_pass "copy_file_range(2) respects RLIMIT_FSIZE"
@@ -153,5 +153,7 @@ function do_vol_test
log_must umount $mntp
fi
# Ubuntu 20.04 wants a sync here
log_must sync
log_must zfs destroy $vol
}
@@ -42,7 +42,7 @@ log_onexit cleanup
log_assert "ensure single-disk pool resumes properly after suspend and clear"
# create a file, and take a checksum, so we can compare later
log_must dd if=/dev/random of=$DATAFILE bs=128K count=1
log_must dd if=/dev/urandom of=$DATAFILE bs=128K count=1
typeset sum1=$(cat $DATAFILE | md5sum)
# make a debug device that we can "unplug"
@@ -37,11 +37,7 @@ export TMP_HISTORY=$TEST_BASE_DIR/tmp_history.$$
export NEW_HISTORY=$TEST_BASE_DIR/new_history.$$
export MIGRATEDPOOLNAME=${MIGRATEDPOOLNAME:-history_pool}
if is_freebsd; then
export TIMEZONE=${TIMEZONE:-America/Denver}
else
export TIMEZONE=${TIMEZONE:-US/Mountain}
fi
export TIMEZONE=${TIMEZONE:-America/Denver}
export HIST_USER="huser"
export HIST_GROUP="hgroup"
@@ -41,13 +41,13 @@ verify_runnable "global"
if ! $(grep -q "CONFIG_IO_URING=y" /boot/config-$(uname -r)); then
log_unsupported "Requires io_uring support"
log_unsupported "Requires io_uring support within Kernel"
fi
if [ -e /etc/os-release ] ; then
source /etc/os-release
if [ -n "$REDHAT_SUPPORT_PRODUCT_VERSION" ] && ((floor($REDHAT_SUPPORT_PRODUCT_VERSION) == 9)) ; then
log_unsupported "Disabled on CentOS 9, fails with 'Operation not permitted'"
if [ $PLATFORM_ID = "platform:el9" ]; then
log_unsupported "Disabled on RHEL 9 variants: fails with 'Operation not permitted'"
fi
fi