Commit Graph

9936 Commits

Author SHA1 Message Date
Rob Norris
2303775fea ZTS: test dmu_tx response when pool suspends
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17355
2025-05-28 10:29:06 -07:00
Rob Norris
55d035e866 dmu_tx_assign: make all VERIFY0 calls use DMU_TX_SUSPEND
This is the cheap way to keep non-user functions working after
break-on-suspend becomes default.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17355
2025-05-28 10:28:59 -07:00
Rob Norris
4653e2f7d3 dmu_tx: break tx assign/wait when pool suspends
This adjusts dmu_tx_assign/dmu_tx_wait to be interruptable if the pool
suspends while they're waiting, rather than just on the initial check
before falling back into a wait.

Since that's not always wanted, add a DMU_TX_SUSPEND flag to ignore
suspend entirely, effectively returning to the previous behaviour.

With that, it shouldn't be possible for anything with a standard
dmu_tx_assign/wait/abort loop to block under failmode=continue.

Also should be a bit tighter than the old behaviour, where a
VERIFY0(dmu_tx_assign(DMU_TX_WAIT)) could technically fail if the pool
is already suspended and failmode=continue.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17355
2025-05-28 10:28:51 -07:00
Rob Norris
ac2e579521 dmu_tx: make DMU_TX_* flags an enum
Mostly for a little more type checking and debugging visibility.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17355
2025-05-28 10:28:46 -07:00
Rob Norris
468d22d60c txg_wait_synced_flags: add TXG_WAIT_SUSPEND flag to not wait if pool suspended
This allows a caller to request a wait for txg sync, with an appropriate
error return if the pool is suspended or becomes suspended during the
wait.

To support this, txg_wait_kick() is added to signal the sync condvar,
which wakes up the waiters, causing them to loop and reconsider their
wait conditions again. zio_suspend() now calls this to trigger the break
if the pool suspends while waiting.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17355
2025-05-28 10:27:46 -07:00
Pavel Snajdr
8487945034
zcp: get_prop: fix encryptionroot and encryption
It was reported that channel programs' zfs.get_prop doesn't work for
dataset properties encryption and encryptionroot.

They are handled in get_special_prop due to the need to call
dsl_dataset_crypt_stats to load those dsl props.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Graham Christensen <graham@grahamc.com>
Closes #17280
2025-05-27 20:04:37 -04:00
Rob Norris
06fa8f3f69
zfs_cmd: reorganise zfs_cmd_t to match original size
2aa3fbe761 extended zinject_record_t, and in doing so inadvertently
extended zfs_cmd_t, which broke compatibility with userspace tools
without the change.

This fixes that by using some of the unused space in zfs_cmd_t for the
extra fields. We also add an assert to trigger a compile error if the
size ever changes.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17367
2025-05-27 20:01:06 -04:00
Fedor Uporov
087d7d80c7
ZVOL: Comment platform-specific empty functions bodies on FreeBSD side
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17383
2025-05-27 20:00:25 -04:00
Rob Norris
fc617645a3 vdev_disk: remove zfs_vdev_scheduler option
It has existed as a warning since 0.8.3, 5+ years ago. I think people
have had enough time.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17376
2025-05-27 15:06:15 -07:00
Rob Norris
284580c878 dmu_traverse: remove 'ignore_hole_birth' tunable alias
It's been many years, we can probably do without.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17376
2025-05-27 15:05:09 -07:00
Ameer Hamza
2a91d577b1
Expose dataset encryption status via fast stat path
In truenas_pylibzfs, we query list of encrypted datasets several times,
which is expensive. This commit exposes a public API zfs_is_encrypted()
to get encryption status from fast stat path without having to refresh
the properties.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17368
2025-05-26 22:11:03 -04:00
Don Brady
b048bfa9c1
Allow opt-in of zvol blocks in special class
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Don Brady <dev.fs.zfs@gmail.com>
Closes #14876
2025-05-24 16:44:26 -04:00
Alexander Motin
9d76950d67
ZIL: Improve write log size accounting
Before this change write log size TXG throttling mechanism was
accounting only user payload bytes.  But the actual ZIL both on
disk and especially in memory include headers of hundred(s) of
bytes.  Not accouting those may allow applications like
bonnie++, in their wisdom writing one byte at a time, to consume
excessive amount of memory and ZIL/SLOG in one TXG.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17373
2025-05-23 21:48:46 -04:00
George Amanakis
b01d7bd32d
ZTS: testing for leaked key mappings in encrypted non-raw send
This test covers a bug fixed by commit ea74cde: performing an
incremental non-raw send from an encrypted filesystem followed by
exporting the pool. Before that commit, exporting the sending pool
in this scenario would trigger a panic:

VERIFY(avl_is_empty(&sk->sk_dsl_keys)) failed
PANIC at dsl_crypt.c:353:spa_keystore_fini()
Call Trace:
 spl_dumpstack+0x29/0x2f [spl]
 spl_panic+0xd1/0xe9 [spl]
 spl_assert.constprop.0+0x1a/0x30 [zfs]
 spa_keystore_fini+0xc2/0xf0 [zfs]
 spa_deactivate+0x25f/0x610 [zfs]
 spa_evict_all+0xf4/0x200 [zfs]
 spa_fini+0x13/0x140 [zfs]
 zfs_kmod_fini+0x72/0xc0 [zfs]
 openzfs_fini_os+0x13/0x3a [zfs]
 openzfs_fini+0x9/0x6b8 [zfs]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #17366
2025-05-23 21:46:51 -04:00
Cameron Harr
92157c840c Refactor man page and CLI help output per mandoc
The man page and the usage statement from the CLI have been refactored
to abide by the ManDoc standard. Style changes include:
 * Upper-case letters before lower-case
 * List short options w/o arguments first
 * Then list short options w/ arguments
 * Then list long arguments

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #17357
2025-05-23 09:10:30 -07:00
Cameron Harr
cdb4c44684 Reformat cli help and man page to be in sync
The man page and CLI usage statements were both a little out
of sync and neither fully alphabetized correctly. That has
been fixed. One outstanding question is whether to get rid of
the ellipses on the CLI usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #16004
Closes #17357
2025-05-23 09:10:21 -07:00
Paul Dagnelie
ddf28f27c5
Fix off-by-one bug in range tree code
Without this fix, zfs_range_tree_find_in could return an overlap when
the found range starts immediately after the searched range, with no
actual overlap.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17363
2025-05-23 10:33:33 -04:00
Alexander Motin
5c30b24381
Fix null dereference in spa_vdev_remove_cancel_sync()
We don't really need to access space map to know where the metaslab
ends, while msp->ms_sm might be NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Fixes #17164
Fixes #17359
Closes #17361
2025-05-22 10:47:43 -04:00
Andres
a6f20250de
Update 69-vdev.rules.in
Add support to alias md-type devices in udev rules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andres <a-d-j-i@users.noreply.github.com>
Closes #17345
2025-05-21 10:05:11 -07:00
Rob Norris
a387b7599c lzc_ioctl_fd: add ZFS_IOC_TRACE envvar to enable ioctl tracing
When set, dumps all ZFS ioctl calls and returns and their nvlists to
STDERR, to make debugging and understanding a lot easier.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17344
2025-05-21 09:23:53 -07:00
Rob Norris
c4c3917b2a lzc: move lzc_ioctl_fd() into lzc proper
Name the OS-specific call lzc_ioctl_fd_os(), and make lzc_ioctl_fd()
wrap it, so we can do more in the wrapper.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17344
2025-05-21 09:23:46 -07:00
Rob Norris
f454cc1723 libzfs: ensure all ioctl calls go through lzc_ioctl_fd()
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17344
2025-05-21 09:23:23 -07:00
Richard Yao
2e5e4bb0f8
Add Quality Assurance to pull request template
PRs like #17352 have no applicable checkbox, so let us add one.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17354
2025-05-20 08:49:22 -07:00
Richard Yao
83fa80a550
dmu_objset_hold_flags() should call dsl_dataset_rele_flags() on error
This was caught when doing a manual check to see if #17352 needed to be
improved to catch mismatches across stack frames of the kind that were
first found in #17340.

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17353
2025-05-20 08:35:45 -07:00
Ameer Hamza
f0baaa329a
arcstat: prevent ZeroDivisionError when L2ARC becomes empty
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard@ryao.dev>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17348
2025-05-19 16:27:24 -07:00
Rob Norris
841be1d049 Linux 6.2/6.15: del_timer_sync() renamed to timer_delete_sync()
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or
functional change apart from that, so a very minimal update for us.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17229
2025-05-19 11:13:00 -07:00
Rob Norris
bb740d66de Linux 6.15: mkdir now returns struct dentry *
The intent is that the filesystem may have a reference to an "old"
version of the new directory, eg if it was keeping it alive because a
remote NFS client still had it open.

We don't need anything like that, so this really just changes things so
we return error codes encoded in pointers.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17229
2025-05-19 11:12:49 -07:00
Richard Yao
d8a33bc0a5
icp: Use explicit_memset() exclusively in gcm_clear_ctx()
d634d20d1b had been intended to fix a
potential information leak issue where the compiler's optimization
passes appeared to remove `memset()` operations that sanitize sensitive
data before memory is freed for use by the rest of the kernel.

When I wrote it, I had assumed that the compiler would not remove the
other `memset()` operations, but upon reflection, I have realized that
this was a bad assumption to make. I would rather have a very slight
amount of additional overhead when calling `gcm_clear_ctx()` than risk a
future compiler remove `memset()` calls. This is likely to happen if
someone decides to try doing link time optimization and the person will
not think to audit the assembly output for issues like this, so it is
best to preempt the possibility before it happens.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard@ryao.dev>
Closes #17343
2025-05-19 10:04:05 -07:00
George Amanakis
ea74cdedda
Fix 2 bugs in non-raw send with encryption
Bisecting identified the redacted send/receive as the source of the bug
for issue #12014. Specifically the call to
dsl_dataset_hold_obj(&fromds) has been replaced by
dsl_dataset_hold_obj_flags() which passes a DECRYPT flag and creates
a key mapping. A subsequent dsl_dataset_rele_flag(&fromds) is missing
and the key mapping is not cleared. This may be inadvertedly used, which
results in arc_untransform failing with ECKSUM in:
 arc_untransform+0x96/0xb0 [zfs]
 dbuf_read_verify_dnode_crypt+0x196/0x350 [zfs]
 dbuf_read+0x56/0x770 [zfs]
 dmu_buf_hold_by_dnode+0x4a/0x80 [zfs]
 zap_lockdir+0x87/0xf0 [zfs]
 zap_lookup_norm+0x5c/0xd0 [zfs]
 zap_lookup+0x16/0x20 [zfs]
 zfs_get_zplprop+0x8d/0x1d0 [zfs]
 setup_featureflags+0x267/0x2e0 [zfs]
 dmu_send_impl+0xe7/0xcb0 [zfs]
 dmu_send_obj+0x265/0x360 [zfs]
 zfs_ioc_send+0x10c/0x280 [zfs]

Fix this by restoring the call to dsl_dataset_hold_obj().

The same applies for to_ds: here replace dsl_dataset_rele(&to_ds) with
dsl_dataset_rele_flags().

Both leaked key mappings will cause a panic when exporting the
sending pool or unloading the zfs module after a non-raw send from
an encrypted filesystem.

Contributions-by: Hank Barta <hbarta@gmail.com>
Contributions-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12014 
Closes #17340
2025-05-19 09:55:00 -07:00
Alexander Motin
e55225be3e
Add explicit DMU_DIRECTIO checks
UIO_DIRECT means we can do Direct I/O, while DMU_DIRECTIO we want
to do it.  First does not automatically means second.  Add few
checks to not use Direct I/O in few cases we don't want it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17342
2025-05-16 23:27:05 -04:00
Alexander Motin
d5616ad34a
Increase meta-dnode redundancy in "some" mode
Loss of one indirect block of the meta dnode likely means loss of
the whole dataset.  It is worse than one file that the man page
promises, and in my opinion is not much better than "none" mode.

This change restores redundancy of the meta-dnode indirect blocks,
while same time still corrects expectations in the man page.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17339
2025-05-16 13:23:32 -04:00
Paul Dagnelie
086105f4c4
Cause zpool scan resume commands to get logged in history
Currently, commands that resume a scrub/errorscrub from a paused state
don't get logged in the pool history. This is because resumes actually
return ECANCELED, instead of 0. This causes the tsd code in the common
ioctl logic to not think the ioctl succeeded, which causes the
log_history ioctl to fail with EPERM. However, for resuming a scrub from
a paused state, ECANCELED is success.

There are two options for how to deal with this. The first is the one
that I implemented here; I can't find a good reason for dmu_scan to
return ECANCELED on resume instead of 0, so let's just not. The only
place we check for the ECANCELED value is in zpool_scan, where we just
convert it back to zero.  However, I am aware that this is changing an
ioctl interface, which I believe is a breaking change. I don't think
it's an important change, but maybe there is someone who relies on it.

The other option that could be implemented is to either allow ECANCELED
specifically from dsl_scan in the common ioctl code, or add a generic
facility to the common ioctl code that allows each command to specify
whether or not success happened, regardless of the return values. I am
open to feedback on which option people think would be better.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17301
2025-05-16 13:19:04 -04:00
Allan Jude
b6916f995e
ARC: parallel eviction
On systems with enormous amounts of memory, the single arc_evict thread
can become a bottleneck if reads and writes are stuck behind it, waiting
for old data to be evicted before new data can take its place.

This commit adds support for evicting from multiple ARC lists in
parallel, by farming the evict work out to some number of threads and
then accumulating their results.

A new tuneable, zfs_arc_evict_threads, sets the number of threads. By
default, it will scale based on the number of CPUs.

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Youzhong Yang <youzhong@gmail.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Signed-off-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Co-authored-by: Rob Norris <rob.norris@klarasystems.com>
Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Co-authored-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com>
Closes #16486
2025-05-14 10:38:32 -04:00
Alexander Motin
89a8a91582
ARC: Notify dbuf cache about target size reduction
ARC target size might drop significantly under memory pressure,
especially if current ARC size was much smaller than the target.
Since dbuf cache size is a fraction of the target ARC size, it
might need eviction too.  Aside of memory from the dbuf eviction
itself, it might help ARC by making more buffers evictable.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17314
2025-05-14 10:34:14 -04:00
Alexander Motin
0aa83dce99
Linux: Stop using NR_FILE_PAGES for ARC scaling
I've found that QEMU/KVM guest memory accounted as shared also
included into NR_FILE_PAGES.  But it is actually a non-evictable
anonymous memory.  Using it as a base for zfs_arc_pc_percent
parameter makes ARC to ignore shrinker requests while page cache
does not really have anything to evict, ending up in OOM killer
killing the QEMU process.

Instead use of NR_ACTIVE_FILE + NR_INACTIVE_FILE should represent
the part of a page cache that is actually evictable, which should
be safer to use as a reference for ARC scaling.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17334
2025-05-14 09:29:02 -04:00
Tony Hutter
b55256e5bb
runners: Add option to install custom kernel on Fedora
Allow installing a custom kernel version from the Fedora experimental
kernel repos onto the github runners.  This is useful for testing if
ZFS works against a newer kernel.

Fedora has a number of repos with experimental kernel packages. This
PR allows installs from kernels in these repos:

@kernel-vanilla/stable
@kernel-vanilla/mainline
(https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories)

You will need to manually kick of a github runner to test with a custom
kernel version.  To do that, go to the github actions tab under
'zfs-qemu' and click the drop-down for 'run workflow'.  In there you
will see a text box to specify the version (like '6.14').  The scripts
will do their best to match the version to the newest matching version
that the repos support (since they're may be multiple nightly versions
of, say, '6.14').  A full list of kernel versions can be seen in the
dependency stage output if you kick off a manual run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17156
2025-05-13 14:43:35 -07:00
Alexander Motin
734eba251d
Wire O_DIRECT also to Uncached I/O (#17218)
Before Direct I/O was implemented, I've implemented lighter version
I called Uncached I/O.  It uses normal DMU/ARC data path with some
optimizations, but evicts data from caches as soon as possible and
reasonable.  Originally I wired it only to a primarycache property,
but now completing the integration all the way up to the VFS.

While Direct I/O has the lowest possible memory bandwidth usage,
it also has a significant number of limitations.  It require I/Os
to be page aligned, does not allow speculative prefetch, etc.  The
Uncached I/O does not have those limitations, but instead require
additional memory copy, though still one less than regular cached
I/O.  As such it should fill the gap in between.  Considering this
I've disabled annoying EINVAL errors on misaligned requests, adding
a tunable for those who wants to test their applications.

To pass the information between the layers I had to change a number
of APIs.  But as side effect upper layers can now control not only
the caching, but also speculative prefetch.  I haven't wired it to
VFS yet, since it require looking on some OS specifics.  But while
there I've implemented speculative prefetch of indirect blocks for
Direct I/O, controllable via all the same mechanisms.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Fixes #17027
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-05-13 14:26:55 -07:00
diwakar-kristappagari
e2ba0f7643
vdev_id: symlinks creation for multipath disk partitions (#17331)
It has been observed that the symlinks are not being created
for the disk partitions on multipath enabled systems.
This fix addresses the issue.

Signed-off-by: Diwakar Kristappagari <diwakar-k@hpe.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
2025-05-13 13:49:13 -07:00
Rob Norris
485a2d0112 AUTHORS/mailmap: update with new contributors
Thanks everyone!

Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-05-13 09:24:03 -07:00
Rob Norris
ae2caf9cb0 update_authors: output possible mailmap additions
Once we've selected a best ident for the AUTHORS file, it makes sense to
set up a corresponding mailmap entry for any other ident for that
committer, to ensure the git history also reflects this into the future.

So, here we output potential mailmap updates for a human to consider.

For the moment, this needs to be done by a human, because update_authors
uses git to get the author names, and thus is reliant on the mailmap
contents to generate its output, so having it update mailmap directly
would introduce a circular dependency that I'm not totally sure about.
It's definitely better than having to go back through the history and
check each commit by hand though.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-05-13 09:24:03 -07:00
Rob Norris
8e318fda80 update_authors: consider Signed-off-by trailers for committer idents
I've increasingly found that commits from new contributors have the
author set in the "Github noreply" obfuscated style. If they do have a
better canonical choice, it's usually in the Signed-off-by: trailer in
the commit message.

I had avoided using these in the first version of this program because
they aren't always present, aren't always correct, and some commits have
multiple signoffs. It seems however that requiring either the name or
the email address to match the commit author sufficiently narrows the
scope to be useful for the "Github noreply" situation, which is really
the main sticking point. And of course, if it gets it wrong, overriding
in .mailmap or AUTHORS is always an option.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-05-13 09:24:03 -07:00
Rob Norris
b2284aedab
metaslab_alloc: make hint BP and DVA const (#17324)
Nothing modifies them, and nothing should, so lets try to enforce that.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2025-05-12 10:52:46 -07:00
Alexander Motin
49fbdd4533
Introduce zfs rewrite subcommand (#17246)
This allows to rewrite content of specified file(s) as-is without
modifications, but at a different location, compression, checksum,
dedup, copies and other parameter values.  It is faster than read
plus write, since it does not require data copying to user-space.
It is also faster for sync=always datasets, since without data
modification it does not require ZIL writing.  Also since it is
protected by normal range range locks, it can be done under any
other load.  Also it does not affect file's modification time or
other properties.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
2025-05-12 10:22:17 -07:00
Rob Norris
9aae14a14a
test-runner: rework output dir construction
The old code would compare all the test group names to work out some
sort of common path, but it didn't appear to work consistently,
sometimes placing output in a top-level dir, other times in one or more
subdirs. (I confess, I do not quite understand what it's supposed to
do).

This is a very simple rework that simply looks at all the test group
paths, removes common leading components, and uses the remainder as the
output directory. This should work because groups paths are unique, and
means we get a output dir tree of roughly the same shape as the test
groups in the runfiles and the test source dirs themselves.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: @ImAwsumm
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17167
2025-05-11 15:24:05 -04:00
Mariusz Zaborski
8b9c4e643b
spa: clear checkpoint information during retry
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org>
Closes #17319
2025-05-11 12:49:38 -04:00
Rob Norris
2ee5b51a57
linux/uio: remove "skip" offset for UIO_ITER
For UIO_ITER, we are just wrapping a kernel iterator. It will take care
of its own offsets if necessary. We don't need to do anything, and if we
do try to do anything with it (like advancing the iterator by the skip
in zfs_uio_advance) we're just confusing the kernel iterator, ending up
at the wrong position or worse, off the end of the memory region.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17298
2025-05-11 12:46:40 -04:00
Alan Somers
c17bdc4914
More aggressively assert that db_mtx protects db.db_data
db.db_mtx must be held any time that db.db_data is accessed.  All of
these functions do have the lock held by a parent; add assertions to
ensure that it stays that way.

See https://github.com/openzfs/zfs/discussions/17118

* Refactor dbuf_read_bonus to make it obvious why db_rwlock isn't
required.

* Refactor dbuf_hold_copy to eliminate the db_rwlock
Copy data into the newly allocated buffer before assigning it to the db.
That way, there will be no need to take db->db_rwlock.

* Refactor dbuf_read_hole
In the case of an indirect hole, initialize the newly allocated buffer
before assigning it to the dmu_buf_impl_t.

Sponsored by:	ConnectWise
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #17209
2025-05-09 09:02:26 -04:00
Fedor Uporov
1a8f5ad3b0
zvol: Enable zvol threading functionality on FreeBSD
Make zvol I/O requests processing asynchronous on FreeBSD side in some
cases. Clone zvol threading logic and required module parameters from
Linux side. Make zvol threadpool creation/destruction logic shared for
both Linux and FreeBSD.
The IO requests are processed asynchronously in next cases:
- volmode=geom: if IO request thread is geom thread or cannot sleep.
- volmode=cdev: if IO request passed thru struct cdevsw .d_strategy
routine, mean is AIO request.
In all other cases the IO requests are processed synchronously. The
volthreading zvol property is ignored on FreeBSD side.

Sponsored-by: vStack, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: @ImAwsumm
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17169
2025-05-08 15:25:40 -04:00
Alan Somers
f13d760aa8
Delete dead code: dbuf_loan_arcbuf
It's been dead ever since 5fa356ea44

Sponsored by:	ConnectWise
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #17119
2025-05-08 10:34:11 -04:00
Rob Norris
4800181b3b
ioctl: remove FICLONE/FICLONERANGE/FIDEDUPERANGE compat
These are only required to support these ioctls on Linux <4.5. Since
4.18 is our cutoff, we don't need this code anymore.

Also removing related test things that will never match again.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17308
2025-05-08 10:32:52 -04:00