Commit Graph

9938 Commits

Author SHA1 Message Date
Tony Hutter
faefa5ffc3 Linux 6.15 compat: META
Update the META file to reflect compatibility with the 6.15
kernel.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17393
2025-06-17 10:50:27 -07:00
Germano Massullo
777d8ee345 Fix mixed-use-of-spaces-and-tabs rpmlint warning
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Germano Massullo <germano.massullo@gmail.com>
Closes #17461
2025-06-17 10:50:27 -07:00
Rob Norris
b00bc81b05 ioctl: remove FICLONE/FICLONERANGE/FIDEDUPERANGE compat
These are only required to support these ioctls on Linux <4.5. Since
4.18 is our cutoff, we don't need this code anymore.

Also removing related test things that will never match again.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17308
2025-06-17 10:50:27 -07:00
Alexander Motin
f7e6dcc68d Relax zfs_vnops_read_chunk_size limitations
It makes no sense to limit read size below the block size, since
DMU will any way consume resources for the whole block, while the
current zfs_vnops_read_chunk_size is only 1MB, which is smaller
that maximum block size of 16MB.  Plus in case of misaligned
Uncached I/O the buffer may get evicted between the chunks,
requiring repeating I/Os.

On 64-bit platforms increase zfs_vnops_read_chunk_size to 32MB.
It allows to less depend on speculative prefetcher if application
requests specific size, first not waiting for prefetcher to start
and later not prefetching more than needed.

Also while there, we don't need to align reads to the chunk size,
but only to a block size, which is smaller and so more forgiving.

My profiles show ~4% of CPU time saving when reading 16MB blocks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17415
2025-06-17 10:50:27 -07:00
Rob Norris
e2de00ca44 dmu_traverse: remove 'ignore_hole_birth' tunable alias
It's been many years, we can probably do without.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17376
2025-06-17 10:50:27 -07:00
Allan Jude
8e9ffe1b4f ARC: parallel eviction
On systems with enormous amounts of memory, the single arc_evict thread
can become a bottleneck if reads and writes are stuck behind it, waiting
for old data to be evicted before new data can take its place.

This commit adds support for evicting from multiple ARC lists in
parallel, by farming the evict work out to some number of threads and
then accumulating their results.

A new tuneable, zfs_arc_evict_threads, sets the number of threads. By
default, it will scale based on the number of CPUs.

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Youzhong Yang <youzhong@gmail.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Signed-off-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Co-authored-by: Rob Norris <rob.norris@klarasystems.com>
Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Co-authored-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com>
Closes #16486
2025-06-17 10:50:26 -07:00
Don Brady
6b67a5bdd3 During pool export flush the ARC asynchronously
This also includes removing L2 vdevs asynchronously.

This commit also guarantees that spa_load_guid is unique.

The zpool reguid feature introduced the spa_load_guid, which is a
transient value used for runtime identification purposes in the ARC.
This value is not the same as the spa's persistent pool guid.

However, the value is seeded from spa_generate_load_guid() which
does not check for uniqueness against the spa_load_guid from other
pools.  Although extremely rare, you can end up with two different
pools sharing the same spa_load_guid value! So we guarantee that
the value is always unique and additionally not still in use by an
async arc flush task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16215
2025-06-17 10:50:26 -07:00
Alexander Motin
4f1b91e343 CI: Automate some GitHub PR status labels manipulations
- Set/remove "Work in Progress"/"Code Review Needed" for drafts.
 - Remove "Accepted", "Inactive", "Revision Needed" and "Stale" on
pushes and reopens.

I hope this reduce chances of PRs being forgotten after requested
modifications done due to stale labels.  It is better to have no
labels than incorrect ones saying there is nothing to look at.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16721
2025-06-17 10:50:26 -07:00
Rob Norris
a65225ec7e FreeBSD: zfs_putpages: don't undirty pages until after write completes
zfs_putpages() would put the entire range of pages onto the ZIL, then
return VM_PAGER_OK for each page to the kernel. However, an associated
zil_commit() or txg sync had not happened at this point, so the write
may not actually be on disk.

So, we rework it to use a ZIL commit callback, and do the post-write
work of undirtying the page and signaling completion there. We return
VM_PAGER_PEND to the kernel instead so it knows that we will take care
of it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17445
2025-06-17 10:50:26 -07:00
Rob Norris
9c0f5bc183 zfs_log_write: only put the callback on the last itx
If a write is split across mutliple itxs, we only want the callback on
the last one, otherwise it will be called for every itx associated with
this single write, which makes it very hard to know what to clean up.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17445
2025-06-17 10:50:26 -07:00
Rob Norris
e1dd433a44 zpl_sync_fs: work around kernels that ignore sync_fs errors
If the kernel will honour our error returns, use them. If not, fool it
by setting a writeback error on the superblock, if available.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-17 10:50:26 -07:00
Rob Norris
08cec6532e zfs_sync: return error when pool suspends
If the pool is suspended, we'll just block in zil_commit(). If the
system is shutting down, blocking wouldn't help anyone. So, we should
keep this test for now, but at least return an error for anyone who is
actually interested.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-17 10:50:26 -07:00
Rob Norris
d944641502 zfs_sync: remove support for impossible scenarios
The superblock pointer will always be set, as will z_log, so remove code
supporting cases that can't occur (on Linux at least).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-17 10:50:26 -07:00
Rob Norris
c758072b2f zts: test syncfs() behaviour when pool suspends
Fairly coarse, but if it returns while the pool suspends, it must be
with an error.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17420
2025-06-17 10:50:26 -07:00
Alexander Motin
0c9cdd1606 Improve block cloning transactions accounting
Previous dmu_tx_count_clone() was broken, stating that cloning is
similar to free.  While they might be from some points, cloning
is not net-free.  It will likely consume space and memory, and
unlike free it will do it no matter whether the destination has
the blocks or not (usually not, so previous code did nothing).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17431
2025-06-17 10:50:26 -07:00
Alexander Motin
23fad19818 Reduce zfs_dmu_offset_next_sync penalty
Looking on txg_wait_synced(, 0) I've noticed that it always syncs
5 TXGs: 3 TXG_CONCURRENT_STATES + 2 TXG_DEFER_SIZE.  But in case
of dmu_offset_next() we do not care about deferred frees. And even
concurrent TXGs we might not need sync all 3 if the dnode was not
dirtied in last few TXGs.

This patch makes dmu_offset_next() to sync one TXG at a time until
the dnode is clean, but no more than 3 TXG_CONCURRENT_STATES times.
My tests with random simultaneous writes and seeks over many files
on HDD pool show 7-14% performance increase.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17434
2025-06-17 10:50:26 -07:00
Alexander Motin
3897e86bd1 Make TX abort after assign safer
It is not right, but there are few examples when TX is aborted
after being assigned in case of error.  To handle it better on
production systems add extra cleanup steps.

While here, replace couple dmu_tx_abort() in simple cases.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17438
2025-06-17 10:50:26 -07:00
Alexander Motin
4c8d0471fa Allow zero compression if dedup is enabled
Having high-refcount dedup entries for zero blocks is inefficient
when they could be recorded as a holes instead.  Normally, zero
compression is not done if compression is disabled to not confuse
naive benchmarks.  But with dedup enabled, it is expected that the
write will be skipped anyway, so we are just optimizing the way it
is skipped.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17435
2025-06-17 10:50:26 -07:00
Tino Reichardt
ea3a600bba ZTS: Enable io_uring on CentOS Stream 9 and 10 also
The io_uring interface is available as a Technology Preview.
Details: https://access.redhat.com/solutions/4723221

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17447
2025-06-17 10:50:26 -07:00
Attila Fülöp
e9c1e08e07 Linux build: silence objtool warnings
After #17401 the Linux build produces some stack related warnings.

Silence them with the `STACK_FRAME_NON_STANDARD` macro.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17410
2025-06-17 10:50:26 -07:00
Brian Behlendorf
1688d9991d CI: Retire Fedora 40 builder
Fedora 40 has gone EOL as of May 2025, retire the CI builder.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17408
2025-06-17 10:50:26 -07:00
Tino Reichardt
e0ad633c64 ZTS: Enable io_uring support on el9/el10
The io_uring interface is available as a Technology Preview.
Details: https://access.redhat.com/solutions/4723221

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17397
2025-06-17 10:50:26 -07:00
Tino Reichardt
da4dfa85eb ZTS: Add AlmaLinux 10
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17397
2025-06-17 10:50:26 -07:00
Rob Norris
7be33d2d40 abd_os: move headers from libzpool to libspl
5b9e695 added specific userspace versions of abd_os.h and abd_impl_os.h
for libzpool. However, abd.h and abd_impl.h, which include them, are
packaged with libzfs, so other programs building against libzfs can
fail to build, either because the headers aren't installed, or because
they aren't on any standard include path.

So, move abd_os.h and abd_impl_os.h to libspl, where they we will be
installed alongside abd.h and abd_impl.h in a known path.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16940
Closes #17390
Closes #17394
2025-06-17 10:50:26 -07:00
Alexander Motin
bf4baee81e Set spa_final_txg in spa_unload()
I've noticed that after some dedup tests system reboot ends up in
assertion about ms_defer tree not free.  It seems to be caused by
DDT flushing still freeing some blocks while ZFS trying to reach
a final steady state due to spa_final_txg, while being set by
spa_export_common() on pool export, is not set when spa_unload()
is called by spa_evict_all() on system reboot/shutdown.  Setting
spa_final_txg in spa_unload() fixes this issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17395
2025-06-17 10:50:26 -07:00
Ameer Hamza
e93d15f112 zpool: clarify ZPOOL_STATUS_REMOVED_DEV status message
Disks can be removed either by the administrator via hotplug or by the
kernel when a disk failure occurs. The previous message implied that
removal was always manual, which could be confusing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17400
2025-06-17 10:50:26 -07:00
Ameer Hamza
f292b0f146 vdev: skip faulting disks pending removal
This patch fixes a race where vdev_remove_wanted may be set after probe
initiation, which could otherwise trigger redundant fault and removal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17400
2025-06-17 10:50:26 -07:00
Brian Behlendorf
bda0bc6304 CI: Retire Ubuntu 20.04 builder
Ubuntu 20.04 has gone EOL as of April 2025, retire the CI builder.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17403
2025-06-17 10:50:26 -07:00
Rob Norris
04493ca819 linux/zvol_os: don't try to set disk ops if alloc fails
If the kernel fails to allocate the gendisk, zvo_disk will be NULL, and
derefencing it will explode. So don't do that.

Sponsored-by: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17396
2025-06-17 10:50:26 -07:00
Attila Fülöp
2c53fe7764 Linux build: always use objtool
We silence `objtool` warnings on some object files using
`OBJECT_FILES_NON_STANDARD_some_file.o`. Nowadays `objtool` is
needed for CPU vulnerability mitigations and a lot more
functionality so its use is desirable.

Just remove the `OBJECT_FILES_NON_STANDARD` definitions. A follow-up
commit is needed to make the offending files standard and address
the compile time warnings.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #17401
Closes #17364
2025-06-17 10:50:26 -07:00
Rob Norris
d7bb6bbf13 tunables: fix spelling
Three occurences with an 'e', and all of them mine. Maybe it's an
British thing?

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
b8f80812a3 tunables: remove __check_old_set_param workaround
This was fully removed from Linux in 4.15, so we won't be seeing it
again.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
97696962b5 tunables: remove unused param get/set aliases
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
06fd6dc6f7 tunables: use Linux ullong param ops for u64
Since 3.17 Linux has provided param ops for 64-bit ints, so we don't
need to use our own anymore.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
28ff5ff1c6 tunables: remove support for s64 tunables
Nothing uses them now.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
e9002887e2 tunables: remove direct use of module_param_cb
The use for spl_taskq_kick was the only use, and the comment that
module_param_call is obsolete is no longer true - it's still very much
used even in recent kernels.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
840b070ec7 tunables: remove FreeBSD compat macros for Linux module params
Nothing in any FreeBSD code uses them.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
d02d3add0d tunables: ensure tunable and variable have same define gate
If a variable is only available in the kernel, then the tunable should
also only be available there.

This matters very little so long as we don't have userspace tunables,
but its still good hygeine.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
cc5724f38d tunables: don't assert initialisation in impl getters
It actually doesn't matter if it's not initialised when we first query
the current value; it just returns empty-string. A crash is quite
obnoxious even if it is a rare case.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Rob Norris
8317244270 zfs_log: make zfs_immediate_write_sz uint
Likely it's only int64 for comparison with ssize_t, which is signed.
However, it would make no sense for it to be less than 0 or greater than
4G, so making it a regular uint will make it safe for comparison and
remove the only S64 tunable in core.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17377
2025-06-17 10:50:26 -07:00
Paul Dagnelie
65cf521353 Only interrupt active disk I/Os in failmode=continue
failmode=continue is in a sorry state. Originally designed to fix a very
specific problem, it causes crashes and panics for most people who end
up trying to use it. At this point, we should either remove it entirely,
or try to make it more usable.

With this patch, I choose the latter. While the feature is fundamentally
unpredictable and prone to race conditions, it should be possible to get
it to the point where it can at least sometimes be useful for some
users. This patch fixes one of the major issues with failmode=continue:
it interrupts even ZIOs that are patiently waiting in line behind stuck
IOs.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17372
2025-06-17 10:49:40 -07:00
Pavel Snajdr
08caad8257 zcp: get_prop: fix encryptionroot and encryption
It was reported that channel programs' zfs.get_prop doesn't work for
dataset properties encryption and encryptionroot.

They are handled in get_special_prop due to the need to call
dsl_dataset_crypt_stats to load those dsl props.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Graham Christensen <graham@grahamc.com>
Closes #17280
2025-06-17 10:49:40 -07:00
Fedor Uporov
d187e3e1a7 ZVOL: Comment platform-specific empty functions bodies on FreeBSD side
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #17383
2025-06-17 10:49:40 -07:00
Ameer Hamza
1215c3b609 Expose dataset encryption status via fast stat path
In truenas_pylibzfs, we query list of encrypted datasets several times,
which is expensive. This commit exposes a public API zfs_is_encrypted()
to get encryption status from fast stat path without having to refresh
the properties.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17368
2025-06-17 10:49:40 -07:00
Alexander Motin
2fe0d5df94 ZIL: Improve write log size accounting
Before this change write log size TXG throttling mechanism was
accounting only user payload bytes.  But the actual ZIL both on
disk and especially in memory include headers of hundred(s) of
bytes.  Not accouting those may allow applications like
bonnie++, in their wisdom writing one byte at a time, to consume
excessive amount of memory and ZIL/SLOG in one TXG.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17373
2025-06-17 10:49:40 -07:00
George Amanakis
fa545db846 ZTS: testing for leaked key mappings in encrypted non-raw send
This test covers a bug fixed by commit ea74cde: performing an
incremental non-raw send from an encrypted filesystem followed by
exporting the pool. Before that commit, exporting the sending pool
in this scenario would trigger a panic:

VERIFY(avl_is_empty(&sk->sk_dsl_keys)) failed
PANIC at dsl_crypt.c:353:spa_keystore_fini()
Call Trace:
 spl_dumpstack+0x29/0x2f [spl]
 spl_panic+0xd1/0xe9 [spl]
 spl_assert.constprop.0+0x1a/0x30 [zfs]
 spa_keystore_fini+0xc2/0xf0 [zfs]
 spa_deactivate+0x25f/0x610 [zfs]
 spa_evict_all+0xf4/0x200 [zfs]
 spa_fini+0x13/0x140 [zfs]
 zfs_kmod_fini+0x72/0xc0 [zfs]
 openzfs_fini_os+0x13/0x3a [zfs]
 openzfs_fini+0x9/0x6b8 [zfs]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #17366
2025-06-17 10:49:40 -07:00
Cameron Harr
cfb9cba51c Refactor man page and CLI help output per mandoc
The man page and the usage statement from the CLI have been refactored
to abide by the ManDoc standard. Style changes include:
 * Upper-case letters before lower-case
 * List short options w/o arguments first
 * Then list short options w/ arguments
 * Then list long arguments

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #17357
2025-06-17 10:49:40 -07:00
Cameron Harr
b647336cc4 Reformat cli help and man page to be in sync
The man page and CLI usage statements were both a little out
of sync and neither fully alphabetized correctly. That has
been fixed. One outstanding question is whether to get rid of
the ellipses on the CLI usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #16004
Closes #17357
2025-06-17 10:49:40 -07:00
Paul Dagnelie
b9324a1e75 Fix off-by-one bug in range tree code
Without this fix, zfs_range_tree_find_in could return an overlap when
the found range starts immediately after the searched range, with no
actual overlap.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17363
2025-06-17 10:49:40 -07:00
Alexander Motin
64e77fdf3b Fix null dereference in spa_vdev_remove_cancel_sync()
We don't really need to access space map to know where the metaslab
ends, while msp->ms_sm might be NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Fixes #17164
Fixes #17359
Closes #17361
(cherry picked from commit 5c30b24381)
2025-05-28 16:00:28 -07:00