Commit Graph

10334 Commits

Author SHA1 Message Date
Ameer Hamza
74bbdda1ef Fix snapshot automount expiry cancellation deadlock
A deadlock occurs when snapshot expiry tasks are cancelled while holding
locks. The snapshot expiry task (snapentry_expire) spawns an umount
process and waits for it to complete. Concurrently, ARC memory pressure
triggers arc_prune which calls zfs_exit_fs(), attempting to cancel the
expiry task while holding locks. The umount process spawned by the
expiry task blocks trying to acquire locks held by arc_prune, which is
blocked waiting for the expiry task to complete. This creates a circular
dependency: expiry task waits for umount, umount waits for arc_prune,
arc_prune waits for expiry task.

Fix by adding non-blocking cancellation support to taskq_cancel_id().
The zfs_exit_fs() path calls zfsctl_snapshot_unmount_delay() to
reschedule the unmount, which needs to cancel any existing expiry task.
It now uses non-blocking cancellation to avoid waiting while holding
locks, breaking the deadlock by returning immediately when the task is
already running.

The per-entry se_taskqid_lock has been removed, with all taskqid
operations now protected by the global zfs_snapshot_lock held as
WRITER. Additionally, an se_in_umount flag prevents recursive waits when
zfsctl_destroy() is called during unmount. The taskqid is now only
cleared by the caller on successful cancellation; running tasks clear
their own taskqid upon completion.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17941
2025-12-10 10:21:29 -08:00
Ameer Hamza
663dc86de2 Fix taskq NULL pointer dereference on timer race
Remove unsafe timer_pending() check in taskq_cancel_id() that created a
race where:
- Timer expires and timer_pending() returns FALSE
- task_done() frees task with tqent_func = NULL
- Timer callback executes and queues freed task
- Worker thread crashes executing NULL function

Always call timer_delete_sync() unconditionally to ensure timer callback
completes before task is freed.

Reliably reproducible by injecting mdelay(10) after setting CANCEL flag
to widen the race window, combined with frequent task cancellations
(e.g., snapshot automount expiry).

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17942
2025-12-10 10:21:29 -08:00
Brian Behlendorf
145c606c60 Linux 6.18 compat: META (#18039)
Update the META file to reflect compatibility with the 6.18
kernel.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-12-10 10:21:29 -08:00
Rob Norris
c9845a1332 Linux: work around use of GPL-only symbol kasan_flag_enabled
We may not be able to avoid our code referencing the symbol, but we can
ensure that a symbol of that name is available to the linker during
build, and so not require linking the GPL-exported version.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18009
Closes #18040
2025-12-10 10:21:29 -08:00
Chunwei Chen
028d66b9dd Fix ddtprune causing space leak
In zio_ddt_free, if a pruned dde is still in ddt, it would do nothing
and cause space leak.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #17982
Closes #17983
2025-12-10 10:21:29 -08:00
Tony Hutter
206487b9b1 CI: Fix Ubuntu 22.01 rsend failures
For whatever reason, the single `log_note` in the `directory_diff`
function causes the function to stop executing on Ubuntu 22.  This
causes most of the rsend tests to fail.  Remove the line since it's only
informational.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2025-12-10 10:21:29 -08:00
Alex
f8572e2a97 Fix a declaration position of the nth_page.
Compilation time bug introduced by 87df5e4 commit.
Fix for the compilation error(Linux kernel 6.18.0):
"zfs/module/os/linux/zfs/abd_os.c:920:32: error: implicit declaration
of function ‘nth_page’; did you mean ‘pte_page’?
[-Werror=implicit-function-declaration]".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: agiUnderground <alex.dev.cv@gmail.com>
Closes #18034
2025-12-10 10:21:29 -08:00
Brian Behlendorf
8c1eaea952 CI: exclude signed-off-by/reviewed-by from 72 char limit
Allow an author or reviewer's name and email address to exceed
the 72 character limit enforced by the commitcheck target.

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18030
2025-12-10 10:21:29 -08:00
bspengler-oss
25d755e108 Fix HIGHMEM/kmap API violation in zfs_uiomove_bvec_impl()
Fix another instance where ZFS assumes multiple pages can be
mapped at once via zfs_kmap_local(), resulting in crashes and
potential memory corruption on HIGHMEM-enabled (typically 32-bit)
systems.

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
Closes #15668
Closes #18030
2025-12-10 10:21:29 -08:00
bspengler-oss
5946eeb8df Preserve LIFO ordering of kmap ops in abd_raidz_gen_iterate()
ZFS typically preserves proper LIFO ordering regarding map/unmap
operations that wrap the Linux kernel's kmap interfaces that
require such ordering, but one instance in abd_raidz_gen_iterate()
did not.

Similar issues have been fixed in the Linux kernel in the past,
see for instance CVE-2025-39899 for userfaultfd.

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
Closes #15668
Closes #18030
2025-12-10 10:21:29 -08:00
bspengler-oss
5e271995d1 Fix interaction of abd_iter_map()/abd_iter_unmap() with HIGHMEM
HIGHMEM kmap interfaces operate on only a single page at a time
yet ZFS hadn't accounted for this, resulting in crashes and
potential memory corruption on HIGHMEM (typically 32-bit) systems.
This was caught by PaX's KERNSEAL feature as it makes use of
HIGHMEM functionality on x64.

On typical 64-bit systems, this issue wouldn't have been observed,
as the map interfaces simply fall back to returning an address in
lowmem where the contiguous pages can be accessed directly.

Joint work with the PaX Team, tested by Mark van Dijk

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
Closes #15668
Closes #18030
2025-12-10 10:21:29 -08:00
Mark Johnston
a2f768f61f FreeBSD: Fix a potential null dereference in zfs_freebsd_fsync()
In general it's possible for a vnode to not have an associated VM
object.  This happens in particular with named pipes, which have
some distinct VOPs, defined in zfs_fifoops.  Thus, this chunk of
zfs_freebsd_fsync() needs to check for the FIFO case, like other
vm_object_mightbedirty() callers do.

(Note that vn_flush_cached_data() calls are predicated on
zn_has_cached_data() returning true, and it checks for a NULL v_object
pointer already.)

Fixes: ef4058fcdc
Reported-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Sean Eric Fagan <sef@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #18015
2025-12-10 10:21:29 -08:00
Alan Somers
872266a5f3 During CI, use nproc instead of sysctl -n hw.ncpu
The latter may give the wrong result if cpusets are in use.

Sponsored by:	ConnectWise
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #18012
2025-12-10 10:21:29 -08:00
Brian Behlendorf
ed87bc593f ZTS: Add slow_vdev_degraded_sit_out retry
While not common the draid3 vdev type has been observed to
not always sit out a vdev when run in the CI.  To prevent
continued false positives allow the test to be retried up
to three times before considering it a failure.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18003
2025-12-10 10:21:29 -08:00
Alexander Motin
e1f0baa546 FreeBSD: Remove HAVE_INLINE_FLSL use
These macros are deprecated in FreeBSD kernel for several years,
and unneeded for much longer.  Instead, similar to Linux, let
kernel let compiler do the right things.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18004
2025-12-10 10:21:29 -08:00
Alexander Motin
071369803e raidz_test: Restore rand_data protection
It feels dirty to modify protection of a memory allocated via libc,
but at least we should try to restore it before freeing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-10 10:21:29 -08:00
Alexander Motin
6e10a51b74 raidz_test: Fix ZIO ABDs initialization
- When filling ABDs of several segments, consider offset.
 - "Corrupt" ABDs with actually different data to fail something.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-10 10:21:29 -08:00
Alexander Motin
001ce40cd4 raidz_test: Set io_offset reasonably
- io_offset of 1 makes no sense.  Set default to 0.
 - Initialize io_offset in all cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-10 10:21:29 -08:00
Alexander Motin
68c1df8db3 ZFS: Enable more logs for raidz_001_neg
The output is not so big here, so lets collect something useful.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-10 10:21:29 -08:00
Alexander Motin
a41ef36858 DDT: Reduce global DDT lock scope during writes
Before this change DDT lock was taken 4 times per written block,
and as effectively a pool-wide lock it can be highly congested.
This change introduces a new per-entry dde_io_lock, protecting some
fields during I/O ready and done stages, so that we don't need the
global lock there.

According to my write tests on 64-thread system with 4KB blocks this
significantly reduce the global lock contention, reducing CPU usage
from 100% to expected ~80%, and increasing write throughput by 10%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17960
2025-12-10 10:21:29 -08:00
Alexander Motin
a785ddc5f3 DDT: Switch to using wmsums for lookup stats
ddt_lookup() is a very busy code under a highly congested global
lock.  Anything we can save here is very important.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17980
2025-12-10 10:21:29 -08:00
Alexander Motin
2aad3dee23 DDT: Make children writes inherit allocator
Even though unlike gang children it is not so critical for dedup
children to inherit parent's allocator, there is still no reason
for them to have allocation policy different from normal writes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17961
2025-12-10 10:21:29 -08:00
Tony Hutter
cdbe788a39 CI: zfs-test-packages: Add in new repos
Test install from our new repos: zfs-latest, zfs-legacy,
zfs-2.3, zfs-2.2, from the zfs-test-packages workflow.
This on-demand workflow is use to verify that the zfs RPMs
in the repos are correct.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17956
2025-12-10 10:21:29 -08:00
Rob Norris
d12eb47d96 config/kmap_atomic: initialise test data
6.18 changes kmap_atomic() to take a const pointer. This is no problem
for the places we use it, but Clang fails the test due to a warning
about being unable to guarantee that uninitialised data will definitely
not change. Easily solved by forcibly initialising it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17954
2025-12-10 10:21:29 -08:00
Rob Norris
304810208e zvol_id: make array length properly known at compile time
Using strlen() in an static array declaration is a GCC extension. Clang
calls it "gnu-folding-constant" and warns about it, which breaks the
build. If it were widespread we could just turn off the warning, but
since there's only one case, lets just change the array to an explicit
size.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17954
2025-12-10 10:21:29 -08:00
Rob Norris
aa091a17bd Linux: bump -std to gnu11
Linux switched from -std=gnu89 to -std=gnu11 in 5.18
(torvalds/linux@e8c07082a8). We've always overridden that with gnu99
because we use some newer features.

More recent kernels are using C11 features in headers that we include.
GCC generally doesn't seem to care, but more recent versions of Clang
seem to be enforcing our gnu99 override more strictly, which breaks the
build in some configurations.

Just bumping our "override" to match the kernel seems to be the easiest
workaround. It's an effective no-op since 5.18, while still allowing us
to build on older kernels.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17954
2025-12-10 10:21:29 -08:00
Alexx Saver
f45622ff42 chksum: run 256K benchmark on demand, preserve chksum_stat_data
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexx Saver <lzsaver.eth@ethermail.io>
Co-authored-by: Adam Moss <c@yotes.com>
Closes #17945
Closes #17946
2025-12-10 10:21:29 -08:00
Alexander Motin
2e09f166f0 FreeBSD: Fix uninitialized variable error
On FreeBSD errno is defined as (* __error()), which means compiler
can't say whether two consecutive reads will return the same.
And without this knowledge the reported error is formally right.

Caching of the errno in local variable fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17975
2025-12-10 10:21:29 -08:00
Shreshth3
c8ecd63acd zpool: fix special vdev -v -o conflict
Right now, running `zpool list` with -v and -o passed
does not work properly for special vdevs. This commit
fixes that problem.

See the discussion on #17839.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Closes #17932
2025-12-10 10:21:29 -08:00
Brian Behlendorf
d06ebddee4 CI: Add smatch static analysis workflow
Smatch is an actively maintained kernel-aware static analyzer
for C with a low false positive rate.  Since the code checker
can be run relatively quickly against the entire OpenZFS code
base (15 min) it makes sense to add it as a GitHub Actions
workflow.  Today smatch reports a significant numbers warnings
so the workflow is configured to always pass as long as the
analysis was run.  The results are available for reference.
Long term it would ideal to resolve all of the errors/warnings
at which point the workflow can be updated to fail when new
problems are detected.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Toomas Soome <tsoome@me.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17935
2025-12-09 15:34:45 -08:00
Toomas Soome
040c533280 cmd/zpool cstyle issues
add missing headers.
usage() is no-return, so anything after call to it is unreachable code.
use (void) cast where we do ignore return value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #17885
2025-12-09 15:34:13 -08:00
Brian Behlendorf
099f69ff5d Tag 2.4.0-rc4
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-11-12 13:10:09 -08:00
Brian Behlendorf
7a919fb70c Update all ABI files
Refresh all ABI files using the CI generated files to reflect
the library interfaces to be published for the 2.4 release.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17911
2025-11-12 13:07:36 -08:00
Brian Behlendorf
5714090fb9 libspl: hide zfs_tunable_* symbols
The zfs_tunable_* functions are a public interface which are
part of the internal libspl convenience library.  They should
be hidden to prevent an unnecessary ABI change in installed
libraries which link against libspl (e.g. libzfs_core, libuutil).

We do already leak long standing libspl symbols.  This commit is
solely intended to prevent leaking these new ones until this is
properly sorted out.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17911
2025-11-12 13:07:32 -08:00
Brian Behlendorf
5b2489caf2 Bump SONAME of libzfs and libzpool
The ABI of libzfs and libzpool have breaking changes since the
last major release.  Bump the SONAME for the upcoming 2.4 release
branch to libzfs7 and libzpool7.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17911
2025-11-12 13:07:28 -08:00
Brian Behlendorf
ff536b1538 Bump SONAME on libnvpair
The nvlist_snprintf() function was added to the ABI of libnvpair.
No other symbols were modified or removed.  Bump the library-info
SONAME current and age args to reflect this is a minor library
version update.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17911
2025-11-12 13:07:23 -08:00
Adi-Goll
7ebb5e9b3f Reduce timeout to zero when running inside a container
Detect container environments and set timeout to zero unless
ZFS_MODULE_TIMEOUT is already set. This avoids an unnecessary ten
second delay after running zfs/zpool commands in a container where
/dev/zfs is unavailable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com>
Closes #15165
Closes #17922
2025-11-12 13:07:20 -08:00
Mariusz Zaborski
1e8c96d7d5 Add knob to disable slow io notifications
Introduce a new vdev property `VDEV_PROP_SLOW_IO_REPORTING` that
allows users to disable notifications for slow devices.
This prevents ZED and/or ZFSD from degrading the pool due to slow
I/O.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org>
Closes 17477
2025-11-12 13:07:14 -08:00
Alexander Motin
41878d57ea Add BRT support to zpool prefetch command
Implement BRT (Block Reference Table) prefetch functionality similar
to existing DDT prefetch.  This allows preloading BRT metadata into
ARC to improve performance for block cloning operations and frees
of earlier cloned blocks.

Make -t parameter optional.  When omitted, prefetch all supported
metadata types (both DDT and BRT now).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17890
2025-11-12 13:07:09 -08:00
Alexander Motin
002bc3da6a BRT: Increase block size from 4KB to 8KB
According to my observations, BRT ZAPs are typically compressible
3:1 for data and 2:1 for indirects.  With ashift=12, typical these
days, it means increasing the block sizes to 8KB we may get most
of possible compression, reducing on-disk and in-ARC BRT footprint
in half by the cost of some compression/decompression overhead,
but without real write inflation, only some dirty data increase.

Increase to 32KB similar to DDT could further increase compression
and storage efficiency, but at the cost of write inflation and
much bigger dirty data increase, which we can not properly control
now.  So lets leave this for a time when BRT log gets implemented.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17916
2025-11-12 13:07:04 -08:00
Alexander Motin
e895c76194 ZAP: Remove dmu_object_info_from_dnode() call
dmu_object_info_from_dnode() takes two locks and copies plenty of
data that we don't need in zap_lockdir_impl().  Just read dn_type
directly in this hot path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17921
2025-11-12 13:07:00 -08:00
Rob Norris
ac0bc4cc00 spa_misc: add an API for spa_namespace_lock
This is useful as debugging support, as it lets namespace lock
operations be traced directly. It will also be useful for future work to
reduce the use of spa_namespace_lock, traditionally a source of
difficult deadlocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17906
2025-11-12 13:06:54 -08:00
Alexander Motin
e305c7d596 BRT: Fix ranges to blocks conversion math
BRT_RANGESIZE_TO_NBLOCKS() takes number of ranges as its argument.
To get number of blocks we should multiply it by the entry size,
not divide by it, as it was due to missing parentheses.

Before #17875 this could cause small memory corruptions for vdevs
bigger than 64TB, but the change made the bug more noticeable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17886
Closes #17915
2025-11-12 13:06:48 -08:00
Adi-Goll
e1734111fd Update man page description of zpool rewind
Update description of zpool import --rewind-to-checkpoint in
man/man7/zpoolconcepts.7 to explain that rewinding automatically
discards a checkpoint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com>
Closes #12646
Closes #17918
2025-11-12 13:06:43 -08:00
Alexander Motin
aaf374bd40 ZIO: Set minimum number of free issue threads to 32
Free issue threads might block waiting for synchronous DDT, BRT or
GANG header reads. So unlike other taskqs using ZTI_SCALE to scale
with number of CPUs, here we also need some amount of threads to
potentially saturate pool reads.  I am not sure we always want the
96 threads we had before ZTI_SCALE introduction at #11966 on small
systems, but lets make it at least 32.

While here, make free taskqs configurable, similar to read and
write ones.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17903
2025-11-12 13:06:39 -08:00
rmacklem
583db40030 FreeBSD: Add support for _PC_CASE_INSENSITIVE
FreeBSD now has a pathconf name called _PC_CASE_INSENSITIVE
used to check if a file system performs case insensitive
name lookups.

This patch adds support for this name.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #17908
2025-11-12 13:06:36 -08:00
Brian Behlendorf
84dd55510b zstd: disable intrinsics
Disable the aarch64 NEON SIMD intrinsics for kernel builds.  Safely
using them in the kernel context requires saving/restoring the FPU
registers which is not currently done.

Additionally, remove the aarch64 optimized PREFETCH_L1 and PREFETCH_L2
instruction.  Rely on the more portable compiler built ins.

This lets us remove the problematic workaround in the aarch64_compat.h
header which undefines the __aarch64__ macro.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17904
Closes #17852
2025-11-12 13:06:22 -08:00
Adi-Goll
015729a11b Fix typo in vdev_raidz.c
Change the spelling of "begining" on line 4875 to
"beginning".

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com>
Closes #17905
2025-11-12 13:06:19 -08:00
Toomas Soome
4fd926ab40 libzfs: ignoring unreachable code
We have infinite loop and on certain condition, we exit this loop
and thread with pthread_exit(). But also after this loop,
we have a code to perform pthread_cleanup_pop() and return from the
thread.

The  problem is that modern compilers are able to recognize that we
actually never get to the statements after loop and therefore
it is dead code there.

I think, instead of pthread_exit(), it is better to break out of loop
and let the last statements to work as intended. This is because
we do need to keep pthread_cleanup_pop() anyhow. Of course,
it is matter of taste if we want to use return or pthread_exit as very
last statement in this function.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #17900
2025-11-12 13:06:15 -08:00
Rob Norris
7b121388fb man: describe zfs-rewrite method and properties
We've heard anecdotes that suggest some
confusion/surprise/disappointment that a changed recordsize is not
applied during rewrite. Until such time as we actually can do that, we
can at least explicitly mention it at something that doesn't work.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17898
2025-11-12 13:06:10 -08:00