Compare commits

..

405 Commits

Author SHA1 Message Date
Ryan Moeller
ac0fd40c8c Add zpool properties for allocation class space
The existing zpool properties accounting pool space (size, allocated,
fragmentation, expandsize, free, capacity) are based on the normal
metaslab class or are cumulative properties of several classes combined.

Add properties reporting the space accounting metrics for each metaslab
class individually.

Also introduce pool-wide AVAIL, USABLE, and USED properties reporting
values corresponding to FREE, SIZE, and ALLOC deflated for raidz.

Update ZTS to recognize the new properties and validate reported values.

While in zpool_get_parsable.cfg, add "fragmentation" to the list of
parsable properties.

Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Ryan Moeller <ryan.moeller@klarasystems.com>
Cloes #18238
2026-03-02 15:50:23 -08:00
Ryan Moeller
6ba3f915d0 zcommon: Fix description of vdev capacity format
Capacity is reported as a percentage not a size.

Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Ryan Moeller <ryan.moeller@klarasystems.com>
Closes #18238
2026-03-02 15:49:23 -08:00
Akash B
f8e5af53e9
Fix redundant declaration of dsl_pool_t
Remove redundant dsl_pool variable and duplicate spa_get_dsl()
call in vdev_rebuild_thread.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #18263
2026-02-27 10:39:52 -08:00
Andriy Tkachuk
f8457fbdc4
Fix deadlock on dmu_tx_assign() from vdev_rebuild()
vdev_rebuild() is always called with spa_config_lock held in
RW_WRITER mode. However, when it tries to call dmu_tx_assign()
the latter may hang on dmu_tx_wait() waiting for available txg.
But that available txg may not happen because txg_sync takes
spa_config_lock in order to process the current txg. So we have
a deadlock case here:

 - dmu_tx_assign() waits for txg holding spa_config_lock;
 - txg_sync waits for spa_config_lock not progressing with txg.

Here are the stacks:

    __schedule+0x24e/0x590
    schedule+0x69/0x110
    cv_wait_common+0xf8/0x130 [spl]
    __cv_wait+0x15/0x20 [spl]
    dmu_tx_wait+0x8e/0x1e0 [zfs]
    dmu_tx_assign+0x49/0x80 [zfs]
    vdev_rebuild_initiate+0x39/0xc0 [zfs]
    vdev_rebuild+0x84/0x90 [zfs]
    spa_vdev_attach+0x305/0x680 [zfs]
    zfs_ioc_vdev_attach+0xc7/0xe0 [zfs]

    cv_wait_common+0xf8/0x130 [spl]
    __cv_wait+0x15/0x20 [spl]
    spa_config_enter+0xf9/0x120 [zfs]
    spa_sync+0x6d/0x5b0 [zfs]
    txg_sync_thread+0x266/0x2f0 [zfs]

The solution is to pass txg returned by spa_vdev_enter(spa)
at the top of spa_vdev_attach() to vdev_rebuild() and call
dmu_tx_create_assigned(txg) which doesn't wait for txg.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Closes #18210
Closes #18258
2026-02-26 11:18:02 -08:00
Rob Norris
f3d4c79496
zpl_super: prefer "new" mount API when available
This API has been available since kernel 5.2, and having it available
(almost) everywhere should give us a lot more flexibility for mount
management in the future.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18260
2026-02-25 13:17:33 -08:00
Rob Norris
09c27a14a3 icp: add SHA512 implementation using Intel SHA512 extensions
Generated from crypto/sha/asm/sha512-x86_64.pl in
openssl/openssl@241d4826f8.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18233
2026-02-25 12:48:30 -08:00
Rob Norris
3547a358fd simd: detect and surface support for Intel SHA512 extensions
Recent Intel CPUs (starting with Arrow Lake and Lunar Lake) include new
vectorised SHA512 instructions. Detect them and make them available to
the rest of the system.

Note the internal name "sha512ext". This is to disambiguate from other
uses of "sha512".

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18233
2026-02-25 12:47:48 -08:00
clefru
6495dafd58
range_tree: use zfs_panic_recover() for partial-overlap remove
zfs_range_tree_remove_impl() used a bare panic() when a segment to be
removed was not completely overlapped by an existing tree entry.  Every
other consistency check in range_tree.c uses zfs_panic_recover(), which
respects the zfs_recover tunable and allows pools with on-disk
corruption to be imported and recovered.  This one call was
inconsistent, making the partial-overlap case unrecoverable regardless
of zfs_recover.

Replace panic() with zfs_panic_recover() so that operators can set
zfs_recover=1 to import a corrupted pool and reclaim data, consistent
with all other range tree error paths.

Related-to: https://github.com/openzfs/zfs/issues/13483
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #18255
2026-02-25 11:26:10 -08:00
Tony Hutter
4da3f059a3
CI: Remove deprecated Fedora 41
Fedora 41 was deprecated on Dec 15 2025.  Remove it from CI tests.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18261
2026-02-25 11:20:23 -08:00
Alexander Motin
991fc56fae
Introduce dedupused/dedupsaved pool properties
Currently there is only a dedup ratio reported via pool properties.
If dedup is enabled only for some datasets, it is impossible to say
how much space the ratio actually covers.  Fix this by introducing
dedupused/dedupsaved pool properties, similar to earlier added
block cloning ones.  Combined with work to expose allocation classes
stats, it should give user-space enough visibility to correlate
`zpool list` and `zfs list` space numbers.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ryan Moeller <ryan.moeller@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18245
2026-02-25 09:41:38 -05:00
Mateusz Piotrowski
3408332d71
zhack: Fix importing large allocation profiles on small pools (#18256)
This patch fixes a segmentation fault in zhack metaslab leak which might
be triggered by feeding zhack with a fragmentation profile that's
exported from a pool larger than the target pool.

Fixes: 8f15d2e4d5
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>

Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
2026-02-24 10:24:22 -08:00
Rob Norris
0f608aa6ca Linux 7.0: add shims for the fs_context-based mount API
The traditional mount API has been removed, so detect when its not
available and instead use a small adapter to allow our existing mount
functions to keep working.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18216
2026-02-23 09:45:12 -08:00
Rob Norris
d34fd6cff3 Linux 7.0: posix_acl_to_xattr() now allocates memory
Kernel devs noted that almost all callers to posix_acl_to_xattr() would
check the ACL value size and allocate a buffer before make the call. To
reduce the repetition, they've changed it to allocate this buffer
internally and return it.

Unfortunately that's not true for us; most of our calls are from
xattr_handler->get() to convert a stored ACL to an xattr, and that call
provides a buffer. For now we have no other option, so this commit
detects the new version and wraps to copy the value back into the
provided buffer and then free it.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18216
2026-02-23 09:44:48 -08:00
Rob Norris
204de946eb Linux 7.0: blk_queue_nonrot() renamed to blk_queue_rot()
It does exactly the same thing, just inverts the return. Detect its
presence or absence and call the right one.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18216
2026-02-23 09:44:20 -08:00
Attila Fülöp
7744f04962
SIMD: libspl: test the correct CPUID bit for AVX512VL
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #18254
2026-02-23 09:42:25 -08:00
Christos Longros
6a717f31e6
Improve misleading error messages for ZPOOL_STATUS_CORRUPT_POOL
When devices are missing or claimed by another subsystem (e.g.
mdadm, LVM), zpool import reports "The pool metadata is corrupted"
and suggests destroying the pool. This is misleading because the
metadata is not necessarily corrupted -- it may simply be incomplete
due to inaccessible devices.

Update the status, action, and recovery messages to acknowledge
that missing devices can trigger this status, and suggest checking
device availability before resorting to pool destruction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Longros <chris.longros@gmail.com>
Closes #18251
Closes #8236
2026-02-23 09:41:24 -08:00
Louis Leseur
bbf0106c6b
build: get objtool from $kernelbuild
On systems where `$kernelsrc` is different than `$kernelbuild`, the
objtool binary will be located in `$kernelbuild` as it's the result of
running `make prepare` during kernel build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Louis Leseur <louis.leseur@gmail.com>
Closes #18248
Closes #18249
2026-02-23 09:39:51 -08:00
MigeljanImeri
4975430cf5
Add vdev property to disable vdev scheduler
Added vdev property to disable the vdev scheduler.
The intention behind this property is to improve IOPS
performance when using o_direct.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: MigeljanImeri <ImeriMigel@gmail.com>
Closes #17358
2026-02-23 09:34:33 -08:00
Tony Hutter
d2f5cb3a50
Move range_tree, btree, highbit64 to common code
Break out the range_tree, btree, and highbit64/lowbit64 code from kernel
space into shared kernel and userspace code.  This is needed for the
updated `zpool status -vv` error byte range reporting that will be
coming in a future commit.  That commit needs the range_tree code in
kernel and userspace.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18133
2026-02-22 11:43:51 -08:00
Rob Norris
168023b603
Linux 7.0: explicitly set setlease handler to kernel implementation
The upcoming 7.0 kernel will no longer fall back to generic_setlease(),
instead returning EINVAL if .setlease is NULL. So, we set it explicitly.

To ensure that we catch any future kernel change, adds a sanity test for
F_SETLEASE and F_GETLEASE too. Since this is a Linux-specific test,
also a small adjustment to the test runner to allow OS-specific helper
programs.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18215
2026-02-22 11:39:06 -08:00
Rob Norris
d11c661544 zdb: handle key load/derive failures a bit more gracefully
There's no real need to outright crash if key loading fails; we can
just unwind nicely.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18230
2026-02-20 13:37:43 -08:00
Rob Norris
9f874ad092 zdb: don't try to load key for unencrypted dataset
Previously using -K/--key on an unencrypted dataset would trip a VERIFY,
because the dataset has nowhere to load the key into.

Now, just ignore it. This makes zdb much easier to drive when there's a
mix of encrypt and non-encrypted datasets, as the key can provided for
all of them (at least, assuming the same encryption root, which is a
common enough case).

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18230
2026-02-20 13:37:11 -08:00
Rob Norris
b021cb60aa ZTS: make get_same_blocks() fail harder if zdb fails
Because it's called in $(...), it will swallow all errors, so we have to
work harder to recognise falure and echo a string that can't ever match
what the test is expecting.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18230
2026-02-20 13:36:49 -08:00
Rob Norris
aeb9fb3828 sha2_test: do correctness checks for all implementations
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18232
2026-02-19 15:16:36 -08:00
Rob Norris
b291d9aa22 get_cpu_freq: handle CPUs with variable frequency
If a CPU has variable frequency, then lscpu will list separate "CPU min
freq" and "CPU max freq" values. In this case, take the maximum.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18232
2026-02-19 15:16:18 -08:00
Alexander Motin
d06a1d9ac3
Fix available space accounting for special/dedup (#18222)
Currently, spa_dspace (base to calculate dataset AVAIL) only includes
the normal allocation class capacity, but dd_used_bytes tracks space
allocated across all classes.  Since we don't want to report free
space of other classes as available (we can't promise new allocations
will be able to use it), report only allocated space, similar to how
we report space saved by dedup and block cloning.

Since we need deflated space here, make allocation classes track
deflated allocated space also.  While here, make mc_deferred also
deflated, matching its use contexts.  Also while there, use
atomic_load() to read the allocation class stats.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18190
Closes #18222
2026-02-19 10:36:35 -08:00
Tony Hutter
640a217faf
CI: Test & fix Linux ZFS built-in build
ZFS can be built directly into the Linux kernel.  Add a test build
of this to the CI to verify it works.  The test build is only enabled
on Fedora runners (since they run the newest kernels) and is done in
parallel with ZTS.  The test build is done on vm2, since it typically
finishes ~15min before vm1 and thus has time to spare.

In addition:

- Update 'copy-builtin' to check that $1 is a directory
- Fix some VERIFYs that were causing the built-in build to fail

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18234
2026-02-19 10:15:41 -08:00
Attila Fülöp
c8a72a27e5
ICP: AES-GCM assembly: remove unused Gmul functions
In the AES-GCM assembly files we are defining Gmul functions we
don't use anywhere.

Just remove the dead code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #18226
2026-02-19 10:10:02 -08:00
Alexander Motin
370570890f
Remove parent ZIO from dbuf_prefetch()
I am not sure why it was added there 10 years ago, but it seems not
needed now.  According to my tests removing it improves sequential
read performance with recordsize=4K by 5-10% by reducing the CPU
overhead in prefetcher.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18214
2026-02-18 18:12:13 -08:00
Attila Fülöp
d489677280
ICP: AES-GCM VAES-AVX2: fix typos and document source files
Require AVX2 compiler support and document source files for
`aesni-gcm-avx2-vaes.S`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #18225
2026-02-17 16:51:32 -08:00
Jessica Clarke
bfb276e55c
freebsd: Fix TIMESPEC_OVERFLOW for PowerPC
Once upon a time, 32-bit PowerPC did indeed have a 32-bit time_t, but
FreeBSD 12.0 switched to a 64-bit time_t for PowerPC as an ABI break,
which predates the addition of FreeBSD support to OpenZFS. Moreover,
64-bit PowerPC has existed since FreeBSD 9.0, where __powerpc__ is also
defined (alongside __powerpc64__ to disambiguate), which has always had
a 64-bit time_t. This code has therefore always been wrong for all
PowerPC variants. Fix this by limiting the 32-bit case to just i386,
which is the only architecture in FreeBSD to have a 32-bit time_t and
not have broken ABI, due to its special legacy compatibility status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Closes #18217
Closes #18218
2026-02-17 16:46:02 -08:00
Attila Fülöp
bee53d8c10
Linux 6.19 compat: in-tree build: fix duplicate GCM assembly functions
Linux 6.19 added an AES-GCM VAES-AVX2 assembly implementation. It's
basically a translation from the BoringSSL perlasm syntax to macro
assembly. We're using the same source but the perlasm generated flat
assembly which shares some global function names with the former.
When  building in-tree this results in the linker failing due to the
duplicate symbols.

To avoid the error we prepend `icp_` via a macro to our function
names.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Moch <mail@alexmoch.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #18204
Closes #18224
2026-02-17 13:09:41 -08:00
Alexander Motin
0f9564e85b
Simplify dnode_level_is_l2cacheable()
We should not dereference through dn_handle->dnh_dnode once we
already have a dnode pointer.  The result will be the same.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18212
2026-02-16 10:34:22 -05:00
Alexander Motin
ba970eb202
Cleanup allocation class selection
- For multilevel gang blocks it seemed possible to fallback from
normal to special class, since they don't have proper object type,
and DMU_OT_NONE is a "metadata".  They should never fallback.
 - Fix possible inversion with zfs_user_indirect_is_special = 0,
when indirects written to normal vdev, while small data to special.
Make small indirect blocks also follow special_small_blocks there.
 - With special_small_blocks now applying to both files and ZVOLs,
make it apply to all non-metadata without extra checks, since there
are no other non-metadata types.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18208
2026-02-16 10:33:21 -05:00
Mariusz Zaborski
cdf89f413c
Flush RRD only when TXGs contain data
This change modifies the behavior of spa_sync_time_logger when
flushing the RRD database.

Previously, once the sync interval elapsed, a flush would always
be generated. On solid-state devices, especially when the pool was
otherwise idle, this caused disks to wake up solely to write RRD
data. Since RRD is best-effort telemetry, this behavior is
unnecessary and wasteful.

With this change, spa_sync_time_logger delays flushing until a TXG
that already contains data is being synced. The RRD update is
appended to that TXG instead of forcing the creation of
a new write-only TXG.

During pool export, flushing is forced regardless of whether
the TXG contains user data. At that stage, data durability takes
precedence and a write must be issued.

Sponsored by: [Wasabi Technology, Inc.; Klara, Inc.]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Closes #18082
Closes #18138
2026-02-11 11:35:45 -08:00
Marc Sladek
cc184fe98b
Fix send:raw permission for send -w -I
When performing an incremental raw send with intermediates (-w -I),
the standard 'send' permission was incorrectly required instead of
allowing 'send:raw'. This was due to a strict boolean comparison on
the 'rawok' flag in zfs_secpolicy_send() with non-boolean value.

This change normalizes the 'rawok' variable to be strictly 0/1 and
updates the test suite to properly verify delegated raw send behavior.

Introduced-by: https://github.com/openzfs/zfs/pull/17543
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Marc Sladek <marc@sladek.dev>
Closes #18198
Closes #18193
2026-02-11 10:30:26 -08:00
Tony Hutter
3463d40779
ZTS: Fix zed_synchronous_zedlet
Wait for scrub_finish (as the comments in the code suggest) rather
than trim_finish in zed_synchronous_zedlet.ksh.  This seems to
workaround the ZTS failures in #18192.  Also, fix some typos.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18192
Closes #18196
2026-02-11 10:05:14 -08:00
Tony Hutter
fdd70565cb
Linux 6.19 compat: META
Update the META file to reflect compatibility with the 6.19
kernel.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18197
2026-02-11 09:37:02 -08:00
Christos Longros
040ba7a7ca
libzfs: improve error message for zpool create with ENXIO
When zpool create fails because a vdev cannot be opened (ENXIO),
the error falls through to zpool_standard_error() which reports
the generic 'one or more devices is currently unavailable'. This
is misleading when the real cause is a block size mismatch or
other device open failure.

Add an explicit ENXIO case in zpool_create()'s error handling to
provide a more descriptive message.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18184
Closes #11087
2026-02-10 13:19:44 -08:00
Tony Hutter
e601a1fb77
CI: Test build Lustre against ZFS
The Lustre filessytem calls a number of exported ZFS functions.  Do a
test build on the Almalinux runners to make sure we're not breaking
Lustre.  We do the Lustre build in parallel with the normal ZTS test
for efficiency, since ZTS isn't very CPU intensive. The full Lustre
build takes around 15min when run on its own.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18161
2026-02-10 09:54:17 -08:00
Alexander Motin
aa29455dd7
Restrict cloning with different properties
While technically its not a problem to clone between datasets with
different properties, it might create expectation of new properties
being applied during data move, while actually it won't happen.
For copies and checksum it may mean incorrect safety expectations.
For dedup, compression and special_small_blocks -- performance and
space usage. New zfs_bclone_strict_properties tunable controls it.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18180
2026-02-10 09:53:24 -08:00
rmacklem
1412bdc6c2
zfs_vnops_os.c: Move a vput() to after zfs_setattr_dir()
Without this patch, the following crash can occur when
a file system is configured with "xattr=dir".

VNASSERT failed: locked not true at
 /posix-acl/freebsd-rdma/sys/kern/vfs_subr.c:5786 (assert_vop_locked)
    hold count flags ()
    flags ()
    lock type zfs: UNLOCKED
panic: zfs_dirent_lookup: vnode is not locked but should be
cpuid = 3
time = 1770520763
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b
vpanic() at vpanic+0x136/frame 0xfffffe00914c8270
panic() at panic+0x43/frame 0xfffffe00914c82d0
assert_vop_locked() at assert_vop_locked+0x78
zfs_dirent_lookup() at zfs_dirent_lookup+0x41
zfs_setattr_dir() at zfs_setattr_dir+0x123
zfs_setattr() at zfs_setattr+0x1389
zfs_freebsd_setattr() at zfs_freebsd_setattr+0x56b
VOP_SETATTR_APV() at VOP_SETATTR_APV+0x5d
setfown() at setfown+0xb1
kern_fchownat() at kern_fchownat+0x192

This patch fixes the problem by moving the vput() call for
attrzp to after the zfs_setattr_dir() call that takes it as
an argument.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes: #18188
2026-02-10 09:29:37 -05:00
Tim Hatch
64bae56b00
Include missing newline in 'man' error
Because the `strerror` result doesn't include a newline, we need to add
one.  Observed on a minimal system that doesn't have `man` installed,
which behaves like this before the fix:

```
[root@upper tim]# zpool help import
couldn't run man program: No such file or directory[root@upper tim]#
```

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Hatch <tim@timhatch.com>
Closes #18183
2026-02-09 10:19:08 -08:00
Alexander Motin
2646bd5585
Allow rewrite skip cloned and snapshotted blocks
Rewrite of cloned and snapshotted blocks can allocate additional
space, that may be undesired.  In some cases it may have sense
to still rewrite snapshotted blocks, expecting the snapshots to
rotate with time, freeing space.  In other cases rewrite of cloned
blocks may be acceptable, despite persistent space usage increase.
For this reason add them as separate flags to `zfs rewrite`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18179
2026-02-09 10:17:56 -08:00
Rob Norris
15fbf534c6
AUTHORS: add names of recent new contributors
"Welcome to my house! Enter freely. Go safely, and leave something of
the happiness you bring!"

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18189
2026-02-09 10:11:09 -08:00
Brian Behlendorf
ae488e496f ZTS: update the relevant mmp test cases
- mmp_concurrent_import: added test case to verify that concurrent
  import correctness.  The pool may only be imported once.

- mmp_exported_import: an activity check is now required for pools
  which were cleanly exported if the system and pool hostids don't
  match.

- mmp_inactive_import: an activity check is now required for any
  pool which wasn't cleanly exported, even if the system and pool
  hostids match.

- mmp_on_uberblocks: updated expected uberblocks to take in to account
  the value MMP_INTERVAL_DEFAULT is set too.

- mmp_reset_interval: reduce the number of iterations from 10 to 3.
  This is sufficient to verify functionality and significantly speeds
  up the test.

- mmp_on_uberblocks: adjust the thresholds and increase the runtime
  to avoid false positives observed in CI.

- Update tests to use 'zhack action idle' instead of ztest to improve
  the reliability of the tests.

- Add additional log_note messages to test cases which have multiple
  verification steps to make it clear which portion of a test failed
  when reviewing the logs.

- Replace default_setup/cleanup_noexit calls with 'zpool create' and
  'zpool destroy' calls to avoid additional unnecessary dataset
  creation work.

- Update activity/noactivity check helper functions to use the
  ZFS_LOAD_INFO_DEBUG information now available from 'zpool import'
  to determine if this activity check ran and why.  This is more
  reliable in the CI than measuring the runtime.

- Removed all mmp tests from the zts-report.py exceptions list.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:36:18 -08:00
Brian Behlendorf
d4c0e52188 zhack: add "action idle" subcommand
In order to reliably test the multihost protection we need two (or more)
systems attempting to import the pool at the same time.  Historically, we've
used ztest running in userspace to simulate an active pool and attempted to
import the pool with the kernel modules.  This works but ztest is a bit
unwieldy for this and if it crashes for unrelated reasons it can result
in false positives.

All we really need is the pool imported in userspace so the MMP thread is
active and writing out uberblocks.  We can extend zhack which already knows
how to import the pool read/write and add an option to leave the pool open
and idle.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:36:14 -08:00
Brian Behlendorf
731ff0a5ac zhack: add -G option to dump debug buffer
Add a -G option to zhack to dump the internal debug buffer on exit.
We were able to use the same code from zdb for this which was nice.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:36:10 -08:00
Brian Behlendorf
20176224ee mmp: claim sequence id before final import
As part of SPA_LOAD_IMPORT add an additional activity check to
detect simultaneous imports from different hosts.  This check is
only required when the timing is such that there's no activity
for the the read-only tryimport check to detect.  This extra
safety chceck operates as follows:

1. Repeats the following MMP check 10 times:
  a. Write out an MMP uberblock with the best txg and a random
     sequence id to all primary pool vdevs.
  b. Verify a minimum number of good writes such that even if
     the pool appears degraded on the remote host it will see
     at least one of the updated MMP uberblocks.
  c. Wait for the MMP interval this leaves a window for other
     racing hosts to make similar modifications which can be
     detected.
  d. Call vdev_uberblock_load() to determine the best uberblock
     to use, this should be the MMP uberblock just written.
  e. Verify the txg and random sequeunce number match the MMP
     uberblock written in 1a.

2. Restore the original MMP uberblocks.  This allows the check
   to be performed again if the pool fails to import for an
   unrelated reason.

This change also includes some refactoring and minor improvements.

- Never try loading earlier txgs during import when the import
  fails with EREMOTEIO or EINTER.  These errors don't indicate
  the txg is damaged but instead that its either in use on a
  remote host or the import was interactively cancelled.  No
  rewind is also performed for EBADD which can result from a
  stale trusted config when doing a verbatim import.

- Refactor the code for consistent logging of the multihost
  activity check using spa_load_note() and console messages
  indicating when the activity check was trigger and the result.

- Added MMP_*_MASK and MMP_SEQ_CLEAR() macros to allow easier
  modification of the sequence number in an uberblock.

- Added ZFS_LOAD_INFO_DEBUG environment variable which can be
  set to log to dump to stdout the spa_load_info nvlist returned
  during import.  This is used by the updated mmp test cases
  to determine if an activity check was run and its result.

- Standardize the mmp messages similarly to make it easier to
  find all the relevent mmp lines in the debug log.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:36:01 -08:00
Brian Behlendorf
2f048ced4d mmp: add spa_load_name() for tryimport
Tryimport adds a unique prefix to the pool name to avoid name
collisions.  This makes it awkward to log user-friendly info
during a tryimport.  Add a spa_load_name() function which can
be used to report the unmodified pool name.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:35:03 -08:00
Brian Behlendorf
62a1bf7d19 mmp: move "Starting import" log message
Move the "Starting import" log message in to the import block so
it's matched with the "Fiinshed importing" debug message.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:34:57 -08:00
Brian Behlendorf
a9564b1787 mmp: further restrict mmp exported pool check
For a cleanly exported pools there exists a small window where
both systems may determine it's safe to import the pool and skip
the activity check.  Only allow the check to be skipped when the
last imported hostid matches the systems hostid and the pool was
cleanly exported.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-09 09:32:58 -08:00
Austin Wise
4f180e095a
Fix activating large_microzap on receive
This ensures that the in-memory state of the feature is recorded and
that `dsl_dataset_activate_feature` is not called when the feature
is already active.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Austin Wise <AustinWise@gmail.com>
Closes #18143
Closes #18144
2026-02-05 15:48:03 -08:00
George Shammas
20f94ef24a
pyzfs: remove unimplemented libzfs_core functions from pyzfs
As per #9008, pyzfs implements and documents several functions that
would be very useful, but then try to call c functions in libzfs_core.
These functions do not exist in libzfs_core, and in the ~7 years of
ticket creation still do not exist in libzfs_core.

It seems unlikely that these functions will get implemented, though 2
years ago, ~5 years after that ticket lzc_get_props was implemented in
23a489a411 which enabled get properties in
pyzfs. Sadly the first thing the  pyzfs function for lzc_get_props does
is call _list, which cals lzc_list, which is not implmented. And the
functions to set or inherit properties are still missing.

Having these functions in pyzfs are misleading, footguns, and time
wasters when evaluating pyzfs.

Removing these functions from pyzfs means that _if_ these functions are
added in libzfs_core, then pyzfs will also need to re-implement these
functions. It's a shame, because these py functions have good
documentation and tests. Funny enough the tests are auto skipped if it
detects that the functions don't exist in libzfs_core.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: George Shammas <george@shamm.as>
Closes #9008
Closes #18162
2026-02-05 15:34:55 -08:00
Alexander Motin
21bbe7cb67
Improve caching for dbuf prefetches
To avoid read errors with transaction open dmu_tx_check_ioerr()
is used to read everything required in advance.  But there seems
to be a chance for the buffer to evicted from dbuf cache in
between, which result in immediate eviction from ARC, which may
require additional disk read later in a place where error handling
is problematic.

To partially workaround this introduce a new flag DMU_IS_PREFETCH,
relayed to ARC as ARC_FLAG_PREFETCH | ARC_FLAG_PRESCIENT_PREFETCH,
making ARC delay eviction by at least several seconds, or till the
actual read inside the transaction, that will promote it to demand
access.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18160
2026-02-04 10:12:32 -08:00
Ameer Hamza
00d69b0f72 arc: remove unused l2df_size and l2df_type from l2arc_data_free_t
These fields became unused when ABD was introduced in a6255b7fc.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:07:26 -08:00
Ameer Hamza
6f17052743 cache_012_pos: disable compression to ensure L2ARC wrap
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:07:22 -08:00
Ameer Hamza
13552d754f ZTS: Add L2ARC DWPD and parallel writes tests
Add four new functional tests to validate L2ARC DWPD rate limiting and
parallel write features:

- l2arc_dwpd_ratelimit_pos: Verifies DWPD rate limiting with different
  values (0, 100, 1000, 10000) and ordering
- l2arc_dwpd_reimport_pos: Verifies DWPD rate limiting persists after
  pool export/import
- l2arc_multidev_scaling_pos: Verifies parallel write scaling ratio
  (dual devices achieve ~2× single device throughput)
- l2arc_multidev_throughput_pos: Verifies absolute parallel write
  throughput scales with device count (~32MB/s per device)

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:07:16 -08:00
Ameer Hamza
48d3f7fac9 man: Update L2ARC tunables for DWPD and parallel writes
Add l2arc_dwpd_limit, remove l2arc_write_boost, update related tunables.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:07:11 -08:00
Ameer Hamza
d1f290f1ea L2ARC: Implement DWPD-based rate limiting with adaptive feed intervals
Add DWPD (Drive Writes Per Day) rate limiting to control L2ARC write
speeds and protect SSD endurance. Write rate is constrained by the
minimum of l2arc_write_max and DWPD-calculated budget. Devices
accumulate unused write budget over 24-hour periods with automatic reset
and carry-over. Writes occur in controlled bursts (max 50MB) with
adaptive intervals to achieve target rates. Applies after initial device
fill.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:07:07 -08:00
Ameer Hamza
b525525b44 L2ARC: Implement per-device feed threads for parallel writes
Transform L2ARC from single global feed thread to per-device threads,
enabling parallel writes to multiple L2ARC devices. Each device runs
its own feed thread independently, improving multi-device throughput.
Previously, a single thread served all devices sequentially; now each
device writes concurrently. Threads are created during device addition
and torn down on removal.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:07:02 -08:00
Ameer Hamza
825dc41ad4 L2ARC: Preserve L2HDR in arc_release() for in-flight writes
When arc_release() is called on a header with a single buffer and
L2_WRITING set, the L2HDR must be preserved for ABD cleanup (similar
to the arc_hdr_destroy() case). If we destroy the L2HDR here, later
arc_write() will allocate a new ABD and call arc_hdr_free_abd(),
which needs b_l2hdr.b_dev to properly defer ABD cleanup, causing
VERIFY(HDR_HAS_L2HDR(hdr)) to fail.

Allocate a new header for the buffer in the single_buf_l2writing
case (single buffer + L2_WRITING), leaving the original header with
L2HDR intact. The original header becomes an "orphan" (no buffers, no
b_pabd) but retains device association for ABD cleanup when
l2arc_write_done() completes.

The shared buffer case (HDR_SHARED_DATA) is excluded because L2ARC
makes its own transformed copy via l2arc_apply_transforms(), so the
original ABD is not used by the L2 write. The header can be safely
reused without allocating a new one.

For proper evictable space accounting, arc_buf_remove() must be
called before remove_reference() in the single_buf_l2writing path.
This ensures arc_evictable_space_increment() (during remove_reference)
and arc_evictable_space_decrement() (during destruction) see the
same state (b_buf=NULL), preventing accounting leaks that cause
module unload to hang with non-zero esize.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:06:57 -08:00
Ameer Hamza
b8610c3d93 L2ARC: Reorder header destruction for in-flight L2 writes
With multiple L2ARC devices, headers can be destroyed asynchronously
(e.g., during zpool sync) while L2_WRITING is set. The original code
destroyed L2HDR before L1HDR, causing ABDs to lose their device
association (b_l2hdr.b_dev) when arc_hdr_free_abd() is called.

This caused ABDs to be added to the global free-on-write list without
device information. When any L2ARC device completed its write and
attempted to free these orphaned ABDs, it would panic on
ASSERT(!list_link_active(&abd->abd_gang_link)) because the ABD was
still part of another device's vdev_queue I/O aggregation gang.

Fix by extending l2ad_mtx lock scope to cover L1HDR destruction and
reordering to destroy L1HDR before L2HDR when L2_WRITING is set. This
ensures arc_hdr_free_abd() can access b_l2hdr.b_dev to properly tag
ABDs with their device for deferred cleanup.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:06:51 -08:00
Ameer Hamza
2f41b9d865 L2ARC: Implement persistent markers with consistent tail scanning
This commit introduces per-sublist persistent markers that eliminate
redundant tail scanning between L2ARC iterations, providing significant
CPU efficiency improvements. Markers are pre-allocated during device
initialization and properly cleaned up during device removal.

The implementation uses conditional behavior based on device capacity:
small devices (capacity < arc_c) retain original HEAD/TAIL scanning
based on ARC warmup state, while large devices (capacity >= arc_c)
use the persistent marker approach for optimal CPU efficiency.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:06:47 -08:00
Ameer Hamza
3523b5f3f9 L2ARC: Implement even-depth multi-sublist scanning
The introduction of ARC multilists made L2ARC writing quite random,
depending on whether it found something to write in a randomly selected
sublist. This created inconsistent write patterns and poor utilization
of available sublists leading to uneven cache population.

This commit replaces random selection with systematic scanning across
all sublists within each burst. Fair headroom distribution ensures
even-depth traversal across all sublists until the target write size
is reached. Round-robin processing with random starting points eliminates
sequential bias while maintaining predictable write behavior.

The systematic approach provides consistent L2ARC filling patterns
and better utilization of available ARC data across all sublists.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18093
2026-02-04 10:05:53 -08:00
Dennis Værum
07ae463d1a
Added support for multiple homes in pam_zfs_key module (#18084)
This implemented support for having multiple datasets unlocked and
mounted when a session is opened.
Example: `homes=rpool/home,tank/users`

Extra unit tests have been added

A man page documents have been added `man 8 pam_zfs_key`. A few
references to the new man page have also been added in other documents.

Signed-off-by: Dennis Vestergaard Værum <github@varum.dk>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2026-02-03 16:09:10 -08:00
Erik Larsson
7e33476a7c
Fix build for Linux 6.18 with PowerPC/RISC-V kernels. (#18145)
The macro 'flush_dcache_page(...)' modifies the page flags, but in Linux
6.18 the type of the page flags changed from 'unsigned long' to the
struct type 'memdesc_flags_t' with a single member 'f' which is the page
flags field.

Signed-off-by: Erik Larsson <catacombae@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2026-02-02 14:16:10 -08:00
John Cabaj
13601e2d24
Linux 6.19: handle --werror with CONFIG_OBJTOOL_WERROR=y
Linux upstream commit 56754f0f46f6: "objtool: Rename
--Werror to --werror" did just that, so we should check for
either "--Werror" or "--werror", else the build will fail

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Closes #18152
2026-02-02 10:19:18 -08:00
Tony Hutter
da9e8ff0df
CI: Fix qemu-1-setup failure, remove debug stuff
- For whatever reason, the runner will now startup with either two 75GB
  disks or one 150GB disk.  Previously the runner was always booting
  with two 75GB, but about a quarter of the time it now starts up
  with a single 150GB disk.  This caused qemu-1-setup.sh to fail
  since it expected the two 75GB disks.  This commit updates
  qemu-1-setup.sh to work with either disk config.

- Remove the watchdog from qemu-1-setup.sh.  It didn't turn out to be
  useful.

- Remove the timestamps that zfs-qemu.yml added to the qemu-1-setup.sh
  output.  The timestamps were redundant, since you can already
  download timestamped logs from the Github web interface.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18166
2026-01-31 12:40:55 -08:00
Brooks Davis
b364720524
nvpair: chase FreeBSD xdrproc_t definition
As of FreeBSD 16, xdrproc_t will take exactly two arguments in both
kernel and userspace in line with the Linux kernel.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Alan Somers <asomers@freebsd.org>
Signed-off-by:	Brooks Davis <brooks@capabilitieslimited.co.uk>
Closes #18154
2026-01-28 21:41:33 -05:00
Mariusz Zaborski
a157ef62a1
Make sure we can still write data to txg
The final txgs are used only to clear out any remaining deferred
frees, and we cannot write new data to them. Make sure we do not
try to do so.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Closes #18139
2026-01-26 21:33:21 -05:00
Alexander Motin
35b2d39709
Lock db_mtx around arc_release() in couple places
* Lock db_mtx around arc_release() in dbuf_release_bp()

While this function is called only in sync context, the same buffer
can be touched by dbuf_hold_impl() in open context, creating races.
All other accesses to arc_release() are already protected by db_mtx,
so just take it here too.

Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>

* Lock db_mtx in sa_byteswap()

While SA code seems protected by sa_lock, there is a back door of
dmu_objset_userquota_get_ids(), that may hold and access the dbuf
without sa_lock, relying only on db_mtx. Taking db_mtx here should
protect both the arc_release() and the data for db_buf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18146
2026-01-26 21:32:16 -05:00
Alek P
cd895f0e57
remove thread unsafe debug code causing FreeBSD double free panic
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #18140
2026-01-21 10:00:34 -08:00
Alexander Moch
28291536bc Zstd: Document update policy
Add the Zstd update policy to the subtree README.

Also update the documented location of zstd-in.c to match upstream
changes, and normalize naming from 'ZSTD' to 'Zstd'.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18089
2026-01-20 13:41:24 -08:00
Alexander Moch
2d5a9b6a4c Zstd: Restore SPDX license identifiers
When updating Zstandard to version 1.5.7 the SPDX license identifiers
were lost. This commit restores them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18089
2026-01-20 13:41:18 -08:00
Alexander Moch
e7f9734bc7 Zstd: Fix ASan poisoning for pooled Zstd contexts
The Zstd context mempool can reuse buffers that were previously poisoned
under AddressSanitizer, leading to false-positive use-after-poison reports
during zloop and other stress tests.

Explicitly unpoison memory when handing buffers out to Zstd and poison the
user-visible region again when buffers are returned to the pool. This makes
the allocator ASan-correct while preserving existing pooling behavior.

Also fix non-standard void * pointer arithmetic in zstd_free() and remove an
early return in zstd_dctx_alloc() so kmem_type/kmem_size are always set on
pool hits.

This only affects ASan bookkeeping in user space, does not change runtime
behavior in non-ASan configurations, and does not affect on-disk formats.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18089
2026-01-20 13:41:12 -08:00
Alexander Moch
a2ac9cd606 Zstd: Integrate v1.5.7 into the ZFS build system
This commit builds on the previous zstd library update and adds the
necessary ZFS integration and build system changes required to make
zstd 1.5.7 compile and function correctly.

Changes:
- Add zstd_preSplit.c (new in 1.5.7) to all build systems.
- Enable x86_64 assembly in userspace (huf_decompress_amd64.S).
- Disable assembly in kernel for RETHUNK/IBT compatibility.
- Disable intrinsics in kernel for EL10 x86_64-v3 baseline.
- Disable tracing in kernel builds for AArch64 compatibility.
- Fix ZSTD_isError symbol renaming with __asm__ directive.
- Rename abs64 to ZSTD_abs64 (FreeBSD kernel conflict).
- Fix bitstream.h attributes (MEM_STATIC -> FORCE_INLINE_TEMPLATE).
- Remove xxhash.c from BSD build (now header-only).
- Update symbol names in zstd_compat_wrapper.h.
- Ignore checkstyle for zstd-in.c.

Kernel assembly disabled for security mitigation compatibility. User
space retains full performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18089
2026-01-20 13:41:06 -08:00
Alexander Moch
bbcddb127a Zstd: Update bundled library to v1.5.7 without further adjustments
This commit only replaces the bundled source and does not include any
ZFS integration changes. Because the build depends on integration
adjustments, it will fail until the accompanying integration commit is
applied.

Upstream release: https://github.com/facebook/zstd/releases/tag/v1.5.7

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18089
2026-01-20 13:40:37 -08:00
Mark Johnston
54b141fab5
FreeBSD: Remove references to DEBUG_VFS_LOCKS
This option is removed upstream in favour of plain INVARIANTS.

VNASSERT is always defined so I see no reason to use it conditionally.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #18136
2026-01-19 08:55:17 -08:00
Martin Matuška
8605bdfdda
FreeBSD: unbreak compilation on i386
tests/zfs-tests/cmd/mmap_seek.c: use correct printf specifier
module/zfs/vdev.c: vdev_clear(): correctly cast argument to
atomic_add_64().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #18096
2026-01-14 17:02:41 -08:00
Alan Somers
3fffe4e707
Fix --enable-invariants on FreeBSD
The make symbols were never getting forwarded to the correct make
subprocess.  As far as I can tell, this has never worked.  Either that,
or something has changed in the behavior of make.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #18131
2026-01-14 14:54:12 -08:00
shuppy
09e4e01e93
Fix history logging for zpool create -t
`zpool create` is supposed to log the command to the new pool’s history,
as a special record that never gets evicted from the ring buffer. but
when you create a pool with `zpool create -t`, no such record is ever
logged (#18102). that bug may be the cause of issues like #16408.

`zpool create -t` (83e9986f6e) and `zpool
import -t` (26b42f3f9d) are both designed
to override the on-disk zpool property `name` with an in-core
“temporary” name, but they work somewhat differently under the hood.

importing with a temporary name sets `spa->spa_import_flags |=
ZFS_IMPORT_TEMP_NAME` in ZFS_IOC_POOL_IMPORT, which tells
spa_write_cachefile() and spa_config_generate() to use the
ZPOOL_CONFIG_POOL_NAME in `spa->spa_config` instead of `spa->spa_name`.

creating with a temporary name permanently(!) sets the internal zpool
property `tname` (ZPOOL_PROP_TNAME) in the `zc->zc_nvlist_src` of
ZFS_IOC_POOL_CREATE, which tells zfs_ioc_pool_create()
(4ceb8dd6fd) and spa_create() to use that
name instead of `zc->zc_name`, then sets `spa->spa_import_flags |=
ZFS_IMPORT_TEMP_NAME` like an import.

but zfsdev_ioctl_common() fails to check for `tname` when saving the
pool name to `zfs_allow_log_key`, so when we call ZFS_IOC_LOG_HISTORY,
we call spa_open() on the wrong pool name and get ENOENT, so the logging
silently fails.

this patch fixes #18102 by checking for `tname` in zfsdev_ioctl_common()
like we do in zfs_ioc_pool_create().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: delan azabani <dazabani@igalia.com>
Closes #18118  
Closes #18102
2026-01-14 14:51:51 -08:00
Alexander Motin
765929cb4e
DDT: Add locking for table ZAP destruction
Similar to BRT, DDT ZAP can be destroyed by sync context when it
becomes empty.  Respectively similar to BRT introduce RW-lock to
protect open context methods from the destruction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18115
2026-01-13 15:07:15 -08:00
Rob Norris
1051c3d211 spdxcheck: enforce SPDX license tags on build system files
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18077
2026-01-08 15:08:32 -08:00
Rob Norris
85391ee931 build: add SPDX license tags to build system files
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18077
2026-01-08 15:08:03 -08:00
Andrew Walker
aca58dbb65
Add fh_to_parent export definition
This commit adds support for converting a file handle to its
parent dentry. This is called in exportfs_decode_fh_raw()
when subtree checking is enabled in NFS. Defining this and
handling the expanded filehandles allows the knfsd to succeed
in handling the file handle where it might otherwise fail
with ESTALE when trying to open by filehandle.

A side effect of this change is that name_to_handle_at(2)
and open_by_handle_at(2) now support AT_HANDLE_CONNECTABLE.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Andrew Walker <andrew.walker@truenas.com>
Closes #18099
2026-01-08 15:06:12 -08:00
Rob Norris
f2b4ed3fe5 spl: remove a _KERNEL check
This code is only compiled for the Linux kernel module, so that define
is always set.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18117
2026-01-08 10:33:44 -08:00
Rob Norris
02a631139f spl: unexport kstat_proc_entry functions
These are used to implement the kstat and procfs_list interfaces, and
aren't used from outside. There's no need to export them.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18117
2026-01-08 10:33:37 -08:00
Rob Norris
662f33f323 spl: lift 64-bit math compat out to separate file
It's a lot of rarely-compiled code, so move it to the side to make other
code easier to read.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18117
2026-01-08 10:33:32 -08:00
Rob Norris
2ca6e880da spl: remove old atomic lock
Long ago, SPL atomics were implemented as a global spinlock over
conventional operations. In 5e9b5d832b (2009-10) they was converted to
proper atomics, with the spinlock retained as a fallback.

The switch to compile with the fallback was later removed in a91258913f
(2018-05), but the code it enabled wasn't. So lets do that.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18117
2026-01-08 10:33:14 -08:00
Dimitry Andric
2f1f25217f
icp: emit .note.GNU-stack section for all ELF targets
On FreeBSD, linking the zfs kernel module with binutils ld 2.44 shows
the following warning:

    ld: warning: aesni-gcm-avx2-vaes.o: missing .note.GNU-stack section
    implies executable stack
    ld: NOTE: This behaviour is deprecated and will be removed in a
    future version of the linker

Some of the `.S` files under `module/icp/asm-x86_64/modes` check whether
to emit the `.note.GNU-stack` section using:

    #if defined(__linux__) && defined(__ELF__)

We could add `&& defined(__FreeBSD__)` to the test, but since all other
`.S` files in the OpenZFS tree use:

    #ifdef __ELF__

it would seem more logical to use that instead. Any recent ELF platform
should support these note sections by now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #18119
2026-01-08 09:21:12 -08:00
Austin Wise
794f1587db
When receiving a stream with the large block flag, activate feature
ZFS send streams include a feature flag DMU_BACKUP_FEATURE_LARGE_BLOCKS
to indicate the presence of large blocks in the dataset. On the sending
side, this flag is included if the `-L` flag is passed to `zfs send`
and the feature is active in the dataset. On the receive side, the
stream is refused if the feature is active in the destination dataset
but the stream does not include the feature flag.

The problem is the feature is only activated when a large block is
born. If a large block has been born in the destination, but never
the source, the send can't work. This can arise when sending streams
back and forth between two datasets.

This commit fixes the problem by always activating the large blocks
feature when receiving a stream with the large block feature flag.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Austin Wise <AustinWise@gmail.com>
Closes #18105
2026-01-07 16:47:12 -08:00
Jitendra Patidar
2301755dfb
Fix zfs_open() to skip zil_async_to_sync() for the snapshot
Fix zfs_open() to skip zil_async_to_sync() for the snapshot, as it won't
have any transactions. zfsvfs->z_log is NULL for the snapshot.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #18091
2026-01-06 10:58:56 -08:00
Wolfgang Hoschek
c77f17b750
Add snapshots_changed_nsecs dataset property
Add a read-only dataset property, snapshots_changed_nsecs, which 
exposes the nanosecond resolution version of snapshots_changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Wolfgang Hoschek <wolfgang.hoschek@mac.com>
Closes #17998
Closes #18031
2026-01-06 09:36:20 -08:00
shuppy
6eef5cdc94
ZTS: add regression test for #17180
In #17180, we fixed an interesting bug that i believe i hit in one of my
pools, but as far as i can tell, there was no test for it.

this patch adds a regression test for #17180, minimised from my attempts
to reproduce the bug in a way that resembled the history of my pool.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: delan azabani <dazabani@igalia.com>
Closes #18109
2026-01-06 09:33:03 -08:00
Dimitry Andric
2dbd6af5e4
Rename several printf attributes declarations to __printf__
For kernel builds on FreeBSD, we redefine `__printf__` to
`__freebsd_kprintf__`, to support FreeBSD kernel printf(9) extensions
with clang.

In OpenZFS various printf related functions are declared with
`__attribute__((format(printf, X, Y)))`, so these won't work with the
above redefinition. With clang 21 and higher, this leads to errors
similar to:

    sys/contrib/openzfs/module/zfs/spa_misc.c:414:38: error: passing
    'printf' format string where 'freebsd_kprintf' format string is
    expected [-Werror,-Wformat]
      414 |         (void) vsnprintf(buf, sizeof (buf), fmt, adx);
          |                                             ^

Since attribute names can always be spelled with leading and trailing
double underscores, rename these instances.

Note that in the FreeBSD base system we usually use `__printflike` from
`<sys/cdefs.h>`, but that does not apply to OpenZFS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #18095
2026-01-05 14:15:22 -08:00
Andrew Walker
312bdab0f5
Add handling for STATX_CHANGE_COOKIE
This commit adds handling for the STATX_CHANGE_COOKIE so that
we can properly surface the ZFS znode sequence to NFS clients via
knfsd.

If knfsd does not have STATX_CHANGE_COOKIE in statx result then
it will synthesize the NFS change_info4 structure and related
change4id values algorithmically based on the ctime value of the
file. Since internally ZFS is using ktime_get_coarse_real_ts64()
for the timestamp calculation here it introduces the possiblity
that the change will not increment the change4id of directories
/ files causing a failure in the client to invalidate its attr
cache (among other things). See RFC 8881 Section 10.8 for
discussion of how clients may implement name and directory
caching.

Notable in this commit is that we are not initializing the
inode->i_version to the znode->z_seq number. The reason for this
is that we're intentionally not setting `SB_I_VERSION`. This
indicates that the filesystem manages its own i_version and
so it is not populated in the generic_fillattr.

The following compares tight loop of setattr over NFSv4
protocol while traching nfsd4_change_attribute.

Before change:
inode, change_attribute
4723, 7590032215978780890
4723, 7590032215978780890
4723, 7590032215978780890
4723, 7590032215982780865
4723, 7590032215982780865

After change:
inode, change_attribute
7602, 7590032992517123951
7602, 7590032992517123952
7602, 7590032992517123953
7602, 7590032992517123954
7602, 7590032992517123955

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Andrew Walker <andrew.walker@truenas.com>
Closes #18097
2026-01-05 14:06:28 -08:00
Rob Norris
a1319bf654
kmem: don't add __GFP_RECLAIMABLE for KM_VMEM allocations
vmalloc()'d memory is not movable/reclaimable, so __GFP_RECLAIMABLE is
not a valid flag, and since 6.19 the kernel warns if you use it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18107
2026-01-05 13:35:13 -08:00
Ivan Shapovalov
dbb3f247ed
cmd/zfs: clone: accept -u to not mount newly created datasets
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18080
2026-01-05 12:21:56 -05:00
Alexander Moch
b9b84445ea
CI: Add Alpine Linux 3.23 runner to the pipeline (#18087)
Add an Alpine Linux 3.23 runner to the CI chain to run OpenZFS builds
and tests against musl libc.

Currently, zfs_send_sparse is killed after 10 minutes on Alpine, causing
cascading EBUSY failures in the test suite. With zfs_send_sparse
disabled, the ZFS test suite reaches a pass rate of 94.62%.

This commit introduces the required Alpine-specific setup and a small
set of shell and cloud-init compatibility fixes that also apply to
existing Linux runners.

The Alpine runner is not enabled by default and is not executed for new
pull requests.

Sponsored-by: ERNW Research GmbH - https://ernw-research.de/

Signed-off-by: Alexander Moch <amoch@ernw.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2025-12-30 09:29:48 -08:00
Alexander Moch
e72f3054e3
cmd/ztest: avoid PATH_MAX stack allocation in ztest_get_zdb_bin() (#18085)
Calling realpath(path, buf) can trigger fortified header wrappers that
allocate a PATH_MAX-sized temporary buffer on the stack, exceeding the
4 KiB frame limit on some systems. Use the heap-allocating
realpath(path, NULL) form instead.

Sponsored-by: ERNW Research GmbH - https://ernw-research.de/

Signed-off-by: Alexander Moch <amoch@ernw.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-12-29 11:16:34 -08:00
Rob Norris
f041375b52 kmem: don't add __GFP_COMP for KM_VMEM allocations
It hasn't been necessary since Linux 3.13
(torvalds/linux@a57a49887e), and since 6.19 the kernel warns if you
use it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18053
2025-12-23 12:54:34 -08:00
Rob Norris
f95e306266 kmem: don't pass __GFP_HIGHMEM to __vmalloc
Since Linux 4.12 (torvalds/linux@19809c2da2) __GFP_HIGHMEM has been
automatically added to calls to __vmalloc() internally, so we don't need
it anymore. This is good, because since 6.19 the kernel warns if you use
__GFP_HIGHMEM.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18053
2025-12-23 12:54:11 -08:00
Rob Norris
3c8665cb5d Linux 6.19: replace i_state access with inode_state_read_once()
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18053
2025-12-23 12:53:32 -08:00
Ivan Shapovalov
3c4193333b zed.d, contrib: fix shellcheck errors in scripts
Not sure why this was not caught by CI; perhaps my shellcheck is new
enough to catch more things.

Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
2025-12-23 11:12:21 -08:00
Ivan Shapovalov
e28d980d68 man: cosmetic: fix typos; use consistent spelling for "non-existing"
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
2025-12-23 11:12:21 -08:00
Ivan Shapovalov
1e7280cece zfs_main: cosmetic: add missing flag to the comment for create
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
2025-12-23 11:12:21 -08:00
Ivan Shapovalov
9880ac3080 zvol: cosmetic: fix up volthreading property short name
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
2025-12-23 11:12:21 -08:00
Rob Norris
654e7628d6 u8_textprep: move into module/zfs
Now that it's built into the main zfs module in all cases, there's no
reason to put it in its own dir.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18071
2025-12-22 14:58:36 -08:00
Rob Norris
309006a0c6 libunicode: merge into libzpool
It's a single source file that is not used anywhere else, so there's no
reason to keep it separate.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18071
2025-12-22 14:58:20 -08:00
Tony Hutter
648a9a2938
CI: Test 2.4.x in qemu-test-repo-vm.sh, quick mode
The qemu-test-repo-vm.sh script tests installs ZFS from different
repos.  Have it test from the new 2.4.x repos as well.

Also add a checkbox to run in "lookup mode".  This just does a
quick lookup to see what version is installed in each repo.  It does
not do a test install and module load.  It only takes 3min to run vs
over an hour for the full version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18070
2025-12-19 19:57:19 -08:00
Rob Norris
0d44b58d7f
libshare: fold into libzfs and reorg headers a little
libzfs is the only user of libshare, and only internally, so there's no
particular reason to build it separately, nor to export its symbols. So,
pull it into libzfs proper, remove its "public" header, and hide its
symbols.

The bare minimum "public" API is just to count and enumerate the
supported share types. These are moved to libzfs.h with the other share
API.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18072
2025-12-19 19:52:33 -08:00
Alexander Motin
962e68865e
Use reduced precision for scan times
Scan time limits do not need precision beyond 1ms.  Switching
scn_sync_start_time and spa_sync_starttime from gethrtime() to
getlrtime() saves ~3% of CPU time during resilver scan stage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18061
2025-12-18 10:22:11 -08:00
Alexander Motin
a83bb15fcd
Reduce minimal scrub/resilver times
With higher throughput and lower latency of modern devices ZFS can
happily live with pretty short (fractions of a second) TXGs.  But
the two decade old multi-second minimal time limits can almost stop
payload writes by extending TXGs beyond dirty data limits of ARC
ability to amortize it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18060
2025-12-18 10:21:45 -08:00
Allan Jude
1d43387dd8
zdb: Add -O option for -r to specify object-id
"zdb -r -O pool/dataset obj-id destination" will copy
the file with object-id obj-id to the named destination;
without -O it'll still be interpreted as a pathname.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Sean Eric Fagan <sean.fagan@klarasystems.com>
Closes #16307
2025-12-18 09:25:09 -08:00
Mark Maybee
7ff329ac2e
Fix rangelock test for growing block size
If the file already has more than one block, then the current
block size cannot change. But if the file block size is less
than the maximum block size supported by the file system, and
there are multiple blocks in the file, the current code will
almost always extend the rangelock to its maximum size.
This means that all writes become serialized and even reads
are slowed as they will more often contend with writes. This
commit adjusts the test so that we will not lock the entire
range if there is more than one block in the file already.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mark Maybee <mark.maybee@perforce.com>
Closes #18046
Closes #18064
2025-12-18 09:23:38 -08:00
Alexander Motin
051a8c7494
Bypass snprintf() in quota checks if no quotas set
This improves synthetic 1 byte write speed by ~2.5%.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18063
2025-12-17 21:59:47 -05:00
Alexander Motin
0550abd4b8
RAIDZ: Remove some excessive logging
There were some per I/O logging into dbgmsg in RAIDZ code, that
increased CPU load and wiped useful content out of dbgmsg, for
example during routine disk replacement process.  I don't think
we need it to be that verbose.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18059
2025-12-17 14:00:01 -08:00
Turbo Fredriksson
0ba3403323 Change shellcheck and checkbashism triggers.
Newer versions of `shellcheck` and `checkbashism` finds more than
previous, so fix those.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
6c6a469bea Replace bashisms in ZFS shell function stub.
The `type` command is an optional feature in POSIX, so shouldn't be
used.

Instead, use `command -v`, which commit
  e865e7809e
did, but it missed this file.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
1842d6b3cb Make lines stay within 80 char limit.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
ead77e952e Add some comments to clarify the mounting of filesystems.
There's no real documenation (which should probably be written!),
so instead document the code the best we can on what's going and
with the mounting of file systems to make future updates easier.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
01cb64510d Standardise if/then/else and for/do/done lines.
More code standard changes, where if/then is on different lines.
To have it on the same, or on different lines, can be argued, but
we need to pick one, and try not to mix how to do things.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
29819a0177 Add missing initrd config variables.
The `ZFS_INITRD_ADDITIONAL_DATASETS` variable is used in the initrd
script to boot additional OS file systems besides the root file system.
But it wasn't included as an example in the config files.

The `ZFS_POOL_EXCEPTIONS` *was* included in the example defaults file,
but it was not exported, so not available in the initrd.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
4af8e28a59 Remove unnecessary sourcing of variables.
The file `/etc/default/zfs` is already sourced by the `/etc/zfs/zfs-functions`,
so no need to source it again.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
94975ff79b Fix issue with finding degraded pool(s).
When a pool is degraded, or needs special action, the `zpool import`
(without pool to import) line will report:
```
  pool: rpool
    id: 01234567890123456789
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
   [..]
```
If the import with the pool name fails, it is supposed to try importing
using the pool ID.

However, the script is also getting the `action` line (and probably `scrub:`
if/when that's available):
  pool; The pool can be imported using its name or numeric identifier.;config:;
which causes issues on consequent import attempts.

Cleanup the information by rewriting the `sed` command line.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
33dd57e1b4 Prefix all variables that are local with underscore.
This just to make them easier to see.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
d3b447de4e Shell script good practices changes.
It's considered good practice to:
1) Wrap the variable name in `{}`.
   As in `${variable}` instead of `$variable`.
2) Put variables in `"`.

Also some minor error message tuning.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Turbo Fredriksson
61ab032ae0 Fix potential global variable overwrite.
In a previous commit (e865e7809e), the
`local` keyword was removed in functions because of bashism.

Removing bashisms is correct, however this could cause variable overwrites,
since several functions use the same variable name.

So this commit make function variables unique in the (now) global name
space.

The problem from the original bug report (see #17963) could not be duplicated,
but it is still sane to make sure that variables stay unique.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Closes #18000
2025-12-16 09:15:51 -08:00
Tony Hutter
32faecb0c2
CI: Use Ubuntu mirrors instead of azure (#18057)
Use the official Ubuntu apt mirrors instead of
azure.archive.ubuntu.com, since that mirror can be slow:

    https://github.com/actions/runner-images/issues/7048

This can help speed up the 'Setup QEMU' stage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18057
2025-12-16 09:15:18 -08:00
Alan Somers
a69a90b49e
Remove the obsolete FreeBSD 14.2-RELEASE from CI
Sponsored by:	ConnectWise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #18013
2025-12-15 15:13:04 -08:00
Tony Hutter
842fb1c135
CI: Change timeout values
The 'Setup QEMU' CI step updates and installs all packages necessary to
startup QEMU.  Typically the step takes a little over a minute, but
we've seen cases where it can take legitimately take more than 45min
minutes.  Change the timeout to 60 minutes.

In addition, change the 'Install dependencies' timeout to 60min since
we've also seen timeouts there.

Lastly, remove all timeouts from the zfs-qemu-packages workflow.
We do this so that we can always build packages from a branch, even if
the time it takes to do a CI step changes over time.  It's ok to
eliminate the timeouts from the zfs-qemu-packages completely since that
workflow is only run manually.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18056
2025-12-15 14:58:01 -08:00
Alexander Motin
22e89aca88
DDT: Fix compressed entry buffer size
The first byte of the entry after compression is used for algorithm
and byte order flag.  We should decrement when calling compression/
decompression algorithm.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18055
2025-12-15 14:52:44 -08:00
Alexander Motin
3b1ff816bd
DDT: Add/use zap_lookup_length_uint64_by_dnode()
Unlike other ZAP consumers due to compression DDT does not know
how big entry it is reading from ZAP.  Due to this it called
zap_length_uint64_by_dnode() and zap_lookup_uint64_by_dnode(),
each of which does full ZAP entry lookup.

Introduction of the combined ZAP method dramatically reduces the
CPU overhead and locks contention at DBUF layer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18048
2025-12-15 14:38:34 -08:00
Alexander Motin
ff5414406f
DDT: Switch to using ZAP _by_dnode() interfaces
As was previously done for BRT, avoid holding/releasing DDT ZAP
dnodes for every access.  Instead hold the dnodes during all their
life time, never releasing.

While at this, add _by_dnode() interfaces for zap_length_uint64()
and zap_count(), actively used by DDT code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18047
2025-12-15 09:49:14 -08:00
Alexander Motin
46d6f1fe56
DDT: Move logs searches out of the lock
Postponing entry removal from the DDT log in case of hit till later
single-threaded sync stage allows to make ddl_tree stable during
multi-threaded ZIO processing stage.  It allows to drop the DDT lock
before the search instead of after, reducing the contention a lot.

Actually ddt_log_update_entry() was already handling the case of
entry present in the active log, so we only need to remove it from
flushing log, if the entry happen to be there.

My tests with parallel 4KB block writes show throughput increase
from 480MB/s (122K blocks/s) to 827MB/s (212K blocks/s), even
though still limited by the global DDT lock contention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18044
2025-12-15 09:17:04 -08:00
Alexander Motin
3d76ba2737
Improve async destroy processing timing
Previous code effectively enforced that all async free ZIOs were
_issued_ within the TXG timeout.  But they could take forever to
complete, especially if the required metadata were not in ARC.

This patch introduces periodic waits every 2000 ZIOs, which should
give at least somewhat reasonable TXG timings even for single HDD
pools with empty ARC.  And makes them complete within half of the
TXG timeout, since we might still need time to sync DDT and BRT.

While there, change zfs_max_async_dedup_frees semantics to include
also clone and gang blocks, which are similar.  Bump the default
value from set long ago to be more forgiving to block cloning
(still not having logs and benefiting from large TXGs), now that
we have better working time limits.  The limit now is a possible
amount of dirty data produced by BRT updates.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18043
2025-12-11 18:46:08 -08:00
Alexander Motin
f72fd378c8 Defer async destroys on pool import
We've observed a number of cases when pool import stuck for many
minutes due to large async destroy trying to load DDT or BRT from
HDD pool.  While proper destroy dosage is a separate problem,
lets give import process a chance to complete before that at all.
It may be not enough if there is a lot of ZIL to replay, but that
is harder to cover, since those are in separate syscalls.

Code investigation shown that we already have this mechanism used
for scrub/resilver, so this patch converts SCAN_IMPORT_WAIT_TXGS
into a tunable and applies it to async destroys also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18033
2025-12-11 18:44:46 -08:00
Alexander Motin
d976587a35 ZTS: Fix zvol_misc_fua SLOG writes check
Instead of comparing number of SLOG writes to number of normal
writes we should just make sure SLOG got the required number of
writes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18033
2025-12-11 18:43:59 -08:00
Alexander Motin
20f09eae42
ZIO: ZIO_STAGE_DDT_WRITE is a blocking stage
ddt_lookup() in zio_ddt_write() might require synchronous DDT ZAP
read.  Running it from interrupt taskq might lead to deadlock.
Inclusion of ZIO_STAGE_DDT_WRITE into ZIO_BLOCKING_STAGES should
hopefully fix that, even though I am not sure how I got there.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17981
2025-12-10 19:51:53 -05:00
Alexander Motin
d393166c54
ARC: Increase parallel eviction batching
Before parallel eviction implementation zfs_arc_evict_batch_limit
caused loop exits after evicting 10 headers.  The cost of it is not
big and well motivated.  Now though taskq task exit after the same
10 headers is much more expensive.  To cover the context switch
overhead of taskq introduce another level of batching, controlled
by zfs_arc_evict_batches_limit tunable, used only for parallel
eviction.

My tests including 36 parallel reads with 4KB recordsize that shown
1.4GB/s (~460K blocks/s) before with heavy arc_evict_lock contention,
now show 6.5GB/s (~1.6M blocks/s) without arc_evict_lock contention.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17970
2025-12-10 13:03:01 -08:00
Rob Norris
9fdb854109
Linux: work around use of GPL-only symbol kasan_flag_enabled
We may not be able to avoid our code referencing the symbol, but we can
ensure that a symbol of that name is available to the linker during
build, and so not require linking the GPL-exported version.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18009
Closes #18040
2025-12-10 10:04:57 -08:00
Chunwei Chen
0c194352b5
Fix ddtprune causing space leak
In zio_ddt_free, if a pruned dde is still in ddt, it would do nothing
and cause space leak.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #17982
Closes #17983
2025-12-10 10:02:14 -08:00
Alexander Moch
ff47dd35e2
Ensure 64-bit off_t is used in user space instead of loff_t
Use 64-bit POSIX off_t in user space instead of the Linux kernel type
loff_t. This is enforced at configure time via AC_SYS_LARGEFILE and
AC_CHECK_SIZEOF([off_t]). loff_t remains in shared headers where they
mirror Linux VFS interfaces, and on FreeBSD we typedef loff_t to off_t
in those headers since libc does not provide it.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18020
2025-12-10 09:45:39 -08:00
Ameer Hamza
48842c0a41
ZTS: Add test for snapshot automount race
Add snapshot_019_pos to verify parallel snapshot automount operations
don't cause AVL tree panic. Regression test for commit 4ce030e025.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18035
2025-12-10 09:16:45 -08:00
Brian Behlendorf
a3238a745e
Linux 6.18 compat: META (#18039)
Update the META file to reflect compatibility with the 6.18
kernel.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-12-10 09:04:24 -08:00
Tony Hutter
5d40e0ed70
CI: Fix Ubuntu 22.01 rsend failures
For whatever reason, the single `log_note` in the `directory_diff`
function causes the function to stop executing on Ubuntu 22.  This
causes most of the rsend tests to fail.  Remove the line since it's only
informational.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18032
2025-12-09 20:04:51 -08:00
Alex
104da9657a
Fix a declaration position of the nth_page.
Compilation time bug introduced by 87df5e4 commit.
Fix for the compilation error(Linux kernel 6.18.0):
"zfs/module/os/linux/zfs/abd_os.c:920:32: error: implicit declaration
of function ‘nth_page’; did you mean ‘pte_page’?
[-Werror=implicit-function-declaration]".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: agiUnderground <alex.dev.cv@gmail.com>
Closes #18034
2025-12-09 15:45:51 -08:00
Alexander Motin
a62c62120e
ARC: Pre-convert zfs_arc_min_prefetch_ms
There is no need to do MSEC_TO_TICK() for each evicted ARC header.
We can do it when tunables are set, since we already have separate
internal variables for those.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17965
2025-12-09 12:07:10 -08:00
Alexander Motin
09492e0f21
Reduce dataset buffers re-dirtying
For each block written or freed ZFS dirties ds_dbuf of the dataset.
While dbuf_dirty() has a fast path for already dirty dbufs, it still
require taking the lock and doing some things visible in profiler.

Investigation shown ds_dbuf dirtying by dsl_dataset_block_born()
and some of dsl_dataset_block_kill() are just not needed, since
by the time they are called in sync context the ds_dbuf is already
dirtied by dsl_dataset_sync().

Tests show this reducing large file deletion time by ~3% by saving
CPU time of single-threaded part of the sync thread.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18028
2025-12-09 09:18:09 -08:00
Brian Behlendorf
574d5f3313 CI: exclude signed-off-by/reviewed-by from 72 char limit
Allow an author or reviewer's name and email address to exceed
the 72 character limit enforced by the commitcheck target.

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18030
2025-12-09 09:12:32 -08:00
bspengler-oss
060bc8b70d Fix HIGHMEM/kmap API violation in zfs_uiomove_bvec_impl()
Fix another instance where ZFS assumes multiple pages can be
mapped at once via zfs_kmap_local(), resulting in crashes and
potential memory corruption on HIGHMEM-enabled (typically 32-bit)
systems.

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
Closes #15668
Closes #18030
2025-12-09 09:12:24 -08:00
bspengler-oss
2cab0554c0 Preserve LIFO ordering of kmap ops in abd_raidz_gen_iterate()
ZFS typically preserves proper LIFO ordering regarding map/unmap
operations that wrap the Linux kernel's kmap interfaces that
require such ordering, but one instance in abd_raidz_gen_iterate()
did not.

Similar issues have been fixed in the Linux kernel in the past,
see for instance CVE-2025-39899 for userfaultfd.

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
Closes #15668
Closes #18030
2025-12-09 09:12:16 -08:00
bspengler-oss
87df5e4872 Fix interaction of abd_iter_map()/abd_iter_unmap() with HIGHMEM
HIGHMEM kmap interfaces operate on only a single page at a time
yet ZFS hadn't accounted for this, resulting in crashes and
potential memory corruption on HIGHMEM (typically 32-bit) systems.
This was caught by PaX's KERNSEAL feature as it makes use of
HIGHMEM functionality on x64.

On typical 64-bit systems, this issue wouldn't have been observed,
as the map interfaces simply fall back to returning an address in
lowmem where the contiguous pages can be accessed directly.

Joint work with the PaX Team, tested by Mark van Dijk

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
Closes #15668
Closes #18030
2025-12-09 09:10:32 -08:00
Ameer Hamza
4ce030e025
Fix snapshot automount race causing duplicate mounts and AVL tree panic
Multiple threads racing to automount the same snapshot can both spawn
mount helper processes that successfully complete, causing both parent
threads to attempt AVL tree registration and triggering a VERIFY()
panic in avl_add(). This occurs because the fsconfig/fsmount API lacks
the serialization provided by traditional mount() via lock_mount().

The fix adds a per-entry mutex (se_mtx) to zfs_snapentry_t that
serializes mount and unmount operations on the same snapshot. The first
mount thread creates a pending entry with se_spa=NULL and holds se_mtx
during the helper execution. Concurrent mounts find the pending entry
and return success without spawning duplicate helpers. Unmount waits on
se_mtx if a mount is pending, ensuring proper serialization. This allows
different snapshots to mount in parallel while preventing the AVL panic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17943
2025-12-08 13:49:11 -08:00
Mark Johnston
86b064469d
FreeBSD: Fix a potential null dereference in zfs_freebsd_fsync()
In general it's possible for a vnode to not have an associated VM
object.  This happens in particular with named pipes, which have
some distinct VOPs, defined in zfs_fifoops.  Thus, this chunk of
zfs_freebsd_fsync() needs to check for the FIFO case, like other
vm_object_mightbedirty() callers do.

(Note that vn_flush_cached_data() calls are predicated on
zn_has_cached_data() returning true, and it checks for a NULL v_object
pointer already.)

Fixes: ef4058fcdc
Reported-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Sean Eric Fagan <sef@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #18015
2025-12-08 13:46:30 -08:00
Alan Somers
89f729dcca
During CI, use nproc instead of sysctl -n hw.ncpu
The latter may give the wrong result if cpusets are in use.

Sponsored by:	ConnectWise
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #18012
2025-12-04 16:57:15 -08:00
Brian Behlendorf
dfb0875200
ZTS: Add slow_vdev_degraded_sit_out retry
While not common the draid3 vdev type has been observed to
not always sit out a vdev when run in the CI.  To prevent
continued false positives allow the test to be retried up
to three times before considering it a failure.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18003
2025-12-04 09:10:22 -08:00
Alexander Moch
05e2747bf2
Provide loff_t via <fcntl.h> on musl-based Linux systems
Musl exposes loff_t only as a macro in <fcntl.h> when _GNU_SOURCE is
defined. Including <fcntl.h> ensures the type is available, and a
fallback typedef is provided when no macro is defined. This fixes build
failures on musl systems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Closes #18002
2025-12-02 12:14:09 -08:00
Alexander Motin
ffaea08319
FreeBSD: Remove HAVE_INLINE_FLSL use
These macros are deprecated in FreeBSD kernel for several years,
and unneeded for much longer.  Instead, similar to Linux, let
kernel let compiler do the right things.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18004
2025-12-02 12:13:16 -08:00
Ameer Hamza
88d012a1d6
Fix snapshot automount expiry cancellation deadlock
A deadlock occurs when snapshot expiry tasks are cancelled while holding
locks. The snapshot expiry task (snapentry_expire) spawns an umount
process and waits for it to complete. Concurrently, ARC memory pressure
triggers arc_prune which calls zfs_exit_fs(), attempting to cancel the
expiry task while holding locks. The umount process spawned by the
expiry task blocks trying to acquire locks held by arc_prune, which is
blocked waiting for the expiry task to complete. This creates a circular
dependency: expiry task waits for umount, umount waits for arc_prune,
arc_prune waits for expiry task.

Fix by adding non-blocking cancellation support to taskq_cancel_id().
The zfs_exit_fs() path calls zfsctl_snapshot_unmount_delay() to
reschedule the unmount, which needs to cancel any existing expiry task.
It now uses non-blocking cancellation to avoid waiting while holding
locks, breaking the deadlock by returning immediately when the task is
already running.

The per-entry se_taskqid_lock has been removed, with all taskqid
operations now protected by the global zfs_snapshot_lock held as
WRITER. Additionally, an se_in_umount flag prevents recursive waits when
zfsctl_destroy() is called during unmount. The taskqid is now only
cleared by the caller on successful cancellation; running tasks clear
their own taskqid upon completion.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17941
2025-12-01 14:43:42 -08:00
Alexander Motin
4754ac8529 raidz_test: Restore rand_data protection
It feels dirty to modify protection of a memory allocated via libc,
but at least we should try to restore it before freeing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-01 14:34:52 -08:00
Alexander Motin
338d432b42 raidz_test: Fix ZIO ABDs initialization
- When filling ABDs of several segments, consider offset.
 - "Corrupt" ABDs with actually different data to fail something.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-01 14:34:48 -08:00
Alexander Motin
95b2eb50f2 raidz_test: Set io_offset reasonably
- io_offset of 1 makes no sense.  Set default to 0.
 - Initialize io_offset in all cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-01 14:34:43 -08:00
Alexander Motin
3647fa3902 ZFS: Enable more logs for raidz_001_neg
The output is not so big here, so lets collect something useful.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-01 14:34:19 -08:00
Alexander Motin
928eccc5bc
DDT: Reduce global DDT lock scope during writes
Before this change DDT lock was taken 4 times per written block,
and as effectively a pool-wide lock it can be highly congested.
This change introduces a new per-entry dde_io_lock, protecting some
fields during I/O ready and done stages, so that we don't need the
global lock there.

According to my write tests on 64-thread system with 4KB blocks this
significantly reduce the global lock contention, reducing CPU usage
from 100% to expected ~80%, and increasing write throughput by 10%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17960
2025-12-01 10:44:10 -08:00
Alexander Motin
a5b665df39
DDT: Switch to using wmsums for lookup stats
ddt_lookup() is a very busy code under a highly congested global
lock.  Anything we can save here is very important.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17980
2025-12-01 10:36:31 -08:00
Alexander Motin
48f33c1ef2
DDT: Make children writes inherit allocator
Even though unlike gang children it is not so critical for dedup
children to inherit parent's allocator, there is still no reason
for them to have allocation policy different from normal writes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17961
2025-12-01 10:30:27 -08:00
Tony Hutter
9a453b2050
CI: zfs-test-packages: Add in new repos
Test install from our new repos: zfs-latest, zfs-legacy,
zfs-2.3, zfs-2.2, from the zfs-test-packages workflow.
This on-demand workflow is use to verify that the zfs RPMs
in the repos are correct.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17956
2025-12-01 10:24:33 -08:00
Rob Norris
bfd137d92b config/kmap_atomic: initialise test data
6.18 changes kmap_atomic() to take a const pointer. This is no problem
for the places we use it, but Clang fails the test due to a warning
about being unable to guarantee that uninitialised data will definitely
not change. Easily solved by forcibly initialising it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17954
2025-12-01 10:19:46 -08:00
Rob Norris
b7e00c7397 zvol_id: make array length properly known at compile time
Using strlen() in an static array declaration is a GCC extension. Clang
calls it "gnu-folding-constant" and warns about it, which breaks the
build. If it were widespread we could just turn off the warning, but
since there's only one case, lets just change the array to an explicit
size.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17954
2025-12-01 10:19:40 -08:00
Rob Norris
c631f5e6c2 Linux: bump -std to gnu11
Linux switched from -std=gnu89 to -std=gnu11 in 5.18
(torvalds/linux@e8c07082a8). We've always overridden that with gnu99
because we use some newer features.

More recent kernels are using C11 features in headers that we include.
GCC generally doesn't seem to care, but more recent versions of Clang
seem to be enforcing our gnu99 override more strictly, which breaks the
build in some configurations.

Just bumping our "override" to match the kernel seems to be the easiest
workaround. It's an effective no-op since 5.18, while still allowing us
to build on older kernels.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17954
2025-12-01 10:19:11 -08:00
Alexx Saver
39303febac
chksum: run 256K benchmark on demand, preserve chksum_stat_data
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexx Saver <lzsaver.eth@ethermail.io>
Co-authored-by: Adam Moss <c@yotes.com>
Closes #17945
Closes #17946
2025-12-01 10:14:52 -08:00
Alexander Motin
7f7d4934cb
FreeBSD: Fix uninitialized variable error
On FreeBSD errno is defined as (* __error()), which means compiler
can't say whether two consecutive reads will return the same.
And without this knowledge the reported error is formally right.

Caching of the errno in local variable fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17975
2025-11-25 05:16:35 -05:00
Rob Norris
e37937f42d
ztest: fix broken random call
Bad copypasta in 4d451bae8a, leading to random stuff being blasted all
over stack, destroying the program.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Sean Eric Fagan <sean.fagan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17957
2025-11-24 12:43:15 -05:00
Shreshth3
1f3444f2bb
zpool: fix special vdev -v -o conflict
Right now, running `zpool list` with -v and -o passed
does not work properly for special vdevs. This commit
fixes that problem.

See the discussion on #17839.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Closes #17932
2025-11-19 08:30:20 -08:00
Ameer Hamza
36e4f18883
Fix taskq NULL pointer dereference on timer race
Remove unsafe timer_pending() check in taskq_cancel_id() that created a
race where:
- Timer expires and timer_pending() returns FALSE
- task_done() frees task with tqent_func = NULL
- Timer callback executes and queues freed task
- Worker thread crashes executing NULL function

Always call timer_delete_sync() unconditionally to ensure timer callback
completes before task is freed.

Reliably reproducible by injecting mdelay(10) after setting CANCEL flag
to widen the race window, combined with frequent task cancellations
(e.g., snapshot automount expiry).

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17942
2025-11-19 08:21:10 -08:00
Rob Norris
71609a9264 zfs: replace tpool with taskq
They're basically the same thing; lets just carry one.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17948
2025-11-19 08:16:51 -08:00
Rob Norris
be7d8eaf54 taskq: initialize tsd on first use
Doing it this way means that callers don't have to call
system_taskq_init() and also get the system and system_delay taskqs that
they possibly don't even want.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17948
2025-11-19 08:16:11 -08:00
Brian Behlendorf
06c73cffab
CI: Add smatch static analysis workflow
Smatch is an actively maintained kernel-aware static analyzer
for C with a low false positive rate.  Since the code checker
can be run relatively quickly against the entire OpenZFS code
base (15 min) it makes sense to add it as a GitHub Actions
workflow.  Today smatch reports a significant numbers warnings
so the workflow is configured to always pass as long as the
analysis was run.  The results are available for reference.
Long term it would ideal to resolve all of the errors/warnings
at which point the workflow can be updated to fail when new
problems are detected.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Toomas Soome <tsoome@me.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17935
2025-11-17 12:33:40 -08:00
Rob Norris
74b50a71a0 libuutil: remove packaging
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17934
2025-11-17 06:23:17 -08:00
Rob Norris
adb316f411 libuutil: remove the whole thing
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17934
2025-11-17 06:23:05 -08:00
Rob Norris
871fa61d26 zfs: replace uu_list with sys/list
Lets just use the list implementation we use everywhere else.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17934
2025-11-17 06:22:48 -08:00
Rob Norris
b593748287 zfs: replace uu_avl with sys/avl
Lets just use the AVL implementation we use everywhere else.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17934
2025-11-17 06:21:26 -08:00
Toomas Soome
e63d026b91
cmd/zpool cstyle issues
add missing headers.
usage() is no-return, so anything after call to it is unreachable code.
use (void) cast where we do ignore return value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #17885
2025-11-14 15:58:50 -08:00
Brian Behlendorf
6015edb374 lib: update ABI meta following libspl changes
In theory they should not have resulted in a change. In practice, the
way visibility is set up currently means that many of our convenience
libraries will "leak through" into the available symbols in our public
libraries.

In this commit, we're seeing all the new symbols in libspl through
libuutil, libzfs and libzfs_core. Importantly, none have been removed,
so consumers of these libraries will not notice.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17861
2025-11-12 10:25:14 -08:00
Rob Norris
23d17f3587 libspl/random: add switch to force pseudo-random numbers for all calls
ztest wants to force all kernel random calls to use the pseudo-random
generator (/dev/urandom), to avoid depleting the system entropy pool
just for testing.

Up until the previous commit, it did this by switching the path that the
libzpool (now libspl) random API would use to get random data from; that
is, it took advantage of an implementation detail.

Now that that hole is closed to it, we need another method. This commit
introduces that; a simple API call to enable/disable "force pseudo"
mode.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:04:30 -08:00
Rob Norris
4d451bae8a libspl: hide global data objects
Currently libspl is a static archive that is linked into multiple shared
objects, which then re-export its symbols. We intend to fix this soon.

For the moment though, most programs shipped with OpenZFS depend on two
or more of these shared objects, and see the same symbols twice. For
functions this is not a problem, as they do not have any mutable state
and so the linker can simply select the first one and use that for all.

For global data objects however, each shared object will have direct
(non-relocatable) references to its own instance of the symbol, such
that changes on one will not necessarily be seen by the other. While
this shouldn't be a problem in practice as these reexported interfaces
are not supposed to be used, they are technically undefined behaviour in
C (C17 6.9.2) and are reported by ASAN as a violation of C++'s "One
Definition Rule".

To fix this, we hide these globals inside their compilation units, and
add access functions and macros as appropriate to preserve the existing
API (though not ABI).

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:04:22 -08:00
Rob Norris
e282e98e79 libzpool: add zfs_impl.c, remove from libicp
This isn't used by libicp directly, but is by some clients, and relies
on headers specific to the zfs module, which makes using it difficult
otherwise.

Also switch the checksum tests over to use libzpool, so they can get
access to it. That's not exactly what we want in the long term, but the
icp and zfs modules have a complicated relationship so this will do for
now.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:04:15 -08:00
Brian Behlendorf
677d6ed730 zfs_context: remove duplicate includes
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:04:03 -08:00
Brian Behlendorf
a49158c064 icp: remove global icp includes
Only include the required icp headers.  There's no need to
include sys/zfs_context.h and pull in all of the zfs headers.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:03:51 -08:00
Brian Behlendorf
913bdbf4d1 libzpool: remove global libzpool includes
Only include the zfs headers where they're currently required to
compile.  Unfortunately, including zfs_ioctl.h in user space pulls
in a bunch of internal zfs headers as a side effect.  We'll need
to move these structures in to a new shared header to avoid this.
We should not need to add the LIBZPOOL_CPPFLAGS when building the
zed, zinject, zpool, libzfs, ior libzfs_core.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:03:15 -08:00
Rob Norris
99d7453b43 libzpool: add BE_POSIX_VENDOR for userspace bootenv
This is mostly a placeholder; it's not actually clear if a boot
environment makes any sense for userspace. Still, "posix" is the likely
future name of libzpool as a port, and this define is mandatory, so lets
roll with it for now.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:03:07 -08:00
Brian Behlendorf
801d9b4f96 debug: move all of the debug bits out of the spl
Pull all of the internal debug infrastructure up in to the zfs
code to clean up the layering.  Remove all the dodgy usage of
SET_ERROR and DTRACE_PROBE from the spl.  Luckily it was
lightly used in the spl layer so we're not losing much.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:51 -08:00
Rob Norris
eceb5b32e9 libspl: move loff_t declaration from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:46 -08:00
Rob Norris
5305d0f8b9 zfs_context: move empty __init/__exit macros to sys/debug.h
These are kind-of compiler attribute placeholders, so go here with the
others for now.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:42 -08:00
Rob Norris
292438295d libspl: move compiler attribute macros from zfs_context.h
sys/debug.h is not really the right place for them, but we already have
some there for libspl, so it is at least convenient.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:35 -08:00
Rob Norris
a43edeefaf libzutil: move NN_NUMBUF_SZ from zfs_context.h nearer to nicenum()
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:29 -08:00
Rob Norris
b9d2e7782f libspl: common sysmacros.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:25 -08:00
Rob Norris
248c7ed0d2 libspl: move DTRACE_PROBE macros from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:20 -08:00
Rob Norris
03b2e5c40c libspl: move remaining ddi_* prototypes from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:12 -08:00
Rob Norris
559597b66c zfs_context: remove misc unused
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:08 -08:00
Rob Norris
ee0e86cfb5 libzpool: remove unused userspace ioctl policy functions
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:04 -08:00
Rob Norris
b5af61b569 libspl: move zone definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:02:00 -08:00
Rob Norris
70a1fadaf2 libspl: move SID implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:56 -08:00
Rob Norris
faa295b9a6 libspl: move SID definitions from zfs_context.h; remove kernel gate
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:48 -08:00
Rob Norris
2b4a0dd6c0 libspl: move callb stubs from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:44 -08:00
Rob Norris
9d609098cd libspl: move random impl from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:39 -08:00
Rob Norris
1911501c7d libspl: move random definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:32 -08:00
Rob Norris
55fb30ebe6 zfs_context: move vn_dumpdir to libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:28 -08:00
Rob Norris
daff6b7e35 libspl: move utsname() etc to sys/misc.h; initialise in libspl_init()
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:21 -08:00
Rob Norris
6cf6f091cf libspl: move physmem to sys/systm.h; initialise at libspl_init()
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:17 -08:00
Rob Norris
d02ea5170a libspl: init/fini
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:08 -08:00
Rob Norris
4e3b88927c libzpool: separate driver-side include
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:01:04 -08:00
Rob Norris
0c6be03fd7 zfs_context: remove duplicated access control stuff; remove kernel gate
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:52 -08:00
Rob Norris
335f46b219 libspl: move ptob() from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:46 -08:00
Rob Norris
bca4ca7949 libspl: add include guards for sys/string.h
The extra inclusion via xvattr.h appears to upset the linter in CI. I'm
not entirely sure what its complaint is, but removing sys/string.h
entirely is not quite possible yet, and include guards are rarely a bad
idea, so this will do.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:41 -08:00
Rob Norris
db1c58095e libspl: move vattr and xvattr definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:24 -08:00
Rob Norris
cf1044a15f libspl: move kmem implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:21 -08:00
Rob Norris
8b5d919d4e libspl: move kmem definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:17 -08:00
Rob Norris
ee89fefe4d libspl: move procfs_list implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:13 -08:00
Rob Norris
8700fc669b libspl: move procfs_list definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:10 -08:00
Rob Norris
586eba95de libspl: move kstat implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:06 -08:00
Rob Norris
ce7a894af1 libspl: move kstat definitions from zfs_context.h, slim down to basics
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 10:00:03 -08:00
Rob Norris
8c022088a7 libspl: move tsd definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:59 -08:00
Rob Norris
c0984c936f libspl: move cred implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:55 -08:00
Rob Norris
52cf8eac42 libspl: move cred definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:51 -08:00
Rob Norris
3823492ca1 libspl: move taskq implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:47 -08:00
Rob Norris
a2e10ebfd3 libspl: move taskq definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:43 -08:00
Rob Norris
0c60920d09 libspl: move thread implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:40 -08:00
Rob Norris
21ae59a53b libspl: move thread definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:14 -08:00
Rob Norris
7234d69748 libspl: move cmn_err definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:09 -08:00
Rob Norris
e7a856d954 libspl: move condvar implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:59:03 -08:00
Rob Norris
a9f3733376 libspl: move condvar definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:59 -08:00
Rob Norris
40ddba8256 libspl: move rwlock implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:55 -08:00
Rob Norris
c7eb0a7633 libspl: move rwlock definitions from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:50 -08:00
Rob Norris
3e37ea85af libspl: move mutex implementation from libzpool
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:44 -08:00
Rob Norris
cc119fbb48 libspl: move mutex headers from zfs_context.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:37 -08:00
Rob Norris
ba2ff4b42c libspl: move time definitions from zfs_context_os.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:31 -08:00
Rob Norris
37d5df62e0 libzpool: move ZFS-specific headers from libspl
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:27 -08:00
Rob Norris
f49b93e2c7 libzpool: move zfs_context_os.h from libspl
Keeping the spl/zfs module split, libzpool is the zfs module for
userspace. Headers and functions specific to it belong there.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:58:18 -08:00
Rob Norris
5588f189a7 libspl: single zfs_context_os.h
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17861
2025-11-12 09:53:34 -08:00
Brian Behlendorf
cb6b249f8c Update all ABI files
Refresh all ABI files using the CI generated files to reflect
the library interfaces to be published for the 2.4 release.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17911
2025-11-12 09:39:00 -08:00
Brian Behlendorf
64c4b6d17a libspl: hide zfs_tunable_* symbols
The zfs_tunable_* functions are a public interface which are
part of the internal libspl convenience library.  They should
be hidden to prevent an unnecessary ABI change in installed
libraries which link against libspl (e.g. libzfs_core, libuutil).

We do already leak long standing libspl symbols.  This commit is
solely intended to prevent leaking these new ones until this is
properly sorted out.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17911
2025-11-12 09:38:54 -08:00
Brian Behlendorf
e4fe41a79f Bump SONAME of libzfs and libzpool
The ABI of libzfs and libzpool have breaking changes since the
last major release.  Bump the SONAME for the upcoming 2.4 release
branch to libzfs7 and libzpool7.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17911
2025-11-12 09:38:48 -08:00
Brian Behlendorf
80cfdbb19f Bump SONAME on libnvpair
The nvlist_snprintf() function was added to the ABI of libnvpair.
No other symbols were modified or removed.  Bump the library-info
SONAME current and age args to reflect this is a minor library
version update.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17911
2025-11-12 09:38:20 -08:00
Adi-Goll
24aaf3a3f9
Reduce timeout to zero when running inside a container
Detect container environments and set timeout to zero unless
ZFS_MODULE_TIMEOUT is already set. This avoids an unnecessary ten
second delay after running zfs/zpool commands in a container where
/dev/zfs is unavailable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com>
Closes #15165
Closes #17922
2025-11-11 15:01:37 -08:00
Mariusz Zaborski
02fdd26e51
Add knob to disable slow io notifications
Introduce a new vdev property `VDEV_PROP_SLOW_IO_REPORTING` that
allows users to disable notifications for slow devices.
This prevents ZED and/or ZFSD from degrading the pool due to slow
I/O.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org>
Closes 17477
2025-11-11 10:42:17 -08:00
Alexander Motin
b4f073b5a6
Add BRT support to zpool prefetch command
Implement BRT (Block Reference Table) prefetch functionality similar
to existing DDT prefetch.  This allows preloading BRT metadata into
ARC to improve performance for block cloning operations and frees
of earlier cloned blocks.

Make -t parameter optional.  When omitted, prefetch all supported
metadata types (both DDT and BRT now).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17890
2025-11-10 16:16:22 -08:00
Alexander Motin
cc5cae5475
BRT: Increase block size from 4KB to 8KB
According to my observations, BRT ZAPs are typically compressible
3:1 for data and 2:1 for indirects.  With ashift=12, typical these
days, it means increasing the block sizes to 8KB we may get most
of possible compression, reducing on-disk and in-ARC BRT footprint
in half by the cost of some compression/decompression overhead,
but without real write inflation, only some dirty data increase.

Increase to 32KB similar to DDT could further increase compression
and storage efficiency, but at the cost of write inflation and
much bigger dirty data increase, which we can not properly control
now.  So lets leave this for a time when BRT log gets implemented.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17916
2025-11-10 15:44:46 -08:00
Alexander Motin
72b2a9571a
ZAP: Remove dmu_object_info_from_dnode() call
dmu_object_info_from_dnode() takes two locks and copies plenty of
data that we don't need in zap_lockdir_impl().  Just read dn_type
directly in this hot path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17921
2025-11-10 14:26:15 -08:00
Rob Norris
6e12f0bd77
spa_misc: add an API for spa_namespace_lock
This is useful as debugging support, as it lets namespace lock
operations be traced directly. It will also be useful for future work to
reduce the use of spa_namespace_lock, traditionally a source of
difficult deadlocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17906
2025-11-10 14:23:39 -08:00
Alexander Motin
8aaed7dc42
BRT: Fix ranges to blocks conversion math
BRT_RANGESIZE_TO_NBLOCKS() takes number of ranges as its argument.
To get number of blocks we should multiply it by the entry size,
not divide by it, as it was due to missing parentheses.

Before #17875 this could cause small memory corruptions for vdevs
bigger than 64TB, but the change made the bug more noticeable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17886
Closes #17915
2025-11-10 13:58:39 -08:00
Adi-Goll
57b1b99d31
Update man page description of zpool rewind
Update description of zpool import --rewind-to-checkpoint in
man/man7/zpoolconcepts.7 to explain that rewinding automatically
discards a checkpoint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com>
Closes #12646
Closes #17918
2025-11-10 10:13:13 -08:00
Alexander Motin
baefe098ee
ZIO: Set minimum number of free issue threads to 32
Free issue threads might block waiting for synchronous DDT, BRT or
GANG header reads. So unlike other taskqs using ZTI_SCALE to scale
with number of CPUs, here we also need some amount of threads to
potentially saturate pool reads.  I am not sure we always want the
96 threads we had before ZTI_SCALE introduction at #11966 on small
systems, but lets make it at least 32.

While here, make free taskqs configurable, similar to read and
write ones.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17903
2025-11-08 14:41:53 -05:00
rmacklem
e26b9fc871
FreeBSD: Add support for _PC_CASE_INSENSITIVE
FreeBSD now has a pathconf name called _PC_CASE_INSENSITIVE
used to check if a file system performs case insensitive
name lookups.

This patch adds support for this name.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #17908
2025-11-08 13:20:23 -05:00
Brian Behlendorf
962474d1a2
zstd: disable intrinsics
Disable the aarch64 NEON SIMD intrinsics for kernel builds.  Safely
using them in the kernel context requires saving/restoring the FPU
registers which is not currently done.

Additionally, remove the aarch64 optimized PREFETCH_L1 and PREFETCH_L2
instruction.  Rely on the more portable compiler built ins.

This lets us remove the problematic workaround in the aarch64_compat.h
header which undefines the __aarch64__ macro.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17904
Closes #17852
2025-11-07 10:01:12 -08:00
Adi-Goll
54876ee85e
Fix typo in vdev_raidz.c
Change the spelling of "begining" on line 4875 to
"beginning".

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com>
Closes #17905
2025-11-07 09:55:03 -08:00
Toomas Soome
fe553581f0
libzfs: ignoring unreachable code
We have infinite loop and on certain condition, we exit this loop
and thread with pthread_exit(). But also after this loop,
we have a code to perform pthread_cleanup_pop() and return from the
thread.

The  problem is that modern compilers are able to recognize that we
actually never get to the statements after loop and therefore
it is dead code there.

I think, instead of pthread_exit(), it is better to break out of loop
and let the last statements to work as intended. This is because
we do need to keep pthread_cleanup_pop() anyhow. Of course,
it is matter of taste if we want to use return or pthread_exit as very
last statement in this function.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #17900
2025-11-07 09:27:18 -05:00
Rob Norris
336c95372d
man: describe zfs-rewrite method and properties
We've heard anecdotes that suggest some
confusion/surprise/disappointment that a changed recordsize is not
applied during rewrite. Until such time as we actually can do that, we
can at least explicitly mention it at something that doesn't work.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17898
2025-11-07 09:01:59 -05:00
Alexander Ziaee
242fdb58e5
zfs-jail.8: Add introductory sentence, refactor
Add an introductory sentance explaining why the reader may want to use
this command, and establishing the requirement that the jail must be
running. Move other requirements from the description of the subcommands
to follow this for flow and structure. Move the caveat that this is for
FreeBSD down to a cannonical CAVEATS section, and crossreference Linux's
equivelant functionality. Mention that this utility can not be used to
delegate the root directory of the jail to that section also.

Reported by: Jan Brankamp <crest@rlwinm.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alexander Ziaee <ziaee@FreeBSD.org>
Closes #17883
2025-11-06 13:53:24 -08:00
Tony Hutter
f93506d1df
Linux 6.17 compat: Fix broken projectquota on 6.17
We need to specifically use the FX_XFLAG_* macros in zpl_ioctl_*attr()
codepaths, and the FS_*_FL macros in the zpl_ioctl_*flags() codepaths.
The earlier code just assumes the FS_*_FL macros for both codepaths.
The 6.17 kernel add a bitmask check in copy_fsxattr_from_user() that
exposed this error via failing 'projectquota' ZTS tests.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17884
Closes #17869
2025-11-05 16:22:03 -08:00
Paul Dagnelie
8c225ff1b4
Fix gang write late_arrival bug
When a write comes in via dmu_sync_late_arrival, its txg is equal to the
open TXG. If that write gangs, and we have not yet activated the new
gang header feature, and the gang header we pick can store a larger gang
header, we will try to schedule the upgrade for the open TXG + 1. In
debug mode, this causes an assertion to trip. This PR sets the TXG for
activating the feature to be the larger of either the current open TXG
or the syncing TXG + 1.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17824
2025-11-05 11:40:22 -08:00
Tino Reichardt
7e7c360256
CI: Update FreeBSD versions and ci-type handling
Update FreeBSD versions:
- add FreeBSD 15.0-STABLE
- add FreeBSD 16.0-CURRENT

So we use the latest versions of each line now:
  - Freebsd 14.3 (RELEASE)
  - FreeBSD 15.0 (STABLE)
  - FreeBSD 16.0 (CURRENT)

In commits - you may specify which type of CI should run:
- ZFS-CI-Type: quick
- ZFS-CI-Type: linux
- ZFS-CI-Type: freebsd
- ZFS-CI-Type: full

Reviewed-by: Alexx Saver <lzsaver@users.noreply.github.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17896
2025-11-05 09:56:17 -08:00
Toomas Soome
5d33801802
get_key_material_https: label 'kfdok' defined but not used
The label 'kfdok' is only used with O_TMPFILE, we need to use
the same #ifdef around this label.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #17894
2025-11-04 13:13:07 -08:00
Robert Evans
d0294aa758
Update dnode_next_offset_level to accept blkid instead of offset
Currently this function uses L0 offsets which:
1. is hard to read since it maps offsets to blkid and back each call
2. necessitates dnode_next_block to handle edge cases at limits
3. makes it hard to tell if the traversal can loop infinitely

Instead, update this and dnode_next_offset to work in (blkid, index).
This way the blkid manipulations are clear, and it's also clear that
the traversal always terminates since blkid goes one direction.

I've also considered updating dnode_next_offset to operate on blkid.
Callers use both patterns, so maybe another PR can split the cases?

While here tidy up dnode_next_offset_level comments.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #17792
2025-11-04 13:12:17 -08:00
jamisiveshkumar
a90f816be6
Fix capitalization typo in README.md
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Sivesh Kumar <siveshjami@gmail.com>
Closes #17889
2025-11-03 16:37:58 -08:00
Alexander Motin
6cfc3dba9c
Cleanup ZIO_FLAG_IO_RETRY vs TRYHARD usage
In cases where all issued ZIOs must succeed, and we can't do
anything clever about the errors, we should just explicitly set
ZIO_FLAG_TRYHARD and let OS to do all the reasonable retries.

In other cases, where retries can be different from the original,
for example, some ZIOs are allowed to fail due to redundancy, or
we can disable aggregation on retrial to get at least some of
the data, we can do first pass without TRYHARD, and only if needed
retry with ZIO_FLAG_IO_RETRY (which implies TRYHARD semantics).

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17877
2025-10-30 16:29:48 -07:00
Alexander Motin
ec268cdf97 Fix caching of DDT log and BRT
Both DDT log and BRT counters we read on pool import and then only
append or overwrite in full blocks.  We don't need them in DMU or
ARC caches.  Fortunately we have DMU_UNCACHEDIO for this now.

Even more we don't need BRT in non-evictable metadata DMU caches,
since it will likely never fit there, while block the cache from
its original users.  Since DMU_OT_IS_METADATA_CACHED() has no way
to differentiate the new metadata types, mark BRT with storage
type of DMU_OT_DDT_ZAP.  As side effect it will also put it on
dedup device, but that should actually be right.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17875
2025-10-30 16:28:28 -07:00
Alexander Motin
ea125eeb5d BRT: Round bv_entcount up to BRT_BLOCKSIZE
Since we set bv_mos_brtvdev block size, and since we keep dirty
bitmap at the same granularity, we should keep the allocations
and writes done with.  Otherwise it makes the last block write
short, that will be odd once we implement writing of only dirty
blocks, but also requires read-modify-write on DMU layer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17875
2025-10-30 16:28:05 -07:00
Joseph Anthony Pasquale Holsten
033dbdc982
autogen.sh: remove workaround for automake <1.14, needed for EL <=7
Ultimately this is a revert of 779ac93, which according to
@nabijaczleweli is to paper over automake <1.14's lack of
%reldir% support.

As I understand it, EL8 is the lowest current build target.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Joseph Holsten <joseph@josephholsten.com>
Closes #17878
2025-10-30 16:26:56 -07:00
Brian Behlendorf
f819b41c78
Retire ZoL patch scripts
Remove the out of date helper scripts originally used to port
Illumos commits to the ZoL repository.  Due to layout changes
made to this repository they're no longer entirely correct.
Remove them to make it clear they're no longer being used or
actively maintained.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17880
2025-10-30 09:07:38 -07:00
Alexander Motin
dcada084b9
Pass flags to more DMU write/hold functions
Over the time many of DMU functions got flags argument to control
prefetch, caching, etc.  Few functions though left without it, even
though closer look shown that many of them do not require prefetch
due to their access pattern.  This patch adds the flags argument to
dmu_write(), dmu_buf_hold_array() and dmu_buf_hold_array_by_bonus(),
passing DMU_READ_NO_PREFETCH where applicable.

I am going to also pass DMU_UNCACHEDIO to some of them later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17872
2025-10-29 11:17:51 -07:00
Quartz
3caf66c25b
man: Update zpool-event subclass names and document new types
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Quartz <yyhran@163.com>
Closes #17868
2025-10-28 12:49:05 -07:00
Toomas Soome
67e716329b
ZTS: autotrim_config.ksh is missing pool type
functional/trim tests do create pools of different types to test
trim, autotrim_config.ksh is missing the type from zpool
create command line while we are looping over different pool
types.

Sponsored-by: Edgecast Cloud LLC.
Signed-off-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17874
2025-10-28 12:41:36 -04:00
Ryan Libby
0455150f11
FreeBSD zio_crypt.c: initialize uio variables before access
In zio_crypt_key_wrap and zio_crypt_key_unwrap, the cuio_s variable was
not initialized before the calls to zfs_uio_init, leading to
uninitialized access to cuio_s.uio_offset.  Initialize it to avoid gcc
warnings.

Similar issue as fixed in 2bf152021 ("Fix gcc uninitialized warning in
FreeBSD zio_crypt.c")

Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17863
2025-10-23 21:23:25 -04:00
Rob Norris
fc519b2c11
mailmap/AUTHORS: update with recent new contributors
We’re not always on the same page, but at least we’re in the same book.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17860
2025-10-22 09:27:54 -07:00
Shreshth3
44704616b4
zpool: fix conflict with -v and -o options
Right now, the -v and -o options for `zpool list` work independently,
but when paired, the -v "wins out" and the -o effect is lost. This
commit fixes that problem.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Closes #11040
Closes #17839
2025-10-21 15:10:52 -07:00
Rob Norris
72f41454a6
ZTS: fail test run if test runner crashes unexpectedly
zfs-tests.sh executes test-runner.py to do the actual test work. Any
exit code < 4 is interpreted as success, with the actual value
describing the outcome of the tests inside.

If a Python program crashes in some way (eg an uncaught exception), the
process exit code is 1.

Taken together, this means that test-runner.py can crash during setup,
but return a "success" error code to zfs-tests.sh, which will report and
exit 0. This in turn causes the CI runner to believe the test run
completed successfully.

This commit addresses this by making zfs-tests.sh interpret an exit code
of 255 as a failure in the runner itself. Then, in test-runner.py, the
"fail()" function defaults to a 255 return, and the main function gets
wrapped in a generic exception handler, which prints it and calls
fail().

All together, this should mean that any unexpected failure in the test
runner itself will be propagated out of zfs-tests.sh for CI or any other
calling program to deal with.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17858
2025-10-21 14:34:39 -07:00
Jean-Sébastien Pédron
3a55e76b84
FreeBSD: zfs_getpages: Don't zero freshly allocated pages
Initially, `zfs_getpages()` is provided with an array of busy pages by
the vnode pager. It then tries to acquire the range lock, but if there
is a concurrent `zfs_write()` running and fails to acquire that range
lock, it "unbusies" the pages to avoid a deadlock with `zfs_write()`.
After that, it grabs the pages again and retries to acquire the range
lock, and so on.

Once it got the range lock, it filters out valid pages, then copy DMU
data to the remaining invalid pages.

The problem is that freshly allocated zero'd pages it grabbed itself are
marked as valid. Therefore they are skipped by the second part of the
function and DMU data is never copied to these pages. This causes mapped
pages to contain zeros instead of the expected file content.

This was discovered while working on RabbitMQ on FreeBSD. I could
reproduce the problem easily with the following commands:

    git clone https://github.com/rabbitmq/rabbitmq-server.git
    cd rabbitmq-server/deps/rabbit

    gmake distclean-ct RABBITMQ_METADATA_STORE=mnesia \
      ct-amqp_client t=cluster_size_3:leader_transfer_stream_send

The testsuite fails because there is a sendfile(2) that can happen
concurrently to a write(2) on the same file. This leads to sendfile(2)
or read(2) (after the sendfile) sending/returning data with zeros, which
causes a function to crash.

The patch consists of not setting the `VM_ALLOC_ZERO` flag when
`zfs_getpages()` grabs pages again. Then, the last page is zero'd if it
is invalid, in case it would be partially filled with the end of the
file content. Other pages are either valid (and will be skipped) or they
will be entirely overwritten by the file content.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by: Jean-Sébastien Pédron <dumbbell@FreeBSD.org>
Closes #17851
2025-10-20 17:04:21 -07:00
Rob Norris
fe8b50f09f Linux 6.18: generic_drop_inode() and generic_delete_inode() renamed
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Rob Norris
3651888182 sha256_generic: make internal functions a little more private
Linux 6.18 has conflicting prototypes for various sha256_* and sha512_*
functions, which we get through a very long include chain. That's tough
to fix right now; easier is just to rename our internal functions.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Rob Norris
8911360a41 Linux 6.18: namespace type moved to ns_common
The namespace type has moved from the namespace ops struct to the
"common" base namespace struct. Detect this and define a macro that does
the right thing for both versions.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Rob Norris
76c238f1ba Linux 6.18: replace write_cache_pages()
Linux 6.18 removed write_cache_pages() without a usable replacement.
Here we implement a minimal zpl_write_cache_pages() that find the dirty
pages within the mapping, gets them into the expected state and hands
them off to zfs_putpage(), which handles the rest.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Rob Norris
39db4bda80 Linux 6.18: block_device_operations->getgeo takes struct gendisk*
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Rob Norris
5de4a297e7 Linux 6.18: convert ida_simple_* calls
ida_simple_get() and ida_simple_remove() are removed in 6.18. However,
since 4.19 they have been simple wrappers around ida_alloc() and
ida_free(), so we can just use those directly.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Rob Norris
9d50ee59dc Linux 6.18: replace nth_page()
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-10-20 16:01:04 -07:00
Andrew Walker
adacf020ce
Fix return value for setting zvol threading
We must return -1 instead of ENOENT if the special zvol threading
property set function can't locate the dataset (this would typically
happen with an encypted and unmounted zvol) so that the operation
gets inserted properly into the nvlist for operations to set. This
is because we want the property to be set once the zvol is
decrypted again.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #17836
2025-10-20 15:21:40 -07:00
Shreshth3
3ea8ca8c0f
zdb: fix bug with -A flag
Fixes #10544.

According to the manpage, zdb -A should
ignore all assertions. But it currently
does not do that. This commit fixes
this bug.

Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17825
2025-10-20 09:30:20 -04:00
Andrew Walker
783a02b5d3
Fix ZFS_READONLY implementation on Linux
MS-FSCC 2.6 is the governing document for
DOS attribute behavior. It specifies the following:

For a file, applications can read the file but
cannot write to it or delete it. For a directory,
applications cannot delete it, but applications can
create and delete files from the directory.

Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17837
2025-10-20 09:28:57 -04:00
Brian Behlendorf
5a03e358fc
Update device removal documentation
Make a minor update to the 'zpool remove' man page to clarify both
raidz and draid pools do not support removal, and change sector to
ashift which is what we actually care about.

Update the big theory comment in vdev_removal.c to accurately reflect
which types of vdevs can be removed.  Furthermore, I've added some
discussion for the casual reader to briefly explain the top-level
vdev removal restrictions.  This has been a common area of confusion
and it's not intuitive where they come from without understanding
the implementation details.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17847
2025-10-20 09:26:51 -04:00
Rob Norris
6ae99d2692
mmap_seek: print error code and text on failure
If lseek() returns an unexpected error, it's useful to know the error
code to help connect it to the trouble spot inside the module.

Since the two seek functions should be basically identical, lift them
into a single generic function.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Robert Evans <evansr@google.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17843
2025-10-14 15:57:35 -07:00
Ameer Hamza
d1e1b80ffe
CI: Fix FreeBSD 15.0 by staying on ALPHA4 due to broken ALPHA5 image
FreeBSD 15.0-ALPHA5 image fails to boot on cloud VMs due to missing
/boot/efi mount point, causing the system to drop to single user mode
where SSH cannot start. Work around this by staying on ALPHA4 and
setting IGNORE_OSVERSION=yes to bypass pkg's kernel version mismatch
prompt during bootstrap. This allows CI to proceed with ALPHA4 until we
have a stable FreeBSD 15.0 image.

Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17846
2025-10-13 21:19:47 -04:00
Shreshth3
a5af3f2db7
arc: fix small typos
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Closes #17840
2025-10-13 11:23:55 -07:00
Rob Norris
0e62831110
libzpool/cmn_err: remove suppression, add stop option, cleanup
A small uplift of the cmn_err() and panic() calls in userspace:

- remove the suppression on CE_NOTE. We have very few of these calls in
  a standard build, it's convenient for "print debugging".

- make prefixes clear and consistent.

- add LIBZPOOL_PANIC_STOP environment variable to send SIGSTOP to the
  process group on a panic, rather than abort(), so all threads remain
  alive for inspection.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17834
2025-10-13 10:55:03 -07:00
Mark Johnston
a9f2a1f361
Fix the type of the raidz_outlier_check_interval_ms parameter
It's an hrtime_t, which is an unsigned long long.  In practice this is
just a U64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #17833
2025-10-13 10:47:09 -07:00
Alexander Motin
f4276479c9
Suppress some ashift warnings
Do not warn about vdev ashifts being smaller then physical ashifts
in a pool status if the pool ashift property set and vdev ashift
satisfies it (bigger or equal), since user explicitly requested
this.  The ashift of individual vdevs are still reported.

Do not warn about vdev ashifts in zpool import, since it doesn't
matter much, and we don't even report individual vdevs ashifts
there.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17830
2025-10-13 10:41:42 -07:00
Alexander Motin
51de2d76f8
Explicit set ashift for non-leaf vdevs
Before this change ashift property was applied only to a leaf
vdevs.  As result, it worked only as a minimal value for parent
vdevs, since bigger physical_ashift value reported by any child
could be used instead when deciding parent's ashift, as if the
ashift property was never set.

This change explicitly passes ZPOOL_CONFIG_ASHIFT to all vdevs,
allowing override for parents only if the passed value is below
logical_ashift and so unacceptable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17826
2025-10-13 10:41:02 -07:00
Ameer Hamza
6045740c8e
zpool_reopen_004_pos: Clear label from offline disk after destroy
zpool_reopen_004_pos destroys a pool with an offline disk, leaving its
label intact. In TrueNAS local repo, zpool_reopen_005_pos is skipped,
causing zpool_reopen_007_pos to fail as it doesn't use -f flag when
creating pools unlike zpool_reopen_005_pos.

Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17831
2025-10-10 11:53:11 -04:00
Dag-Erling Smørgrav
6e5b836e9f
FreeBSD: Correct _PC_MIN_HOLE_SIZE
The actual minimum hole size on ZFS is variable, but we always report
SPA_MINBLOCKSIZE, which is 512.  This may lead applications to believe
that they can reliably create holes at 512-byte boundaries and waste
resources trying to punch holes that ZFS ends up filling anyway.

* In the general case, if the vnode is a regular file, return its
  current block size, or the record size if the file is smaller than
  its own block size.  If the vnode is a directory, return the dataset
  record size.  If it is neither a regular file nor a directory,
  return EINVAL.

* In the control directory case, always return EINVAL.

Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17750
2025-10-08 09:13:22 -04:00
Shreshth3
ea914e4a43
Add missing include statement
Resolve a build failure for user applications that include <sys/uio.h>.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Closes #17781
Closes #17814
2025-10-07 09:21:03 -07:00
Tony Hutter
1861a329fb
zvol: verify IO type is supported
ZVOLs don't support all block layer IO request types.  Add a check for
the IO types we do support.  Also, remove references to
io_is_secure_erase() since they are not supported on ZVOLs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17803
2025-10-06 16:54:09 -07:00
Mateusz Guzik
346ecac61b
Annotate arc_buf_is_shared as __maybe_unused
Otherwise the compiler warns about it on production FreeBSD builds.

The routine proved resilient to attempts to ifdef on debug.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #17818
2025-10-06 16:43:20 -07:00
Tino Reichardt
3ef2d5ee45
CI: Switch FreeBSD 15 to 15.0-ALPHA4 and add FreeBSD 16
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #17815
2025-10-06 16:41:01 -07:00
Ivan Shapovalov
f8b082b5af zdb: adjust block histogram binning strategy
Previously, a bin included all blocks _starting_ from given size
(e.g., a "4K" bin would include all blocks within the [4K; 8K) region).
This is counter-intuitive and does not match the typical use-case of the
block histogram (that is, to estimate disk usage considering how ZFS'
block allocation works). In other words, if I'm looking at the "4K" row,
I'm interested in records that _fit into_ a 4K block.

Adjust the binning strategy such that a bin includes all blocks _up to_
given size, such that e.g. a "4K" bin would include all blocks within
the (2K; 4K] region.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16999
2025-10-06 09:35:32 -07:00
Ivan Shapovalov
3a1a22abb4 zdb: factor out block histogram bin number computation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16999
2025-10-06 09:35:28 -07:00
Ivan Shapovalov
c0a874fced zdb: add --class=(normal|special|...) to filter blocks by alloc class
When counting blocks to generate block size histograms (`-bb`), accept a
`--class=` argument (as a comma-separated list of either "normal",
"special", "dedup" or "other") to only consider blocks that belong to
these metaslab classes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16999
2025-10-06 09:35:23 -07:00
Ivan Shapovalov
8e97b98140 zdb: add --bin=(lsize|psize|asize) arg to control histogram binning
When counting blocks to generate block size histograms (`-bb`), accept a
`--bin=` argument to force placing blocks into all three bins based on
*this* size.

E.g. with `--bin=lsize`, a block with lsize=512K, psize=128K, asize=256K
will be placed into the "512K" bin in all three output columns. This
way, by looking at the "512K" row the user will be able to determine
how well was ZFS able to compress blocks of this logical size.

Conversely, with `--bin=psize`, by looking at the "128K" row the user
will be able to determine how much overhead was incurred for storage
of blocks of this physical size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16999
2025-10-06 09:35:02 -07:00
Ivan Shapovalov
1269fa9b79 zdb: convert ALLOCATED_OPT into anonymous enum
We are adding more long-only options, so use an enum for all of them
to avoid manually numbering these constants.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16999
2025-10-06 09:34:03 -07:00
Rob Norris
5605a6d79b pool_iter_refresh: don't refresh pools twice
In "all pools" mode, pool_iter_refresh() will call zpool_iter(), which
will call zpool_refresh_stats() before calling add_pool(). If we already
have the pool, this is a different handle, so we just release it and
return. Back in pool_iter_refresh(), we then call zpool_stats_refresh()
again for our handle on the same pool.

All together, this means we're doing two ZFS_IOC_POOL_STATS calls into
the kernel for every pool in the system. This isn't wrong, but it does
double the pressure on global locks.

Instead, we add a new function zpool_refresh_stats_from_handle() that
simply copies the pool config and state from one handle to another, and
use it to update our handle before we release it in add_pool(), so we
only have one call per pool per interval.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17807
2025-10-03 14:39:09 -07:00
Rob Norris
5f09781cca pool_iter_refresh: don't flag existing pools as refreshed
zpool_iter() passes the callback a new instance of zpool_handle_t each
time, so the existing handle in the pool_list AVL never actually gets a
refresh. Internally, that means its zpool_config is never updated, and
the old config is never moved to zpool_old_config. As a result,
print_iostat() never sees any updated config, and so repeats the first
line forever.

This is the simplest workaround: just don't mark existing pools as
refreshed. pool_list_refresh() will see this and refresh them.
The downside is a second call to ZFS_IOC_POOL_STATS for existing pools,
because zpool_iter() just called it for the handle we threw away.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17807
2025-10-03 14:39:00 -07:00
Rob Norris
1a32adca0f zpool iostat: update pool counter when skipping boot row
When skipping the boot row (with -y), the early loop meant we weren't
updating the "last_npools" count. That means the count never advanced
past zero, so cb_iteration was always reset to 0, leading to it being
"stuck" on the boot line, printing the header and nothing else forever.

Updating the pool counter on every loop sorts that out: it advances,
cb_iteration moves properly, and normal rows are printed.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17807
2025-10-03 14:38:35 -07:00
Ameer Hamza
ac2d8c80b6
Make mount/share errors non-fatal for zfs create/clone
If zfs_mount_and_share() fails, the error propagates to zfs create/clone
commands despite successful operation. If create/clone operations were
successful, there's no point in making zfs_mount_and_share() failures
fatal.

Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17799
2025-10-02 11:24:26 -04:00
Igor Ostapenko
cb3c18a9a9 ddt prune: Add SCL_ZIO deadlock workaround
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com>
Closes #17793
2025-10-01 15:17:09 -07:00
Igor Ostapenko
e829e2fd04 spa_config: Rename spa_config_enter_mmp() to spa_config_enter_priority()
Originally this was created for MMP, but now new cases are emerging
where the same mechanism is required. Hence the name's generalization.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com>
Closes #17793
2025-10-01 15:16:04 -07:00
Robert Evans
8869caae5f
zinject: Introduce ready delay fault injection
This adds a pause to the ZIO pipeline in the ready stage for
matching I/O (data, dnode, or raw bookmark).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #17787
2025-10-01 12:17:13 -07:00
Paul Dagnelie
fa4d4b1f80
Fix display of default xattr to show 'sa'
When the default value of the xattr property was changed from 'dir' to
'sa', the code that displays the property's value was not affected. The
problem with this state of affairs is that 1) user tooling that
specifically looked for 'sa' before will be confused now that the code
displays 'on' instead. And 2) users may be confused when manually
running the commands about which specific type of xattr is in use unless
they are up to date on the latest zfs changes.

The fix here is to show the actual type always, rather than 'on' if we
happen to be using the default. This turns out to be easy to do, by
simply reordering the list of xattr values in the properties code. When
the property is displayed, we iterate down the table until we find a row
with a matching value, and use that row's name as the
display. Reordering the row fixes the display without affecting any
other code.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17801
2025-10-01 12:14:56 -07:00
Shreshth3
32ce74ff32
docs: fix a few small typos (#17804)
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-10-01 10:15:46 -07:00
nav1s
102ff2a640
manuals: fix typos in zpool-upgrade man page
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: nav1s <nav1s@proton.me>
Closes #17797
2025-09-29 16:43:22 -07:00
hoshinomori
e4a407f29f
range_tree: drop duplicate zfs_ prefix from rs_set_fill_raw
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: hoshinomori <hoshinomori@owarisekai.moe>
Closes #17800
2025-09-29 16:38:52 -07:00
Rob Norris
f0a95e8971
zpool iostat: refresh pool list every interval
When running zpool iostat in interval mode, it would not notice any new
pools created or imported, and would forget any destroyed or exported,
so would not notice if they came back. This leads to outputting "no
pools available" every interval until killed.

It looks like this was at least intended to work; the comment above
zpool_do_iostat() indicates that it is expected to "deal with pool
creation/destruction" and that pool_list_update() would detect new
pools. That call however was removed in 3e43edd2c5, though its unclear
if that broke this behaviour and it wasn't noticed, or if it never
worked, or if something later broke it. That said, the lack of
pool_list_update() is only part of the reason it doesn't work properly.

The fundamental problem is that the various things involved in
refreshing or updating the list of pools would aggressively ignore,
remove, skip or fail on pools that stop existing, or that already exist.
Mostly this meant that once a pool is removed from the list, it will
never be seen again. Restoring pool_list_update() to the
zpool_do_iostat() loop only partially fixes this - it would find "new"
pools again, but only in the "all pools" (no args) mode, and because its
iterator callback add_pool() would abort the iterator if it already has
a pool listed, it would only add pools if there weren't any already.

So, this commit reworks the structure somewhat. pool_list_update()
becomes pool_list_refresh(), and will ensure the state of all pools in
the list are updated. In the "all pools" mode, it will also add new
pools and remove pools that disappear, but when a fixed list of pools is
used, the list doesn't change, only the state of the pools within it.

The rest of the commit is adjusting things for this much simpler
structure. Regardless of the mode in use, pool_list_refresh() will
always do the right thing, so the driver code can just get on with the
display.

Now that pools can appear and disappear, I've made it so the header (if
enabled) is re-printed when the list changes, so that its easier to see
what's happening if the column widths change.

Since this is all rather complicated, I've included tests for the "all
pools" and "set of pools" modes.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17786
2025-09-29 16:35:27 -07:00
Tony Hutter
75be5f2973
CI: Add ZTS -O option, log Setup Testing Machines step
Add a -O option to zfs-test.sh to dump debug information on test
timeout.  The debug info includes:

- 30 lines from 'top'
- /proc/<PID>/stack output of process with highest CPU usage
- Last lines strace-ing process with highest CPU usage
- /proc/sysrq-trigger kernel stack traces

All debug information gets dumped to /dev/kmsg (Linux only).

In addition, print out the VM console lines from the "Setup Testing
Machines" step.  We have often see VMs timeout at this step and don't
know why.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17753
2025-09-29 16:32:05 -07:00
Tony Hutter
8d4c3ee9e6
zvol: Fix blk-mq sync
The zvol blk-mq codepaths would erroneously send FLUSH and TRIM
commands down the read codepath, rather than write.  This fixes
the issue, and updates the zvol_misc_fua test to verify that
sync writes are actually happening.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17761
Closes #17765
2025-09-29 16:29:20 -07:00
Brian Behlendorf
4ff25e9013
CI: Switch FreeBSD 15 to 15.0-ALPHA3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17795
2025-09-26 20:52:57 -04:00
Brian Behlendorf
a44985315e
CI: Remove Buildbot references
The Buildbot CI infrastructure has been fully replaced by GitHub
Actions.  Remove any lingering references from the repository.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17794
2025-09-26 15:32:41 -07:00
Brian Behlendorf
79be201806
Linux 6.17 compat: META
Update the META file to reflect compatibility with the 6.17
kernel.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17789
2025-09-26 10:00:18 -07:00
Brian Behlendorf
aecd6deeb3
CI: update perf and bpftools with the kernel packages
When updating a Fedora instance to an experimental kernel make sure
to include the matching versioned perf and bpftool packages.  This
helps ensure there are no unexpected conflicts which would prevent
the new packages from being installed.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17791
2025-09-25 17:47:32 -07:00
patrickxia
5c38029f4b
zdb: add ZFS_KEYFORMAT_RAW support for -K option
This change adds support for ZFS_KEYFORMAT_RAW to zdb_derive_key in 
zdb.c. The implementation reads the raw key from the file specified 
by the -K option which is consistent with how raw keys are handled in 
the other parts of ZFS, along with a check to ensure that the keyfile 
doesn't have too many bytes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Patrick Xia <patrickx@google.com>
Closes #17783
2025-09-25 12:05:42 -07:00
Robert Evans
26b0f561be
dnode_next_offset: backtrack if lower level does not match
This changes the basic search algorithm from a single search up and down
the tree to a full depth-first traversal to handle conditions where the
tree matches at a higher level but not a lower level.

Normally higher level blocks always point to matching blocks, but there
are cases where this does not happen:

1. Racing block pointer updates from dbuf_write_ready.

   Before f664f1ee7f (#8946), both dbuf_write_ready and
   dnode_next_offset held dn_struct_rwlock which protected against
   pointer writes from concurrent syncs.

   This no longer applies, so sync context can f.e. clear or fill all
   L1->L0 BPs before the L2->L1 BP and higher BP's are updated.

   dnode_free_range in particular can reach this case and skip over L1
   blocks that need to be dirtied. Later, sync will panic in
   free_children when trying to clear a non-dirty indirect block.

   This case was found with ztest.

2. txg > 0, non-hole case. This is #11196.

   Freeing blocks/dnodes breaks the assumption that a match at a higher
   level implies a match at a lower level when filtering txg > 0.

   Whenever some but not all L0 blocks are freed, the parent L1 block is
   rewritten. Its updated L2->L1 BP reflects a newer birth txg.

   Later when searching by txg, if the L1 block matches since the txg is
   newer, it is possible that none of the remaining L1->L0 BPs match if
   none have been updated.

   The same behavior is possible with dnode search at L0.

   This is reachable from dsl_destroy_head for synchronous freeing.
   When this happens open context fails to free objects leaving sync
   context stuck freeing potentially many objects.

   This is also reachable from traverse_pool for extreme rewind where it
   is theoretically possible that datasets not dirtied after txg are
   skipped if the MOS has high enough indirection to trigger this case.

In both of these cases, without backtracking the search ends prematurely
as ESRCH result implies no more matches in the entire object.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16025
Closes #11196
2025-09-25 11:06:28 -07:00
Brian Behlendorf
c722bf8812
Add interface to interface spa_get_worst_case_min_alloc() function
Provide an interface to retrieve the lowest and highest minimum
allocation size for the normal allocation class.  This can be used
by external consumers of the DMU to estimate potential wasted
capacity when setting the recordsize for an object.

The new "min_alloc" and "max_alloc" keys are added to the pool
configuration and used by default_volblocksize() to warn when
an ineffecient block size is requested.  For older kmods which
don't yet include the new keys fallback to the previous logic.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17758
2025-09-25 09:35:35 -07:00
Brian Behlendorf
0e1a53a8c0
Fix 'zpool add' safety check corner cases
Three cases were discovered where 'zpool add' would fail to
warn when adding vdevs to a pool with a mismatched replication
level.  These are:

  1. When a pool contains mixed file and disk vdevs.
  2. When a pool contains an active dRAID distributed spare
  3. When a pool contains an active hot spare

The lack of warnings are caused by get_replication() assessing
the current pool configuration an inconsistent and disabling
the mismatched replication check for the new pool configuration
after 'zpool add'.  This change updates get_replication() to
be slightly more tolerant in the non-fatal case.

The zpool_add_010_pos.ksh test case was split in to separate
tests: zpool_add_warn_create.ksh, pool_add_warn_degraded.ksh,
and zpool_add_warn_removal.  These test were extended to
include coverage for dRAID pools and the three scenarios
described above.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17780
2025-09-25 09:32:59 -07:00
Brian Behlendorf
3e9347c9f7
ZTS: update upgrade_readonly_pool.ksh
Modify the test case to use the `zfs mount` command instead
of directly calling the mount command, create a dedicated dataset,
and use the default mount point.  These changes are intended to
preserve the intent of the original test case and resolve some
spurious mount failures which have been observed by the CI.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17785
2025-09-25 09:31:10 -07:00
jozzsi
b2196fbedf
contrib: dracut: install dependent kernel modules
Eliminates the need for the following workaround

> Add other drivers to dracut:

```
if grep mpt3sas /proc/modules; then
  echo 'force_drivers+=" mpt3sas "'  >> /etc/dracut.conf.d/zfs.conf
fi
if grep virtio_blk /proc/modules; then
  echo 'filesystems+=" virtio_blk "' >> /etc/dracut.conf.d/fs.conf
fi
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jo Zzsi <jozzsicsataban@gmail.com>
Closes #17762
2025-09-23 16:58:38 -07:00
Alexander Motin
ea37c30fcb
zdb: Fix asize overflow in verify_livelist_allocs()
Spacemap entry might be too big to fit into a block pointer ashift.
We hit an assertion trying to run `zdb -bvy` on a large pool.  But
it seems the code does not really need size there, since we only
need to search for a range of offsets, so setting it to zero should
just make btree return position just before the first entry.  I
suspect the previous code could actually miss the first entry
due to this if its size was smaller.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17764
2025-09-23 16:09:37 -07:00
trick2011
876f705cc4
Use "vdev" instead of "devices" when referring to vdevs
Update documentation to use the correct terminology.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: trick2011 <trick2011@users.noreply.github.com>
Closes #17734
Closes #17755
2025-09-23 16:08:07 -07:00
Tony Hutter
11787965e0
ZTS: Fix stale symlinks with zfs-helpers.sh
zfs-helpers.sh is a utility script that sets up udev symlinks so you
can run ZTS from a local ZFS git workspace.  However, it doesn't check
that the udev symlinks point to the current workspace.  They may point
to an old workspace that has been deleted.  This means the udev rules
never get executed, which in turn causes the zvol tests to fail.

This commit removes old symlinks that do not point to the current
ZFS workspace.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17766
2025-09-23 12:58:14 -07:00
jozzsi
6ba51da93b
contrib: dracut: always include zfs kernel module
This commit fixes the issue and includes the zfs kernel
module even when dracut is used in hostonly mode.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jo Zzsi <jozzsicsataban@gmail.com>
Closes #17754
2025-09-18 16:56:45 -07:00
Alan Somers
545d66204d
Fix a printf format specifier on FreeBSD/i386
This is breaking the build on FreeBSD/i386.  Originally committed
downstream as https://github.com/freebsd/freebsd-src/commit/2d76470b701

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	ConnectWise
Closes #17705
2025-09-17 16:32:29 -07:00
Alan Somers
3387d34093
Fix atomic-alignment warnings in libspl on FreeBSD/i386
On i386, Clang complains about misaligned atomic operations.  Silence
these warnings to fix the build on FreeBSD/i386.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	ConnectWise
Closes #17708
2025-09-17 16:31:27 -07:00
Rob Norris
ab8cc63c77 linux/super: add tunable to request immediate reclaim of unused dentries
Traditionally, unused dentries would be cached in the dentry cache until
the associated entry is no longer on disk. The cached dentry continues
to hold an inode reference, causing the inode to be pinned (see previous
commit).

Here we implement the dentry op d_delete, which is roughly analogous to
the drop_inode superblock op, and add a zfs_delete_dentry tunable to
control its behaviour. By default it continues the traditional
behaviour, but when the tunable is enabled, we signal that an unused
dentry should be freed immediately, releasing its inode reference, and
so allowing that inode to be deleted if no longer in use.

Sponsored-by: Klara, Inc.
Sponsored-by: Fastmail Pty Ltd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17746
2025-09-17 08:16:32 -07:00
Rob Norris
ab93b4b70e linux/super: add tunable to request immediate reclaim of unused inodes
Traditionally, unused inodes would be held on the superblock inode cache
until the associated on-disk file is removed or the kernel requests
reclaim.  On filesystems with millions of rarely-used files, this can be
a lot of unusable memory.

Here we implement the superblock drop_inode method, and add a
zfs_delete_inode tunable to control its behaviour. By default it
continues the traditional behaviour, but when the tunable is enabled, we
signal that the inode should be deleted immediately when the last
reference is dropped, rather than cached. This releases the associated
data to the dbuf cache and ARC, allowing them to be reclaimed normally.

Sponsored-by: Klara, Inc.
Sponsored-by: Fastmail Pty Ltd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17746
2025-09-17 08:15:56 -07:00
buzzingwires
ffe93aee0a Add typesets to zhack label repair test scripts
As a quality assurance measure, `typeset` is added to local variable
declarations to actually enforce their intended scope.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: buzzingwires <buzzingwires@outlook.com>
Closes #17732
2025-09-17 08:13:24 -07:00
buzzingwires
1d2d812986 Refactor zhack label repair and fix -c regression on nonzero TXG
This commit fixes a likely regression introduced by 64db435 where the
checksum repair functionality (`-c` or default behavior) will perform
checks and access data associated with the newer undetach (`-u`)
functionality, resulting in a failure when an uberblock's TXG is not 0
as required by `-u` but not `-c`

Additionally, code is refactored for better separation of tasks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: buzzingwires <buzzingwires@outlook.com>
Closes #17732
2025-09-17 08:13:06 -07:00
Rob Norris
d36684201f man: add silent rules for mancheck
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17747
2025-09-17 08:10:32 -07:00
Rob Norris
45ac6045cc mancheck: allow single files
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17747
2025-09-17 08:10:28 -07:00
Rob Norris
faf2db3435 Shellcheck.am: add silent rules for shellcheck and checkbashisms
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17747
2025-09-17 08:10:08 -07:00
Alexander Motin
d147ed7d26
CI: Switch FreeBSD 15 to 15.0-ALPHA2
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17749
2025-09-15 12:15:31 -07:00
Igor Ostapenko
58b84289e8
Fix txg_log_time ZAP key typo
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com>
Sponsored-by: Klara, Inc.
Closes #17748
2025-09-15 10:43:43 -07:00
Kyle Evans
5c46baa1ce
zfsprops(7): attempt to clarify the keylocation description
The current description is somewhat difficult to parse through, and in
some cases is a little unclear as to the behavior.

Split it into a paragraphs based on the three distinct behaviors you
may get: prompt, file URL, HTTP(S) URL.  The descriptions of the file
and HTTP(s) behavior seems fine, but prompt is a little vague- expand
on it and make it clear that the behavior is actively based on whether
the inquisitor of key-data is provided with a tty for stdin or not.

Also clarify *why* one shouldn't "place keys which should be kept secret
on the command line" and note that you *have* to supply the key via
stdin if it's a raw key, just to be sure.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #17742
2025-09-15 10:26:17 -07:00
Brian Behlendorf
f330b463de
ZTS: default to random data in fill_fs
Update the fill_fs helper function to request a random fill pattern
when the "data" argument isn't specified.  This ensures the default
behavior is to perform a more realistic fill of incompressible blocks.

Additionally, update a few test cases to specify a random fill.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17739
2025-09-15 09:37:33 -07:00
Brian Behlendorf
4b764fb01a
ZTS: Fix zfs_send_delegation_user test
Correct the path in the common.run file.  The zfs_send_delegation_user
test is installed under cli_user not cli_root.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17740
2025-09-15 09:30:57 -07:00
Rob Norris
f319ff3570
vdev_disk_close: take disk write lock before destroying it
Many IO operations are submitted to the kernel async, and so the zio can
complete and followup actions before the submission call returns. If one
of the followup actions closes the disk (eg during pool create/import),
the initiator may be left holding a lock on the disk at destruction.

Instead, take the write lock before finishing up and decoupling the disk
state from the vdev proper. The caller will hold until all IO is
submitted and locks released.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17719
2025-09-15 09:12:24 -07:00
Alexander Motin
3f4312a0a4
Fix two infinite loops if dmu_prefetch_max set to zero
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17692
Closes #17729
2025-09-13 12:58:48 -04:00
Paul Dagnelie
9b772f328b
Fix time database update calculations
The time database update math assumed that the timestamps were in
nanoseconds, but at some point in the development or review process they
changed to seconds. This PR fixes the math to use seconds instead.
    
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17735
2025-09-12 16:33:36 -07:00
Brian Behlendorf
455c36156c
ZTS: refreserv/refreserv_raidz improvements
Several small changes intended to make this test reliable.

- Leave the default compression enabled for the pool and switch
  to using /dev/urandom as the data source.  Functionally this
  shouldn't impact the test but it's preferable to test with
  the pool defaults when possible.

- Verify the device is created and removed as required.  Switch
  to a unique volume name for a more clarity in the logs.

- Use the ZVOL_DEVDIR to specify the device path.

- Speed up the test by creating the pool with an ashift=12 and
  testing 4K, 8K, 128K volblocksizes.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17725
2025-09-12 11:08:53 -07:00
Alexander Motin
bc8bcfc71a
Fix type in dbrrd_closest()
For ABS() to work, the argument must be signed, but rrdd_time is
uint64_t.  Clang noticed it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Fixes #16853
Closes #17733
2025-09-12 11:05:38 -07:00
Alexander Motin
cb5f9aa582
FreeBSD: Satisfy ASSERT_VOP_IN_SEQC()
zfs_aclset_common() might be called for newly created or not even
created vnodes, that triggers assertions on newer FreeBSD versions
with DEBUG_VFS_LOCKS included into INVARIANTS.  In the first case
make sure to call vn_seqc_write_begin()/_end(), in the second just
skip the assertion.

The similar has to be done for project management IOCTL and file-
bases extended attributes, since those are not going through VFS.

Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17722
2025-09-12 13:29:27 -04:00
Paul Dagnelie
35f47cb4f4
Make new zhack test a little more reliable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17728
2025-09-12 10:07:24 -07:00
Chunwei Chen
37cd30f714
Fix ddle memleak in ddt_log_load
In ddt_log_load(), when removing dup entry from flushing tree, it doesn't
free the entry causing memleak.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Closes #17657
Closes #17730
2025-09-12 10:05:06 -07:00
JT Pennington
955fbc5ade Add send:encrypted test
Create tests for the new send:encrypted permission

Sponsored-by: Klara, Inc.
Sponsored-by: Karakun AG
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: JT Pennington <jt.pennington@klarasystems.com>
Closes #17543
2025-09-12 09:53:54 -07:00
Allan Jude
7b1cc9eb61 ZFS allow send:encrypted
A new `zfs allow` permissions that ONLY allows sending replication
streams in raw (encrypted) mode, so encrypted data will not be
decrypted as part of the replication process.

Sponsored-by: Klara, Inc.
Sponsored-by: Karakun AG
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Co-authored-by: JT Pennington <jt.pennington@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #17543
2025-09-12 09:53:31 -07:00
Tony Hutter
654f2dcb42
zed: Add synchronous zedlets
Historically, ZED has blindly spawned off zedlets in parallel and never
worried about their completion order.  This means that you can
potentially have zedlets for event number 2 starting before zedlets for
event number 1 had finished.  Most of the time this is fine, and it
actually helps a lot when the system is getting spammed with hundreds
of events.

However, there are times when you want your zedlets to be executed
in sequence with the event ID.  That is where synchronous zedlets
come in.

ZED will wait for all previously spawned zedlets to finish before
running a synchronous zedlet.  Synchronous zedlets are guaranteed to be
the only zedlet running.  No other zedlets may run in parallel with a
synchronous zedlet.  Users should be careful to only use synchronous
zedlets when needed, since they decrease parallelism.

To make a zedlet synchronous, simply add a "-sync-" immediately
following the event name in the zedlet's file name:

	EVENT_NAME-sync-ZEDLETNAME.sh

For example, if you wanted a synchronous statechange script:

	statechange-sync-myzedlet.sh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17335
2025-09-11 11:34:07 -07:00
Brian Behlendorf
bc0b5318aa
Prevent scrubbing a read-only pool
While it would be nice to be able to scrub a pool imported read-only
this will currently trip an ASSERT.  Before we can support this there
are some designs challenges which need to be thought through first.

For starters, a read-only import skips reading certain information 
from disk which it knows won't be needed, such as the space maps.
Furthermore, the scrub process expects to be checkpoint it's progress, 
update the on disk error log, and issue repair IO.  None of which 
would be possible when the pool is imported read-only.  

Each of these wrinkles can certainly be handled, but that will take 
some signifcant work.  In the meanwhile we disable the 'zpool scrub' 
command when the pool is imported read-only.

Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #17527
Closes #17717
2025-09-11 10:58:46 -07:00
Paul Dagnelie
d64711c202 Detect a slow raidz child during reads
A single slow responding disk can affect the overall read
performance of a raidz group.  When a raidz child disk is
determined to be a persistent slow outlier, then have it
sit out during reads for a period of time. The raidz group
can use parity to reconstruct the data that was skipped.

Each time a slow disk is placed into a sit out period, its
`vdev_stat.vs_slow_ios count` is incremented and a zevent
class `ereport.fs.zfs.delay` is posted.

The length of the sit out period can be changed using the
`raid_read_sit_out_secs` module parameter.  Setting it to
zero disables slow outlier detection.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Contributions-by: Don Brady <don.brady@klarasystems.com>
Contributions-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17227
2025-09-10 15:25:03 -07:00
Paul Dagnelie
0620c979a5 Remove RAIDZ reconstruct flags from debug defaults
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17227
2025-09-10 15:24:50 -07:00
Tony Hutter
7f7c58389e
ZTS: Print warning if running ZTS user_run test locally
Print a warning if you're attempting to run a ZTS test that calls
'user_run', and the ephemeral user doesn't have permissions to
access the test binaries.

This can happen if you're running ZTS from a local git repo.  In
that case the test user (say, 'testuser1') may need access to the
ZTS binaries in:

/home/<your_username>/zfs/tests/zfs-tests/bin/

... but 'testuser1' doesn't have permission to enter your home dir:

/home/<your_username>

The warning will help alert users to what is going on.  This will
not be an issue when ZTS is actually installed on the system
(via 'make install' or from packages).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17721
2025-09-10 14:55:58 -07:00
Alan Somers
cd6db758f3
Fix the build of crypto_test on LP32 architectures
test->id is a uint64_t, not a long.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	ConnectWise
Closes #17707
2025-09-10 11:27:39 -07:00
Paul Dagnelie
bc4aac0395 Enable zhack to work properly with 4k sector size disks
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17576
2025-09-10 11:13:55 -07:00
Paul Dagnelie
8f15d2e4d5 Add allocation profile export and zhack subcommand for import
When attempting to debug performance problems on large systems, one of
the major factors that affect performance is free space
fragmentation. This heavily affects the allocation process, which is an
area of active development in ZFS. Unfortunately, fragmenting a large
pool for testing purposes is time consuming; it usually involves filling
the pool and then repeatedly overwriting data until the free space
becomes fragmented, which can take many hours. And even if the time is
available, artificial workloads rarely generate the same fragmentation
patterns as the natural workloads they're attempting to mimic.

This patch has two parts. First, in zdb, we add the ability to export
the full allocation map of the pool. It iterates over each vdev,
printing every allocated segment in the ms_allocatable range tree. This
can be done while the pool is online, though in that case the allocation
map may actually be from several different TXGs as new ones are loaded
on demand.

The second is a new subcommand for zhack, zhack metaslab leak (and its
supporting kernel changes). This is a zhack subcommand that imports a
pool and then modified the range trees of the metaslabs, allowing the
sync process to write them out normall. It does not currently store
those allocations anywhere to make them reversible, and there is no
corresponding free subcommand (which would be extremely dangerous); this
is an irreversible process, only intended for performance testing. The
only way to reclaim the space afterwards is to destroy the pool or roll
back to a checkpoint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17576
2025-09-10 11:13:24 -07:00
Shengqi Chen
92ca3ae56a contrib/debian: install files into merged /usr
This commit synchronizes the debian packaging files with the distro
version (also maintained by me) as much as possible.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Colm Buckley <colm@tuatha.org>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #17712
2025-09-10 10:45:26 -07:00
Shengqi Chen
9ae20cf03d cmd: rename arcstat to zarcstat
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Colm Buckley <colm@tuatha.org>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16357
Closes #17712
2025-09-10 10:45:21 -07:00
Shengqi Chen
a5571a0dd1 cmd: rename arc_summary to zarcsummary
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Colm Buckley <colm@tuatha.org>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16357
Closes #17712
2025-09-10 10:45:13 -07:00
Shengqi Chen
d3429a75b0 Remove renaming notice and symlinks for arcstat and arc_summary
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Colm Buckley <colm@tuatha.org>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16357
Closes #17712
2025-09-10 10:44:17 -07:00
Tony Hutter
c6fe41cac5
CI: Increase setup timeout to 20min, add timestamps
- Increase qemu-1-setup.sh timeout to 20min since it sometimes
  fails to complete after 15min.

- Timestamp all qemu-1-setup.sh lines to look for hangs.

- Add a 'watchdog' process to print out the top running process every
  30sec to help with debugging.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17714
2025-09-10 10:25:58 -07:00
Rob Norris
fe2f7cf6d7
linux/rw_destroy: assert no holders before destroying
While rw_destroy() may do nothing on Linux, we still want to make sure
that we don't have any holders outstanding like we do for mutexes.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17718
2025-09-10 08:59:57 -07:00
Rob Norris
7939bad5e7 Linux 6.17: d_set_d_op() is no longer available
We only have extremely narrow uses, so move it all into a single
function that does only what we need, with and without d_set_d_op().

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17621
2025-09-09 13:44:43 -07:00
Rob Norris
9e5e95c24d config: restore ZFS_AC_KERNEL_DENTRY tests
Accidentally removed calls in ed048fdc5b.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17621
2025-09-09 13:44:18 -07:00
Tony Hutter
4b83891db0
ZTS: Fix fault_limits timeouts
fault_limits would often hit the 10min timeout and be killed on Fedora
41-42.  Investigation showed that the 'fill_fs' portion of the test,
which would fill the pool with junk data before vdev replacement, was
writing highly compressible data (~126x), which would have taxed the
CPUs, potentially causing the timeout.

The fix is to write random data and reduce the number of writes.
This has an added benefit that more real data being is written to the
pool (~1GB) vs the old way (~300-400MB).  It also speeds up the test.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17709
2025-09-09 13:42:01 -07:00
Alan Somers
e29bfa5bd0
Fix warnings about sha2_is_supported on FreeBSD/i386
This is one problem currently preventing OpenZFS from building on
FreeBSD/i386.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	ConnectWise
Closes #17704
2025-09-09 09:56:38 -07:00
Alan Somers
a2424312c4
Fix the build on 32-bit FreeBSD with GCC
GCC complains about casting a 64-bit integer to a 32-bit pointer.
Originally committed downstream as
https://github.com/freebsd/freebsd-src/commit/2d76470b701

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Sponsored by:	ConnectWise
Closes #17706
2025-09-09 08:56:43 -07:00
rmacklem
59f8f5dfe1
zfs_vnops_os.c: Add support for the _PC_CLONE_BLKSIZE name
FreeBSD now has a pathconf name called _PC_CLONE_BLKSIZE
which is the block size supported for block cloning for
the file system.  Since ZFS's block size varies per file,
return the largest size likely to be used, or zero if block
cloning is not supported.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #17645
2025-09-09 08:52:40 -07:00
Rob Norris
8266fa5858
cmd: force zarcstat/zarc_summary recreation at install
If the target already exists, lt will fail. Force it to recreate the
symlinks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17702
2025-09-08 14:36:14 -07:00
Chunwei Chen
e3c3e86c04
Fix wrong dedup_table_size for legacy dedup
If we call ddt_log_load() for legacy ddt, we will end up going into
ddt_log_update_stats() and filling uninitialized value into ddo_dspace.
This value will then get added to dedup_table_size during
ddt_get_dedup_object_stats().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17019
Closes #17699

Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
2025-09-08 14:02:51 -07:00
Rob Norris
ced72fdd69
tunables: remove legacy FreeBSD aliases
These are old pre-OpenZFS tunable names that have long been
available via either conventional ZFS_MODULE_PARAM tunables or through
kstats. There's no point doubling up anymore, so delete them.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17375
2025-09-08 10:03:01 -07:00
Shengqi Chen
b9c6b0e09b Install zarcstat and zarcsummary in deb / rpm build rules
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16357
Closes #17695
2025-09-05 10:00:48 -07:00
Shengqi Chen
c69b7ea6ca Install zarcstat and zarcsummary symlinks in Makefile
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16357
Closes #17695
2025-09-05 10:00:48 -07:00
Shengqi Chen
ffba31c236 Add upcoming renaming notice for arc_summary and arcstat
They will become zarcsummary and zarcstat in 2.4.0.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #16357
Closes #17695
2025-09-05 10:00:48 -07:00
Shengqi Chen
dfc2c32590 ci: fix syntax issues in zfs-qemu.yml
Otherwise it might become `if [ == "" ]` which is ill-formed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #17695
2025-09-05 10:00:45 -07:00
Shengqi Chen
11b5c50238 ci: use real head sha instead of GITHUB_SHA when generating CI type
Because GitHub creates a merge commit on top of real head, so the check
on HEAD will fail regardlessly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #17695
2025-09-05 10:00:37 -07:00
Tony Hutter
0e88a0e1ea
CI: Increase 'Setup QEMU' timeout to 15 minutes
We've seen Fedora 42 still setting up after 10 min.  Change the timeout
to 15 min.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17697
2025-09-05 09:08:15 -07:00
Maksym Shkolnyi
69b65dda8a
config: Add warning if ARCH environment variable is set
If ARCH environment variable is set it can cause the failure of the
kernel modules check during the configure step. The resulting error
will be confusing, and may looks like this:

>    checking for kernel config option compatibility... done
>    checking whether CONFIG_MODULES is defined... no
>    configure: error:
>        *** This kernel does not include the required loadable module
>        *** support!

Detect when ARCH is print a warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Maksym Shkolnyi <maksym.shkolnyi@workato.com>
Closes #17680
2025-09-03 11:24:17 -07:00
Rob Norris
64d3143e82
zvol: reject suspend attempts when zvol is shutting down
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17690
2025-09-03 11:13:09 -07:00
Brian Behlendorf
9acedbacee
config: Fix LLVM-21 -Wuninitialized-const-pointer warning
LLVM-21 enables -Wuninitialized-const-pointer which results in the
following compiler warning and the bdev_file_open_by_path() interface
not being detected for 6.9 and newer kernels.  The blk_holder_ops
are not used by the ZFS code so we can safely use a NULL argument
for this check.

    bdev_file_open_by_path/bdev_file_open_by_path.c:110:54: error:
    variable 'h' is uninitialized when passed as a const pointer
    argument here [-Werror,-Wuninitialized-const-pointer]

Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17682
Closes #17684
2025-09-02 09:34:08 -07:00
Alexander Ziaee
5a8ba4520b
manuals: Audit/bump dates for last content change
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Ziaee <ziaee@FreeBSD.org>
Closes #17676
2025-08-28 16:26:16 -07:00
classabbyamp
ccf5a8a6fc
linux: use sys/stat.h instead of linux/stat.h
glibc includes linux/stat.h for statx, but musl defines its own statx
struct and associated constants, which does not include STATX_MNT_ID
yet. Thus, including linux/stat.h directly should be avoided for
maximum libc compatibility.

Tested on:
  - glibc: x86_64, i686, aarch64, armv7l, armv6l
  - musl: x86_64, aarch64, armv7l, armv6l

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-By: Achill Gilgenast <achill@achill.org>
Signed-off-by: classabbyamp <dev@placeviolette.net>
Closes #17675
2025-08-27 14:42:32 -07:00
Eric A. Borisch
1da2c30bed
Update pam_zfs_key.c defaultt path for FreeBSD
As described in https://github.com/freebsd/freebsd-src/pull/1305,
FreeBSD's installer defaults to zroot/home for user home directories.

For FreeBSD only, set the default prefix for pam_zfs_key to match.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Eric A. Borisch <eborisch@gmail.com>
Closes #17600
2025-08-27 09:36:37 -07:00
ofthesun9
976f765341
Update compatibility.d files
Add an openzfs-2.4 compatibility file for the next release.

While there are no compatibility difference between Linux and
FreeBSD for 2.4 symlinks for the -linux and -freebsd names are
created for any scripts expecting that convention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: ofthesun9 <olivier@ofthesun.net>
Closes #17672
Closes #17673
2025-08-25 16:47:19 -07:00
Shawn Bayern
ee7c362645
Add description of default sorting behavior to zfs_list.8
The sorting logic is all in cmd/zfs/zfs_iter.c.  I borrowed
where I could from the comments in the source code, but please
note that the comment to zfs_sort() is a little imprecise, or at
least incomplete, because it doesn't give any indication of the
chronological sort that will be used by default for snapshots in
zfs_compare().

While adding this description, I took the liberty to copy-edit
the rest of the file lightly.

In those edits, I've removed "If specified, you can list
property information by the absolute pathname or the relative
pathname" because, in context, it seems more confusing than
helpful.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Shawn Bayern <sbayern@law.fsu.edu>
Closes #15713
Closes #15869
2025-08-25 16:45:47 -07:00
Ivan Shapovalov
14bad10f96 config: add and use KERNEL_CC check for -Wno-format-zero-length
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16997
2025-08-25 11:26:13 -07:00
Ivan Shapovalov
e903177b56 config: cleanup KERNEL_CC checks, fix broken status output
If $KERNEL_CC was not defined, configure status output would print an
empty string where the kernel compiler should have been. Fix this and
simplify the code generally.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #16997
2025-08-25 11:26:13 -07:00
Tony Hutter
d247538e15
ZTS: add mount_loopback to test zfs behind loop dev
Add a test case to reproduce issue #17277:

1. Make a pool
2. Write a file to the pool
3. Mount the file as a loopback device
4. Make an XFS filesystem on the loopback device
5. Mount the XFS filesystem... <hangs>

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Issue #17277
Closes #17329
2025-08-25 11:20:46 -07:00
Mark Johnston
0d54ae2880
zdb: Fix format strings on 32-bit systems
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #17665
2025-08-25 08:59:41 -07:00
youzhongyang
b6bd3228bb
Synchronize the update of feature refcount
The concurrent execution of feature_sync() can lead to a panic due 
to an unprotected update of the feature refcount.  Resolve this by
using the spa->spa_feat_stats_lock to synchronize the update of the 
refcount.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #17184
Closes #17632
2025-08-22 16:35:58 -07:00
Cong Zhang
e7485d04f1
Prompt user to unlock when login from dropbear
Update the zfsunlock initramfs hook to provide instructions on how
to unlock the root filesystem when appropriate.  The intent is to
make the dropbear ssh MOTD more user friendly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Cong Zhang <13283869+congzhangzh@users.noreply.github.com>
Closes #17661
Closes #17662
2025-08-22 13:11:41 -07:00
Brian Behlendorf
f1f74577cb Update META
Increase the version to 2.4.99 to indicate the master branch is
newer than the 2.4.x release.  This ensures packages built from
master branch are considered to be newer than the last release.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-08-22 12:01:19 -07:00
533 changed files with 37076 additions and 19481 deletions

View File

@ -78,11 +78,6 @@ case "$OS" in
OPTS[0]="--boot"
OPTS[1]="uefi=on"
;;
fedora41)
OSNAME="Fedora 41"
OSv="fedora-unknown"
URL="https://download.fedoraproject.org/pub/fedora/linux/releases/41/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-41-1.4.x86_64.qcow2"
;;
fedora42)
OSNAME="Fedora 42"
OSv="fedora-unknown"

View File

@ -58,7 +58,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: ['almalinux8', 'almalinux9', 'almalinux10', 'fedora41', 'fedora42', 'fedora43']
os: ['almalinux8', 'almalinux9', 'almalinux10', 'fedora42', 'fedora43']
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4

View File

@ -48,7 +48,7 @@ jobs:
os_selection='["almalinux8", "almalinux9", "almalinux10", "debian12", "fedora42", "freebsd15-0s", "ubuntu24"]'
;;
linux)
os_selection='["almalinux8", "almalinux9", "almalinux10", "centos-stream9", "centos-stream10", "debian11", "debian12", "debian13", "fedora41", "fedora42", "fedora43", "ubuntu22", "ubuntu24"]'
os_selection='["almalinux8", "almalinux9", "almalinux10", "centos-stream9", "centos-stream10", "debian11", "debian12", "debian13", "fedora42", "fedora43", "ubuntu22", "ubuntu24"]'
;;
freebsd)
os_selection='["freebsd13-5r", "freebsd14-3r", "freebsd13-5s", "freebsd14-3s", "freebsd15-0s", "freebsd16-0c"]'

View File

@ -53,6 +53,7 @@ Jason Harmening <jason.harmening@gmail.com>
Jeremy Faulkner <gldisater@gmail.com>
Jinshan Xiong <jinshan.xiong@gmail.com>
John Poduska <jpoduska@datto.com>
Joseph Holsten <joseph@josephholsten.com>
Jo Zzsi <jozzsicsataban@gmail.com>
Justin Scholz <git@justinscholz.de>
Ka Ho Ng <khng300@gmail.com>
@ -72,10 +73,12 @@ Roberto Ricci <io@r-ricci.it>
Roberto Ricci <ricci@disroot.org>
Rob Norris <robn@despairlabs.com>
Rob Norris <rob.norris@klarasystems.com>
Rob Norris <rob.norris@truenas.com>
Sam Lunt <samuel.j.lunt@gmail.com>
Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Sebastian Wuerl <s.wuerl@mailbox.org>
SHENGYI HONG <aokblast@FreeBSD.org>
Sivesh Kumar <siveshjami@gmail.com>
Stoiko Ivanov <github@nomore.at>
Tamas TEVESZ <ice@extreme.hu>
WHR <msl0000023508@gmail.com>
@ -83,8 +86,12 @@ Yanping Gao <yanping.gao@xtaotech.com>
Youzhong Yang <youzhong@gmail.com>
# Signed-off-by: overriding Author:
Alexander Moch <mail@alexmoch.com> <amoch@ernw.de>
Alexander Moch <mail@alexmoch.com> <github@alexanderjulian.de>
Alexander Ziaee <ziaee@FreeBSD.org> <concussious@runbox.com>
delan azabani <dazabani@igalia.com> <delan@azabani.com>
Felix Schmidt <felixschmidt20@aol.com> <f.sch.prototype@gmail.com>
George Shammas <george@shamm.as> <georgyo@gmail.com>
Jean-Sébastien Pédron <dumbbell@FreeBSD.org> <jean-sebastien.pedron@dumbbell.fr>
Konstantin Belousov <kib@FreeBSD.org> <kib@kib.kiev.ua>
Olivier Certner <olce@FreeBSD.org> <olce.freebsd@certner.fr>
@ -108,6 +115,7 @@ Ned Bass <bass6@llnl.gov> <bass6@zeno1.(none)>
Tulsi Jain <tulsi.jain@delphix.com> <tulsi.jain@Tulsi-Jains-MacBook-Pro.local>
# Mappings from Github no-reply addresses
Adi Gollamudi <adigollamudi@gmail.com> <68113680+Adi-Goll@users.noreply.github.com>
ajs124 <git@ajs124.de> <ajs124@users.noreply.github.com>
Alek Pinchuk <apinchuk@axcient.com> <alek-p@users.noreply.github.com>
Aleksandr Liber <aleksandr.liber@perforce.com> <61714074+AleksandrLiber@users.noreply.github.com>
@ -125,6 +133,7 @@ bernie1995 <bernie.pikes@gmail.com> <42413912+bernie1995@users.noreply.github.co
Bojan Novković <bnovkov@FreeBSD.org> <72801811+bnovkov@users.noreply.github.com>
Boris Protopopov <boris.protopopov@actifio.com> <bprotopopov@users.noreply.github.com>
Brad Forschinger <github@bnjf.id.au> <bnjf@users.noreply.github.com>
Brad Spengler <94915855+bspengler-oss@users.noreply.github.com>>
Brandon Thetford <brandon@dodecatec.com> <dodexahedron@users.noreply.github.com>
buzzingwires <buzzingwires@outlook.com> <131118055+buzzingwires@users.noreply.github.com>
Cedric Maunoury <cedric.maunoury@gmail.com> <38213715+cedricmaunoury@users.noreply.github.com>
@ -138,6 +147,7 @@ Daniel Kobras <d.kobras@science-computing.de> <sckobras@users.noreply.github.com
Daniel Reichelt <hacking@nachtgeist.net> <nachtgeist@users.noreply.github.com>
David Quigley <david.quigley@intel.com> <dpquigl@users.noreply.github.com>
Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com> <31087738+dennisfriedrichsen@users.noreply.github.com>
Dennis Vestergaard Værum <github@varum.dk> <6872940+dvaerum@users.noreply.github.com>
Dex Wood <slash2314@gmail.com> <slash2314@users.noreply.github.com>
DHE <git@dehacked.net> <DeHackEd@users.noreply.github.com>
Dmitri John Ledkov <dimitri.ledkov@canonical.com> <19779+xnox@users.noreply.github.com>

14
AUTHORS
View File

@ -14,6 +14,7 @@ CONTRIBUTORS:
Adam D. Moss <c@yotes.com>
Adam Leventhal <ahl@delphix.com>
Adam Stevko <adam.stevko@gmail.com>
Adi Gollamudi <adigollamudi@gmail.com>
adisbladis <adis@blad.is>
Adrian Chadd <adrian@freebsd.org>
Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
@ -32,8 +33,10 @@ CONTRIBUTORS:
Alek Pinchuk <alek@nexenta.com>
Aleksandr Liber <aleksandr.liber@perforce.com>
Aleksa Sarai <cyphar@cyphar.com>
Alex <simplecodemaster@gmail.com>
Alexander Eremin <a.eremin@nexenta.com>
Alexander Lobakin <alobakin@pm.me>
Alexander Moch <mail@alexmoch.com>
Alexander Motin <mav@freebsd.org>
Alexander Pyhalov <apyhalov@gmail.com>
Alexander Richardson <Alexander.Richardson@cl.cam.ac.uk>
@ -46,6 +49,7 @@ CONTRIBUTORS:
Alex McWhirter <alexmcwhirter@triadic.us>
Alex Reece <alex@delphix.com>
Alex Wilson <alex.wilson@joyent.com>
Alexx Saver <lzsaver.eth@ethermail.io>
Alex Zhuravlev <alexey.zhuravlev@intel.com>
Allan Jude <allanjude@freebsd.org>
Allen Holl <allen.m.holl@gmail.com>
@ -89,6 +93,7 @@ CONTRIBUTORS:
Arun KV <arun.kv@datacore.com>
Arvind Sankar <nivedita@alum.mit.edu>
Attila Fülöp <attila@fueloep.org>
Austin Wise <AustinWise@gmail.com>
Avatat <kontakt@avatat.pl>
Bart Coddens <bart.coddens@gmail.com>
Basil Crow <basil.crow@delphix.com>
@ -110,6 +115,7 @@ CONTRIBUTORS:
Boris Protopopov <boris.protopopov@nexenta.com>
Brad Forschinger <github@bnjf.id.au>
Brad Lewis <brad.lewis@delphix.com>
Brad Spengler <bspengler-oss@users.noreply.github.com>
Brandon Thetford <brandon@dodecatec.com>
Brian Atkinson <bwa@g.clemson.edu>
Brian Behlendorf <behlendorf1@llnl.gov>
@ -196,7 +202,9 @@ CONTRIBUTORS:
David Quigley <david.quigley@intel.com>
Debabrata Banerjee <dbanerje@akamai.com>
D. Ebdrup <debdrup@freebsd.org>
delan azabani <dazabani@igalia.com>
Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com>
Dennis Vestergaard Værum <github@varum.dk>
Denys Rtveliashvili <denys@rtveliashvili.name>
Derek Dai <daiderek@gmail.com>
Derek Schrock <dereks@lifeofadishwasher.com>
@ -223,6 +231,7 @@ CONTRIBUTORS:
Eric Desrochers <eric.desrochers@canonical.com>
Eric Dillmann <eric@jave.fr>
Eric Schrock <Eric.Schrock@delphix.com>
Erik Larsson <catacombae@gmail.com>
Ethan Coe-Renner <coerenner1@llnl.gov>
Etienne Dechamps <etienne@edechamps.fr>
Evan Allrich <eallrich@gmail.com>
@ -255,6 +264,7 @@ CONTRIBUTORS:
George Diamantopoulos <georgediam@gmail.com>
George Gaydarov <git@gg7.io>
George Melikov <mail@gmelikov.ru>
George Shammas <george@shamm.as>
George Wilson <gwilson@delphix.com>
Georgy Yakovlev <ya@sysdump.net>
Gerardwx <gerardw@alum.mit.edu>
@ -343,6 +353,7 @@ CONTRIBUTORS:
Joe Stein <joe.stein@delphix.com>
John-Mark Gurney <jmg@funkthat.com>
John Albietz <inthecloud247@gmail.com>
John Cabaj <john.cabaj@canonical.com>
John Eismeier <john.eismeier@gmail.com>
John Gallagher <john.gallagher@delphix.com>
John Layman <jlayman@sagecloud.com>
@ -358,6 +369,7 @@ CONTRIBUTORS:
Jorgen Lundman <lundman@lundman.net>
Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Jose Luis Duran <jlduran@gmail.com>
Joseph Holsten <joseph@josephholsten.com>
Josh Soref <jsoref@users.noreply.github.com>
Joshua M. Clulow <josh@sysmgr.org>
José Luis Salvador Rufo <salvador.joseluis@gmail.com>
@ -622,6 +634,7 @@ CONTRIBUTORS:
Simon Guest <simon.guest@tesujimath.org>
Simon Howard <fraggle@soulsphere.org>
Simon Klinkert <simon.klinkert@gmail.com>
Sivesh Kumar <siveshjami@gmail.com>
Sowrabha Gopal <sowrabha.gopal@delphix.com>
Spencer Kinny <spencerkinny1995@gmail.com>
Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
@ -704,6 +717,7 @@ CONTRIBUTORS:
Windel Bouwman <windel@windel.nl>
Wojciech Małota-Wójcik <outofforest@users.noreply.github.com>
Wolfgang Bumiller <w.bumiller@proxmox.com>
Wolfgang Hoschek <wolfgang.hoschek@mac.com>
XDTG <click1799@163.com>
Xin Li <delphij@FreeBSD.org>
Xinliang Liu <xinliang.liu@linaro.org>

2
META
View File

@ -1,7 +1,7 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 2.4.1
Version: 2.4.99
Release: 1
Release-Tags: relext
License: CDDL

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
CLEANFILES =
dist_noinst_DATA =
INSTALL_DATA_HOOKS =
@ -132,6 +133,7 @@ cstyle:
! -name 'zfs_config.*' ! -name '*.mod.c' \
! -name 'opt_global.h' ! -name '*_if*.h' \
! -name 'zstd_compat_wrapper.h' \
! -path './module/zstd/zstd-in.c' \
! -path './module/zstd/lib/*' \
! -path './include/sys/lua/*' \
! -path './module/lua/l*.[ch]' \

View File

@ -1,3 +1,4 @@
#!/bin/sh
# SPDX-License-Identifier: CDDL-1.0
autoreconf -fiv "$(dirname "$0")" && rm -rf "$(dirname "$0")"/autom4te.cache

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
bin_SCRIPTS =
bin_PROGRAMS =
sbin_SCRIPTS =
@ -35,8 +36,8 @@ zhack_SOURCES = \
zhack_LDADD = \
libzpool.la \
libzfs_core.la \
libnvpair.la
libnvpair.la \
librange_tree.la
ztest_CFLAGS = $(AM_CFLAGS) $(KERNEL_CFLAGS)
ztest_CPPFLAGS = $(AM_CPPFLAGS) $(LIBZPOOL_CPPFLAGS)

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
raidz_test_CFLAGS = $(AM_CFLAGS) $(KERNEL_CFLAGS)
raidz_test_CPPFLAGS = $(AM_CPPFLAGS) $(LIBZPOOL_CPPFLAGS)

View File

@ -33,6 +33,7 @@
#include <sys/vdev_raidz_impl.h>
#include <assert.h>
#include <stdio.h>
#include <libzpool.h>
#include "raidz_test.h"
static int *rand_data;

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
zdb_CPPFLAGS = $(AM_CPPFLAGS) $(LIBZPOOL_CPPFLAGS)
zdb_CFLAGS = $(AM_CFLAGS) $(LIBCRYPTO_CFLAGS)
@ -13,6 +14,8 @@ zdb_LDADD = \
libzdb.la \
libzpool.la \
libzfs_core.la \
libnvpair.la
libnvpair.la \
libbtree.la \
librange_tree.la
zdb_LDADD += $(LIBCRYPTO_LIBS)

View File

@ -36,6 +36,7 @@
* Copyright (c) 2021 Toomas Soome <tsoome@me.com>
* Copyright (c) 2023, 2024, Klara Inc.
* Copyright (c) 2023, Rob Norris <robn@despairlabs.com>
* Copyright (c) 2026, TrueNAS.
*/
#include <stdio.h>
@ -89,6 +90,7 @@
#include <sys/zstd/zstd.h>
#include <sys/backtrace.h>
#include <libzpool.h>
#include <libnvpair.h>
#include <libzutil.h>
#include <libzfs_core.h>
@ -3389,14 +3391,14 @@ zdb_derive_key(dsl_dir_t *dd, uint8_t *key_out)
static char encroot[ZFS_MAX_DATASET_NAME_LEN];
static boolean_t key_loaded = B_FALSE;
static void
static int
zdb_load_key(objset_t *os)
{
dsl_pool_t *dp;
dsl_dir_t *dd, *rdd;
uint8_t key[WRAPPING_KEY_LEN];
uint64_t rddobj;
int err;
int err = 0;
dp = spa_get_dsl(os->os_spa);
dd = os->os_dsl_dataset->ds_dir;
@ -3409,10 +3411,14 @@ zdb_load_key(objset_t *os)
dsl_dir_rele(rdd, FTAG);
if (!zdb_derive_key(dd, key))
fatal("couldn't derive encryption key");
err = EINVAL;
dsl_pool_config_exit(dp, FTAG);
if (err != 0) {
fprintf(stderr, "couldn't derive encryption key\n");
return (err);
}
ASSERT3U(dsl_dataset_get_keystatus(dd), ==, ZFS_KEYSTATUS_UNAVAILABLE);
dsl_crypto_params_t *dcp;
@ -3428,16 +3434,20 @@ zdb_load_key(objset_t *os)
dsl_crypto_params_free(dcp, (err != 0));
fnvlist_free(crypto_args);
if (err != 0)
fatal(
"couldn't load encryption key for %s: %s",
if (err != 0) {
fprintf(stderr,
"couldn't load encryption key for %s: %s\n",
encroot, err == ZFS_ERR_CRYPTO_NOTSUP ?
"crypto params not supported" : strerror(err));
return (err);
}
ASSERT3U(dsl_dataset_get_keystatus(dd), ==, ZFS_KEYSTATUS_AVAILABLE);
printf("Unlocked encryption root: %s\n", encroot);
key_loaded = B_TRUE;
return (0);
}
static void
@ -3480,15 +3490,30 @@ open_objset(const char *path, const void *tag, objset_t **osp)
path, strerror(err));
return (err);
}
dsl_dataset_long_hold(dmu_objset_ds(*osp), tag);
dsl_pool_rele(dmu_objset_pool(*osp), tag);
/* succeeds or dies */
zdb_load_key(*osp);
/*
* Only try to load the key and unlock the dataset if it is
* actually encrypted; otherwise we'll just crash. Just
* ignore the -K switch entirely otherwise; it's useful to be
* able to provide even if it's not needed.
*/
if ((*osp)->os_encrypted) {
dsl_dataset_long_hold(dmu_objset_ds(*osp), tag);
dsl_pool_rele(dmu_objset_pool(*osp), tag);
/* release it all */
dsl_dataset_long_rele(dmu_objset_ds(*osp), tag);
dsl_dataset_rele(dmu_objset_ds(*osp), tag);
err = zdb_load_key(*osp);
/* release it all */
dsl_dataset_long_rele(dmu_objset_ds(*osp), tag);
dsl_dataset_rele(dmu_objset_ds(*osp), tag);
if (err != 0) {
*osp = NULL;
return (err);
}
} else {
dmu_objset_rele(*osp, tag);
}
}
int ds_hold_flags = key_loaded ? DS_HOLD_FLAG_DECRYPT : 0;
@ -3497,6 +3522,7 @@ open_objset(const char *path, const void *tag, objset_t **osp)
if (err != 0) {
(void) fprintf(stderr, "failed to hold dataset '%s': %s\n",
path, strerror(err));
*osp = NULL;
return (err);
}
dsl_dataset_long_hold(dmu_objset_ds(*osp), tag);

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
include $(srcdir)/%D%/zed.d/Makefile.am
zed_CFLAGS = $(AM_CFLAGS)
@ -37,8 +38,7 @@ zed_SOURCES = \
zed_LDADD = \
libzfs.la \
libzfs_core.la \
libnvpair.la \
libuutil.la
libnvpair.la
zed_LDADD += -lrt $(LIBATOMIC_LIBS) $(LIBUDEV_LIBS) $(LIBUUID_LIBS)
zed_LDFLAGS = -pthread

View File

@ -29,7 +29,6 @@
#include <stddef.h>
#include <string.h>
#include <libuutil.h>
#include <libzfs.h>
#include <sys/types.h>
#include <sys/time.h>
@ -96,7 +95,7 @@ typedef struct zfs_case {
uint32_t zc_version;
zfs_case_data_t zc_data;
fmd_case_t *zc_case;
uu_list_node_t zc_node;
list_node_t zc_node;
id_t zc_remove_timer;
char *zc_fru;
er_timeval_t zc_when;
@ -126,8 +125,7 @@ zfs_de_stats_t zfs_stats = {
/* wait 15 seconds after a removal */
static hrtime_t zfs_remove_timeout = SEC2NSEC(15);
uu_list_pool_t *zfs_case_pool;
uu_list_t *zfs_cases;
static list_t zfs_cases;
#define ZFS_MAKE_RSRC(type) \
FM_RSRC_CLASS "." ZFS_ERROR_CLASS "." type
@ -174,8 +172,8 @@ zfs_case_unserialize(fmd_hdl_t *hdl, fmd_case_t *cp)
zcp->zc_remove_timer = fmd_timer_install(hdl, zcp,
NULL, zfs_remove_timeout);
uu_list_node_init(zcp, &zcp->zc_node, zfs_case_pool);
(void) uu_list_insert_before(zfs_cases, NULL, zcp);
list_link_init(&zcp->zc_node);
list_insert_head(&zfs_cases, zcp);
fmd_case_setspecific(hdl, cp, zcp);
@ -206,8 +204,8 @@ zfs_other_serd_cases(fmd_hdl_t *hdl, const zfs_case_data_t *zfs_case)
next_check = gethrestime_sec() + CASE_GC_TIMEOUT_SECS;
}
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
for (zcp = list_head(&zfs_cases); zcp != NULL;
zcp = list_next(&zfs_cases, zcp)) {
zfs_case_data_t *zcd = &zcp->zc_data;
/*
@ -257,8 +255,8 @@ zfs_mark_vdev(uint64_t pool_guid, nvlist_t *vd, er_timeval_t *loaded)
/*
* Mark any cases associated with this (pool, vdev) pair.
*/
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
for (zcp = list_head(&zfs_cases); zcp != NULL;
zcp = list_next(&zfs_cases, zcp)) {
if (zcp->zc_data.zc_pool_guid == pool_guid &&
zcp->zc_data.zc_vdev_guid == vdev_guid) {
zcp->zc_present = B_TRUE;
@ -304,8 +302,8 @@ zfs_mark_pool(zpool_handle_t *zhp, void *unused)
/*
* Mark any cases associated with just this pool.
*/
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
for (zcp = list_head(&zfs_cases); zcp != NULL;
zcp = list_next(&zfs_cases, zcp)) {
if (zcp->zc_data.zc_pool_guid == pool_guid &&
zcp->zc_data.zc_vdev_guid == 0)
zcp->zc_present = B_TRUE;
@ -321,8 +319,8 @@ zfs_mark_pool(zpool_handle_t *zhp, void *unused)
if (nelem == 2) {
loaded.ertv_sec = tod[0];
loaded.ertv_nsec = tod[1];
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
for (zcp = list_head(&zfs_cases); zcp != NULL;
zcp = list_next(&zfs_cases, zcp)) {
if (zcp->zc_data.zc_pool_guid == pool_guid &&
zcp->zc_data.zc_vdev_guid == 0) {
zcp->zc_when = loaded;
@ -389,8 +387,7 @@ zpool_find_load_time(zpool_handle_t *zhp, void *arg)
static void
zfs_purge_cases(fmd_hdl_t *hdl)
{
zfs_case_t *zcp;
uu_list_walk_t *walk;
zfs_case_t *zcp, *next;
libzfs_handle_t *zhdl = fmd_hdl_getspecific(hdl);
/*
@ -410,8 +407,8 @@ zfs_purge_cases(fmd_hdl_t *hdl)
/*
* Mark the cases as not present.
*/
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp))
for (zcp = list_head(&zfs_cases); zcp != NULL;
zcp = list_next(&zfs_cases, zcp))
zcp->zc_present = B_FALSE;
/*
@ -425,12 +422,11 @@ zfs_purge_cases(fmd_hdl_t *hdl)
/*
* Remove those cases which were not found.
*/
walk = uu_list_walk_start(zfs_cases, UU_WALK_ROBUST);
while ((zcp = uu_list_walk_next(walk)) != NULL) {
for (zcp = list_head(&zfs_cases); zcp != NULL; zcp = next) {
next = list_next(&zfs_cases, zcp);
if (!zcp->zc_present)
fmd_case_close(hdl, zcp->zc_case);
}
uu_list_walk_end(walk);
}
/*
@ -660,8 +656,8 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
zfs_ereport_when(hdl, nvl, &er_when);
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
for (zcp = list_head(&zfs_cases); zcp != NULL;
zcp = list_next(&zfs_cases, zcp)) {
if (zcp->zc_data.zc_pool_guid == pool_guid) {
pool_found = B_TRUE;
pool_load = zcp->zc_when;
@ -867,8 +863,8 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
* Pool level fault. Before solving the case, go through and
* close any open device cases that may be pending.
*/
for (dcp = uu_list_first(zfs_cases); dcp != NULL;
dcp = uu_list_next(zfs_cases, dcp)) {
for (dcp = list_head(&zfs_cases); dcp != NULL;
dcp = list_next(&zfs_cases, dcp)) {
if (dcp->zc_data.zc_pool_guid ==
zcp->zc_data.zc_pool_guid &&
dcp->zc_data.zc_vdev_guid != 0)
@ -1088,8 +1084,7 @@ zfs_fm_close(fmd_hdl_t *hdl, fmd_case_t *cs)
if (zcp->zc_data.zc_has_remove_timer)
fmd_timer_remove(hdl, zcp->zc_remove_timer);
uu_list_remove(zfs_cases, zcp);
uu_list_node_fini(zcp, &zcp->zc_node, zfs_case_pool);
list_remove(&zfs_cases, zcp);
fmd_hdl_free(hdl, zcp, sizeof (zfs_case_t));
}
@ -1117,23 +1112,11 @@ _zfs_diagnosis_init(fmd_hdl_t *hdl)
if ((zhdl = libzfs_init()) == NULL)
return;
if ((zfs_case_pool = uu_list_pool_create("zfs_case_pool",
sizeof (zfs_case_t), offsetof(zfs_case_t, zc_node),
NULL, UU_LIST_POOL_DEBUG)) == NULL) {
libzfs_fini(zhdl);
return;
}
if ((zfs_cases = uu_list_create(zfs_case_pool, NULL,
UU_LIST_DEBUG)) == NULL) {
uu_list_pool_destroy(zfs_case_pool);
libzfs_fini(zhdl);
return;
}
list_create(&zfs_cases,
sizeof (zfs_case_t), offsetof(zfs_case_t, zc_node));
if (fmd_hdl_register(hdl, FMD_API_VERSION, &fmd_info) != 0) {
uu_list_destroy(zfs_cases);
uu_list_pool_destroy(zfs_case_pool);
list_destroy(&zfs_cases);
libzfs_fini(zhdl);
return;
}
@ -1148,24 +1131,18 @@ void
_zfs_diagnosis_fini(fmd_hdl_t *hdl)
{
zfs_case_t *zcp;
uu_list_walk_t *walk;
libzfs_handle_t *zhdl;
/*
* Remove all active cases.
*/
walk = uu_list_walk_start(zfs_cases, UU_WALK_ROBUST);
while ((zcp = uu_list_walk_next(walk)) != NULL) {
while ((zcp = list_remove_head(&zfs_cases)) != NULL) {
fmd_hdl_debug(hdl, "removing case ena %llu",
(long long unsigned)zcp->zc_data.zc_ena);
uu_list_remove(zfs_cases, zcp);
uu_list_node_fini(zcp, &zcp->zc_node, zfs_case_pool);
fmd_hdl_free(hdl, zcp, sizeof (zfs_case_t));
}
uu_list_walk_end(walk);
uu_list_destroy(zfs_cases);
uu_list_pool_destroy(zfs_case_pool);
list_destroy(&zfs_cases);
zhdl = fmd_hdl_getspecific(hdl);
libzfs_fini(zhdl);

View File

@ -82,7 +82,7 @@
#include <sys/sunddi.h>
#include <sys/sysevent/eventdefs.h>
#include <sys/sysevent/dev.h>
#include <thread_pool.h>
#include <sys/taskq.h>
#include <pthread.h>
#include <unistd.h>
#include <errno.h>
@ -98,7 +98,7 @@ typedef void (*zfs_process_func_t)(zpool_handle_t *, nvlist_t *, boolean_t);
libzfs_handle_t *g_zfshdl;
list_t g_pool_list; /* list of unavailable pools at initialization */
list_t g_device_list; /* list of disks with asynchronous label request */
tpool_t *g_tpool;
taskq_t *g_taskq;
boolean_t g_enumeration_done;
pthread_t g_zfs_tid; /* zfs_enum_pools() thread */
@ -749,8 +749,8 @@ zfs_iter_pool(zpool_handle_t *zhp, void *data)
continue;
if (zfs_toplevel_state(zhp) >= VDEV_STATE_DEGRADED) {
list_remove(&g_pool_list, pool);
(void) tpool_dispatch(g_tpool, zfs_enable_ds,
pool);
(void) taskq_dispatch(g_taskq, zfs_enable_ds,
pool, TQ_SLEEP);
break;
}
}
@ -1347,9 +1347,9 @@ zfs_slm_fini(void)
/* wait for zfs_enum_pools thread to complete */
(void) pthread_join(g_zfs_tid, NULL);
/* destroy the thread pool */
if (g_tpool != NULL) {
tpool_wait(g_tpool);
tpool_destroy(g_tpool);
if (g_taskq != NULL) {
taskq_wait(g_taskq);
taskq_destroy(g_taskq);
}
while ((pool = list_remove_head(&g_pool_list)) != NULL) {

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
zedconfdir = $(sysconfdir)/zfs/zed.d
dist_zedconf_DATA = \
%D%/zed-functions.sh \

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
sbin_PROGRAMS += zfs
CPPCHECKTARGETS += zfs
@ -12,8 +13,7 @@ zfs_SOURCES = \
zfs_LDADD = \
libzfs.la \
libzfs_core.la \
libnvpair.la \
libuutil.la
libnvpair.la
zfs_LDADD += $(LTLIBINTL)

View File

@ -28,7 +28,6 @@
*/
#include <libintl.h>
#include <libuutil.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
@ -50,14 +49,16 @@
* When finished, we have an AVL tree of ZFS handles. We go through and execute
* the provided callback for each one, passing whatever data the user supplied.
*/
typedef struct callback_data callback_data_t;
typedef struct zfs_node {
zfs_handle_t *zn_handle;
uu_avl_node_t zn_avlnode;
callback_data_t *zn_callback;
avl_node_t zn_avlnode;
} zfs_node_t;
typedef struct callback_data {
uu_avl_t *cb_avl;
struct callback_data {
avl_tree_t cb_avl;
int cb_flags;
zfs_type_t cb_types;
zfs_sort_column_t *cb_sortcol;
@ -65,9 +66,7 @@ typedef struct callback_data {
int cb_depth_limit;
int cb_depth;
uint8_t cb_props_table[ZFS_NUM_PROPS];
} callback_data_t;
uu_avl_pool_t *avl_pool;
};
/*
* Include snaps if they were requested or if this a zfs list where types
@ -99,13 +98,12 @@ zfs_callback(zfs_handle_t *zhp, void *data)
if ((zfs_get_type(zhp) & cb->cb_types) ||
((zfs_get_type(zhp) == ZFS_TYPE_SNAPSHOT) && include_snaps)) {
uu_avl_index_t idx;
avl_index_t idx;
zfs_node_t *node = safe_malloc(sizeof (zfs_node_t));
node->zn_handle = zhp;
uu_avl_node_init(node, &node->zn_avlnode, avl_pool);
if (uu_avl_find(cb->cb_avl, node, cb->cb_sortcol,
&idx) == NULL) {
node->zn_callback = cb;
if (avl_find(&cb->cb_avl, node, &idx) == NULL) {
if (cb->cb_proplist) {
if ((*cb->cb_proplist) &&
!(*cb->cb_proplist)->pl_all)
@ -120,7 +118,7 @@ zfs_callback(zfs_handle_t *zhp, void *data)
return (-1);
}
}
uu_avl_insert(cb->cb_avl, node, idx);
avl_insert(&cb->cb_avl, node, idx);
should_close = B_FALSE;
} else {
free(node);
@ -286,7 +284,7 @@ zfs_compare(const void *larg, const void *rarg)
if (rat != NULL)
*rat = '\0';
ret = strcmp(lname, rname);
ret = TREE_ISIGN(strcmp(lname, rname));
if (ret == 0 && (lat != NULL || rat != NULL)) {
/*
* If we're comparing a dataset to one of its snapshots, we
@ -340,11 +338,11 @@ zfs_compare(const void *larg, const void *rarg)
* with snapshots grouped under their parents.
*/
static int
zfs_sort(const void *larg, const void *rarg, void *data)
zfs_sort(const void *larg, const void *rarg)
{
zfs_handle_t *l = ((zfs_node_t *)larg)->zn_handle;
zfs_handle_t *r = ((zfs_node_t *)rarg)->zn_handle;
zfs_sort_column_t *sc = (zfs_sort_column_t *)data;
zfs_sort_column_t *sc = ((zfs_node_t *)larg)->zn_callback->cb_sortcol;
zfs_sort_column_t *psc;
for (psc = sc; psc != NULL; psc = psc->sc_next) {
@ -414,7 +412,7 @@ zfs_sort(const void *larg, const void *rarg, void *data)
return (-1);
if (lstr)
ret = strcmp(lstr, rstr);
ret = TREE_ISIGN(strcmp(lstr, rstr));
else if (lnum < rnum)
ret = -1;
else if (lnum > rnum)
@ -438,13 +436,6 @@ zfs_for_each(int argc, char **argv, int flags, zfs_type_t types,
callback_data_t cb = {0};
int ret = 0;
zfs_node_t *node;
uu_avl_walk_t *walk;
avl_pool = uu_avl_pool_create("zfs_pool", sizeof (zfs_node_t),
offsetof(zfs_node_t, zn_avlnode), zfs_sort, UU_DEFAULT);
if (avl_pool == NULL)
nomem();
cb.cb_sortcol = sortcol;
cb.cb_flags = flags;
@ -489,8 +480,8 @@ zfs_for_each(int argc, char **argv, int flags, zfs_type_t types,
sizeof (cb.cb_props_table));
}
if ((cb.cb_avl = uu_avl_create(avl_pool, NULL, UU_DEFAULT)) == NULL)
nomem();
avl_create(&cb.cb_avl, zfs_sort,
sizeof (zfs_node_t), offsetof(zfs_node_t, zn_avlnode));
if (argc == 0) {
/*
@ -531,25 +522,20 @@ zfs_for_each(int argc, char **argv, int flags, zfs_type_t types,
* At this point we've got our AVL tree full of zfs handles, so iterate
* over each one and execute the real user callback.
*/
for (node = uu_avl_first(cb.cb_avl); node != NULL;
node = uu_avl_next(cb.cb_avl, node))
for (node = avl_first(&cb.cb_avl); node != NULL;
node = AVL_NEXT(&cb.cb_avl, node))
ret |= callback(node->zn_handle, data);
/*
* Finally, clean up the AVL tree.
*/
if ((walk = uu_avl_walk_start(cb.cb_avl, UU_WALK_ROBUST)) == NULL)
nomem();
while ((node = uu_avl_walk_next(walk)) != NULL) {
uu_avl_remove(cb.cb_avl, node);
void *cookie = NULL;
while ((node = avl_destroy_nodes(&cb.cb_avl, &cookie)) != NULL) {
zfs_close(node->zn_handle);
free(node);
}
uu_avl_walk_end(walk);
uu_avl_destroy(cb.cb_avl);
uu_avl_pool_destroy(avl_pool);
avl_destroy(&cb.cb_avl);
return (ret);
}

View File

@ -42,7 +42,6 @@
#include <getopt.h>
#include <libgen.h>
#include <libintl.h>
#include <libuutil.h>
#include <libnvpair.h>
#include <locale.h>
#include <stddef.h>
@ -440,8 +439,8 @@ get_usage(zfs_help_t idx)
return (gettext("\tredact <snapshot> <bookmark> "
"<redaction_snapshot> ...\n"));
case HELP_REWRITE:
return (gettext("\trewrite [-Prvx] [-o <offset>] [-l <length>] "
"<directory|file ...>\n"));
return (gettext("\trewrite [-CPSrvx] [-o <offset>] "
"[-l <length>] <directory|file ...>\n"));
case HELP_JAIL:
return (gettext("\tjail <jailid|jailname> <filesystem>\n"));
case HELP_UNJAIL:
@ -2853,31 +2852,27 @@ static int us_type_bits[] = {
static const char *const us_type_names[] = { "posixgroup", "posixuser",
"smbgroup", "smbuser", "all" };
typedef struct us_cbdata us_cbdata_t;
typedef struct us_node {
nvlist_t *usn_nvl;
uu_avl_node_t usn_avlnode;
uu_list_node_t usn_listnode;
us_cbdata_t *usn_cbdata;
avl_node_t usn_avlnode;
list_node_t usn_listnode;
} us_node_t;
typedef struct us_cbdata {
struct us_cbdata {
nvlist_t **cb_nvlp;
uu_avl_pool_t *cb_avl_pool;
uu_avl_t *cb_avl;
avl_tree_t cb_avl;
boolean_t cb_numname;
boolean_t cb_nicenum;
boolean_t cb_sid2posix;
zfs_userquota_prop_t cb_prop;
zfs_sort_column_t *cb_sortcol;
size_t cb_width[USFIELD_LAST];
} us_cbdata_t;
};
static boolean_t us_populated = B_FALSE;
typedef struct {
zfs_sort_column_t *si_sortcol;
boolean_t si_numname;
} us_sort_info_t;
static int
us_field_index(const char *field)
{
@ -2890,13 +2885,12 @@ us_field_index(const char *field)
}
static int
us_compare(const void *larg, const void *rarg, void *unused)
us_compare(const void *larg, const void *rarg)
{
const us_node_t *l = larg;
const us_node_t *r = rarg;
us_sort_info_t *si = (us_sort_info_t *)unused;
zfs_sort_column_t *sortcol = si->si_sortcol;
boolean_t numname = si->si_numname;
zfs_sort_column_t *sortcol = l->usn_cbdata->cb_sortcol;
boolean_t numname = l->usn_cbdata->cb_numname;
nvlist_t *lnvl = l->usn_nvl;
nvlist_t *rnvl = r->usn_nvl;
int rc = 0;
@ -3030,25 +3024,22 @@ userspace_cb(void *arg, const char *domain, uid_t rid, uint64_t space,
const char *propname;
char sizebuf[32];
us_node_t *node;
uu_avl_pool_t *avl_pool = cb->cb_avl_pool;
uu_avl_t *avl = cb->cb_avl;
uu_avl_index_t idx;
avl_tree_t *avl = &cb->cb_avl;
avl_index_t idx;
nvlist_t *props;
us_node_t *n;
zfs_sort_column_t *sortcol = cb->cb_sortcol;
unsigned type = 0;
const char *typestr;
size_t namelen;
size_t typelen;
size_t sizelen;
int typeidx, nameidx, sizeidx;
us_sort_info_t sortinfo = { sortcol, cb->cb_numname };
boolean_t smbentity = B_FALSE;
if (nvlist_alloc(&props, NV_UNIQUE_NAME, 0) != 0)
nomem();
node = safe_malloc(sizeof (us_node_t));
uu_avl_node_init(node, &node->usn_avlnode, avl_pool);
node->usn_cbdata = cb;
node->usn_nvl = props;
if (domain != NULL && domain[0] != '\0') {
@ -3150,8 +3141,8 @@ userspace_cb(void *arg, const char *domain, uid_t rid, uint64_t space,
* Check if this type/name combination is in the list and update it;
* otherwise add new node to the list.
*/
if ((n = uu_avl_find(avl, node, &sortinfo, &idx)) == NULL) {
uu_avl_insert(avl, node, idx);
if ((n = avl_find(avl, node, &idx)) == NULL) {
avl_insert(avl, node, idx);
} else {
nvlist_free(props);
free(node);
@ -3325,7 +3316,7 @@ print_us_node(boolean_t scripted, boolean_t parsable, int *fields, int types,
static void
print_us(boolean_t scripted, boolean_t parsable, int *fields, int types,
size_t *width, boolean_t rmnode, uu_avl_t *avl)
size_t *width, boolean_t rmnode, avl_tree_t *avl)
{
us_node_t *node;
const char *col;
@ -3350,7 +3341,7 @@ print_us(boolean_t scripted, boolean_t parsable, int *fields, int types,
(void) printf("\n");
}
for (node = uu_avl_first(avl); node; node = uu_avl_next(avl, node)) {
for (node = avl_first(avl); node; node = AVL_NEXT(avl, node)) {
print_us_node(scripted, parsable, fields, types, width, node);
if (rmnode)
nvlist_free(node->usn_nvl);
@ -3362,9 +3353,6 @@ zfs_do_userspace(int argc, char **argv)
{
zfs_handle_t *zhp;
zfs_userquota_prop_t p;
uu_avl_pool_t *avl_pool;
uu_avl_t *avl_tree;
uu_avl_walk_t *walk;
char *delim;
char deffields[] = "type,name,used,quota,objused,objquota";
char *ofield = NULL;
@ -3383,10 +3371,8 @@ zfs_do_userspace(int argc, char **argv)
us_cbdata_t cb;
us_node_t *node;
us_node_t *rmnode;
uu_list_pool_t *listpool;
uu_list_t *list;
uu_avl_index_t idx = 0;
uu_list_index_t idx2 = 0;
list_t list;
avl_index_t idx = 0;
if (argc < 2)
usage(B_FALSE);
@ -3520,12 +3506,6 @@ zfs_do_userspace(int argc, char **argv)
return (1);
}
if ((avl_pool = uu_avl_pool_create("us_avl_pool", sizeof (us_node_t),
offsetof(us_node_t, usn_avlnode), us_compare, UU_DEFAULT)) == NULL)
nomem();
if ((avl_tree = uu_avl_create(avl_pool, NULL, UU_DEFAULT)) == NULL)
nomem();
/* Always add default sorting columns */
(void) zfs_add_sort_column(&sortcol, "type", B_FALSE);
(void) zfs_add_sort_column(&sortcol, "name", B_FALSE);
@ -3533,10 +3513,12 @@ zfs_do_userspace(int argc, char **argv)
cb.cb_sortcol = sortcol;
cb.cb_numname = prtnum;
cb.cb_nicenum = !parsable;
cb.cb_avl_pool = avl_pool;
cb.cb_avl = avl_tree;
cb.cb_sid2posix = sid2posix;
avl_create(&cb.cb_avl, us_compare,
sizeof (us_node_t), offsetof(us_node_t, usn_avlnode));
for (i = 0; i < USFIELD_LAST; i++)
cb.cb_width[i] = strlen(gettext(us_field_hdr[i]));
@ -3551,59 +3533,52 @@ zfs_do_userspace(int argc, char **argv)
cb.cb_prop = p;
if ((ret = zfs_userspace(zhp, p, userspace_cb, &cb)) != 0) {
zfs_close(zhp);
avl_destroy(&cb.cb_avl);
return (ret);
}
}
zfs_close(zhp);
/* Sort the list */
if ((node = uu_avl_first(avl_tree)) == NULL)
if ((node = avl_first(&cb.cb_avl)) == NULL) {
avl_destroy(&cb.cb_avl);
return (0);
}
us_populated = B_TRUE;
listpool = uu_list_pool_create("tmplist", sizeof (us_node_t),
offsetof(us_node_t, usn_listnode), NULL, UU_DEFAULT);
list = uu_list_create(listpool, NULL, UU_DEFAULT);
uu_list_node_init(node, &node->usn_listnode, listpool);
list_create(&list, sizeof (us_node_t),
offsetof(us_node_t, usn_listnode));
list_link_init(&node->usn_listnode);
while (node != NULL) {
rmnode = node;
node = uu_avl_next(avl_tree, node);
uu_avl_remove(avl_tree, rmnode);
if (uu_list_find(list, rmnode, NULL, &idx2) == NULL)
uu_list_insert(list, rmnode, idx2);
node = AVL_NEXT(&cb.cb_avl, node);
avl_remove(&cb.cb_avl, rmnode);
list_insert_head(&list, rmnode);
}
for (node = uu_list_first(list); node != NULL;
node = uu_list_next(list, node)) {
us_sort_info_t sortinfo = { sortcol, cb.cb_numname };
if (uu_avl_find(avl_tree, node, &sortinfo, &idx) == NULL)
uu_avl_insert(avl_tree, node, idx);
for (node = list_head(&list); node != NULL;
node = list_next(&list, node)) {
if (avl_find(&cb.cb_avl, node, &idx) == NULL)
avl_insert(&cb.cb_avl, node, idx);
}
uu_list_destroy(list);
uu_list_pool_destroy(listpool);
while ((node = list_remove_head(&list)) != NULL) { }
list_destroy(&list);
/* Print and free node nvlist memory */
print_us(scripted, parsable, fields, types, cb.cb_width, B_TRUE,
cb.cb_avl);
&cb.cb_avl);
zfs_free_sort_columns(sortcol);
/* Clean up the AVL tree */
if ((walk = uu_avl_walk_start(cb.cb_avl, UU_WALK_ROBUST)) == NULL)
nomem();
while ((node = uu_avl_walk_next(walk)) != NULL) {
uu_avl_remove(cb.cb_avl, node);
void *cookie = NULL;
while ((node = avl_destroy_nodes(&cb.cb_avl, &cookie)) != NULL) {
free(node);
}
uu_avl_walk_end(walk);
uu_avl_destroy(avl_tree);
uu_avl_pool_destroy(avl_pool);
avl_destroy(&cb.cb_avl);
return (ret);
}
@ -5409,7 +5384,7 @@ typedef struct deleg_perm {
typedef struct deleg_perm_node {
deleg_perm_t dpn_perm;
uu_avl_node_t dpn_avl_node;
avl_node_t dpn_avl_node;
} deleg_perm_node_t;
typedef struct fs_perm fs_perm_t;
@ -5421,13 +5396,13 @@ typedef struct who_perm {
char who_ug_name[256]; /* user/group name */
fs_perm_t *who_fsperm; /* uplink */
uu_avl_t *who_deleg_perm_avl; /* permissions */
avl_tree_t who_deleg_perm_avl; /* permissions */
} who_perm_t;
/* */
typedef struct who_perm_node {
who_perm_t who_perm;
uu_avl_node_t who_avl_node;
avl_node_t who_avl_node;
} who_perm_node_t;
typedef struct fs_perm_set fs_perm_set_t;
@ -5435,8 +5410,8 @@ typedef struct fs_perm_set fs_perm_set_t;
struct fs_perm {
const char *fsp_name;
uu_avl_t *fsp_sc_avl; /* sets,create */
uu_avl_t *fsp_uge_avl; /* user,group,everyone */
avl_tree_t fsp_sc_avl; /* sets,create */
avl_tree_t fsp_uge_avl; /* user,group,everyone */
fs_perm_set_t *fsp_set; /* uplink */
};
@ -5444,19 +5419,14 @@ struct fs_perm {
/* */
typedef struct fs_perm_node {
fs_perm_t fspn_fsperm;
uu_avl_t *fspn_avl;
avl_tree_t fspn_avl;
uu_list_node_t fspn_list_node;
list_node_t fspn_list_node;
} fs_perm_node_t;
/* top level structure */
struct fs_perm_set {
uu_list_pool_t *fsps_list_pool;
uu_list_t *fsps_list; /* list of fs_perms */
uu_avl_pool_t *fsps_named_set_avl_pool;
uu_avl_pool_t *fsps_who_perm_avl_pool;
uu_avl_pool_t *fsps_deleg_perm_avl_pool;
list_t fsps_list; /* list of fs_perms */
};
static inline const char *
@ -5519,9 +5489,8 @@ who_type2weight(zfs_deleg_who_type_t who_type)
}
static int
who_perm_compare(const void *larg, const void *rarg, void *unused)
who_perm_compare(const void *larg, const void *rarg)
{
(void) unused;
const who_perm_node_t *l = larg;
const who_perm_node_t *r = rarg;
zfs_deleg_who_type_t ltype = l->who_perm.who_type;
@ -5532,63 +5501,24 @@ who_perm_compare(const void *larg, const void *rarg, void *unused)
if (res == 0)
res = strncmp(l->who_perm.who_name, r->who_perm.who_name,
ZFS_MAX_DELEG_NAME-1);
if (res == 0)
return (0);
if (res > 0)
return (1);
else
return (-1);
return (TREE_ISIGN(res));
}
static int
deleg_perm_compare(const void *larg, const void *rarg, void *unused)
deleg_perm_compare(const void *larg, const void *rarg)
{
(void) unused;
const deleg_perm_node_t *l = larg;
const deleg_perm_node_t *r = rarg;
int res = strncmp(l->dpn_perm.dp_name, r->dpn_perm.dp_name,
ZFS_MAX_DELEG_NAME-1);
if (res == 0)
return (0);
if (res > 0)
return (1);
else
return (-1);
return (TREE_ISIGN(strncmp(l->dpn_perm.dp_name, r->dpn_perm.dp_name,
ZFS_MAX_DELEG_NAME-1)));
}
static inline void
fs_perm_set_init(fs_perm_set_t *fspset)
{
memset(fspset, 0, sizeof (fs_perm_set_t));
if ((fspset->fsps_list_pool = uu_list_pool_create("fsps_list_pool",
sizeof (fs_perm_node_t), offsetof(fs_perm_node_t, fspn_list_node),
NULL, UU_DEFAULT)) == NULL)
nomem();
if ((fspset->fsps_list = uu_list_create(fspset->fsps_list_pool, NULL,
UU_DEFAULT)) == NULL)
nomem();
if ((fspset->fsps_named_set_avl_pool = uu_avl_pool_create(
"named_set_avl_pool", sizeof (who_perm_node_t), offsetof(
who_perm_node_t, who_avl_node), who_perm_compare,
UU_DEFAULT)) == NULL)
nomem();
if ((fspset->fsps_who_perm_avl_pool = uu_avl_pool_create(
"who_perm_avl_pool", sizeof (who_perm_node_t), offsetof(
who_perm_node_t, who_avl_node), who_perm_compare,
UU_DEFAULT)) == NULL)
nomem();
if ((fspset->fsps_deleg_perm_avl_pool = uu_avl_pool_create(
"deleg_perm_avl_pool", sizeof (deleg_perm_node_t), offsetof(
deleg_perm_node_t, dpn_avl_node), deleg_perm_compare, UU_DEFAULT))
== NULL)
nomem();
list_create(&fspset->fsps_list, sizeof (fs_perm_node_t),
offsetof(fs_perm_node_t, fspn_list_node));
}
static inline void fs_perm_fini(fs_perm_t *);
@ -5597,21 +5527,13 @@ static inline void who_perm_fini(who_perm_t *);
static inline void
fs_perm_set_fini(fs_perm_set_t *fspset)
{
fs_perm_node_t *node = uu_list_first(fspset->fsps_list);
while (node != NULL) {
fs_perm_node_t *next_node =
uu_list_next(fspset->fsps_list, node);
fs_perm_node_t *node;
while ((node = list_remove_head(&fspset->fsps_list)) != NULL) {
fs_perm_t *fsperm = &node->fspn_fsperm;
fs_perm_fini(fsperm);
uu_list_remove(fspset->fsps_list, node);
free(node);
node = next_node;
}
uu_avl_pool_destroy(fspset->fsps_named_set_avl_pool);
uu_avl_pool_destroy(fspset->fsps_who_perm_avl_pool);
uu_avl_pool_destroy(fspset->fsps_deleg_perm_avl_pool);
list_destroy(&fspset->fsps_list);
}
static inline void
@ -5626,14 +5548,11 @@ static inline void
who_perm_init(who_perm_t *who_perm, fs_perm_t *fsperm,
zfs_deleg_who_type_t type, const char *name)
{
uu_avl_pool_t *pool;
pool = fsperm->fsp_set->fsps_deleg_perm_avl_pool;
memset(who_perm, 0, sizeof (who_perm_t));
if ((who_perm->who_deleg_perm_avl = uu_avl_create(pool, NULL,
UU_DEFAULT)) == NULL)
nomem();
avl_create(&who_perm->who_deleg_perm_avl, deleg_perm_compare,
sizeof (deleg_perm_node_t),
offsetof(deleg_perm_node_t, dpn_avl_node));
who_perm->who_type = type;
who_perm->who_name = name;
@ -5643,35 +5562,26 @@ who_perm_init(who_perm_t *who_perm, fs_perm_t *fsperm,
static inline void
who_perm_fini(who_perm_t *who_perm)
{
deleg_perm_node_t *node = uu_avl_first(who_perm->who_deleg_perm_avl);
deleg_perm_node_t *node;
void *cookie = NULL;
while (node != NULL) {
deleg_perm_node_t *next_node =
uu_avl_next(who_perm->who_deleg_perm_avl, node);
uu_avl_remove(who_perm->who_deleg_perm_avl, node);
while ((node = avl_destroy_nodes(&who_perm->who_deleg_perm_avl,
&cookie)) != NULL) {
free(node);
node = next_node;
}
uu_avl_destroy(who_perm->who_deleg_perm_avl);
avl_destroy(&who_perm->who_deleg_perm_avl);
}
static inline void
fs_perm_init(fs_perm_t *fsperm, fs_perm_set_t *fspset, const char *fsname)
{
uu_avl_pool_t *nset_pool = fspset->fsps_named_set_avl_pool;
uu_avl_pool_t *who_pool = fspset->fsps_who_perm_avl_pool;
memset(fsperm, 0, sizeof (fs_perm_t));
if ((fsperm->fsp_sc_avl = uu_avl_create(nset_pool, NULL, UU_DEFAULT))
== NULL)
nomem();
if ((fsperm->fsp_uge_avl = uu_avl_create(who_pool, NULL, UU_DEFAULT))
== NULL)
nomem();
avl_create(&fsperm->fsp_sc_avl, who_perm_compare,
sizeof (who_perm_node_t), offsetof(who_perm_node_t, who_avl_node));
avl_create(&fsperm->fsp_uge_avl, who_perm_compare,
sizeof (who_perm_node_t), offsetof(who_perm_node_t, who_avl_node));
fsperm->fsp_set = fspset;
fsperm->fsp_name = fsname;
@ -5680,46 +5590,41 @@ fs_perm_init(fs_perm_t *fsperm, fs_perm_set_t *fspset, const char *fsname)
static inline void
fs_perm_fini(fs_perm_t *fsperm)
{
who_perm_node_t *node = uu_avl_first(fsperm->fsp_sc_avl);
while (node != NULL) {
who_perm_node_t *next_node = uu_avl_next(fsperm->fsp_sc_avl,
node);
who_perm_node_t *node;
void *cookie = NULL;
while ((node = avl_destroy_nodes(&fsperm->fsp_sc_avl,
&cookie)) != NULL) {
who_perm_t *who_perm = &node->who_perm;
who_perm_fini(who_perm);
uu_avl_remove(fsperm->fsp_sc_avl, node);
free(node);
node = next_node;
}
node = uu_avl_first(fsperm->fsp_uge_avl);
while (node != NULL) {
who_perm_node_t *next_node = uu_avl_next(fsperm->fsp_uge_avl,
node);
cookie = NULL;
while ((node = avl_destroy_nodes(&fsperm->fsp_uge_avl,
&cookie)) != NULL) {
who_perm_t *who_perm = &node->who_perm;
who_perm_fini(who_perm);
uu_avl_remove(fsperm->fsp_uge_avl, node);
free(node);
node = next_node;
}
uu_avl_destroy(fsperm->fsp_sc_avl);
uu_avl_destroy(fsperm->fsp_uge_avl);
avl_destroy(&fsperm->fsp_sc_avl);
avl_destroy(&fsperm->fsp_uge_avl);
}
static void
set_deleg_perm_node(uu_avl_t *avl, deleg_perm_node_t *node,
set_deleg_perm_node(avl_tree_t *avl, deleg_perm_node_t *node,
zfs_deleg_who_type_t who_type, const char *name, char locality)
{
uu_avl_index_t idx = 0;
avl_index_t idx = 0;
deleg_perm_node_t *found_node = NULL;
deleg_perm_t *deleg_perm = &node->dpn_perm;
deleg_perm_init(deleg_perm, who_type, name);
if ((found_node = uu_avl_find(avl, node, NULL, &idx))
== NULL)
uu_avl_insert(avl, node, idx);
if ((found_node = avl_find(avl, node, &idx)) == NULL)
avl_insert(avl, node, idx);
else {
node = found_node;
deleg_perm = &node->dpn_perm;
@ -5744,20 +5649,17 @@ static inline int
parse_who_perm(who_perm_t *who_perm, nvlist_t *nvl, char locality)
{
nvpair_t *nvp = NULL;
fs_perm_set_t *fspset = who_perm->who_fsperm->fsp_set;
uu_avl_t *avl = who_perm->who_deleg_perm_avl;
avl_tree_t *avl = &who_perm->who_deleg_perm_avl;
zfs_deleg_who_type_t who_type = who_perm->who_type;
while ((nvp = nvlist_next_nvpair(nvl, nvp)) != NULL) {
const char *name = nvpair_name(nvp);
data_type_t type = nvpair_type(nvp);
uu_avl_pool_t *avl_pool = fspset->fsps_deleg_perm_avl_pool;
deleg_perm_node_t *node =
safe_malloc(sizeof (deleg_perm_node_t));
VERIFY(type == DATA_TYPE_BOOLEAN);
uu_avl_node_init(node, &node->dpn_avl_node, avl_pool);
set_deleg_perm_node(avl, node, who_type, name, locality);
}
@ -5768,13 +5670,11 @@ static inline int
parse_fs_perm(fs_perm_t *fsperm, nvlist_t *nvl)
{
nvpair_t *nvp = NULL;
fs_perm_set_t *fspset = fsperm->fsp_set;
while ((nvp = nvlist_next_nvpair(nvl, nvp)) != NULL) {
nvlist_t *nvl2 = NULL;
const char *name = nvpair_name(nvp);
uu_avl_t *avl = NULL;
uu_avl_pool_t *avl_pool = NULL;
avl_tree_t *avl = NULL;
zfs_deleg_who_type_t perm_type = name[0];
char perm_locality = name[1];
const char *perm_name = name + 3;
@ -5790,8 +5690,7 @@ parse_fs_perm(fs_perm_t *fsperm, nvlist_t *nvl)
case ZFS_DELEG_CREATE_SETS:
case ZFS_DELEG_NAMED_SET:
case ZFS_DELEG_NAMED_SET_SETS:
avl_pool = fspset->fsps_named_set_avl_pool;
avl = fsperm->fsp_sc_avl;
avl = &fsperm->fsp_sc_avl;
break;
case ZFS_DELEG_USER:
case ZFS_DELEG_USER_SETS:
@ -5799,8 +5698,7 @@ parse_fs_perm(fs_perm_t *fsperm, nvlist_t *nvl)
case ZFS_DELEG_GROUP_SETS:
case ZFS_DELEG_EVERYONE:
case ZFS_DELEG_EVERYONE_SETS:
avl_pool = fspset->fsps_who_perm_avl_pool;
avl = fsperm->fsp_uge_avl;
avl = &fsperm->fsp_uge_avl;
break;
default:
@ -5811,14 +5709,12 @@ parse_fs_perm(fs_perm_t *fsperm, nvlist_t *nvl)
who_perm_node_t *node = safe_malloc(
sizeof (who_perm_node_t));
who_perm = &node->who_perm;
uu_avl_index_t idx = 0;
avl_index_t idx = 0;
uu_avl_node_init(node, &node->who_avl_node, avl_pool);
who_perm_init(who_perm, fsperm, perm_type, perm_name);
if ((found_node = uu_avl_find(avl, node, NULL, &idx))
== NULL) {
if (avl == fsperm->fsp_uge_avl) {
if ((found_node = avl_find(avl, node, &idx)) == NULL) {
if (avl == &fsperm->fsp_uge_avl) {
uid_t rid = 0;
struct passwd *p = NULL;
struct group *g = NULL;
@ -5857,7 +5753,7 @@ parse_fs_perm(fs_perm_t *fsperm, nvlist_t *nvl)
}
}
uu_avl_insert(avl, node, idx);
avl_insert(avl, node, idx);
} else {
node = found_node;
who_perm = &node->who_perm;
@ -5874,7 +5770,6 @@ static inline int
parse_fs_perm_set(fs_perm_set_t *fspset, nvlist_t *nvl)
{
nvpair_t *nvp = NULL;
uu_avl_index_t idx = 0;
while ((nvp = nvlist_next_nvpair(nvl, nvp)) != NULL) {
nvlist_t *nvl2 = NULL;
@ -5887,10 +5782,6 @@ parse_fs_perm_set(fs_perm_set_t *fspset, nvlist_t *nvl)
VERIFY(DATA_TYPE_NVLIST == type);
uu_list_node_init(node, &node->fspn_list_node,
fspset->fsps_list_pool);
idx = uu_list_numnodes(fspset->fsps_list);
fs_perm_init(fsperm, fspset, fsname);
if (nvpair_value_nvlist(nvp, &nvl2) != 0)
@ -5898,7 +5789,7 @@ parse_fs_perm_set(fs_perm_set_t *fspset, nvlist_t *nvl)
(void) parse_fs_perm(fsperm, nvl2);
uu_list_insert(fspset->fsps_list, node, idx);
list_insert_tail(&fspset->fsps_list, node);
}
return (0);
@ -6450,7 +6341,7 @@ construct_fsacl_list(boolean_t un, struct allow_opts *opts, nvlist_t **nvlp)
}
static void
print_set_creat_perms(uu_avl_t *who_avl)
print_set_creat_perms(avl_tree_t *who_avl)
{
const char *sc_title[] = {
gettext("Permission sets:\n"),
@ -6460,9 +6351,9 @@ print_set_creat_perms(uu_avl_t *who_avl)
who_perm_node_t *who_node = NULL;
int prev_weight = -1;
for (who_node = uu_avl_first(who_avl); who_node != NULL;
who_node = uu_avl_next(who_avl, who_node)) {
uu_avl_t *avl = who_node->who_perm.who_deleg_perm_avl;
for (who_node = avl_first(who_avl); who_node != NULL;
who_node = AVL_NEXT(who_avl, who_node)) {
avl_tree_t *avl = &who_node->who_perm.who_deleg_perm_avl;
zfs_deleg_who_type_t who_type = who_node->who_perm.who_type;
const char *who_name = who_node->who_perm.who_name;
int weight = who_type2weight(who_type);
@ -6479,8 +6370,8 @@ print_set_creat_perms(uu_avl_t *who_avl)
else
(void) printf("\t%s ", who_name);
for (deleg_node = uu_avl_first(avl); deleg_node != NULL;
deleg_node = uu_avl_next(avl, deleg_node)) {
for (deleg_node = avl_first(avl); deleg_node != NULL;
deleg_node = AVL_NEXT(avl, deleg_node)) {
if (first) {
(void) printf("%s",
deleg_node->dpn_perm.dp_name);
@ -6495,28 +6386,24 @@ print_set_creat_perms(uu_avl_t *who_avl)
}
static void
print_uge_deleg_perms(uu_avl_t *who_avl, boolean_t local, boolean_t descend,
print_uge_deleg_perms(avl_tree_t *who_avl, boolean_t local, boolean_t descend,
const char *title)
{
who_perm_node_t *who_node = NULL;
boolean_t prt_title = B_TRUE;
uu_avl_walk_t *walk;
if ((walk = uu_avl_walk_start(who_avl, UU_WALK_ROBUST)) == NULL)
nomem();
while ((who_node = uu_avl_walk_next(walk)) != NULL) {
for (who_node = avl_first(who_avl); who_node != NULL;
who_node = AVL_NEXT(who_avl, who_node)) {
const char *who_name = who_node->who_perm.who_name;
const char *nice_who_name = who_node->who_perm.who_ug_name;
uu_avl_t *avl = who_node->who_perm.who_deleg_perm_avl;
avl_tree_t *avl = &who_node->who_perm.who_deleg_perm_avl;
zfs_deleg_who_type_t who_type = who_node->who_perm.who_type;
char delim = ' ';
deleg_perm_node_t *deleg_node;
boolean_t prt_who = B_TRUE;
for (deleg_node = uu_avl_first(avl);
deleg_node != NULL;
deleg_node = uu_avl_next(avl, deleg_node)) {
for (deleg_node = avl_first(avl); deleg_node != NULL;
deleg_node = AVL_NEXT(avl, deleg_node)) {
if (local != deleg_node->dpn_perm.dp_local ||
descend != deleg_node->dpn_perm.dp_descend)
continue;
@ -6566,8 +6453,6 @@ print_uge_deleg_perms(uu_avl_t *who_avl, boolean_t local, boolean_t descend,
if (!prt_who)
(void) printf("\n");
}
uu_avl_walk_end(walk);
}
static void
@ -6577,10 +6462,10 @@ print_fs_perms(fs_perm_set_t *fspset)
char buf[MAXNAMELEN + 32];
const char *dsname = buf;
for (node = uu_list_first(fspset->fsps_list); node != NULL;
node = uu_list_next(fspset->fsps_list, node)) {
uu_avl_t *sc_avl = node->fspn_fsperm.fsp_sc_avl;
uu_avl_t *uge_avl = node->fspn_fsperm.fsp_uge_avl;
for (node = list_head(&fspset->fsps_list); node != NULL;
node = list_next(&fspset->fsps_list, node)) {
avl_tree_t *sc_avl = &node->fspn_fsperm.fsp_sc_avl;
avl_tree_t *uge_avl = &node->fspn_fsperm.fsp_uge_avl;
int left = 0;
(void) snprintf(buf, sizeof (buf),
@ -6602,7 +6487,7 @@ print_fs_perms(fs_perm_set_t *fspset)
}
}
static fs_perm_set_t fs_perm_set = { NULL, NULL, NULL, NULL };
static fs_perm_set_t fs_perm_set = {};
struct deleg_perms {
boolean_t un;
@ -7462,15 +7347,14 @@ append_options(char *mntopts, char *newopts)
static enum sa_protocol
sa_protocol_decode(const char *protocol)
{
for (enum sa_protocol i = 0; i < ARRAY_SIZE(sa_protocol_names); ++i)
if (strcmp(protocol, sa_protocol_names[i]) == 0)
for (enum sa_protocol i = 0; i < SA_PROTOCOL_COUNT; ++i)
if (strcmp(protocol, zfs_share_protocol_name(i)) == 0)
return (i);
(void) fputs(gettext("share type must be one of: "), stderr);
for (enum sa_protocol i = 0;
i < ARRAY_SIZE(sa_protocol_names); ++i)
for (enum sa_protocol i = 0; i < SA_PROTOCOL_COUNT; ++i)
(void) fprintf(stderr, "%s%s",
i != 0 ? ", " : "", sa_protocol_names[i]);
i != 0 ? ", " : "", zfs_share_protocol_name(i));
(void) fputc('\n', stderr);
usage(B_FALSE);
}
@ -7734,17 +7618,16 @@ zfs_do_share(int argc, char **argv)
typedef struct unshare_unmount_node {
zfs_handle_t *un_zhp;
char *un_mountp;
uu_avl_node_t un_avlnode;
avl_node_t un_avlnode;
} unshare_unmount_node_t;
static int
unshare_unmount_compare(const void *larg, const void *rarg, void *unused)
unshare_unmount_compare(const void *larg, const void *rarg)
{
(void) unused;
const unshare_unmount_node_t *l = larg;
const unshare_unmount_node_t *r = rarg;
return (strcmp(l->un_mountp, r->un_mountp));
return (TREE_ISIGN(strcmp(l->un_mountp, r->un_mountp)));
}
/*
@ -7926,11 +7809,9 @@ unshare_unmount(int op, int argc, char **argv)
*/
FILE *mnttab;
struct mnttab entry;
uu_avl_pool_t *pool;
uu_avl_t *tree = NULL;
avl_tree_t tree;
unshare_unmount_node_t *node;
uu_avl_index_t idx;
uu_avl_walk_t *walk;
avl_index_t idx;
enum sa_protocol *protocol = NULL,
single_protocol[] = {SA_NO_PROTOCOL, SA_NO_PROTOCOL};
@ -7946,16 +7827,12 @@ unshare_unmount(int op, int argc, char **argv)
usage(B_FALSE);
}
if (((pool = uu_avl_pool_create("unmount_pool",
avl_create(&tree, unshare_unmount_compare,
sizeof (unshare_unmount_node_t),
offsetof(unshare_unmount_node_t, un_avlnode),
unshare_unmount_compare, UU_DEFAULT)) == NULL) ||
((tree = uu_avl_create(pool, NULL, UU_DEFAULT)) == NULL))
nomem();
offsetof(unshare_unmount_node_t, un_avlnode));
if ((mnttab = fopen(MNTTAB, "re")) == NULL) {
uu_avl_destroy(tree);
uu_avl_pool_destroy(pool);
avl_destroy(&tree);
return (ENOENT);
}
@ -8020,10 +7897,8 @@ unshare_unmount(int op, int argc, char **argv)
node->un_zhp = zhp;
node->un_mountp = safe_strdup(entry.mnt_mountp);
uu_avl_node_init(node, &node->un_avlnode, pool);
if (uu_avl_find(tree, node, NULL, &idx) == NULL) {
uu_avl_insert(tree, node, idx);
if (avl_find(&tree, node, &idx) == NULL) {
avl_insert(&tree, node, idx);
} else {
zfs_close(node->un_zhp);
free(node->un_mountp);
@ -8036,14 +7911,10 @@ unshare_unmount(int op, int argc, char **argv)
* Walk the AVL tree in reverse, unmounting each filesystem and
* removing it from the AVL tree in the process.
*/
if ((walk = uu_avl_walk_start(tree,
UU_WALK_REVERSE | UU_WALK_ROBUST)) == NULL)
nomem();
while ((node = uu_avl_walk_next(walk)) != NULL) {
while ((node = avl_last(&tree)) != NULL) {
const char *mntarg = NULL;
uu_avl_remove(tree, node);
avl_remove(&tree, node);
switch (op) {
case OP_SHARE:
if (zfs_unshare(node->un_zhp,
@ -8066,9 +7937,7 @@ unshare_unmount(int op, int argc, char **argv)
if (op == OP_SHARE)
zfs_commit_shares(protocol);
uu_avl_walk_end(walk);
uu_avl_destroy(tree);
uu_avl_pool_destroy(pool);
avl_destroy(&tree);
} else {
if (argc != 1) {
@ -9211,11 +9080,17 @@ zfs_do_rewrite(int argc, char **argv)
zfs_rewrite_args_t args;
memset(&args, 0, sizeof (args));
while ((c = getopt(argc, argv, "Pl:o:rvx")) != -1) {
while ((c = getopt(argc, argv, "CPSl:o:rvx")) != -1) {
switch (c) {
case 'C':
args.flags |= ZFS_REWRITE_SKIP_BRT;
break;
case 'P':
args.flags |= ZFS_REWRITE_PHYSICAL;
break;
case 'S':
args.flags |= ZFS_REWRITE_SKIP_SNAPSHOT;
break;
case 'l':
args.len = strtoll(optarg, NULL, 0);
break;

View File

@ -56,6 +56,7 @@
#include <zfeature_common.h>
#include <libzutil.h>
#include <sys/metaslab_impl.h>
#include <libzpool.h>
static importargs_t g_importargs;
static char *g_pool;
@ -744,8 +745,11 @@ zhack_do_metaslab_leak(int argc, char **argv)
&start, &size), ==, 2);
ASSERT(vd);
metaslab_t *cur =
vd->vdev_ms[start >> vd->vdev_ms_shift];
size_t idx;
idx = start >> vd->vdev_ms_shift;
if (idx >= vd->vdev_ms_count)
continue;
metaslab_t *cur = vd->vdev_ms[idx];
if (prev != cur) {
if (prev) {
dmu_tx_commit(tx);

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
sbin_PROGRAMS += zinject
CPPCHECKTARGETS += zinject

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
zpool_CFLAGS = $(AM_CFLAGS)
zpool_CFLAGS += $(LIBBLKID_CFLAGS) $(LIBUUID_CFLAGS)
@ -28,7 +29,6 @@ zpool_LDADD = \
libzfs.la \
libzfs_core.la \
libnvpair.la \
libuutil.la \
libzutil.la
zpool_LDADD += $(LTLIBINTL)

View File

@ -30,12 +30,10 @@
*/
#include <libintl.h>
#include <libuutil.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <thread_pool.h>
#include <libzfs.h>
#include <libzutil.h>
@ -52,30 +50,28 @@
typedef struct zpool_node {
zpool_handle_t *zn_handle;
uu_avl_node_t zn_avlnode;
avl_node_t zn_avlnode;
hrtime_t zn_last_refresh;
} zpool_node_t;
struct zpool_list {
boolean_t zl_findall;
boolean_t zl_literal;
uu_avl_t *zl_avl;
uu_avl_pool_t *zl_pool;
avl_tree_t zl_avl;
zprop_list_t **zl_proplist;
zfs_type_t zl_type;
hrtime_t zl_last_refresh;
};
static int
zpool_compare(const void *larg, const void *rarg, void *unused)
zpool_compare(const void *larg, const void *rarg)
{
(void) unused;
zpool_handle_t *l = ((zpool_node_t *)larg)->zn_handle;
zpool_handle_t *r = ((zpool_node_t *)rarg)->zn_handle;
const char *lname = zpool_get_name(l);
const char *rname = zpool_get_name(r);
return (strcmp(lname, rname));
return (TREE_ISIGN(strcmp(lname, rname)));
}
/*
@ -86,12 +82,11 @@ static int
add_pool(zpool_handle_t *zhp, zpool_list_t *zlp)
{
zpool_node_t *node, *new = safe_malloc(sizeof (zpool_node_t));
uu_avl_index_t idx;
avl_index_t idx;
new->zn_handle = zhp;
uu_avl_node_init(new, &new->zn_avlnode, zlp->zl_pool);
node = uu_avl_find(zlp->zl_avl, new, NULL, &idx);
node = avl_find(&zlp->zl_avl, new, &idx);
if (node == NULL) {
if (zlp->zl_proplist &&
zpool_expand_proplist(zhp, zlp->zl_proplist,
@ -101,7 +96,7 @@ add_pool(zpool_handle_t *zhp, zpool_list_t *zlp)
return (-1);
}
new->zn_last_refresh = zlp->zl_last_refresh;
uu_avl_insert(zlp->zl_avl, new, idx);
avl_insert(&zlp->zl_avl, new, idx);
} else {
zpool_refresh_stats_from_handle(node->zn_handle, zhp);
node->zn_last_refresh = zlp->zl_last_refresh;
@ -139,15 +134,8 @@ pool_list_get(int argc, char **argv, zprop_list_t **proplist, zfs_type_t type,
zlp = safe_malloc(sizeof (zpool_list_t));
zlp->zl_pool = uu_avl_pool_create("zfs_pool", sizeof (zpool_node_t),
offsetof(zpool_node_t, zn_avlnode), zpool_compare, UU_DEFAULT);
if (zlp->zl_pool == NULL)
zpool_no_memory();
if ((zlp->zl_avl = uu_avl_create(zlp->zl_pool, NULL,
UU_DEFAULT)) == NULL)
zpool_no_memory();
avl_create(&zlp->zl_avl, zpool_compare,
sizeof (zpool_node_t), offsetof(zpool_node_t, zn_avlnode));
zlp->zl_proplist = proplist;
zlp->zl_type = type;
@ -194,8 +182,8 @@ pool_list_refresh(zpool_list_t *zlp)
* state.
*/
int navail = 0;
for (zpool_node_t *node = uu_avl_first(zlp->zl_avl);
node != NULL; node = uu_avl_next(zlp->zl_avl, node)) {
for (zpool_node_t *node = avl_first(&zlp->zl_avl);
node != NULL; node = AVL_NEXT(&zlp->zl_avl, node)) {
boolean_t missing;
zpool_refresh_stats(node->zn_handle, &missing);
navail += !missing;
@ -209,8 +197,8 @@ pool_list_refresh(zpool_list_t *zlp)
/* Walk the list of existing pools, and update or remove them. */
zpool_node_t *node, *next;
for (node = uu_avl_first(zlp->zl_avl); node != NULL; node = next) {
next = uu_avl_next(zlp->zl_avl, node);
for (node = avl_first(&zlp->zl_avl); node != NULL; node = next) {
next = AVL_NEXT(&zlp->zl_avl, node);
/*
* Skip any that were refreshed and are online; they were added
@ -224,7 +212,7 @@ pool_list_refresh(zpool_list_t *zlp)
boolean_t missing;
zpool_refresh_stats(node->zn_handle, &missing);
if (missing) {
uu_avl_remove(zlp->zl_avl, node);
avl_remove(&zlp->zl_avl, node);
zpool_close(node->zn_handle);
free(node);
} else {
@ -232,7 +220,7 @@ pool_list_refresh(zpool_list_t *zlp)
}
}
return (uu_avl_numnodes(zlp->zl_avl));
return (avl_numnodes(&zlp->zl_avl));
}
/*
@ -245,8 +233,8 @@ pool_list_iter(zpool_list_t *zlp, int unavail, zpool_iter_f func,
zpool_node_t *node, *next_node;
int ret = 0;
for (node = uu_avl_first(zlp->zl_avl); node != NULL; node = next_node) {
next_node = uu_avl_next(zlp->zl_avl, node);
for (node = avl_first(&zlp->zl_avl); node != NULL; node = next_node) {
next_node = AVL_NEXT(&zlp->zl_avl, node);
if (zpool_get_state(node->zn_handle) != POOL_STATE_UNAVAIL ||
unavail)
ret |= func(node->zn_handle, data);
@ -261,25 +249,15 @@ pool_list_iter(zpool_list_t *zlp, int unavail, zpool_iter_f func,
void
pool_list_free(zpool_list_t *zlp)
{
uu_avl_walk_t *walk;
zpool_node_t *node;
void *cookie = NULL;
if ((walk = uu_avl_walk_start(zlp->zl_avl, UU_WALK_ROBUST)) == NULL) {
(void) fprintf(stderr,
gettext("internal error: out of memory"));
exit(1);
}
while ((node = uu_avl_walk_next(walk)) != NULL) {
uu_avl_remove(zlp->zl_avl, node);
while ((node = avl_destroy_nodes(&zlp->zl_avl, &cookie)) != NULL) {
zpool_close(node->zn_handle);
free(node);
}
uu_avl_walk_end(walk);
uu_avl_destroy(zlp->zl_avl);
uu_avl_pool_destroy(zlp->zl_pool);
avl_destroy(&zlp->zl_avl);
free(zlp);
}
@ -289,7 +267,7 @@ pool_list_free(zpool_list_t *zlp)
int
pool_list_count(zpool_list_t *zlp)
{
return (uu_avl_numnodes(zlp->zl_avl));
return (avl_numnodes(&zlp->zl_avl));
}
/*
@ -674,21 +652,21 @@ all_pools_for_each_vdev_gather_cb(zpool_handle_t *zhp, void *cb_vcdl)
static void
all_pools_for_each_vdev_run_vcdl(vdev_cmd_data_list_t *vcdl)
{
tpool_t *t;
t = tpool_create(1, 5 * sysconf(_SC_NPROCESSORS_ONLN), 0, NULL);
if (t == NULL)
taskq_t *tq = taskq_create("vdev_run_cmd",
5 * sysconf(_SC_NPROCESSORS_ONLN), minclsyspri, 1, INT_MAX,
TASKQ_DYNAMIC);
if (tq == NULL)
return;
/* Spawn off the command for each vdev */
for (int i = 0; i < vcdl->count; i++) {
(void) tpool_dispatch(t, vdev_run_cmd_thread,
(void *) &vcdl->data[i]);
(void) taskq_dispatch(tq, vdev_run_cmd_thread,
(void *) &vcdl->data[i], TQ_SLEEP);
}
/* Wait for threads to finish */
tpool_wait(t);
tpool_destroy(t);
taskq_wait(tq);
taskq_destroy(tq);
}
/*

View File

@ -46,14 +46,12 @@
#include <inttypes.h>
#include <libgen.h>
#include <libintl.h>
#include <libuutil.h>
#include <locale.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <termios.h>
#include <thread_pool.h>
#include <time.h>
#include <unistd.h>
#include <pwd.h>
@ -2390,7 +2388,7 @@ zpool_do_destroy(int argc, char **argv)
}
typedef struct export_cbdata {
tpool_t *tpool;
taskq_t *taskq;
pthread_mutex_t mnttab_lock;
boolean_t force;
boolean_t hardforce;
@ -2415,12 +2413,12 @@ zpool_export_one(zpool_handle_t *zhp, void *data)
* zpool_disable_datasets() is not thread-safe for mnttab access.
* So we serialize access here for 'zpool export -a' parallel case.
*/
if (cb->tpool != NULL)
if (cb->taskq != NULL)
(void) pthread_mutex_lock(&cb->mnttab_lock);
int retval = zpool_disable_datasets(zhp, cb->force);
if (cb->tpool != NULL)
if (cb->taskq != NULL)
(void) pthread_mutex_unlock(&cb->mnttab_lock);
if (retval)
@ -2464,7 +2462,7 @@ zpool_export_task(void *arg)
static int
zpool_export_one_async(zpool_handle_t *zhp, void *data)
{
tpool_t *tpool = ((export_cbdata_t *)data)->tpool;
taskq_t *tq = ((export_cbdata_t *)data)->taskq;
async_export_args_t *aea = safe_malloc(sizeof (async_export_args_t));
/* save pool name since zhp will go out of scope */
@ -2472,7 +2470,8 @@ zpool_export_one_async(zpool_handle_t *zhp, void *data)
aea->aea_cbdata = data;
/* ship off actual export to another thread */
if (tpool_dispatch(tpool, zpool_export_task, (void *)aea) != 0)
if (taskq_dispatch(tq, zpool_export_task, (void *)aea,
TQ_SLEEP) == TASKQID_INVALID)
return (errno); /* unlikely */
else
return (0);
@ -2518,7 +2517,7 @@ zpool_do_export(int argc, char **argv)
cb.force = force;
cb.hardforce = hardforce;
cb.tpool = NULL;
cb.taskq = NULL;
cb.retval = 0;
argc -= optind;
argv += optind;
@ -2532,16 +2531,17 @@ zpool_do_export(int argc, char **argv)
usage(B_FALSE);
}
cb.tpool = tpool_create(1, 5 * sysconf(_SC_NPROCESSORS_ONLN),
0, NULL);
cb.taskq = taskq_create("zpool_export",
5 * sysconf(_SC_NPROCESSORS_ONLN), minclsyspri, 1, INT_MAX,
TASKQ_DYNAMIC);
(void) pthread_mutex_init(&cb.mnttab_lock, NULL);
/* Asynchronously call zpool_export_one using thread pool */
ret = for_each_pool(argc, argv, B_TRUE, NULL, ZFS_TYPE_POOL,
B_FALSE, zpool_export_one_async, &cb);
tpool_wait(cb.tpool);
tpool_destroy(cb.tpool);
taskq_wait(cb.taskq);
taskq_destroy(cb.taskq);
(void) pthread_mutex_destroy(&cb.mnttab_lock);
return (ret | cb.retval);
@ -3456,7 +3456,7 @@ show_import(nvlist_t *config, boolean_t report_error)
case ZPOOL_STATUS_CORRUPT_POOL:
(void) printf_color(ANSI_YELLOW, gettext("The pool metadata is "
"corrupted.\n"));
"incomplete or corrupted.\n"));
break;
case ZPOOL_STATUS_VERSION_OLDER:
@ -3704,6 +3704,12 @@ show_import(nvlist_t *config, boolean_t report_error)
(void) printf(gettext("Set a unique system hostid with "
"the zgenhostid(8) command.\n"));
break;
case ZPOOL_STATUS_CORRUPT_POOL:
(void) printf(gettext("The pool cannot be imported due "
"to missing or damaged devices. Ensure\n"
"\t%sall devices are present and not in use by "
"another subsystem.\n"), indent);
break;
default:
(void) printf(gettext("The pool cannot be imported due "
"to damaged devices or data.\n"));
@ -3949,10 +3955,11 @@ import_pools(nvlist_t *pools, nvlist_t *props, char *mntopts, int flags,
uint_t npools = 0;
tpool_t *tp = NULL;
taskq_t *tq = NULL;
if (import->do_all) {
tp = tpool_create(1, 5 * sysconf(_SC_NPROCESSORS_ONLN),
0, NULL);
tq = taskq_create("zpool_import_all",
5 * sysconf(_SC_NPROCESSORS_ONLN), minclsyspri, 1, INT_MAX,
TASKQ_DYNAMIC);
}
/*
@ -4001,8 +4008,8 @@ import_pools(nvlist_t *pools, nvlist_t *props, char *mntopts, int flags,
ip->ip_mntthreads = mount_tp_nthr / npools;
ip->ip_err = &err;
(void) tpool_dispatch(tp, do_import_task,
(void *)ip);
(void) taskq_dispatch(tq, do_import_task,
(void *)ip, TQ_SLEEP);
} else {
/*
* If we're importing from cachefile, then
@ -4051,8 +4058,8 @@ import_pools(nvlist_t *pools, nvlist_t *props, char *mntopts, int flags,
}
}
if (import->do_all) {
tpool_wait(tp);
tpool_destroy(tp);
taskq_wait(tq);
taskq_destroy(tq);
}
/*
@ -6953,7 +6960,19 @@ collect_vdev_prop(zpool_prop_t prop, uint64_t value, const char *str,
switch (prop) {
case ZPOOL_PROP_SIZE:
case ZPOOL_PROP_NORMAL_SIZE:
case ZPOOL_PROP_SPECIAL_SIZE:
case ZPOOL_PROP_DEDUP_SIZE:
case ZPOOL_PROP_LOG_SIZE:
case ZPOOL_PROP_ELOG_SIZE:
case ZPOOL_PROP_SELOG_SIZE:
case ZPOOL_PROP_EXPANDSZ:
case ZPOOL_PROP_NORMAL_EXPANDSZ:
case ZPOOL_PROP_SPECIAL_EXPANDSZ:
case ZPOOL_PROP_DEDUP_EXPANDSZ:
case ZPOOL_PROP_LOG_EXPANDSZ:
case ZPOOL_PROP_ELOG_EXPANDSZ:
case ZPOOL_PROP_SELOG_EXPANDSZ:
case ZPOOL_PROP_CHECKPOINT:
case ZPOOL_PROP_DEDUPRATIO:
case ZPOOL_PROP_DEDUPCACHED:
@ -6964,6 +6983,12 @@ collect_vdev_prop(zpool_prop_t prop, uint64_t value, const char *str,
format);
break;
case ZPOOL_PROP_FRAGMENTATION:
case ZPOOL_PROP_NORMAL_FRAGMENTATION:
case ZPOOL_PROP_SPECIAL_FRAGMENTATION:
case ZPOOL_PROP_DEDUP_FRAGMENTATION:
case ZPOOL_PROP_LOG_FRAGMENTATION:
case ZPOOL_PROP_ELOG_FRAGMENTATION:
case ZPOOL_PROP_SELOG_FRAGMENTATION:
if (value == ZFS_FRAG_INVALID) {
(void) strlcpy(propval, "-", sizeof (propval));
} else if (format == ZFS_NICENUM_RAW) {
@ -6975,6 +7000,12 @@ collect_vdev_prop(zpool_prop_t prop, uint64_t value, const char *str,
}
break;
case ZPOOL_PROP_CAPACITY:
case ZPOOL_PROP_NORMAL_CAPACITY:
case ZPOOL_PROP_SPECIAL_CAPACITY:
case ZPOOL_PROP_DEDUP_CAPACITY:
case ZPOOL_PROP_LOG_CAPACITY:
case ZPOOL_PROP_ELOG_CAPACITY:
case ZPOOL_PROP_SELOG_CAPACITY:
/* capacity value is in parts-per-10,000 (aka permyriad) */
if (format == ZFS_NICENUM_RAW)
(void) snprintf(propval, sizeof (propval), "%llu",
@ -10615,7 +10646,8 @@ print_status_reason(zpool_handle_t *zhp, status_cbdata_t *cbp,
case ZPOOL_STATUS_CORRUPT_POOL:
(void) snprintf(status, ST_SIZE, gettext("The pool metadata is "
"corrupted and the pool cannot be opened.\n"));
"incomplete or corrupted and the pool cannot be "
"opened.\n"));
zpool_explain_recover(zpool_get_handle(zhp),
zpool_get_name(zhp), reason, zpool_get_config(zhp, NULL),
action, AC_SIZE);

View File

@ -114,29 +114,3 @@ array64_max(uint64_t array[], unsigned int len)
return (max);
}
/*
* Find highest one bit set.
* Returns bit number + 1 of highest bit that is set, otherwise returns 0.
*/
int
highbit64(uint64_t i)
{
if (i == 0)
return (0);
return (NBBY * sizeof (uint64_t) - __builtin_clzll(i));
}
/*
* Find lowest one bit set.
* Returns bit number + 1 of lowest bit that is set, otherwise returns 0.
*/
int
lowbit64(uint64_t i)
{
if (i == 0)
return (0);
return (__builtin_ffsll(i));
}

View File

@ -45,8 +45,6 @@ void *safe_realloc(void *, size_t);
void zpool_no_memory(void);
uint_t num_logs(nvlist_t *nv);
uint64_t array64_max(uint64_t array[], unsigned int len);
int highbit64(uint64_t i);
int lowbit64(uint64_t i);
/*
* Misc utility functions

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
zfsexec_PROGRAMS += zpool_influxdb
CPPCHECKTARGETS += zpool_influxdb

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
zstream_CPPFLAGS = $(AM_CPPFLAGS) $(LIBZPOOL_CPPFLAGS)
sbin_PROGRAMS += zstream

View File

@ -191,9 +191,9 @@ zfs_redup_stream(int infd, int outfd, boolean_t verbose)
#ifdef _ILP32
uint64_t max_rde_size = SMALLEST_POSSIBLE_MAX_RDT_MB << 20;
#else
uint64_t physmem = sysconf(_SC_PHYS_PAGES) * sysconf(_SC_PAGESIZE);
uint64_t physbytes = sysconf(_SC_PHYS_PAGES) * sysconf(_SC_PAGESIZE);
uint64_t max_rde_size =
MAX((physmem * MAX_RDT_PHYSMEM_PERCENT) / 100,
MAX((physbytes * MAX_RDT_PHYSMEM_PERCENT) / 100,
SMALLEST_POSSIBLE_MAX_RDT_MB << 20);
#endif

View File

@ -139,9 +139,10 @@
#include <sys/crypto/icp.h>
#include <sys/zfs_impl.h>
#include <sys/backtrace.h>
#include <libzpool.h>
#include <libspl.h>
static int ztest_fd_data = -1;
static int ztest_fd_rand = -1;
typedef struct ztest_shared_hdr {
uint64_t zh_hdr_size;
@ -902,13 +903,10 @@ ztest_random(uint64_t range)
{
uint64_t r;
ASSERT3S(ztest_fd_rand, >=, 0);
if (range == 0)
return (0);
if (read(ztest_fd_rand, &r, sizeof (r)) != sizeof (r))
fatal(B_TRUE, "short read from /dev/urandom");
random_get_pseudo_bytes((uint8_t *)&r, sizeof (r));
return (r % range);
}
@ -8150,10 +8148,8 @@ ztest_raidz_expand_run(ztest_shared_t *zs, spa_t *spa)
/* Setup a 1 MiB buffer of random data */
uint64_t bufsize = 1024 * 1024;
void *buffer = umem_alloc(bufsize, UMEM_NOFAIL);
random_get_pseudo_bytes((uint8_t *)buffer, bufsize);
if (read(ztest_fd_rand, buffer, bufsize) != bufsize) {
fatal(B_TRUE, "short read from /dev/urandom");
}
/*
* Put some data in the pool and then attach a vdev to initiate
* reflow.
@ -8959,13 +8955,13 @@ main(int argc, char **argv)
exit(EXIT_FAILURE);
}
libspl_init();
/*
* Force random_get_bytes() to use /dev/urandom in order to prevent
* ztest from needlessly depleting the system entropy pool.
*/
random_path = "/dev/urandom";
ztest_fd_rand = open(random_path, O_RDONLY | O_CLOEXEC);
ASSERT3S(ztest_fd_rand, >=, 0);
random_force_pseudo(B_TRUE);
if (!fd_data_str) {
process_options(argc, argv);

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
#
# cppcheck for userspace nodist_*_SOURCES are kernel code and cppcheck goes crazy on them.
#

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
#
# Default build rules for all user space components, every Makefile.am
# should include these rules and override or extend them as needed.
@ -8,9 +9,9 @@ AM_CPPFLAGS = \
-include $(top_builddir)/zfs_config.h \
-I$(top_builddir)/include \
-I$(top_srcdir)/include \
-I$(top_srcdir)/module/icp/include \
-I$(top_srcdir)/lib/libspl/include \
-I$(top_srcdir)/lib/libspl/include/os/@ac_system_l@
-I$(top_srcdir)/lib/libspl/include/os/@ac_system_l@ \
-I$(top_srcdir)/lib/libzpool/include
AM_LIBTOOLFLAGS = --silent

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
# Global ShellCheck exclusions:
#
# ShellCheck can't follow non-constant source. Use a directive to specify location. [SC1090]

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
subst_sed_cmd = \
-e 's|@abs_top_srcdir[@]|$(abs_top_srcdir)|g' \
-e 's|@bindir[@]|$(bindir)|g' \

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Set the target cpu architecture. This allows the
dnl # following syntax to be used in a Makefile.am.

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Enabled -fsanitize=address if supported by $CC.
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Check if cppcheck is available.
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Check if GNU parallel is available.
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # The majority of the python scripts are written to be compatible
dnl # with Python 3.6. This option is intended to

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # ZFS_AC_PYTHON_MODULE(module_name, [action-if-true], [action-if-false])
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Set the flags used for sed in-place edits.
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Check if shellcheck and/or checkbashisms are available.
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Set the target system
dnl #

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: FSFAP
# ===========================================================================
# https://www.gnu.org/software/autoconf-archive/ax_compare_version.html
# ===========================================================================

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: FSFAP
# ===========================================================================
# https://www.gnu.org/software/autoconf-archive/ax_count_cpus.html
# ===========================================================================

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: GPL-3.0-or-later WITH Autoconf-exception-macro
# ===========================================================================
# https://www.gnu.org/software/autoconf-archive/ax_python_devel.html
# ===========================================================================

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: FSFAP
# ===========================================================================
# http://www.gnu.org/software/autoconf-archive/ax_restore_flags.html
# ===========================================================================

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: FSFAP
# ===========================================================================
# http://www.gnu.org/software/autoconf-archive/ax_save_flags.html
# ===========================================================================

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: CDDL-1.0
PHONY += deb-kmod deb-dkms deb-utils deb deb-local native-deb-local \
native-deb-utils native-deb-kmod native-deb
@ -57,22 +58,21 @@ deb-utils: deb-local rpm-utils-initramfs
debarch=`$(DPKG) --print-architecture`; \
pkg1=$${name}-$${version}.$${arch}.rpm; \
pkg2=libnvpair3-$${version}.$${arch}.rpm; \
pkg3=libuutil3-$${version}.$${arch}.rpm; \
pkg4=libzfs7-$${version}.$${arch}.rpm; \
pkg5=libzpool7-$${version}.$${arch}.rpm; \
pkg6=libzfs7-devel-$${version}.$${arch}.rpm; \
pkg7=$${name}-test-$${version}.$${arch}.rpm; \
pkg8=$${name}-dracut-$${version}.noarch.rpm; \
pkg9=$${name}-initramfs-$${version}.$${arch}.rpm; \
pkg10=`ls python3-pyzfs-$${version}.noarch.rpm 2>/dev/null`; \
pkg11=`ls pam_zfs_key-$${version}.$${arch}.rpm 2>/dev/null`; \
pkg3=libzfs7-$${version}.$${arch}.rpm; \
pkg4=libzpool7-$${version}.$${arch}.rpm; \
pkg5=libzfs7-devel-$${version}.$${arch}.rpm; \
pkg6=$${name}-test-$${version}.$${arch}.rpm; \
pkg7=$${name}-dracut-$${version}.noarch.rpm; \
pkg8=$${name}-initramfs-$${version}.$${arch}.rpm; \
pkg9=`ls python3-pyzfs-$${version}.noarch.rpm 2>/dev/null`; \
pkg10=`ls pam_zfs_key-$${version}.$${arch}.rpm 2>/dev/null`; \
## Arguments need to be passed to dh_shlibdeps. Alien provides no mechanism
## to do this, so we install a shim onto the path which calls the real
## dh_shlibdeps with the required arguments.
path_prepend=`mktemp -d /tmp/intercept.XXXXXX`; \
echo "#!$(SHELL)" > $${path_prepend}/dh_shlibdeps; \
echo "`which dh_shlibdeps` -- \
-xlibuutil3linux -xlibnvpair3linux -xlibzfs7linux -xlibzpool7linux" \
-xlibnvpair3linux -xlibzfs7linux -xlibzpool7linux" \
>> $${path_prepend}/dh_shlibdeps; \
## These -x arguments are passed to dpkg-shlibdeps, which exclude the
## Debianized packages from the auto-generated dependencies of the new debs,

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
# find_system_lib.m4 - Macros to search for a system library. -*- Autoconf -*-
dnl requires pkg.m4 from pkg-config

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: FSFULLR
# gettext.m4 serial 70 (gettext-0.20)
dnl Copyright (C) 1995-2014, 2016, 2018 Free Software Foundation, Inc.
dnl This file is free software; the Free Software Foundation

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: FSFULLR
# host-cpu-c-abi.m4 serial 11
dnl Copyright (C) 2002-2019 Free Software Foundation, Inc.
dnl This file is free software; the Free Software Foundation

View File

@ -1,3 +1,4 @@
# SPDX-License-Identifier: FSFULLR
# iconv.m4 serial 21
dnl Copyright (C) 2000-2002, 2007-2014, 2016-2019 Free Software Foundation,
dnl Inc.

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Linux 5.0: access_ok() drops 'type' parameter:
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 3.1 API change,
dnl # posix_acl_equiv_mode now wants an umode_t instead of a mode_t
@ -21,6 +22,35 @@ AC_DEFUN([ZFS_AC_KERNEL_POSIX_ACL_EQUIV_MODE_WANTS_UMODE_T], [
])
])
dnl #
dnl # 7.0 API change
dnl # posix_acl_to_xattr() now allocates and returns the value.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_POSIX_ACL_TO_XATTR_ALLOC], [
ZFS_LINUX_TEST_SRC([posix_acl_to_xattr_alloc], [
#include <linux/fs.h>
#include <linux/posix_acl_xattr.h>
], [
struct user_namespace *ns = NULL;
struct posix_acl *acl = NULL;
size_t size = 0;
gfp_t gfp = 0;
void *xattr = NULL;
xattr = posix_acl_to_xattr(ns, acl, &size, gfp);
])
])
AC_DEFUN([ZFS_AC_KERNEL_POSIX_ACL_TO_XATTR_ALLOC], [
AC_MSG_CHECKING([whether posix_acl_to_xattr() allocates its result]);
ZFS_LINUX_TEST_RESULT([posix_acl_to_xattr_alloc], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_POSIX_ACL_TO_XATTR_ALLOC, 1,
[posix_acl_to_xattr() allocates its result])
], [
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 3.1 API change,
dnl # Check if inode_operations contains the function get_acl
@ -173,12 +203,14 @@ AC_DEFUN([ZFS_AC_KERNEL_INODE_OPERATIONS_SET_ACL], [
AC_DEFUN([ZFS_AC_KERNEL_SRC_ACL], [
ZFS_AC_KERNEL_SRC_POSIX_ACL_EQUIV_MODE_WANTS_UMODE_T
ZFS_AC_KERNEL_SRC_POSIX_ACL_TO_XATTR_ALLOC
ZFS_AC_KERNEL_SRC_INODE_OPERATIONS_GET_ACL
ZFS_AC_KERNEL_SRC_INODE_OPERATIONS_SET_ACL
])
AC_DEFUN([ZFS_AC_KERNEL_ACL], [
ZFS_AC_KERNEL_POSIX_ACL_EQUIV_MODE_WANTS_UMODE_T
ZFS_AC_KERNEL_POSIX_ACL_TO_XATTR_ALLOC
ZFS_AC_KERNEL_INODE_OPERATIONS_GET_ACL
ZFS_AC_KERNEL_INODE_OPERATIONS_SET_ACL
])

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 5.16 API change
dnl # add_disk grew a must-check return code

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 6.10 kernel, check number of args of __assign_str() for trace:
dnl

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 2.6.37 API change
dnl # The dops->d_automount() dentry operation was added as a clean

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Linux 4.8 API,
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 5.12 API change removes BIO_MAX_PAGES in favor of bio_max_segs()
dnl # which will handle the logic of setting the upper-bound to a

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 2.6.39 API change,
dnl # blk_start_plug() and blk_finish_plug()
@ -225,6 +226,30 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_MAX_HW_SECTORS], [
])
])
dnl #
dnl # 7.0 API change
dnl # blk_queue_rot() replaces blk_queue_nonrot() (inverted meaning)
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLK_QUEUE_ROT], [
ZFS_LINUX_TEST_SRC([blk_queue_rot], [
#include <linux/blkdev.h>
], [
struct request_queue *q __attribute__ ((unused)) = NULL;
(void) blk_queue_rot(q);
], [])
])
AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_ROT], [
AC_MSG_CHECKING([whether blk_queue_rot() is available])
ZFS_LINUX_TEST_RESULT([blk_queue_rot], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_ROT, 1,
[blk_queue_rot() is available])
],[
AC_MSG_RESULT(no)
])
])
dnl #
dnl # 2.6.34 API change
dnl # blk_queue_max_segments() consolidates blk_queue_max_hw_segments()
@ -278,6 +303,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLK_QUEUE], [
ZFS_AC_KERNEL_SRC_BLK_QUEUE_SECURE_ERASE
ZFS_AC_KERNEL_SRC_BLK_QUEUE_MAX_HW_SECTORS
ZFS_AC_KERNEL_SRC_BLK_QUEUE_MAX_SEGMENTS
ZFS_AC_KERNEL_SRC_BLK_QUEUE_ROT
ZFS_AC_KERNEL_SRC_BLK_MQ_RQ_HCTX
])
@ -290,5 +316,6 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE], [
ZFS_AC_KERNEL_BLK_QUEUE_SECURE_ERASE
ZFS_AC_KERNEL_BLK_QUEUE_MAX_HW_SECTORS
ZFS_AC_KERNEL_BLK_QUEUE_MAX_SEGMENTS
ZFS_AC_KERNEL_BLK_QUEUE_ROT
ZFS_AC_KERNEL_BLK_MQ_RQ_HCTX
])

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 2.6.38 API change,
dnl # Added blkdev_get_by_path()

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 2.6.38 API change
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 2.6.33 API change
dnl # Added eops->commit_metadata() callback to allow the underlying

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Certain kernel build options are not supported. These must be
dnl # detected at configure time and cause a build failure. Otherwise

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # On certain architectures `__copy_from_user_inatomic`
dnl # is a GPL exported variable and cannot be used by OpenZFS.

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # cpu_has_feature() may referencing GPL-only cpu_feature_keys on powerpc
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Ensure the DECLARE_EVENT_CLASS macro is available to non-GPL modules.
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 2.6.28 API change
dnl # Added d_obtain_alias() helper function.

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 2.6.33 API change
dnl # Discard granularity and alignment restrictions may now be set.

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 6.18 API change
dnl # - generic_drop_inode() renamed to inode_generic_drop()

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 6.12 removed f_version from struct file
dnl #

23
config/kernel-filelock.m4 Normal file
View File

@ -0,0 +1,23 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 6.3 API change
dnl # locking support functions (eg generic_setlease) were moved out of
dnl # linux/fs.h to linux/filelock.h
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_FILELOCK_HEADER], [
ZFS_LINUX_TEST_SRC([filelock_header], [
#include <linux/fs.h>
#include <linux/filelock.h>
], [])
])
AC_DEFUN([ZFS_AC_KERNEL_FILELOCK_HEADER], [
AC_MSG_CHECKING([for standalone filelock header])
ZFS_LINUX_TEST_RESULT([filelock_header], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_FILELOCK_HEADER, 1, [linux/filelock.h exists])
], [
AC_MSG_RESULT(no)
])
])

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
AC_DEFUN([ZFS_AC_KERNEL_SRC_COPY_SPLICE_READ], [
dnl #
dnl # Kernel 6.5 - generic_file_splice_read was removed in favor

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Starting from Linux 5.13, flush_dcache_page() becomes an inline
dnl # function and may indirectly referencing GPL-only symbols:

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 2.6.28 API change,
dnl # check if fmode_t typedef is defined

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 2.6.38 API change
dnl # follow_down() renamed follow_down_one(). The original follow_down()

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Handle differences in kernel FPU code.
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Linux 5.2 API change
dnl #

View File

@ -0,0 +1,33 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 2.6.38 API change
dnl # The .get_sb callback has been replaced by a .mount callback
dnl # in the file_system_type structure.
dnl #
dnl # 5.2 API change
dnl # The new fs_context-based filesystem API is introduced, with the old
dnl # one (via file_system_type.mount) preserved as a compatibility shim.
dnl #
dnl # 7.0 API change
dnl # Compatibility shim removed, so all callers must go through the mount API.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_FS_CONTEXT], [
ZFS_LINUX_TEST_SRC([fs_context], [
#include <linux/fs.h>
#include <linux/fs_context.h>
],[
static struct fs_context fs __attribute__ ((unused)) = { 0 };
static struct fs_context *fsp __attribute__ ((unused));
fsp = vfs_dup_fs_context(&fs);
])
])
AC_DEFUN([ZFS_AC_KERNEL_FS_CONTEXT], [
AC_MSG_CHECKING([whether fs_context exists])
ZFS_LINUX_TEST_RESULT([fs_context], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_FS_CONTEXT, 1, [fs_context exists])
],[
AC_MSG_RESULT(no)
])
])

View File

@ -1,30 +0,0 @@
dnl #
dnl # 2.6.38 API change
dnl # The .get_sb callback has been replaced by a .mount callback
dnl # in the file_system_type structure.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_FST_MOUNT], [
ZFS_LINUX_TEST_SRC([file_system_type_mount], [
#include <linux/fs.h>
static struct dentry *
mount(struct file_system_type *fs_type, int flags,
const char *osname, void *data) {
struct dentry *d = NULL;
return (d);
}
static struct file_system_type fst __attribute__ ((unused)) = {
.mount = mount,
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_FST_MOUNT], [
AC_MSG_CHECKING([whether fst->mount() exists])
ZFS_LINUX_TEST_RESULT([file_system_type_mount], [
AC_MSG_RESULT(yes)
],[
ZFS_LINUX_TEST_ERROR([fst->mount()])
])
])

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 6.6 API change,
dnl # fsync_bdev was removed in favor of sync_blockdev

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 5.3 API change
dnl # The generic_fadvise() function is present since 4.19 kernel

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 5.12 API
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # Check for generic io accounting interface.
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 5.17 API change,
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 2.6.x API change
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 6.0 API change
dnl # struct iattr has two unions for the uid and gid

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 5.12 API
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
AC_DEFUN([ZFS_AC_KERNEL_SRC_CREATE], [
dnl #
dnl # 6.3 API change

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_GETATTR], [
dnl #
dnl # Linux 6.3 API

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 3.6 API change
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
AC_DEFUN([ZFS_AC_KERNEL_SRC_PERMISSION], [
dnl #
dnl # 6.3 API change

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_SETATTR], [
dnl #
dnl # Linux 6.3 API

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 6.19 API change. inode->i_state no longer accessible directly; helper
dnl # functions exist.

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_TIMES], [
dnl #

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 2.6.28 API change
dnl # Added insert_inode_locked() helper function.

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 2.6.39 API change,
dnl # The is_owner_or_cap() macro was renamed to inode_owner_or_capable(),

View File

@ -1,3 +1,4 @@
dnl # SPDX-License-Identifier: CDDL-1.0
dnl #
dnl # 6.18: some architectures and config option causes the kasan_ inline
dnl # functions to reference the GPL-only symbol 'kasan_flag_enabled',

Some files were not shown because too many files have changed in this diff Show More