Commit Graph

10592 Commits

Author SHA1 Message Date
Sean Eric Fagan
f109c7bb98
Add the --file-layout (-f) option to zdb(8)
Displays the physical raidz block layout for a given file.
This leverages the internal vdev_raidz_map_alloc() function to find
the map of how the block data is laid out across the child disks.

The column entry for each row looks like:
+------------+
|  D2     43 |
|     6020da |
+------------+
representing here the logical data column 2 that is 43 sectors high
starting at sector 0x6020da.

With -H, the output is a list of disks, LBAs, and block counts,
given in 512 byte block values.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: sean.fagan@klarasystems.com
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Sean Fagan <sean.fagan@klarasystems.com>
Closes #18264
2026-03-12 14:41:23 -07:00
Rob Norris
2b930f63f8
config: fix STATX_MNT_ID detection
statx(2) requires _GNU_SOURCE to be defined in order for sys/stat.h to
produce a definition for struct statx and the STATX_* defines. We get
that at compile time because we pass -D_GNU_SOURCE through to
everything, but in the configure check we aren't setting _GNU_SOURCE, so
we don't find STATX_MNT_ID, and so don't set HAVE_STATX_MNT_ID.

(This was fine before ccf5a8a6fc, because linux/stat.h does not require
_GNU_SOURCE).

Simple fix: in the check, define _GNU_SOURCE before including
sys/stat.h.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18312
2026-03-12 09:58:54 -07:00
Rob Norris
cff853cec5
contrib/debian: add zilstat.1 manpage to installation list
Missed in 65165df129, which caused Debian packaging to fail.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18313
2026-03-12 09:17:32 -07:00
Christos Longros
d35951b18d
zpool clear: remove undocumented rewind flags
Remove the -F, -n, and -X flags from zpool clear.  These flags were
inherited from OpenSolaris but are not applicable in this context.
Unlike zpool import, where the pool is not yet loaded and a specific
TXG can be selected, zpool clear operates on an already imported pool
whose in-memory state is ahead of what is on disk.  Rewinding
transactions would require force-exporting the pool first.

The rewind policy passed to zpool_clear() is now always
ZPOOL_NO_REWIND.

Tested on FreeBSD 16.0-CURRENT (amd64).  Verified that -F, -n, and
-X are properly rejected as invalid options and that the usage output
reflects the change.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #13825
Closes #18300
2026-03-11 15:15:45 -07:00
Christos Longros
65165df129
zilstat: add man page
The zilstat command has no man page. Add zilstat.1 documenting all
options and field definitions based on the source in cmd/zilstat.in.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18303
2026-03-11 15:11:41 -07:00
Andriy Tkachuk
b403040c4c
draid: fix data corruption after disk clear
Currently, when there there are several faulted disks with attached
dRAID spares, and one of those disks is cleared from errors (zpool
clear), followed by its spare being detached, the data in all the
remaining spares that were attached while the cleared disk was in
FAULTED state might get corrupted (which can be seen by running scrub).
In some cases, when too many disks get cleared at a time, this can
result in data corruption/loss.

dRAID spare is a virtual device whose blocks are distributed among
other disks. Those disks can be also in FAULTED state with attached
spares on their own. When a disk gets sequentially resilvered (rebuilt),
the changes made by that resilvering won't get captured in the DTL
(Dirty Time Log) of other FAULTED disks with the attached spares to
which the data is written during the resilvering (as it would normally
be done for the changes made by the user if a new file is written or
some existing one is deleted). It is because sequential resilvering
works on the block level, without touching or looking into metadata,
so it doesn't know anything about the old BPs or transactions groups
that it is resilvering. So later on, when that disk gets cleared
from errors and healing resilvering is trying to sync all the data
from its spare onto it, all the changes made on its spare during the
resilvering of other disks will be missed because they won't be
captured in its DTL. That's why other dRAID spares may get corrupted.

Here's another way to explain it that might be helpful. Imagine a
scenario:

1. d1 fails and gets resilvered to some spare s1 - OK.
2. d2 fails and gets sequentially resilvered on draid spare s2. Now,
   in some slices, s2 would map to d1, which is failed. But d1 has s1
   spare attached, so the data from that resilvering goes to s1, but
   not recorded in d1's DTL.
3. Now, d1 gets cleared and its s1 gets detached. All the changes
   done by the user (writes or deletions) have their txgs captured
   in d1's DTL, so they will be resilvered by the healing resilver
   from its spare (s1) - that part works fine. But the data which
   was written during resilvering of d2 and went to s1 - that one
   will be missed from d1's DTL and won't get resilvered to it. So
   here we are:
4. s2 under d2 is corrupted in the slices which map to d1, because
   d1 doesn't have that data resilvered from s1.

Now, if there are more failed disks with draid spares attached which
were sequentially resilvered while d1 was failed, d3+s3, d4+s4 and
so on - all their spares will be corrupted. Because, in some slices,
each of them will map to d1 which will miss their data.

Solution: add all known txgs starting from TXG_INITIAL to DTLs of
non-writable devices during sequential resilvering so when healing
resilver starts on disk clear, it would be able to check and heal
blocks from all txgs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Closes #18286
Closes #18294
2026-03-11 14:54:20 -07:00
Rob Norris
62fa8bcb3c abi: updates for mnttab cleanup
Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
a59e712d25 libspl/mnttab: remove struct extmnttab
The two additional fields are never used by calling code, and we can
replace their sole internal use with an extra stack param.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
f64f12079c libspl/mnttab: remove getmntany()
Only used for when the mount cache was disabled, but since its always
enabled now, we don't need it.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
143f410e99 libspl/mnttab: make mnttab source filenames consistent
FreeBSD's getextmntent.c is only separate because it has a different
license to mnttab.c, otherwise it would go there too.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
c0ea89db9f libzfs/mnttab: shorten names, reorg a bit
We can't change the public interface, but internally we don't need so
much redundant naming.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
f43cb1fef6 libzfs/mnttab: lift node alloc/free
Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
0ecf5e3f62 libzfs/mnttab: always enable the cache
There's no real reason not to enable it always; the `zfs` command always
enables it anyway, and right now there's multiple places that do mount
work that don't go through the cache anyway. Having it always be on lets
us remove a bunch of the fallback code.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
b5637fba1c libzfs/mnttab: use SPL mutexes
More consistent, less typing, and we can check ownership.

Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Rob Norris
02224bca40 libzfs/mnttab: lift mnttab cache into separate file
Sponsored-by: TrueNAS
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18296
2026-03-10 13:07:07 -07:00
Alek P
ae7fcd5f92
fix libzfs diff mem leak in an error path
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #18301
2026-03-10 12:39:49 -07:00
Ameer Hamza
5b93d1a218 L2ARC: Fix prev_hdr use-after-free in l2arc_write_sublist
prev_hdr is dereferenced after the sublist lock is dropped for write I/O
but nothing prevents it from being freed during that window. Eliminate
prev_hdr entirely and simplify persistent marker repositioning logic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:23 -07:00
Ameer Hamza
be5d36919a man: Update L2ARC documentation for depth cap and write budget fairness
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:18 -07:00
Ameer Hamza
b27a87f399 L2ARC: Write budget fairness for metadata monopolization
Under heavy metadata load, metadata passes can monopolize the write
budget every cycle while data passes get nothing written. Track
consecutive monopolized cycles per device in l2ad_meta_cycles. After
l2arc_meta_cycles (default 2) consecutive cycles where metadata fills
the write budget, skip metadata for one cycle to let data run.  Reset
the counter when nothing is written.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:14 -07:00
Ameer Hamza
62ca8f721b L2ARC: Scan-based depth cap for persistent markers
With persistent markers and inclusive scanning, the marker traverses the
entire ARC state across many feed cycles, writing buffers far from the
tail that may no longer be relevant.

Track cumulative bytes scanned per pass in l2arc_ext_scanned. When scans
reach l2arc_ext_headroom_pct (default 25%) of the ARC state size, reset
the pass markers to the tail via lazy reset flags. This keeps markers
focused on the tail zone where buffers soon to be evicted have the most
value for L2ARC.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:08 -07:00
Ameer Hamza
15fc3d64c8 L2ARC: Lazy sublist reset flags for persistent markers
Replace direct marker-to-tail manipulation with per-sublist boolean
flags consumed lazily by feed threads.  Each scanning thread resets its
own marker when it sees the flag, rather than having another thread
manipulate the marker directly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 11:00:01 -07:00
Ameer Hamza
22fdaf0b1f L2ARC: Even sublist headroom distribution with round-robin selection
The dynamic headroom redistribution formula gave later sublists
progressively larger scanning budgets, and random sublist selection
caused uneven coverage across sublists. For depth cap to work
effectively, each sublist should be equally and fairly treated.
Use equal per-sublist headroom (headroom / num_sublists) for even
distribution and deterministic round-robin selection for fair
coverage across cycles.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18289
2026-03-10 10:59:41 -07:00
Rob Norris
0b0971f82f README: describe specific kernels/distros we target
Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18295
2026-03-10 09:55:18 -07:00
Rob Norris
97e080c496 config: remove minimum kernel version check
The autoconf checks are more than enough to decide whether or not we can
work with this kernel or not.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18295
2026-03-10 09:55:01 -07:00
Ivan Shapovalov
8531621aba zfs_main: create, clone, rename: accept -pp for non-mountable parents
Teach `zfs {create,clone,rename}` to accept a doubled `-p` flag (`-pp`)
to create non-existing ancestor datasets with `canmount=off`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #17000
2026-03-09 14:50:18 -07:00
Ivan Shapovalov
2f3f1ab1ba libzfs: teach zfs_create_ancestors() to accept properties
This will be used to support creating non-mountable ancestors in zfs(8).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes #17000
2026-03-09 14:49:52 -07:00
Ameer Hamza
1eace59060
libzfs: use mount_setattr for selective remount including legacy mounts
When a namespace property is changed via zfs set, libzfs remounts the
filesystem to propagate the new VFS mount flags. The current approach
uses mount(2) with MS_REMOUNT, which reads all namespace properties
from ZFS and applies them together. This has two problems:

1. Linux VFS resets unspecified per-mount flags on remount. If an
   administrator sets a temporary flag (e.g. mount -o remount,noatime),
   a subsequent zfs set on any namespace property clobbers it.

2. Two concurrent zfs set operations on different namespace properties
   can overwrite each other's mount flags.

Additionally, legacy datasets (mountpoint=legacy) were never remounted
on namespace property changes since zfs_is_mountable() returns false
for them.

Add zfs_mount_setattr() which uses mount_setattr(2) to selectively
update only the mount flags that correspond to the changed property.
For legacy datasets, /proc/mounts is iterated to update all
mountpoints. On kernels without mount_setattr (ENOSYS), non-legacy
datasets fall back to a full remount; legacy mounts are skipped to
avoid clobbering temporary flags.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #18257
2026-03-09 11:06:22 -07:00
Alexander Ziaee
d45c8d6489
FreeBSD: Improve dmesg kernel message prefix
Provide intuitive log search keywords and increased system consistency.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Ziaee <ziaee@FreeBSD.org>
Closes #18290
2026-03-09 10:17:23 -07:00
Christos Longros
304de7f19b
libzfs: handle EDOM error in zpool_create
When creating a pool with devices that have incompatible block sizes,
the kernel returns EDOM. However, zpool_create() did not handle this
errno, falling through to zpool_standard_error() which produced a
confusing message about invalid property values.

Add a case EDOM handler in zpool_create() to return EZFS_BADDEV with
a descriptive auxiliary message, consistent with the existing EDOM
handler in zpool_vdev_add().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christos Longros <chris.longros@gmail.com>
Closes #18268
2026-03-08 12:59:10 -07:00
Andrew Walker
c5905b2cb7
Implement lzc_send_progress
This commit adds an implementation of lzc_send_progress, which
existed in the libzfs_core header, but not in ABI and lacked
an actual implementation. The libzfs_send_progress function
is altered so that it wraps around the lzc operation. This
fills a functional gap in libzfs core.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Andrew Walker <andrew.walker@truenas.com>
Closes #18288
2026-03-06 11:05:58 -08:00
Juhyung Park
c58b8b7dc2
Fix check for .cfi_negate_ra_state on aarch64
Checking for LD_VERSION in unreliable as not all distros define it on
the compiler's preprocessor.

Explicitly check it via autoconf.

This fixes support for Ubuntu 18.04 on arm64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Closes #18262
2026-03-06 11:04:37 -08:00
Rob Norris
e73ada771d
libzpool: lift zfs_file ops out to separate source file
So its easier to remove and replace on non-Unix platforms.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18281
2026-03-05 18:07:46 -08:00
Garth Snyder
d979457760
zstream: consolidate shared code
zstream currently contains three identical copies of dump_record(),
which appear to all be drawn from libzfs_sendrecv.c. The original
is marked internal.

This PR adds zstream_util.[hc] and puts the shared code there along with
a couple of other items in common.

No functional changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Garth Snyder <garth@garthsnyder.com>
Closes #18284
2026-03-05 15:33:03 -08:00
Idefix2020
5dad9459d5
Add --no-preserve-encryption flag
* Add an option to send datasets with params or replicate
without preserving encryption
* Add a test case for the new functionality

Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Chris Jacobs <idefix2020dev@gmail.com>
Closes #18240
2026-03-05 15:08:17 -08:00
Rob Norris
c329530e6b Add simd_config.h and HAVE_SIMD() selector
We need to select which SIMD variable to check based on the compilation
target: HAVE_KERNEL_xxx for the Linux kernel, HAVE_TOOLCHAIN_xxx for
other platforms.

This adds a HAVE_SIMD() macro returns the right result depending on the
definedness or value of the variable for this target.

The macro is in simd_config.h, which is forcibly included in every
compiler call (like zfs_config.h), to ensure that it can be used
directly without further includes.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:42 -08:00
Rob Norris
35f74f84e6 Convert all HAVE_<name> SIMD gates to HAVE_SIMD(<name>)
The original names no longer exist, and the new ones will need to be
selectable based on the current compilation target.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:37 -08:00
Rob Norris
92a6ab405f config: also do SIMD checks on the kernel toolchain
The kernel may be built with a different compiler, and also includes
objtool, which may fail on unknwon instructions sequences. So, we want
to run the checks a second time for that toolchain too.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:32 -08:00
Rob Norris
c183268019 config: generate SIMD checks from table
No need to repeat all that boilerplate each time!

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:26 -08:00
Rob Norris
23bd583830 config: remove checks for unused SIMD gates
Specifically, we don't have any code gated on:

    HAVE_SSE
    HAVE_SSE3
    HAVE_SSE4_2
    HAVE_AVX512CD
    HAVE_AVX512DQ
    HAVE_AVX512IFMA
    HAVE_AVX512VBMI
    HAVE_AVX512PF
    HAVE_AVX512ER

So we can remove them and the checks that probe and generate them.

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:01:20 -08:00
Rob Norris
e4b8d6a56f linux/simd_x86: remove obsolete kernel feature gates
Most of the X86_FEATURE_* defines we use were introduced in kernels much
older than those we support, so there's no need to check for them.

For the history, these are the ones being removed, and the kernel
versions/commits where they were introduced:

    <4.6  torvalds/linux@cd4d09ec6f (refactor/consolidation commit)
        OSXSAVE
        BMI1
        BMI2
        AES
        PCLMULQDQ
        MOVBE
        SHA_NI
        AVX512F
        AVX512CD
        AVX512ER
        AVX512PF

    4.6   torvalds/linux@d050049442
        AVX512BW
        AVX512DQ
        AVX512VL

    4.10  torvalds/linux@a8d9df5a50
        AVX512IFMA
        AVX512VBMI

    4.15  torvalds/linux@c128dbfa0f
        VAES
        VPCLMULQDQ

Sponsored-by: TrueNAS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18285
2026-03-05 15:00:43 -08:00
Alexander Motin
1e1d64d665
Fix log vdev removal issues
When we clear the log, we should clear all the fields, not only
zh_log.  Otherwise remaining ZIL_REPLAY_NEEDED will prevent the
vdev removal.  Handle it also from the other side, when zh_log
is already cleared, while zh_flags is not.

spa_vdev_remove_log() asserts that allocated space on removed log
device is zero.  While it should be so in perfect world, it might
be not if space leaked at any point.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18277
2026-03-04 09:12:14 -05:00
Brian Behlendorf
f6205fdf64
ZTS: Adjust mmp_on_uberblocks threshold
Decrease the number of required uberblock blocks write slightly due
to observed variation when running in the CI.  This should help
avoid future false positives.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18280
2026-03-03 13:11:51 -08:00
Brian Behlendorf
75659a4e50
ZTS: Add additional exceptions
The following tests have been observed to occasionally fail when
running under the CI.  Updated our exceptions list to track them.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18274
2026-03-03 11:18:46 -08:00
Rob Norris
1e2c94a043
More consistent use of TREE_* macros in AVL comparators
Where is it appropriate and obvious, use TREE_CMP(), TREE_ISIGN() and
TREE_PCMP() instead or direct comparisons. It can make the code a lot
smaller, less error prone, and easier to read.

Sponsored-by: TrueNAS
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18259
2026-03-03 09:08:23 -08:00
Brian Behlendorf
0f90a797dd
Fix vdev_rebuild_range() tx commit
The spa_sync thread waits on ->spa_txg_zio and will set ZIO_WAIT_DONE
before running the sync tasks.  The dmu_tx_commit() call must be done
after we add the child zio to the ->spa_txg_zio parent otherwise its
possible the child is added after txg_sync has waited.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18276
2026-03-03 09:05:34 -08:00
Ryan Moeller
ac0fd40c8c Add zpool properties for allocation class space
The existing zpool properties accounting pool space (size, allocated,
fragmentation, expandsize, free, capacity) are based on the normal
metaslab class or are cumulative properties of several classes combined.

Add properties reporting the space accounting metrics for each metaslab
class individually.

Also introduce pool-wide AVAIL, USABLE, and USED properties reporting
values corresponding to FREE, SIZE, and ALLOC deflated for raidz.

Update ZTS to recognize the new properties and validate reported values.

While in zpool_get_parsable.cfg, add "fragmentation" to the list of
parsable properties.

Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Ryan Moeller <ryan.moeller@klarasystems.com>
Cloes #18238
2026-03-02 15:50:23 -08:00
Ryan Moeller
6ba3f915d0 zcommon: Fix description of vdev capacity format
Capacity is reported as a percentage not a size.

Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Ryan Moeller <ryan.moeller@klarasystems.com>
Closes #18238
2026-03-02 15:49:23 -08:00
Akash B
f8e5af53e9
Fix redundant declaration of dsl_pool_t
Remove redundant dsl_pool variable and duplicate spa_get_dsl()
call in vdev_rebuild_thread.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #18263
2026-02-27 10:39:52 -08:00
Andriy Tkachuk
f8457fbdc4
Fix deadlock on dmu_tx_assign() from vdev_rebuild()
vdev_rebuild() is always called with spa_config_lock held in
RW_WRITER mode. However, when it tries to call dmu_tx_assign()
the latter may hang on dmu_tx_wait() waiting for available txg.
But that available txg may not happen because txg_sync takes
spa_config_lock in order to process the current txg. So we have
a deadlock case here:

 - dmu_tx_assign() waits for txg holding spa_config_lock;
 - txg_sync waits for spa_config_lock not progressing with txg.

Here are the stacks:

    __schedule+0x24e/0x590
    schedule+0x69/0x110
    cv_wait_common+0xf8/0x130 [spl]
    __cv_wait+0x15/0x20 [spl]
    dmu_tx_wait+0x8e/0x1e0 [zfs]
    dmu_tx_assign+0x49/0x80 [zfs]
    vdev_rebuild_initiate+0x39/0xc0 [zfs]
    vdev_rebuild+0x84/0x90 [zfs]
    spa_vdev_attach+0x305/0x680 [zfs]
    zfs_ioc_vdev_attach+0xc7/0xe0 [zfs]

    cv_wait_common+0xf8/0x130 [spl]
    __cv_wait+0x15/0x20 [spl]
    spa_config_enter+0xf9/0x120 [zfs]
    spa_sync+0x6d/0x5b0 [zfs]
    txg_sync_thread+0x266/0x2f0 [zfs]

The solution is to pass txg returned by spa_vdev_enter(spa)
at the top of spa_vdev_attach() to vdev_rebuild() and call
dmu_tx_create_assigned(txg) which doesn't wait for txg.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
Closes #18210
Closes #18258
2026-02-26 11:18:02 -08:00
Rob Norris
f3d4c79496
zpl_super: prefer "new" mount API when available
This API has been available since kernel 5.2, and having it available
(almost) everywhere should give us a lot more flexibility for mount
management in the future.

Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@truenas.com>
Closes #18260
2026-02-25 13:17:33 -08:00